Transformers of the handwritten word by AI

Source: https://mbzuai.ac.ae/news/transformers-of-the-handwritten-word/

Handwriting is an ancient technology. Perhaps the oldest evidence of writing are artifacts that have been found in what is present day Iraq and feature characters of the Sumerian language. These pieces are thought to have been penned more than 5,000 years ago. Millennia later, even with access to gadgets like keyboards and speech-to-text software, many people still make language legible the old-fashioned way — by writing it down by hand.

A team of researchers at MBZUAI are combining ancient and contemporary technologies in an artificial intelligence program that can learn the handwriting style of a person and generate text scrawled in what looks like their hand.

The inventors were recently granted a patent by the United States Patent and Trademark Office for the tool, which could help people who have injuries that prevent them from taking up a pen. It could also be used to efficiently generate a large amount of data to improve machine learning models’ ability to process handwritten script.

Can it be done?

Like many scientific endeavors, the project started with curiosity, said Hisham Cholakkal, assistant professor of computer vision at MBZUAI and one of the inventors of the technology: “We wanted to know if you gave a model a few samples of someone’s handwriting if the model could learn about the style of that person and then write anything in the handwriting style of that person.”

Cholakkal and colleagues shared their initial research findings in 2021 at the International Conference on Computer Vision (ICCV).

The team was comprised of Assistant Professor of Computer Vision Rao Muhammad Anwer, Associate Professor of Computer Vison Salman Khan, Deputy Department Chair of Computer Vision and Professor of Computer Vision Fahad Shahbaz Khan, and Ankan Kumar Bhunia.

In that presentation, the researchers noted that previous approaches to mimicking a person’s handwriting style had been developed using a machine learning technique called a generative adversarial network, or GAN.

Handwriting generated by GANs capture the overall, general style of a writer — for example, the slant with which a person composes letters, or the width of the strokes that make up letters. But GANs struggle to recreate how people create individual characters and the lines, known as ligatures, that tie characters together.

Instead of GANs, the researchers used vision transformers, which are a type of neural network designed for computer vision tasks. Their study was the first use of vision transformers to mimic handwriting.

The proposed vision transformer-based solution is different from GANs in that vision transformers are able to process what are known as long-range dependencies. This concept relates to how parts of an image that are physically distant from each other in fact have meaningful relationships.

“To mimic someone’s handwriting style, we want to look at the whole text, and only then will we start to understand how the writer ligated characters, how the writer connected letters, or spaced words,” Fahad Khan said. “All these tasks require a kind of global receptive field, which is not easy using convolutional neural networks. We identified this gap by in existing methods and adopted this transformer-based method.”

While the initial study focused on generating handwriting in English, the researchers are also interested to apply their technology to other languages, like Arabic, which is challenging to analyze due to the way Arabic letters are connected in handwritten script.

Even better than the real thing?

In the study, the scientists compared their handwritten text image generation approach, which they shorten to HWT, to two other handwriting generation technologies. They showed text generated by the three models to 100 people and asked which one they preferred. The participants in the study preferred HWT to the other text generators 81% of the time.

A qualitative comparison of HWT to two other handwriting generators, GANwriting and Davis et al. All three generators were instructed to produce the same text: “No two people can write precisely the same way just as no two people can have the same fingerprints.” All three applications were trained on samples of handwritten text (far-left column) by six different writers. Davis et al. captures the overall style of a writer, for example the slant of the text, but struggles to mimic the character-specific style details. GANwriting is limited by the length of words it can mimic and was unable to complete the provided textual content — for example, it generated the word “precise” instead of “precisely.” The approach by researchers at MBZUAI better mimics both global and local style patterns, generating more realistic handwriting. FromHandwriting Transformers presented at the International Conference on Computer Vision (ICCV) 2021.

“We also showed the handwriting mimicking that was generated to humans to compare it to the benchmark, and to our surprise the result of the generated handwriting was quite good. They could not distinguish the mimicked handwriting from the actual handwriting, and it was satisfying to see that kind of validation of the performance,” Salman Khan said.

The researchers’ model doesn’t require much data to be trained. A few paragraphs of original handwriting were all it needed.

But there is also always a risk to innovation. “We are very cautious about it because it could be misused,” Anwer said. “Handwriting represents a person’s identity, so we are thinking carefully about this before deploying it.”

And while there are risks, new findings can also raise awareness of potential threats. “It’s important to be aware that it’s possible to use AI to generate handwriting that matches the style of an individual,” Cholakkal said.

Related AI news

OpenAI hits back at DeepSeek with o3-mini reasoning: A Leaner, More Efficient AI Model model

by Vicky Nijdam-Nguyen | Feb 1, 2025 | artificial intelligence, openAI

OpenAI has just unveiled its latest reasoning model, o3-mini, a significant step forward in AI efficiency and accessibility. Designed to excel in coding, mathematics, and scientific problem-solving, this model is a response to increasing competition in the AI space,...

AI-powered Daze Chat Set to Launch: A New Messaging Platform Tailored for Gen Z

by Vicky Nijdam-Nguyen | Oct 23, 2024 | artificial intelligence, generative ai

A new messaging app, Daze Chat, is preparing to shake up the digital landscape with its official release on the Apple Store, expected on November 4, 2024. Designed specifically with Gen Z users in mind, Daze Chat promises to bring a fresh, personalized, and fun...

Introducing Computer Use, a New Claude 3.5 Sonnet, and Claude 3.5 Haiku

by Vicky Nijdam-Nguyen | Oct 23, 2024 | artificial intelligence

Anthropic has recently released a major update with the Claude 3.5 models, a key step forward in AI capabilities. Alongside the improvements in understanding, reasoning, and conversation, a standout feature is the AI’s ability to use computers effectively—making it...