Google’s DeepMind self-learning AI program has acquired another medal to keep in its hall of fame by showing its mastery even when it comes to lip reading.

Lip reading, or the ability to understand speech by the visual cues provided, is a challenging task for even the best in the field, let alone a novice. A recent collaboration between Google’s DeepMind division and the University of Oxford resulted in the creation of the most accurate lip reading software as of now known as “Watch, Listen, Attend and Spell”. As stated in their published research paper, their primary goal was “to recognize phrases and sentences being spoken by a talking face, with or without the audio. Unlike previous works that have focused on identifying a limited number of words or phrases, we tackle lip reading as an open-world problem – unconstrained natural language sentences, and in the wild videos.”

The software was made to annotate over  5000 hours of video footage from six television shows aired on the BBC.

To make a comparison in the performance of the model to that of a human, a professional lip reader with over a decade of experience was called in to decipher the footage. Even with ten times the video duration, the lip reader was only able to achieve 12.4% accuracy. The program beat its human counterpart by almost four times, with 46.8% accuracy.

A lip-reading program called LipNet created earlier, which achieved 93.4 percent accuracy in tests, attained a far better result than DeepMind’s. However, LipNet had been tested on footage that was specially-recorded and was aided by volunteers speaking formulaic sentences whereas DeepMind’s software was given a more daunting task of annotating unscripted conversations from British television.


The most significant change which will come with an AI learning how to lip-read will be in the working of speech recognition. Even though mouthing words to your phone’s camera in public isn’t particularly ideal, it certainly is better than countering background noise by screaming at your phone.

Though the future of the program has been made clear, the possibility of it being used for surveillance and security may be red flags waving in our direction. However, the software is a leap forward in the field of artificial intelligence and an insight to the monumental changes that are yet to follow. Scientists believe it might be useful by helping hearing-impaired individuals to understand conversations, annotating or re-dubbing old silent films and improving speech recognition as a whole.