LipNet

LipNet is a deep neural network for audio-visual speech recognition (ASVR). It was created by University of Oxford researchers Yannis Assael, Brendan Shillingford, Shimon Whiteson, and Nando de Freitas.^[1] The researchers stated that could match mouth movements to text with 93 percent accuracy,^[2] though it was criticized for its test using a limited dataset of words and grammar.^[3] It was used in Nvidia's autonomous "backseat driver" prototype Co-Pilot.^[4]

References

^ Assael, Yannis M.; Shillingford, Brendan; Whiteson, Shimon; de Freitas, Nando (2016-12-16). "LipNet: End-to-End Sentence-level Lipreading". arXiv:1611.01599 [cs.LG].
^ "AI that lip-reads 'better than humans'". BBC News. 2016-11-08. Retrieved 2026-05-25.
^ Vincent, James (November 7, 2016). "Can deep learning help solve lip reading?". The Verge.
^ Quach, Katyanna. "Revealed: How Nvidia's 'backseat driver' AI learned to read lips". www.theregister.com.

This artificial neural network-related article is a stub. You can help Wikipedia by adding missing information.

This speech recognition-related article is a stub. You can help Wikipedia by adding missing information.

[1] Assael, Yannis M.; Shillingford, Brendan; Whiteson, Shimon; de Freitas, Nando (2016-12-16). "LipNet: End-to-End Sentence-level Lipreading". arXiv:1611.01599 [cs.LG].

[2] "AI that lip-reads 'better than humans'". BBC News. 2016-11-08. Retrieved 2026-05-25.

[3] Vincent, James (November 7, 2016). "Can deep learning help solve lip reading?". The Verge.

[4] Quach, Katyanna. "Revealed: How Nvidia's 'backseat driver' AI learned to read lips". www.theregister.com.

[1]

[2]

[3]

[4]