Most már YouTube-videókban is egész jól olvas szájról a mesterséges intelligencia | collaboration | Scoop.it

The researchers started with 140,000 hours of YouTube videos of people talking in diverse situations. Then, they designed a program that created clips a few seconds long with the mouth movement for each phoneme, or word sound, annotated. The program filtered out non-English speech, nonspeaking faces, low-quality video, and video that wasn’t shot straight ahead. Then, they cropped the videos around the mouth. That yielded nearly 4000 hours of footage, including more than 127,000 English words.