Patent ReferencesSound-synchronized video system Method and apparatus for producing audio-visual synthetic speech Patent #: 5657426 InventorAssigneeApplicationNo. 019514 filed on 02/05/1998US Classes:704/276, Pattern display704/235, Speech to image704/270ApplicationExaminersPrimary: Hudspeth, DavidAssistant: Opsasnick, Michael N. Attorney, Agent or FirmInternational ClassG10L 003/00AbstractThe present invention utilizes a novel approach to facial imaging synchronized with synthetic speech. Mapping viseme images to a diphone requires the same `transitioning` in that the imaging associated with a diphone is not a static image, but rather, a series of images which dynamically depict, with lip, teeth and tongue positioning, the sound transition occurring in the relevant diphone. Each series of lip, teeth, and tongue positioning transitions is referred to herein as a `diseme.` A diseme (like a diphone) thus begins somewhere during one viseme (phone) and ends somewhere during a following viseme (phone). Due to lip, teeth and tongue position imaging commonality, phones are grouped into archiphonic families. A single diseme, which depicts the transition from a phone in one archiphonic family to another phone in a different archiphonic family, can be used for displaying the transition between any phone in the first archiphonic family to any phone in the second archiphonic family. In this way, the approximately 1800 diphones in General American English can be visually depicted by a relatively small number of disemes, again, due to their similarity in lip, teeth, and tongue image positioning. This results in a mapping between synthetic speech and facial imaging which more accurately reflects the speech transitional movements of a realistic speaker image. | |