See it in Search
This page is a preview of the following resource. Continue onto eagle-i search using the button on the right to see the full record.
High-Quality Compression, Enhancement, and Personalization of Text-to-Speech Voices
eagle-i ID
http://ohsu.eagle-i.net/i/0000012c-3caa-b66e-cc1a-f59980000000
Resource Type
Properties
-
-
Resource Description
-
"The vast variability of the human speech signal remains a central challenge for Text-to-Speech (TTS) systems. The objective of this research is to develop TTS technologies that focus on elimination of concatenation errors, and accurate speech modifications in the areas of coarticulation, degree of articulation, prosodic effects, and speaker characteristics. The investigators are exploring an asynchronous interpolation model (AIM), which promises to provide for high-quality and flexible TTS. The core idea of AIM is to represent a short region of speech as a composition of several types of features called streams. Each stream is computed by asynchronous interpolation of basis vectors.
Each basis vector is associated with a particular phoneme, allophone, or more specialized unit. Thus, the speech region is described by the varying degrees of influence of several types of preceding and following acoustic features. Using AIM, the investigators are also developing methods to optimally compress the acoustic inventories of TTS systems, given a size or a quality constraint, and to adapt the system to a new voice, given a few training samples. The system being researched forms a hybrid between traditional concatenative and formant-based synthesis, having advantages of both, resulting in a high-quality, optimized TTS system with voice adaptation capabilities. TTS has generally recognized societal benefits for universal access, education, and information access by voice. Our research will make it possible, for example, to build personalized TTS systems for individuals with speech disorders who can only intermittently produce normal speech sounds."
Uses following algorithms:
- ESPS get_formant algorithm
- Dynamic time warping (DTW) algorithm
- Pitch-synchronous sinusoidal synthesis algorithm
-
-
Used by
-
Center for Spoken Language Understanding
-
-
Related Publication or Documentation
-
Unit-Selection Text-to-Speech Synthesis Using an Asynchronous Interpolation Model
-
-
Website(s)
-
http://nsf.gov/awardsearch/showAward.do?AwardNumber=0713617
-
-
Website(s)
-
http://www.ohsu.edu/xd/education/schools/school-of-medicine/departments/basic-science-departments/biomedical-engineering/center-for-spoken-language-understanding/high-quality-compression.cfm?WT_rank=1
-
-
Developed by
-
Leen, Todd K., Ph.D.
-
-
Developed by
-
Kain, Alexander, Ph.D.
-
-
Software license
-
Open source software license