This page is a preview of the following resource. Continue onto eagle-i search using the button on the right to see the full record.

High-Quality Compression, Enhancement, and Personalization of Text-to-Speech Voices

eagle-i ID

http://ohsu.eagle-i.net/i/0000012c-3caa-b66e-cc1a-f59980000000

Resource Type

Algorithmic software suite

Properties

Resource Description

"The vast variability of the human speech signal remains a central challenge for Text-to-Speech (TTS) systems. The objective of this research is to develop TTS technologies that focus on elimination of concatenation errors, and accurate speech modifications in the areas of coarticulation, degree of articulation, prosodic effects, and speaker characteristics. The investigators are exploring an asynchronous interpolation model (AIM), which promises to provide for high-quality and flexible TTS. The core idea of AIM is to represent a short region of speech as a composition of several types of features called streams. Each stream is computed by asynchronous interpolation of basis vectors. Each basis vector is associated with a particular phoneme, allophone, or more specialized unit. Thus, the speech region is described by the varying degrees of influence of several types of preceding and following acoustic features. Using AIM, the investigators are also developing methods to optimally compress the acoustic inventories of TTS systems, given a size or a quality constraint, and to adapt the system to a new voice, given a few training samples. The system being researched forms a hybrid between traditional concatenative and formant-based synthesis, having advantages of both, resulting in a high-quality, optimized TTS system with voice adaptation capabilities. TTS has generally recognized societal benefits for universal access, education, and information access by voice. Our research will make it possible, for example, to build personalized TTS systems for individuals with speech disorders who can only intermittently produce normal speech sounds." Uses following algorithms: - ESPS get_formant algorithm - Dynamic time warping (DTW) algorithm - Pitch-synchronous sinusoidal synthesis algorithm
Used by

Center for Spoken Language Understanding
Related Publication or Documentation

Unit-Selection Text-to-Speech Synthesis Using an Asynchronous Interpolation Model
Website(s)

http://nsf.gov/awardsearch/showAward.do?AwardNumber=0713617
Website(s)

http://www.ohsu.edu/xd/education/schools/school-of-medicine/departments/basic-science-departments/biomedical-engineering/center-for-spoken-language-understanding/high-quality-compression.cfm?WT_rank=1
Developed by

Leen, Todd K., Ph.D.
Developed by

Kain, Alexander, Ph.D.
Software license

Open source software license

Inferred Types from the eagle-i Ontology (What is an ontology?)

Provenance Metadata About This Resource Record

workflow state

Published
contributor

nvasilevsky (Nicole Vasilevsky)
created

2010-11-11T14:41:27.936-06:00
creator

mhan (Mikyung Han)
modified

2012-11-20T17:42:55.522-06:00