eagle-i Oregon Health & Science UniversityOregon Health & Science University
See it in Search
This page is a preview of the following resource. Continue onto eagle-i search using the button on the right to see the full record.

High-Quality Compression, Enhancement, and Personalization of Text-to-Speech Voices

eagle-i ID


Resource Type

  1. Algorithmic software suite


  1. Resource Description
    "The vast variability of the human speech signal remains a central challenge for Text-to-Speech (TTS) systems. The objective of this research is to develop TTS technologies that focus on elimination of concatenation errors, and accurate speech modifications in the areas of coarticulation, degree of articulation, prosodic effects, and speaker characteristics. The investigators are exploring an asynchronous interpolation model (AIM), which promises to provide for high-quality and flexible TTS. The core idea of AIM is to represent a short region of speech as a composition of several types of features called streams. Each stream is computed by asynchronous interpolation of basis vectors. Each basis vector is associated with a particular phoneme, allophone, or more specialized unit. Thus, the speech region is described by the varying degrees of influence of several types of preceding and following acoustic features. Using AIM, the investigators are also developing methods to optimally compress the acoustic inventories of TTS systems, given a size or a quality constraint, and to adapt the system to a new voice, given a few training samples. The system being researched forms a hybrid between traditional concatenative and formant-based synthesis, having advantages of both, resulting in a high-quality, optimized TTS system with voice adaptation capabilities. TTS has generally recognized societal benefits for universal access, education, and information access by voice. Our research will make it possible, for example, to build personalized TTS systems for individuals with speech disorders who can only intermittently produce normal speech sounds." Uses following algorithms: - ESPS get_formant algorithm - Dynamic time warping (DTW) algorithm - Pitch-synchronous sinusoidal synthesis algorithm
  2. Used by
    Center for Spoken Language Understanding
  3. Related Publication or Documentation
    Unit-Selection Text-to-Speech Synthesis Using an Asynchronous Interpolation Model
  4. Website(s)
  5. Website(s)
  6. Developed by
    Leen, Todd K., Ph.D.
  7. Developed by
    Kain, Alexander, Ph.D.
  8. Software license
    Open source software license
Provenance Metadata About This Resource Record
Copyright © 2016 by the President and Fellows of Harvard College
The eagle-i Consortium is supported by NIH Grant #5U24RR029825-02 / Copyright 2016