eagle-i Oregon Health & Science UniversityOregon Health & Science University
See it in Search
This page is a preview of the following resource. Continue onto eagle-i search using the button on the right to see the full record.

Software for semi-supervised discriminative training of language models

eagle-i ID


Resource Type

  1. Software


  1. Related grant number
    NSF Award Abstract #0964102
  2. Resource Description
    "This project is conducting fundamental research in statistical language modeling to improve human language technologies, including automatic speech recognition (ASR) and machine translation (MT). A language model (LM) is conventionally optimized, using text in the target language, to assign high probability to well-formed sentences. This method has a fundamental shortcoming: the optimization does not explicitly target the kinds of distinctions necessary to accomplish the task at hand, such as discriminating (for ASR) between different words that are acoustically confusable or (for MT) between different target-language words that express the multiple meanings of a polysemous source-language word. Discriminative optimization of the LM, which would overcome this shortcoming, requires large quantities of paired input-output sequences: speech and its reference transcription for ASR or source-language (e.g. Chinese) sentences and their translations into the target language (say, English) for MT. Such resources are expensive, and limit the efficacy of discriminative training methods. In a radical departure from convention, this project is investigating discriminative training using easily available, *unpaired* input and output sequences: un-transcribed speech or monolingual source-language text and unpaired target-language text. Two key ideas are being pursued: (i) unlabeled input sequences (e.g. speech or Chinese text) are processed to learn likely confusions encountered by the ASR or MT system; (ii) unpaired output sequences (English text) are leveraged to discriminate between these well-formed sentences from the (supposed) ill-formed sentences the system could potentially confuse them with. This self-supervised discriminative training, if successful, will advance machine intelligence in fundamental ways that impact many other applications."
  3. Used by
    Center for Spoken Language Understanding
  4. Website(s)
  5. Website(s)
  6. Developed by
    Roark, Brian E., Ph.D.
  7. Developed by
    Shafran, Izhak, Ph.D.
Provenance Metadata About This Resource Record

Copyright © 2016 by the President and Fellows of Harvard College
The eagle-i Consortium is supported by NIH Grant #5U24RR029825-02 / Copyright 2016