Investigation of multilingual speech-to-text systems for use in spoken term detection
The development of high-performance speech processing systems for low-resource languages is a challenging research area. One approach to address the lack of resources is to make use of data from multiple languages. A popular direction in recent years is to use bottleneck features or hybrid systems trained on multilingual data for speech-to-text (STT) systems. This talk presents an overview of these approaches for STT , and their performance for both speech recognition and spoken term detection. Experiments will be presented based on the IARPA Babel limited language pack corpora (10 hours/language) using 7 languages for multilingual system development and 3 held-out target languages.
