COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
Interspeech practice sessionAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Rogier van Dalen. 12:00 – 12:40 Oral presentations
12:40 – 13:30 Posters and sandwiches:
Abstracts Shakti P. Rath, Daniel Povey, Karel Vesely, Jan Cernocky Improved Feature Processing for Deep Neural Networks In this paper, we investigate alternative ways of processing MFCC -based features to use as the input to Deep Neural Networks (DNNs). Our baseline is a conventional feature pipeline that involves splicing the 13-dimensional front-end MFC Cs across 9 frames, followed by applying LDA to reduce the dimension to 40 and then further decorrelation using MLLT . Confirming the results of other groups, we show that speaker adaptation applied on the top of these features using feature-space MLLR is helpful. The fact that the number of parameters of a DNN is not strongly sensitive to the input feature dimension (unlike GMM -based systems) motivated us to investigate ways to increase the dimension of the features. In this paper, we investigate several approaches to derive higher-dimensional features and verify their performance with DNN . Our best result is obtained from splicing our baseline 40-dimensional speaker adapted features again across 9 frames, followed by reducing the dimension to 200 or 300 using another LDA . Our final result is about 3% absolute better than our best GMM system, which is a discriminatively trained model. Yongqiang Wang and Mark Gales An Explicit Independence Constraint for Factorised Adaptation in Speech Recognition Speech signals are usually affected by multiple acoustic factors, such as speaker characteristics and environment differences. Usually, the combined effect of these factors is modelled by a single transform. Acoustic factorisation splits the transform into several factor transforms, each modelling only one factor. This allows, for example, estimating a speaker transform in a noise condition and applying the same speaker transform in a different noise condition. To achieve this factorisation, it is crucial to keep factor transforms independent of each other. Previous work on acoustic factorisation relies on using different forms of factor transforms and/or the attribute of the data to enforce this independence. In this work, the independence is formulated in mathematically, and an explicit constraint is derived to enforce the independence. Using factorised cluster adaptive training (fCAT) as an application, experimental results demonstrates that the proposed explicit independence constraint helps factorisation when imbalanced adaptation data is used. Y. Long, M.J.F. Gales, P. Lanchantin, X. Liu, M.S. Seigel, P.C. Woodland Improving Lightly Supervised Training for Broadcast Transcription This paper investigates improving lightly supervised acoustic model training for an archive of broadcast data. Standard lightly supervised training uses automatically derived decoding hypotheses using a biased language model. However, as the actual speech can deviate significantly from the original programme scripts that are supplied, the quality of standard lightly supervised hypotheses can be poor. To address this issue, word and segment level combination approaches are used between the lightly supervised transcripts and the original programme scripts which yield improved transcriptions. Experimental results show that systems trained using these improved transcriptions consistently outperform those trained using only the original lightly supervised decoding hypotheses. This is shown to be the case for both the maximum likelihood and minimum phone error trained systems. Jingzhou Yang, Rogier van Dalen, Mark Gales Infinite Support Vector Machines in Speech Recognition Generative feature spaces provide an elegant way to apply discriminative models in speech recognition, and system performance has been improved by adapting this framework. However, the classes in the feature space may be not linearly separable. Applying a linear classifier then limits performance. Instead of a single classifier, this paper applies a mixture of experts. This model trains different classifiers as experts focusing on different regions of the feature space. However, the number of experts is not known in advance. This problem can be bypassed by employing a Bayesian non-parametric model. In this paper, a specific mixture of experts based on the Dirichlet process, namely the infinite support vector machine, is studied. Experiments conducted on the noise-corrupted continuous digit task AURORA 2 show the advantages of this Bayesian non-parametric approach. This talk is part of the CUED Speech Group Seminars series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsThe obesity epidemic: Discussing the global health crisis Networking Event - Cambridge Social Ventures Statistical Software Training Wall Street meets Lincoln's Inn! Milcho Manchevski in CambridgeOther talksAutumn Cactus & Succulent Show Intelligent Self-Driving Vehicles My ceramic practice, and Moon Jars for the 21st century Colorectal cancer. Part 1. Presentation, Diagnosis and Intervention. Part 2. Cellular signalling networks in colon cancer and the models to study them - a basic research perspective Breckland, birds and conservation Handbuchwissenschaft, or: how big books maintain knowledge in the twentieth-century life sciences Coin Betting for Backprop without Learning Rates and More Discovering regulators of insulin output with flies and human islets: implications for diabetes and pancreas cancer TBC The Partition of India and Migration Art and Migration Can land rights prevent deforestation? Evidence from a large-scale titling policy in the Brazilian Amazon. |