Towards Weaker Supervision and Simpler Pipelines in Speech Recognition
Add to your list(s)
Download to your calendar using vCal
If you have a question about this talk, please contact Louise Segar.
raditionally, speech recognition required feature engineering, fine grained transcription, a phonetics step, and glue all of these with long pipelines. The progress in computational power and datasets size enabled the success of a less rigid class of deep learning models. We will present our recent works towards coarser annotation: either training acoustic models directly on pairs of same or different words (no class information nor phonetic information) with siamese neural networks, or training on sentences annotated as bag of words with large convolutional neural networks and temporal pooling. We will also show promising results on unsupervised acoustic model training. Finally, we will present results towards training directly from the raw waveform to the graphemes with an efficient sequence-based loss. This will span joint work with Neil Zeghidour, Ronan Collobert, Dimitri Palaz, Nicolas Usunier, Christian Puhrsch, and Emmanuel Dupoux.
This talk is part of the Machine Learning @ CUED series.
This talk is included in these lists:
Note that ex-directory lists are not shown.
|