University of Cambridge > Talks.cam > CUED Speech Group Seminars > Very deep convolutional neural networks for speech recognition

Very deep convolutional neural networks for speech recognition

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Anton Ragni.

Convolutional Neural Networks are one of the main drivers of the recent deep learning explosion, with the “Alexnet” (2012) result on the imagenet competition, and consecutive models like Overfeat (2013), VGG net (2014), GoogLeNet (2014), and residual networks (2015). In the speech recognition domain, CNNs with 2 convolutional layers were introduced around 2012 and have not seen major updates since. We will present a number of recent architectural advances in CNNs for speech recognition. We introduce a very deep convolutional network architecture with up to 14 weight layers. There are multiple convolutional layers before each pooling layer, with small 3×3 kernels, inspired by the VGG Imagenet 2014 architecture. We will discuss the design choice of strided pooling and zero-padding along the time direction, which renders convolutional evaluation of sequences highly inefficient. This can be phrased in the computer vision terminology of classification vs dense pixelwise prediction. We define the architectural constraints to make efficient evaluation of full utterances possible. This allows batch normalization to be adopted during full-utterance sequence training, resulting in faster training and improved performance. We show state of the art results on the benchmark switchboard 2000 hour dataset (Hub5 eval). We also adapted our architecture to the multilingual setting and got strong results on the babel OP3 surprise language after multilingual training on 25 languages.

This talk is part of the CUED Speech Group Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity