Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Very deep convolutional neural networks for speech recognition

Add to your list(s) Download to your calendar using vCal

Tom Sercu, IBM Watson, USA
Friday 19 August 2016, 12:00-13:00
Department of Engineering - LR5.

If you have a question about this talk, please contact Anton Ragni.

Convolutional Neural Networks are one of the main drivers of the recent deep learning explosion, with the “Alexnet” (2012) result on the imagenet competition, and consecutive models like Overfeat (2013), VGG net (2014), GoogLeNet (2014), and residual networks (2015). In the speech recognition domain, CNNs with 2 convolutional layers were introduced around 2012 and have not seen major updates since. We will present a number of recent architectural advances in CNNs for speech recognition. We introduce a very deep convolutional network architecture with up to 14 weight layers. There are multiple convolutional layers before each pooling layer, with small 3×3 kernels, inspired by the VGG Imagenet 2014 architecture. We will discuss the design choice of strided pooling and zero-padding along the time direction, which renders convolutional evaluation of sequences highly inefficient. This can be phrased in the computer vision terminology of classification vs dense pixelwise prediction. We define the architectural constraints to make efficient evaluation of full utterances possible. This allows batch normalization to be adopted during full-utterance sequence training, resulting in faster training and improved performance. We show state of the art results on the benchmark switchboard 2000 hour dataset (Hub5 eval). We also adapted our architecture to the multilingual setting and got strong results on the babel OP3 surprise language after multilingual training on 25 languages.

This talk is part of the CUED Speech Group Seminars series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Very deep convolutional neural networks for speech recognition

This talk is included in these lists:

Other lists

Other talks