University of Cambridge > > Machine Learning Reading Group @ CUED > Deep Structured Prediction for Handwriting Recognition

Deep Structured Prediction for Handwriting Recognition

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Alessandro Davide Ialongo.

Work in collaboration with P. M. Olmos and J.C.A. Jaramillo


Structured prediction or structured (output) learning is an umbrella term for supervised machine learning techniques that involve predicting structured objects, rather than scalar discrete or real values. Application domains include bioinformatics, natural language processing, speech recognition, and computer vision. While convolutional nets are quite useful in this task, when looking for long term dependencies in the input, recurrent neural networks (RNN) are preferred. However, the problem of vanishing/exploding gradients prevents the use of simple RNNs and long short-term memory (LSTM) networks are widely adopted. In this talk we focus on the handwritten text recognition problem. We review LSTM as the state-of-the-art solution within deep learning structures. Then, to cope with the problem of (letters) labelling in an image we explain the connectionist temporal classification (CTC), a useful tool whose cost function can be incorporated in the network to allow for gradient computations. We discuss some results obtained in an attempt to put this theory to work with TensorFlow.

Reading list:

No previous reading is really needed beyond general concepts of deep neural networks.

Useful references:

- I. Goodfellow, Y. Bengio, A. Courville, “Deep Learning”. MIT Press 2016, chapters 6 to 9 for concepts on deep learning, chapter 10 in particular for recurrent networks (but see below for LSTM ).

- M. Görner, Tensor Flow and Deep Learning without a PhD,, quick Review of Main concepts and examples on DL

- Z. C. Lipton, John Berkowitz, Charles Elkan, “A Critical Review of Recurrent Neural Networks for Sequence Learning”, 2015,

- K. Cho, “Natural Language Understanding with Distributed Representation”, 2016, for a detailed explanation of LST Ms

- C. Olah, “Understanding LSTM Networks” 2015, for a good explanation of LST Ms

- A. Graves, “Supervised sequence labelling with recurrent neural networks” Ph. D. Thesis 2008, for an explanation of the CTC

This talk is part of the Machine Learning Reading Group @ CUED series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2024, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity