COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > NLIP Seminar Series > Generating Natural-Language Video Descriptions using LSTM Recurrent Neural Networks
Generating Natural-Language Video Descriptions using LSTM Recurrent Neural NetworksAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Kris Cao. We present a method for automatically generating English sentences describing short videos using deep neural networks. Specifically, we apply convolutional and Long Short-Term Memory (LSTM) recurrent networks to translate videos to English descriptions using an encoder/decoder framework. A sequence of image frames (represented using deep visual features) is first mapped to a vector encoding the full video, and then this encoding is mapped to a sequence of words. We have also explored how statistical linguistic knowledge mined from large text corpora, specifically LSTM language models and lexical embeddings, can improve the descriptions. Experimental evaluation on a corpus of short YouTube videos and movie clips annotated by Descriptive Video Service demonstrate the capabilities of the technique by comparing its output to human-generated descriptions. This talk is part of the NLIP Seminar Series series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsDAMTP Jubilee Celebration MedSoc Cambridge ESRC DTP Annual Lecture Emmy Noether Society Category Theory Seminar Land Economy Departmental Seminar SeriesOther talksOverview of Research Process CANCELLED IN SYMPATHY WITH STRIKE Nationality, Alienage and Early International Rights What sort of challenge is climate change? Fifty years of editorialising in ‘Nature’ and ‘Science’ Mesembs - Actual and Digital Prices of peers: identifying endogenous price effects between real assets Modularity, criticality and evolvability of a developmental GRN |