COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > CUED Speech Group Seminars > General teacher-student learning for automatic speech recognition
General teacher-student learning for automatic speech recognitionAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Anton Ragni. Teacher-student learning is a general framework that can be used to transfer knowledge from one or more models to another. This has found various applications in the field of automatic speech recognition, to perform tasks such as compressing a large model or ensemble of models, and domain adaptation. In its standard form, teacher-student learning propagates information from one or more teacher models to a student model, by minimising the KL-divergence between their per-frame state-cluster posterior distributions, at the Neural Network (NN) outputs. This form of teacher-student learning is limited in two aspects. First, only frame-level posterior information is propagated from the teachers to the student. This form of information may not effectively capture the sequential nature of speech data, or the interactions between the acoustic, alignment, and language models. Second, all models are required to use the same set of state clusters. This in turn requires that all models must also use the same set of sub-word units, Hidden Markov Model (HMM) alignment model topology, context-dependency, and language model. Furthermore, all models are required to use the NN-HMM topology. This restricts the situations for which teacher-student learning may be applied. In particular, the allowed forms of diversity are limited within an ensemble that can be compressed using teacher-student learning. This talk presents several proposals to generalise the teacher-student learning framework to overcome these limitations. Different sets of state cluster can be allowed between the teacher and student models, by minimising the KL-divergence between per-frame logical context-dependent state posteriors. The sequential nature of speech data can be taken into account by using sequence-level criteria. These sequence-level criteria can potentially also remove all restrictions on the required topological similarities between the teacher and student models. This talk is part of the CUED Speech Group Seminars series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsEarthwatch Lecture Dr Mcmachon UK~IRC SummitOther talksNature’s engines – powering life ARE WE READY FOR CONNECTED AND AUTOMATED VEHICLES? MicroRNAs as circulating biomarkers in cancer The Z-Dirac and massive Laplacian operators in the Z-invariant Ising model Single cell seminar: August |