University of Cambridge > > CUED Speech Group Seminars > End-to-end contextual speech recognition with Tree-constrained pointer generator

End-to-end contextual speech recognition with Tree-constrained pointer generator

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Simon Webster McKnight.

Contextual knowledge is of vital importance to end-to-end automatic speech recognition (ASR) systems, especially for the long-tailed word problem where systems suffer from degraded performance on rare or unseen words that are both relevant to the context and carrying important information. Integrating such contextual knowledge into such end-to-end systems is both necessary and challenging, as contextual knowledge is always dynamically changing while neural systems adopt a static set of trained parameters. In ASR , dynamic contextual knowledge is often incorporated via contextual biasing, where a list of rare words or phrases that are likely to appear in a given context is included, denoted as a biasing list of biasing words. A word is more likely to be correctly recognised if it is incorporated into the biasing list. This talk introduces tree-constrained pointer generator (TCPGen) as an effective neural-based biasing component for end-to-end contextual ASR . TCPGen effectively integrate contextual knowledge via a pointer generator mechanism, and efficiently structures biasing lists into prefix-trees. This talk includes the detailed TCP Gen approach, the use of graph neural networks for tree encodings and its application to Whisper models.

This talk is part of the CUED Speech Group Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2024, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity