COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > Machine Intelligence Laboratory Speech Seminars > A Comparison of VTLN and Gender-Dependent Models
A Comparison of VTLN and Gender-Dependent ModelsAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Dr Marcus Tomalin. After an introduction of Multimodal Technologies, Inc, Pittsburgh, PA (MModal), l describe the current challenges in dictation based health care documentation. This will be followed by an overview of MModal’s contribution in this space: a unique blend of speech recognition and natural language processing technologies for turning conversational dictations of clinical encounters into structured and encoded clinical documents. Using a centralized, hosted architecture based on a web services infrastructure, allows us to collect vast amounts of audio and proof-read textual data, enabling us to make use of highly speaker-specific models. Rapid adaptation to new speakers with minimal or no impact on physicians’ workflow is an important aspect which affects the acceptability of the solution. One difference between speakers is the variation in the length of the vocal tract. It is well established that this can be partially compensated for with gender-dependent or vocal-tract-normalized acoustic models. I will present several ways of building gender- dependent models by splitting the database along the gender or the usage of a gender question in the context cluster tree. This is then compared with Vocal Tract Length Normalized (VTLN) acoustic models using data from a Radiology reporting domain. Although gender dependent models result in considerable gains they did not outperform VTLN . From a business point of view scalability is an important issue and in addition to better performance practical constraints are also in favor of VTLN . For example it is possible to estimate the VTLN based on a simple Gaussian Mixture Model during frontend processing allowing a single-pass decoding, and still be able to adapt quickly if unexpected speaker change occurs. I will end the presentation with a selection of research topics that arise from running an automatic transcription service. This talk is part of the Machine Intelligence Laboratory Speech Seminars series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsDenise Schofield Monday Mechanics Seminars (DAMTP) Research Seminars - Department of Biochemistry 2008/09Other talksNetworks, resilience and complexity CANCELLED-Open tools in Marchantia for plant bioengineering work and as a platform for elucidating morphogenesis Tying Knots in Wavefunctions Short-Selling Restrictions and Returns: a Natural Experiment Cosmology and Astrophysics from CMB Measurements Active bacterial suspensions: from individual effort to team work The Gopakumar-Vafa conjecture for symplectic manifolds Single Cell Seminars (August) A rose by any other name Market Socialism and Community Rating in Health Insurance Summer Cactus & Succulent Show |