COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > CUED Speech Group Seminars > Kōrero Māori - indigenous language revitalisation powered by machine learning
Kōrero Māori - indigenous language revitalisation powered by machine learningAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Anton Ragni. Te Reo Irirangi o Te Hiku o Te Ika (Te Hiku Media) is a non-profit organisation whose mission is to preserve and promote te reo Māori, the indigenous language of New Zealand. Over the past 30 years we’ve recorded thousands of hours of the stories of our people, most of whom were native speakers. These stories are rich in culture and traditional knowledge around science, the environment, and traditional Māori medicine. Today, we operate in digital industries creating technology to help document, conserve, and share the language and knowledge in novel ways. Central to the development of technology and the collection of data is the formalisation of our cultural practices into our Kaitiakitanga License (1). The license outlines the way that people are able to access data gathered and acknowledges the value of open source technologies but recognises the impact of colonisation on indigenous peoples’ ability to access those technologies. This discussion will provide insight into the Kōrero Māori (2) project and its progress to date in creating speech to text, text to speech, and pronunciation tools. We demonstrate how innovation in language revitalisation succeeds when an indigenous organization leads the corpus collection and technology development. We collected more than 300 hours of labeled corpus in ten days. This enabled the creation of an automatic speech recognition (ASR) tool for te reo Māori using Mozilla’s DeepSpeech (3) project with a word error rate of 14%. The ASR tool is being used to speed up the transcription of our native speaker archives (4). (1) https://github.com/tehikumedia/corpora#license-kaitiakitanga (2) https://koreromaori.com (3) https://github.com/mozilla/DeepSpeech (4) https://koreromaori.io This talk is part of the CUED Speech Group Seminars series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsZoology postdoc summer seminar series Cambridge Interdisciplinary Reproduction Forum New Directions in the Study of the MindOther talksStatistics Clinic Michaelmas 2018 - III Patagonian Afrikaans: Historical and Contemporary Perspectives Obscenity and the Politics of Moral Regulation in China and Singapore, 1919-1937 New arylation strategies in synthesis’ 'Reform or Revolution', redux: Eduard Bernstein on the 1918-19 German Revolution |