University of Cambridge > > CUED Speech Group Seminars > Kōrero Māori - indigenous language revitalisation powered by machine learning

Kōrero Māori - indigenous language revitalisation powered by machine learning

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Anton Ragni.

Te Reo Irirangi o Te Hiku o Te Ika (Te Hiku Media) is a non-profit organisation whose mission is to preserve and promote te reo Māori, the indigenous language of New Zealand. Over the past 30 years we’ve recorded thousands of hours of the stories of our people, most of whom were native speakers. These stories are rich in culture and traditional knowledge around science, the environment, and traditional Māori medicine. Today, we operate in digital industries creating technology to help document, conserve, and share the language and knowledge in novel ways. Central to the development of technology and the collection of data is the formalisation of our cultural practices into our Kaitiakitanga License (1). The license outlines the way that people are able to access data gathered and acknowledges the value of open source technologies but recognises the impact of colonisation on indigenous peoples’ ability to access those technologies. This discussion will provide insight into the Kōrero Māori (2) project and its progress to date in creating speech to text, text to speech, and pronunciation tools. We demonstrate how innovation in language revitalisation succeeds when an indigenous organization leads the corpus collection and technology development. We collected more than 300 hours of labeled corpus in ten days. This enabled the creation of an automatic speech recognition (ASR) tool for te reo Māori using Mozilla’s DeepSpeech (3) project with a word error rate of 14%. The ASR tool is being used to speed up the transcription of our native speaker archives (4).

(1) (2) (3) (4)

This talk is part of the CUED Speech Group Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2024, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity