COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > NLIP Seminar Series > A Graph-Based Framework for Structured Prediction Tasks in Sanskrit
A Graph-Based Framework for Structured Prediction Tasks in SanskritAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact James Thorne. Note unusual time Join Zoom Meeting https://cl-cam-ac-uk.zoom.us/j/96198886046?pwd=Ui8rRG1UTkZtdVQyZSswcWN6T0hVUT09 Meeting ID: 961 9888 6046 Passcode: 695236 We propose a framework using Energy-Based Models for multiple structured prediction tasks in Morphologically rich free-word order languages, with a focus Sanskrit. Ours is an arc-factored model, similar to the graph-based parsing approaches, and we consider the tasks of word-segmentation, morphological parsing, dependency parsing, syntactic linearisation and prosodification, a prosody level task we introduce in this work. Ours is a search based structured prediction framework, which expects a graph as input, where relevant linguistic information is encoded in the nodes, and the edges are then used to indicate the association between these nodes. Typically the state of the art models for morphosyntactic tasks in morphologically rich languages still rely on hand-crafted features for their performance. But here, we automate the learning of the feature function. The feature function so learnt along with the search space we construct, encode relevant linguistic information for the tasks we consider. This enables us to substantially reduce the training data requirements to as low as 10 \% as compared to the data requirements for the neural state of the art models. Our experiments in Czech and Sanskrit show the language agnostic nature of the framework, where we train highly competitive models for both the languages. Moreover, our framework enables to incorporate language specific constraints to prune the search space and to filter the candidates during inference. We obtain significant improvements in morphosyntactic tasks for Sanskrit by incorporating language specific constraints into the model. In all the tasks we discuss for Sanskrit, we either achieve state of the art results or ours is the only data driven solution for those tasks. This talk is part of the NLIP Seminar Series series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsInstitute for Energy and Environmental Flows (IEEF) Imagine2027 The Leadership Masterclass seriesOther talksModelling the evolution of developmental processes A New Approach for 3D hydrogel printing; and a Study of Single-Cell Microfibre Interaction Dynamics Dracula, Vampires and the New Woman Towards Profitable Sustainability via Electrochemistry of Liquid Metals and Molten Salt Epigenetic Inheritance: What Is It and Why Is It Important? Research Ecosystems, Cognitive Bias and Incentives |