COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > NLIP Seminar Series > Knowledge Representation and Extraction at Scale
Knowledge Representation and Extraction at ScaleAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Andrew Caines. These days, most general knowledge question-answering systems rely on large-scale knowledge bases comprising billions of facts about millions of entities. Having a structured source of semantic knowledge means that we can answer questions involving single static facts (e.g. “Who was the 8th president of the US?”) or dynamically generated ones (e.g. “How old is Donald Trump?). More importantly, we can answer questions involving multiple inference steps (“Is the queen older than the president of the US?”). In this talk, I’m going to be discussing some of the unique challenges that are involved with building and maintaining a consistent knowledge base for Alexa, extending it with new facts and using it to serve answers in multiple languages. I will focus on three recent projects from our group. First, a way of measuring the completeness of a knowledge base, that is based on usage patterns. The definition of the usage of the KB is done in terms of the relation distribution of entities seen in question-answer logs. Instead of directly estimating the relation distribution of individual entities, it is generalized to the “class signature” of each entity. For example, users ask for baseball players’ height, age, and batting average, so a knowledge base is complete (with respect to baseball players) if every entity has facts for those three relations. Second, an investigation into fact extraction from unstructured text. I will present a method for creating distant (weak) supervision labels for training a large-scale relation extraction system. I will also discuss the effectiveness of neural network approaches by decoupling the model architecture from the feature design of a state-of-the-art neural network system. Surprisingly, a much simpler classifier trained on similar features performs on par with the highly complex neural network system (at 75x reduction to the training time), suggesting that the features are a bigger contributor to the final performance. Finally, I will present the Fact Extraction and VERification (FEVER) dataset and challenge. The dataset comprises more than 185,000 human-generated claims extracted from Wikipedia pages. False claims were generated by mutating true claims in a variety of ways, some of which were meaning-altering. During the verification step, annotators were required to label a claim for its validity and also supply full-sentence textual evidence from (potentially multiple) Wikipedia articles for the label. With FEVER , we aim to help create a new generation of transparent and interpretable knowledge extraction systems. This talk is part of the NLIP Seminar Series series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsRussia, Ukraine and the West: The geopolitical economy of contradictions Sequencing Workshop Institute SeminarOther talksANOVA of balanced multi-factorial designs: between subject designs, and single subject studies The thaw in the Pole: Cold War science and showcasing at the Siberian science-city and Antarctic expeditions (1955–1964) Cell cycle controls enforcing asymmetric spindle pole fate in budding yeast Generating human kidney tissue from pluripotent stem cells On discovery in catalysis |