COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > NLIP Seminar Series > Data Mining and Information Extraction for CiteSeerX and Friends
Data Mining and Information Extraction for CiteSeerX and FriendsAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Ekaterina Kochmar. Cyberinfrastructure or e-science has become crucial in many areas of science where data access often defines scientific progress. Open source (OS) systems have greatly facilitated design and implementation and supporting cyberinfrastructure permitting the design of specialized integrated search engines and digital libraries which offer many opportunities for domain relevant information and knowledge extraction, such as citation extraction, automated indexing and ranking, chemical formulae search, table indexing, etc. We describe the open source SeerSuite architecture which is a modular, extensible system built on successful OS projects such as Lucene/Solr and discuss issues in building domain specific enterprise search and cyberinfrastructure for the sciences and academia. Because of the large amount of information crawled and/or search there are many scale problems in information extraction and data mining such as author and entity disambiguation, data extraction and ranking, etc. We highlight application domains with examples from computer science, CiteSeerX, and chemistry, ChemXSeer and related problem areas. Because such enterprise systems require unique information extraction approaches, several different machine learning methods, such as conditional random fields, support vector machines, mutual information based feature selection, sequence mining, etc. are critical for performance. We draw lessons for other e-science and cyberinfrastructure systems in terms of design, implementation and research and discuss future directions, systems and research. This talk is part of the NLIP Seminar Series series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsDepartment of Middle Eastern Studies Seminar Series Cancer Research UK Cambridge Institute - Invited speakers Centre for Smart Infrastructure & ConstructionOther talks100 Problems around Scalar Curvature A passion for pottery: a photographer’s dream job Reframing African Studies through Languages and Translation: Overcoming Barricades to Knowledge and Knowledge Management CANCELLED: The Impact of New Technology on Transport Planning Beyond truth-as-correspondence: realism for realistic people The role of myosin VI in connexin 43 gap junction accretion Cambridge-Lausanne Workshop 2018 - Day 1 Stereodivergent Catalysis, Strategies and Tactics Towards Secondary Metabolites as enabling tools for the Study of Natural Products Biology Protein Folding, Evolution and Interactions Symposium Auxin and cytokinin regulation of root architecture - antagonism or synergy |