COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > Computer Laboratory Systems Research Group Seminar > A Scalable Approach for Managing Unstructured Information
A Scalable Approach for Managing Unstructured InformationAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Eiko Yoneki. Digital data is being generated in mind-boggling amounts: 15 petabytes—more than 8X the information contained in all US libraries—is created daily. The data landscape is shifting—in addition to structured data in databases, organizations are increasingly dealing with unstructured data such as email, documents, spreadsheets, blogs, Web pages and media files. Unstructured information comprises 80% of most organizations’ information today, and it is growing at an annual rate of 60%. Users are demanding increasing sophistication in the level of information processing that storage and information management systems provide. In addition to the traditional challenges of storing the bytes and searching and classifying the content, they need to leverage their information to provide relevant and timely insights that improve the outcomes of the tasks that they undertake. In this talk, I will describe recent work at HP Labs on unstructured information management, including SCAN -lite, an extensible framework for gathering structured metadata from unstructured documents, and LazyBase, a scalable database system for ingesting, storing and querying the resulting metadata. Leveraging the high degree of replication present in the enterprise, SCAN -lite uses a two-phase scanning policy (e.g., an initial phase to identify duplicate content and a second phase to do more complicated analysis) that considers client priority classes and idle time to minimize the impact on client foreground workloads. LazyBase is a scalable NoSQL database system that provides extremely high ingest rates, a strong consistency model (as contrasted with eventual consistency), and an explicit per-query tradeoff between freshness and query speed. Bio: Dr. Kimberly Keeton is a Principal Researcher in the Storage and Information Management Platform group at HP Labs in Palo Alto, CA, USA . Her research focuses on simplifying the management of enterprise information systems, including system design and implementation, modeling, and optimization techniques to automatically design systems to meet users’ (e.g., dependability or information quality) goals. This talk is part of the Computer Laboratory Systems Research Group Seminar series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsBrain Mapping Unit Networks Meeting and the Cambridge Connectome Consortium Electron Microscopy Group Conferences Faith & BeliefOther talksA polyfold lab report CANCELLED IN SYMPATHY WITH STRIKE Nationality, Alienage and Early International Rights Loss and damage: Insights from the front lines in Bangladesh Climate and Sustainable Development Finance for Industrial Sustainability in Developing Countries Small Opuntioideae The cardinal points and the structure of geographical knowledge in the early twelfth century Symplectic topology of K3 surfaces via mirror symmetry Single Cell Seminars (August) Validation & testing of novel therapeutic targets to treat osteosarcoma Statistical Methods in Pre- and Clinical Drug Development: Tumour Growth-Inhibition Model Example The role of myosin VI in connexin 43 gap junction accretion MRI in large animals: a new imaging model |