University of Cambridge > Talks.cam > Energy and Environment Group, Department of CST > Conservation Evidence

Conservation Evidence

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact lyr24.

Abstract:

Grey literature’s inherent nature means that it is a difficult form of media to discover, typically being hidden deep within websites, analyse, following no standard file formats or structures, and process, due to the sheer volume of existing and actively produced literature, this forms a massive cost and time problem for organisations that require such literature in their function.

We devise and implement a pipeline that uses Common Crawl internet archives to locate & scrape potential grey literature; then process it for use in a multistage machine learning pipeline to classify and output relevant media.

Bios:

Radhika Iyer is a second-year Computer Science Student at Murray Edwards College.

Kacper Michalik is a Second-year Computer Science Student at Pembroke College.

This talk is part of the Energy and Environment Group, Department of CST series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity