University of Cambridge > > Wednesday Seminars - Department of Computer Science and Technology  > Digging into (Historical) Data: Tracking global commodity trading in the nineteenth century

Digging into (Historical) Data: Tracking global commodity trading in the nineteenth century

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact David Greaves.

Trade in the nineteenth century focused primarily on commodities, usually raw materials. The global economy expanded as Western nations colonized frontiers rich with natural resources and began to reshape the environment, introducing plants and animals into new ecosystems, and transporting natural resources back to consumer markets at home.

The talk will describe Trading Consequences, a collaborative project funded by JISC ’s Digging into Data programme that uses text mining to explore digitised historical documents related to commodity trading during the nineteenth century. Prior historical research into commodity flows has focused on a small handful of widely traded natural resources, and one of our goals is to build a more accurate picture of how many hundreds of different commodity types were transported across the world. The data that we extract from text is used to populate a relational database; via both a conventional query interface and an interactive visual interface, this database will allow historians to explore global trends of commodity trading at different times and at different locations while still being able to investigate mentions of individual commodities in context.

One of the main challenges faced in processing the historical text corpora available to us is the low quality of the Optical Character Recognition output, and the talk will describe some of our attempts to mitigate this problem. A second challenge is the paucity of resources which would allow us to recognise mentions of commodities in text. Our current approach takes as its starting point a list of commodities that we have manually extracted from nineteenth century British custom records. This list of terms is incorporated into a SKOS thesaurus; most of the terms can be linked to DBPedia concepts, which are already grouped into Wikipedia-derived page categories. The initial thesaurus is then expanded by querying DBpedia’s SPARQL endpoint for all instances of these page categories. We are currently evaluating the performance of the expanded thesaurus against a manually annotated portion of the historical text.

This talk is part of the Wednesday Seminars - Department of Computer Science and Technology series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2024, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity