University of Cambridge > Talks.cam > Machine Learning Reading Group @ CUED > Benchmarking and evaluation in contemporary machine learning

Benchmarking and evaluation in contemporary machine learning

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Elre Oldewage.

Abstract: Machine learning is primarily considered an empirical field, replying on experiments to compare methods and measure progress. These experiments often take the form of “benchmarks” with a standardized setup and set of evaluation criteria. In this reading group we will discuss the advantages and disadvantages of this approach, drawing largely from material from 3 papers (see below). These papers all describe different undesirable aspects of the interplay between benchmarks and the machine learning community, particularly how benchmarks may not reward ideas according to their “true” underlying potential. This calls for more care and thought when evaluating or judging any work based on the presented evidence in terms of benchmark results, especially during the peer-review process.

This reading group session will be a discussion (not a presentation) on benchmarking and evaluation in machine learning, drawing on content from 3 papers. While we encourage everybody to read all 3 papers (it should take under 2 hours), we have picked out the most important subsections of the different papers to make < 10 pages of light required reading (no math). Please do the reading before the reading group: the discussion will be much better if everybody is familiar with the key ideas of these papers. We’ve also shortlisted some “bonus” parts of the papers which are recommended but not required.

The discussion will be hybrid, but the audio quality in the CBL seminar group can sometimes be low, so be warned that if you join via Zoom it may be hard to participate fully in the discussion.

Reading:

1. Testing heuristics: We have it all wrong (https://link.springer.com/article/10.1007/BF02430364)
  • Required: [beginning, section 2). 3 pages
  • Bonus: Section 4
2. The Benchmark Lottery (http://arxiv.org/abs/2107.07002)
  • Required: sections 1, [2, 2.1), [4, 4.1), 5, [6, 6.1). 5 pages
  • Bonus: section 7
3. The hardware lottery: http://arxiv.org/abs/2009.06489
  • Required: abstract
  • Bonus: sections [1, 3.1)

Where [A, B) means read from A until start of B (i.e. excluding B)

This talk is part of the Machine Learning Reading Group @ CUED series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity