Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Benchmarking and evaluation in contemporary machine learning

Add to your list(s) Download to your calendar using vCal

Austin Tripp and Shoaib Siddiqui, University of Cambridge
Wednesday 26 October 2022, 11:00-12:30
Cambridge University Engineering Department, CBL Seminar room BE4-38.

If you have a question about this talk, please contact Elre Oldewage.

Abstract: Machine learning is primarily considered an empirical field, replying on experiments to compare methods and measure progress. These experiments often take the form of “benchmarks” with a standardized setup and set of evaluation criteria. In this reading group we will discuss the advantages and disadvantages of this approach, drawing largely from material from 3 papers (see below). These papers all describe different undesirable aspects of the interplay between benchmarks and the machine learning community, particularly how benchmarks may not reward ideas according to their “true” underlying potential. This calls for more care and thought when evaluating or judging any work based on the presented evidence in terms of benchmark results, especially during the peer-review process.

This reading group session will be a discussion (not a presentation) on benchmarking and evaluation in machine learning, drawing on content from 3 papers. While we encourage everybody to read all 3 papers (it should take under 2 hours), we have picked out the most important subsections of the different papers to make < 10 pages of light required reading (no math). Please do the reading before the reading group: the discussion will be much better if everybody is familiar with the key ideas of these papers. We’ve also shortlisted some “bonus” parts of the papers which are recommended but not required.

The discussion will be hybrid, but the audio quality in the CBL seminar group can sometimes be low, so be warned that if you join via Zoom it may be hard to participate fully in the discussion.

Reading:

1. Testing heuristics: We have it all wrong (https://link.springer.com/article/10.1007/BF02430364)

Required: [beginning, section 2). _{3 pages}
Bonus: Section 4

2. The Benchmark Lottery (http://arxiv.org/abs/2107.07002)

Required: sections 1, [2, 2.1), [4, 4.1), 5, [6, 6.1). 5 pages
Bonus: section 7

3. The hardware lottery: http://arxiv.org/abs/2009.06489

Required: abstract
Bonus: sections [1, 3.1)

Where [A, B) means read from A until start of B (i.e. excluding B)

This talk is part of the Machine Learning Reading Group @ CUED series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Benchmarking and evaluation in contemporary machine learning

This talk is included in these lists:

Other lists

Other talks