Talks.cam will close on 1 July 2026, further information is available on the UIS Help Site
 

University of Cambridge > Talks.cam > NLIP Seminar Series > Evaluation with LLMs - Theoretical and Practical insights

Evaluation with LLMs - Theoretical and Practical insights

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Suchir Salhan.

Abstract: As large language models (LLMs) continue to evolve, the task of assessing their performance becomes increasingly crucial and complex, and LLMs are being used to evaluate the quality of other models. In this talk, I will explore LLM -as-a-Judge, combining theoretical foundations with practical insights from the industry. Topics include benchmark design, pre-LLM metrics, common pitfalls illustrated with real examples, methods for automatic tuning of evaluation metrics, and the industry-academy gaps. I will conclude with a vision for the future of robust and meaningful LLM assessment.

Bio: Dr. Eyal Kolman is a Senior Researcher at Microsoft and an adjunct lecturer at Tel Aviv University and Bar-Ilan University, where he teaches courses in Deep Learning. He holds a Ph.D. in Electrical Engineering from Tel Aviv University and has over 25 years of experience in machine learning and artificial intelligence. His work spans evaluation methodologies, applied AI systems, and large-scale learning models. Dr. Kolman has authored numerous research papers, holds dozens of patents, and is the author of Knowledge‑Based Neurocomputing: A Fuzzy Logic Approach.

This talk is part of the NLIP Seminar Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2025 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity