![]() |
University of Cambridge > Talks.cam > NLIP Seminar Series > Evaluation with LLMs - Theoretical and Practical insights
Evaluation with LLMs - Theoretical and Practical insightsAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Suchir Salhan. Abstract: As large language models (LLMs) continue to evolve, the task of assessing their performance becomes increasingly crucial and complex, and LLMs are being used to evaluate the quality of other models. In this talk, I will explore LLM -as-a-Judge, combining theoretical foundations with practical insights from the industry. Topics include benchmark design, pre-LLM metrics, common pitfalls illustrated with real examples, methods for automatic tuning of evaluation metrics, and the industry-academy gaps. I will conclude with a vision for the future of robust and meaningful LLM assessment. Bio: Dr. Eyal Kolman is a Senior Researcher at Microsoft and an adjunct lecturer at Tel Aviv University and Bar-Ilan University, where he teaches courses in Deep Learning. He holds a Ph.D. in Electrical Engineering from Tel Aviv University and has over 25 years of experience in machine learning and artificial intelligence. His work spans evaluation methodologies, applied AI systems, and large-scale learning models. Dr. Kolman has authored numerous research papers, holds dozens of patents, and is the author of Knowledge‑Based Neurocomputing: A Fuzzy Logic Approach. This talk is part of the NLIP Seminar Series series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsArts and Creativities Research Group Ecology Lunchtime Series Cambridge Central Asia ForumOther talksMicroscopic Dynamical Entropy: Second Law from Hamiltonian Dynamics Geometric Deep Learning of Disordered Network Rheology Topological median structures on R^n Generative AI and Diffusion Models: a Statistical Physics Analysis Uncovering Genomic Drivers Across 13 Feline Cancer Types Kernels Simplify Differential Equations |