![]() |
COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. | ![]() |
University of Cambridge > Talks.cam > Computer Laboratory Systems Research Group Seminar > Adaptive Resource Allocation for Low-Latency LLM Serving in Dynamic Environments
Adaptive Resource Allocation for Low-Latency LLM Serving in Dynamic EnvironmentsAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Srinivasan Keshav. This talk has been canceled/deleted Large language models (LLMs) face significant challenges in achieving low-latency inference. Techniques such as speculative decoding and chunked prefill can help reduce latency, but their effectiveness depends heavily on algorithmic parameters that are sensitive to fluctuating system conditions. As a result, static parameter settings often lead to suboptimal performance under dynamic workloads. To address this issue, we propose dynamic parameter optimization methods that adapt to evolving environments to maximize performance. In this talk, we present the technical details of these methods along with initial evaluation results. —- Masayuki Usui received his bachelor’s and master’s degrees in computer science from the University of Tokyo, Japan. He is currently pursuing a Ph.D. degree at the University of Tokyo. His research interests include LLM inference serving and computer architecture. Shinya Takamaeda-Yamazaki received his B.E., M.E., and D.E. degrees from the Tokyo Institute of Technology, Japan, in 2009, 2011, and 2014, respectively. Since 2019, he has been an Associate Professor at the University of Tokyo, Japan. In 2025, he also became a Team Leader at RIKEN AIP , Japan. His research interests include computer architecture, hardware design technologies, and machine learning systems. This talk is part of the Computer Laboratory Systems Research Group Seminar series. This talk is included in these lists:This talk is not included in any other list Note that ex-directory lists are not shown. |
Other listsConference "Continuity and Change in Grammar" Breaking Bread: What's wrong with wheat? The Cambridge Modern and Contemproary Art Seminar SeriesOther talksCambridge Head and Neck Cancer Symposium, 25th Sep 2025 EDI Event and Networking Reception Kirk Public Lecture: Topological Quantum Field Theory: From Physics to Mathematics and back TBC Physics informed Gaussian process priors Repeated Measures and Mixed Model ANOVA |