Talks.cam will close on 1 July 2026, further information is available on the UIS Help Site
 

University of Cambridge > Talks.cam > Technical Talks - Department of Computer Science and Technology  > Perplexity AI: Under the Hood of LLM Inference

Perplexity AI: Under the Hood of LLM Inference

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Ben Karniely.

Abstract: Perplexity is a search and answer engine which leverages LLMs to provide high-quality citation-backed answers. The AI Inference team within the company is responsible for serving the models behind the product, ranging from single-GPU embedding models to multi-node sparse Mixture-of-Experts language models. This talk provides more insight into the in-house runtime behind inference at Perplexity, with a particular focus on efficiently serving some of the largest available open-source models.

Biography:Nandor Licker is an AI Inference Engineer at Perplexity, focusing on LLM runtime implementation and GPU performance optimization.

Register for the talk at the following link: https://luma.com/dx1ggxgk

Some catering will be provided after the talk.

This talk is part of the Technical Talks - Department of Computer Science and Technology series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2025 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity