Perplexity AI: Under the Hood of LLM Inference
Add to your list(s)
Download to your calendar using vCal
If you have a question about this talk, please contact Ben Karniely.
Abstract: Perplexity is a search and answer engine which leverages LLMs to provide high-quality citation-backed answers.
The AI Inference team within the company is responsible for serving the models behind the product, ranging from single-GPU embedding models to multi-node sparse Mixture-of-Experts language models.
This talk provides more insight into the in-house runtime behind inference at Perplexity, with a particular focus on efficiently serving some of the largest available open-source models.
Biography:Nandor Licker is an AI Inference Engineer at Perplexity, focusing on LLM runtime implementation and GPU performance optimization.
Register for the talk at the following link: https://luma.com/dx1ggxgk
Some catering will be provided after the talk.
This talk is part of the Technical Talks - Department of Computer Science and Technology series.
This talk is included in these lists:
Note that ex-directory lists are not shown.
|