University of Cambridge > Talks.cam > Artificial Intelligence Research Group Talks (Computer Laboratory) > Paying Attention to Efficiency: LLM Deployment on Mobile and Edge Devices

Paying Attention to Efficiency: LLM Deployment on Mobile and Edge Devices

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Mateja Jamnik.

Transformers have recently sparked significant interest in AI, driving advancements in accuracy and enabling a wide range of applications, from multi-modal intelligent assistants to autonomous systems. While their scaling laws promise even greater capabilities, the demands on hardware and data present significant challenges. In response, there is growing interest in compressing these models to smaller, more efficient forms, making them feasible for deployment with lower resource requirements. As edge and mobile devices are integrating increasingly powerful System-On-Chips (SoCs), deploying these models locally becomes viable, thus enabling new use-cases while enhancing privacy, sustainability and task-specific customization.

In this talk, I will be touching upon two areas: first, measuring the execution efficiency and deployability of Large Language Models (LLMs) on mobile and edge devices; and second optimising DNN workloads for efficiency through low-rank decompositions. I will introduce MELT (MobiCom’24), a benchmarking framework designed to assess the computational, memory, energy, and thermal characteristics of LLMs running on device, identifying associated bottlenecks. Following this, I will present Maestro (ICML’24), a novel approach leveraging trainable low-rank decompositions to enable more efficient training and deployment of DNNs, enabled via data-informed progressive shrinking of networks.

You can also join us on Zoom

This talk is part of the Artificial Intelligence Research Group Talks (Computer Laboratory) series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity