Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Paying Attention to Efficiency: LLM Deployment on Mobile and Edge Devices

Add to your list(s) Download to your calendar using vCal

Stefanos Laskaridis, Brave Software
Tuesday 05 November 2024, 13:00-14:00
Lecture Theatre 2, Computer Laboratory, William Gates Building.

If you have a question about this talk, please contact Mateja Jamnik.

Transformers have recently sparked significant interest in AI, driving advancements in accuracy and enabling a wide range of applications, from multi-modal intelligent assistants to autonomous systems. While their scaling laws promise even greater capabilities, the demands on hardware and data present significant challenges. In response, there is growing interest in compressing these models to smaller, more efficient forms, making them feasible for deployment with lower resource requirements. As edge and mobile devices are integrating increasingly powerful System-On-Chips (SoCs), deploying these models locally becomes viable, thus enabling new use-cases while enhancing privacy, sustainability and task-specific customization.

In this talk, I will be touching upon two areas: first, measuring the execution efficiency and deployability of Large Language Models (LLMs) on mobile and edge devices; and second optimising DNN workloads for efficiency through low-rank decompositions. I will introduce MELT (MobiCom’24), a benchmarking framework designed to assess the computational, memory, energy, and thermal characteristics of LLMs running on device, identifying associated bottlenecks. Following this, I will present Maestro (ICML’24), a novel approach leveraging trainable low-rank decompositions to enable more efficient training and deployment of DNNs, enabled via data-informed progressive shrinking of networks.

You can also join us on Zoom

This talk is part of the Artificial Intelligence Research Group Talks (Computer Laboratory) series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Paying Attention to Efficiency: LLM Deployment on Mobile and Edge Devices

This talk is included in these lists:

Other lists

Other talks