University of Cambridge > Talks.cam > Computer Laboratory Computer Architecture Group Meeting > Two generations of Many-Core Computational Arrays

Two generations of Many-Core Computational Arrays

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Robert Mullins.

Note unusual time

The Asynchronous Array of Simple Processors (AsAP) is a programmable and reconfigurable processing system that: enables high throughput and high energy-efficiency, is well matched to workloads containing many varied DSP tasks, and is well suited for deep submicron VLSI fabrication technologies. The AsAP platform is composed of a large number of programmable “reduced complexity” processing elements designed to capture the targeted task kernels but with very little additional overhead. Processors contain individual digitally-tunable clock oscillators operating completely independently with respect to each other (GALS), and processors communicate through a reconfigurable full-rate 2-D mesh network. Individual clock oscillators fully halt in 9 cycles when there is no work to do, and restart at full speed in less than one cycle after work becomes available. A chip containing 36 programmable processors was fabricated in 0.18 um CMOS using standard cells and is fully functional. Each 0.66 mm^2 processor operates up to 610 MHz at 2.0 V and dissipates 32 mW average at 475 MHz and 1.8 V, and 2.4 mW at 116 MHz and 0.9 V while executing applications. [ISSCC06] Several dozen DSP and general tasks have been coded including 32-1024 point complex FFTs, a k=7 viterbi decoder, a JPEG encoder, a full-rate HDTV H .264 CAVLC encoder, and a fully-compliant IEEE 802.11a/11g wireless LAN baseband transmitter and receiver. Power, throughput, and area results compare very well with existing programmable DSP processors. A recently completed C compiler and automatic mapping tool greatly simplify programming. A second generation 65 nm CMOS design contains 167 processors and has many new architectural features including dedicated FFT , Viterbi, and video motion estimation processors; 16 KB shared memories; and long-distance inter-processor interconnect. The programmable processors are able to individually and dynamically change their supply voltage (choosing among VddHi, VddLo, or disconnected) and clock frequency. The chip is fully-functional with early measurements showing the programmable processors operating up to 1.2 GHz while dissipating 59 mW at 1.3 V. At a supply voltage of 0.675 V, they operate at 66 MHz while dissipating only 608 uW.

This talk is part of the Computer Laboratory Computer Architecture Group Meeting series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity