Project

Back to overview

Spatio-Temporal Memory Streaming

Applicant Falsafi Babak
Number 127021
Funding scheme Project funding
Research institution Laboratoire de design et media EPFL IC ISIM LDM
Institution of higher education EPF Lausanne - EPFL
Main discipline Information Technology
Start/End 01.09.2010 - 31.08.2013
Approved amount 305'940.00
Show all

Keywords (8)

streaming; temporal correlation; spatial correlation; spatio-temporal; cache; prefetching; temporal corrélation; spatial corrélation

Lay Summary (English)

Lead
Lay summary
We propose Spatio-Temporal Memory Streaming (STeMS), a memory system in which data are fetched, placed, and replaced in the form of data groups that are spatially or temporally correlated-which we refer to as spatio-temporal streams-rather than individual cache blocks. STeMS can extract streams on-the-fly in hardware by observing and recording a program's memory accesses or allow software to form and communicate the streams to hardware. STeMS helps bridge the processor/memory performance gap in a number of ways. First, streaming helps to hide the memory access latency and increase the memory-level parallelism for arbitrary memory access patterns (including dépendent accesses). Second, streaming helps to improve storage utilization by throttling the movement of data across storage levels and matching the fetch rate to the processor's consumption rate. Finally, STeMS optimizes pin bandwidth by moving only data that is likely to be referenced and implementing replacement decisions on data groups to avoid unnecessary replacement traffic. The specific contributions of this project will be: Extracting spatio-temporal streams. We will develop mechanisms that can detect and extract spatial and temporal correlation in memory access sequences to construct spatio-temporal streams. We will describe both transparent streaming mechanisms that extract streams dynamically in hardware, and software-assisted streaming mechanisms that allow the programmer/compiler to specify streams. Transferring and placing spatio-temporal streams. We will present novel hardware designs that throttle the rate at which spatio-temporal streams are loaded on chip to hide the latency of off-chip accesses. Furthermore, we will design mechanisms that intelligently allocate, place, and replace streamed data on chip to keep frequently-accessed streams in low-latency storage. Integrated hardware/software streaming systems. We will deliver integrated hardware/software system prototypes for important server applications, such as online transaction processing, that leverage the synergies possible with explicit software-specified streams. We will design HW/SW interfaces that allow the programmer or compiler to communicate a data structure's access patterns directly to the memory system.
Direct link to Lay Summary Last update: 21.02.2013

Responsible applicant and co-applicants

Employees

Publications

Publication
Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache
Jevdjic Djordje, Volos Stavros, Falsafi Babak (2013), Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache, in Proceedings of the 40th Annual International Symposium on Computer Architecture.
Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware
Ferdman Michael, Adileh Almutaz, Kocberber Onur, Volos Stavros, Alisafaee Mohammad, Jevdjic Djordje, Kaynak Cansu, Popescu Adrian Daniel, Ailamaki Anastasia, Falsafi Babak (2012), Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware, in Proceedings of the seventeenth international conference on Architectural Support for Programming Lan.
Proactive Instruction Fetch
Ferdman Michael, Kaynak Cansu, Falsafi Babak (2011), Proactive Instruction Fetch, in Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture , ACM, in press.
Quantifying the Mismatch Between Emerging Scale-Out Applications and Modern Processors
Ferdman Michael, Adileh Almutaz, Kocberber Onur, Volos Stavros, Alisafaee Mohammad, Jevdjic Djordje, Kaynak Cansu, Popescu Adrian Daniel, Ailamaki Anastasia, Falsafi Babak, Quantifying the Mismatch Between Emerging Scale-Out Applications and Modern Processors, in ACM Transactions on Computer Systems (TOCS).
SHIFT: Shared History Instruction Fetch for Manycore Lean-Core Server Processors
Kaynak Cansu, Grot Boris, Falsafi Babak, SHIFT: Shared History Instruction Fetch for Manycore Lean-Core Server Processors, in Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture .

Scientific events

Active participation

Title Type of contribution Title of article or contribution Date Place Persons involved
39th International Symposium on Computer Architecture Individual talk 09.06.2012 Portland, OR, United States of America Volos Stavros; Falsafi Babak; Kaynak Ilknur Cansu;
17th International Conference on Architectural Support for Programming Languages and Operating Systems Individual talk 03.03.2012 London, Great Britain and Northern Ireland Kaynak Ilknur Cansu; Volos Stavros; Falsafi Babak;
44th Annual IEEE/ACM International Symposium on Microarchitecture Individual talk 04.12.2011 Porto Alegre, Brazil Falsafi Babak; Kaynak Ilknur Cansu;
38th International Symposium on Computer Architecture Individual talk 06.06.2011 San Jose, CA, United States of America Kaynak Ilknur Cansu; Volos Stavros; Falsafi Babak;


Awards

Title Year
Best Paper Award by ASPLOS'12 Program Committee 2012
The ASPLOS'12 (Clearing the Clouds: a study of emerging scale-out workloads on modern hardware) paper was invited for publication in ACM Transactions on Computer Systems (TOCS'12). 2012

Associated projects

Number Title Start Funding scheme
145020 Plant metabolite analysis by liquid chromatography-mass spectrometry 01.10.2013 R'EQUIP

Abstract

We propose Spatio-Temporal Memory Streaming (STeMS), a memory system in which data are fetched, placed, and replaced in the form of data groups that are spatially or temporally correlated-which we refer to as spatio-temporal streams-rather than individual cache blocks. STeMS can extract streams on-the-fly in hardware by observing and recording a program’s memory accesses or allow software to form and communicate the streams to hardware. STeMS helps bridge the processor/memory performance gap in a number of ways. First, streaming helps to hide the memory access latency and increase the memory-level parallelism for arbitrary memory access patterns (including dependent accesses). Second, streaming helps to improve storage utilization by throttling the movement of data across storage levels and matching the fetch rate to the processor’s con¬sumption rate. Finally, STeMS optimizes pin bandwidth by moving only data that is likely to be referenced and implementing replacement decisions on data groups to avoid unnecessary replacement traffic. The specific contributions of this project will be:Extracting spatio-temporal streams. We will develop mechanisms that can detect and extract spatial and temporal correlation in memory access sequences to construct spatio-temporal streams. We will describe both transparent streaming mechanisms that extract streams dynamically in hardware, and soft¬ware-assisted streaming mechanisms that allow the programmer/compiler to specify streams.Transferring and placing spatio-temporal streams. We will present novel hardware designs that throttle the rate at which spatio-temporal streams are loaded on chip to hide the latency of off-chip accesses. Furthermore, we will design mechanisms that intelligently allocate, place, and replace streamed data on chip to keep frequently-accessed streams in low-latency storage.Integrated hardware/software streaming systems. We will deliver integrated hardware/software system prototypes for important server applications, such as online transaction processing, that leverage the synergies possible with explicit software-specified streams. We will design HW/SW interfaces that allow the programmer or compiler to communicate a data structure’s access patterns directly to the memory system.
-