Stream-Based Trace Compression 
              | 
           
Contact:
       Aleksandar Milenkovic
       Assistant Professor of Electrical and Computer Engineering
       The University of Alabama in Huntsville
       301 Sparkman Drive, Huntsville, AL 35899
       Email: milenka@ece.uah.edu
       URL: http://www.ece.uah.edu/~milenka
Trace-driven simulations have been widely used in quantitative evaluations
   of new techniques in computer architecture. To offer a faithful representation
   of a specific workload, traces are very large, encompassing billions of
 memory  references and/or instructions. To efficiently store and use even
 a small  collection of traces, its size must be reduced beyond traditional
 compression  techniques. 
       Stream-based compression (SBC) is a new method for single-pass compression
   of address traces and various extended trace formats. The SBC algorithm
 relies  on extracting instruction streams. An instruction stream is defined
 as a sequential run of instructions, from the target of a taken branch to
 the first taken branch in sequence. A stream table is created during compression,
  encompassing all relevant information about streams, such as starting address,
  stream length, instruction words and their types. All instructions from
a  stream are replaced by its index in the stream table, creating a trace
of  instruction streams. 
       Unlike instruction addresses, data address for a memory referencing
 instruction   stays rarely constant during program execution, but it a can
 have a regular   stride. The SBC trace compression features a very simple
 and yet efficient   on-line algorithm for compression of data address references.
 The compressed   data address trace encompasses the data address stride
and  the number or  repetitions for each memory-referencing instruction in
a stream.  A change  in a data address stride results in another compressed
record. Compressed  records are ordered by the corresponding stream appearances
in the trace.   
       
The SBC algorithm exploits several inherent characteristics of program
   execution traces. Instruction traces consist of a fairly limited number
 of  different instruction streams, and the most of memory references exhibit
  strong spatial and/or temporal locality, for example, a load having a constant
  address stride across loop iterations. 
        Figure 1 illustrates the compression flow assuming Dinero+ input. 
A  Dinero+  trace record has fixed length fields: the header field (0 – data 
 read, 1 – data write, and 2 – instruction read), the address field, and the
 instruction  word field for the instruction read type. The stream-based compression
 results  in three files: Stream Table File (STF), Stream-Based Instruction
 Trace (SBIT),  and Stream-Based Data Trace (SBDT). Figure 2 demonstrates
the compression  process using a short excerpt of a trace. Figure 3 shows
the SBC format for  data address references.
       
       
       
       Header file: mytrace.h 
    
Dinero+ to SBC conversion: din_plus2sbc.c
SBC to Dinero+ conversion: sbc2din_plus.c
SBC to Dinero conversion: sbc2din.c
A. Milenkovic, M. Milenkovic, "Exploiting Streams in Instruction and
   Data Address Trace Compression," 
           Proceedings of the IEEE 6th Annual Workshop on Workload Characterization,
   Austin, TX, USA, October 27, 2003, pp. 99-107.
        [PDF] 
   [ps.gz] 
   [PPT 
  slides]
Last update: March 4, 2004