Cache Memories

References

Outline

Recall: Locality

Recall: Memory Hierarchy

Memory Hierarchy
Memory Hierarchy

Recall: General Cache Concepts

Cache Concepts
Cache Concepts

Recall: General Cache Concepts

Cache Memories

Cache Bus
Cache Bus

General Cache Organization (S, E, B)

Cache Organization
Cache Organization

Cache Read

Example: Direct-Mapped Cache

Direct Mapped Cache
Direct Mapped Cache

Example: Direct-Mapped Cache

Direct Mapped Cache Index
Direct Mapped Cache Index

Example: Direct-Mapped Cache

Direct Mapped Cache Match
Direct Mapped Cache Match

Direct-Mapped Cache Simulation

Example: E-way Set Associative Cache

2-way Set Associative Cache Simulation

Cache Writes

Intel Core i7 Cache Hierarchy

Core i7 Caches
Core i7 Caches

Intel Core i7 Cache Hierarchy

Cache Performance Metrics

How Bad Can a Few Cache Misses Be?

Writing Cache Friendly Code

The Memory Mountain

Memory Mountain Test Function

long data[MAXELEMS];  /* Global array to traverse */

/* test - Iterate over first "elems" elements of
 *        array "data" with stride of "stride“,
 *        using 4x4 loop unrolling.
 */ 
int test(int elems, int stride) {
    long i, sx2=stride*2, sx3=stride*3, sx4=stride*4;
    long acc0 = 0, acc1 = 0, acc2 = 0, acc3 = 0;
    long length = elems, limit = length - sx4;

    /* Combine 4 elements at a time */
    for (i = 0; i < limit; i += sx4) {
        acc0 = acc0 + data[i];
        acc1 = acc1 + data[i+stride];
        acc2 = acc2 + data[i+sx2];
        acc3 = acc3 + data[i+sx3];
    }

    /* Finish any remaining elements */
    for (; i < length; i++) {
        acc0 = acc0 + data[i];
    }
    return ((acc0 + acc1) + (acc2 + acc3));
}

The Memory Mountain

Memory Mountain
Memory Mountain

Matrix Multiplication Example

Matrix Multiplication Example

for (i=0; i<n; i++)  {
  for (j=0; j<n; j++) {
    sum = 0.0;
    for (k=0; k<n; k++)
      sum += a[i][k] * b[k][j];
    c[i][j] = sum;
  }
}

Miss Rate Analysis for Matrix Multiply

Layout of C Arrays in Memory (review)

Matrix Multiplication (ijk)

for (i=0; i<n; i++)  {
  for (j=0; j<n; j++) {
    sum = 0.0;
    for (k=0; k<n; k++)
      sum += a[i][k] * b[k][j];
    c[i][j] = sum;
  }
}

Matrix Multiplication (kij)

for (k=0; k<n; k++) {
  for (i=0; i<n; i++) {
    r = a[i][k];
    for (j=0; j<n; j++)
      c[i][j] += r * b[k][j];
  }
}

Matrix Multiplication (jki)

for (j=0; j<n; j++) {
  for (k=0; k<n; k++) {
    r = b[k][j];
    for (i=0; i<n; i++)
      c[i][j] += a[i][k] * r;
  }
}

Summary of Matrix Multiplication

Core i7 Matrix Multiply Performance

Core i7 Matrix Multiply
Core i7 Matrix Multiply

Matrix Multiplication (Again)

c = (double *) calloc(sizeof(double), n*n);

/* Multiply n x n matrices a and b  */
void mmm(double *a, double *b, double *c, int n) {
    int i, j, k;
    for (i = 0; i < n; i++)
        for (j = 0; j < n; j++)
             for (k = 0; k < n; k++)
                c[i*n + j] += a[i*n + k] * b[k*n + j];
}

Cache Miss Analysis

Blocked Matrix Multiplication

c = (double *) calloc(sizeof(double), n*n);

/* Multiply n x n matrices a and b  */
void mmm(double *a, double *b, double *c, int n) {
    int i, j, k;
    for (i = 0; i < n; i+=L)
        for (j = 0; j < n; j+=L)
            for (k = 0; k < n; k+=L)
                  /* L x L mini matrix multiplications */
                  for (i1 = i; i1 < i+L; i1++)
                      for (j1 = j; j1 < j+L; j1++)
                          for (k1 = k; k1 < k+L; k1++)
                              c[i1*n+j1] += a[i1*n + k1]*b[k1*n + j1];
}

Cache Miss Analysis

Blocking Summary

Cache Summary