Linear Sum:
float total = 0;
for (int i = 0; i < size; ++i) {
int idx = linearIndex[i];
total += values[idx];
}
return total;
Random Sum:
float total = 0;
for (int i = 0; i < size; ++i) {
int idx = randomIndex[i];
total += values[idx];
}
return total;
arr
a large array
On read/write arr[i]
, search for arr[i]
successively in
Copy arr[i]
and surrounding values to L1 cache
arr[i-a],...,arr[i+b]
ends up in L1This process is called paging
Be aware of your program’s memory access pattern
float[][] shortcuts = new float[size][size];
for (int i = 0; i < size; ++i) {
for (int j = 0; j < size; ++j) {
float min = Float.MAX_VALUE;
for (int k = 0; k < size; ++k) {
float x = matrix[i][k];
float y = matrix[k][j];
float z = x + y;
if (z < min)
min = z;
}
shortcuts[i][j] = min;
}
Questions.
matrix
are sequential? Which are not?Which accesses to matrix
are sequential? Which are not?
float[][] shortcuts = new float[size][size];
for (int i = 0; i < size; ++i) {
for (int j = 0; j < size; ++j) {
float min = Float.MAX_VALUE;
for (int k = 0; k < size; ++k) {
float x = matrix[i][k];
float y = matrix[k][j];
float z = x + y;
if (z < min)
min = z;
}
shortcuts[i][j] = min;
}
How could we make all memory accesses sequential?
float[][] shortcuts = new float[size][size];
for (int i = 0; i < size; ++i) {
for (int j = 0; j < size; ++j) {
float min = Float.MAX_VALUE;
for (int k = 0; k < size; ++k) {
float x = matrix[i][k];
float y = matrix[k][j];
float z = x + y;
if (z < min)
min = z;
}
shortcuts[i][j] = min;
}
Which operations can be (easily) parallelized?
float[][] shortcuts = new float[size][size];
for (int i = 0; i < size; ++i) {
for (int j = 0; j < size; ++j) {
float min = Float.MAX_VALUE;
for (int k = 0; k < size; ++k) {
float x = matrix[i][k];
float y = matrix[k][j];
float z = x + y;
if (z < min)
min = z;
}
shortcuts[i][j] = min;
}
My Benchmark (HPC cluster):
[wrosenbaum@hpc-login1 lab02-shortcuts]$ cat shortcutTest.out
|------|------------------|-------------|------------------|---------|
| size | avg runtime (ms) | improvement | iteration per us | passed? |
|------|------------------|-------------|------------------|---------|
| 128 | 184 | 0.05 | 11 | yes |
| 256 | 56 | 0.82 | 294 | yes |
| 512 | 19 | 9.22 | 6972 | yes |
| 1024 | 85 | 33.15 | 12497 | yes |
| 2048 | 257 | 88.33 | 33317 | yes |
| 4096 | 1124 | 324.66 | 61095 | yes |
|------|------------------|-------------|------------------|---------|