We’ve assumed:
Computer architechture is not so simple!
When reading or writing:
Heuristic:
Cache (in)consistency
Takes time to propogate changes to values
Counter
Example RevisitedPreviously, we saw
Counter
objectMakes sense with cache
Heuristic:
arr[i]
, likely next access is arr[i+1]
arr[i+c]
for some small c
Fetch many values from higher levels at a time
arr[i]
is fetched from high level, also fetch arr[i+1]...arr[i+c]
So write write programs that access memory linearly!
float[] arr
a reasonably large arrayint[] indices
indices of arr
to be accessed
float sum = 0;
for (int i : indices)
sum += arr[i];
indices = {0, 1, 2, 3, ...}
than
indices = {2743, 1, 9932, 4952,...}
How much faster?
Investigate how much access order effects efficiency of a program
How did array size affect run-time?
If access pattern
may be more efficient to reorganize data structure so that access pattern is linear
Heuristic:
for (int i = 0; i < size; ++i) {
for (int j = 0; j < size; ++j) {
float min = Float.MAX_VALUE;
for (int k = 0; k < size; ++k) {
float x = matrix[i][k];
float y = matrix[k][j];
float z = x + y;
if (z < min) {
min = z;
}
}
shortcuts[i][j] = min;
}
}
|------|------------------|-------------|------------------|---------|
| size | avg runtime (ms) | improvement | iteration per us | passed? |
|------|------------------|-------------|------------------|---------|
| 128 | 10 | 1.07 | 201 | yes |
| 256 | 42 | 1.23 | 396 | yes |
| 512 | 317 | 0.75 | 422 | yes |
| 1024 | 680 | 6.91 | 1578 | yes |
| 2048 | 6249 | 9.98 | 1374 | yes |
| 4096 | 57592 | 11.05 | 1193 | yes |
|------|------------------|-------------|------------------|---------|
Up to 11x speedup, before multithreading
|------|------------------|-------------|------------------|---------|
| size | avg runtime (ms) | improvement | iteration per us | passed? |
|------|------------------|-------------|------------------|---------|
| 128 | 24 | 0.80 | 86 | yes |
| 256 | 6 | 18.94 | 2623 | yes |
| 512 | 21 | 13.11 | 6132 | yes |
| 1024 | 157 | 31.70 | 6828 | yes |
| 2048 | 1123 | 55.11 | 7647 | yes |
|------|------------------|-------------|------------------|---------|
|------|------------------|-------------|------------------|---------|
| size | avg runtime (ms) | improvement | iteration per us | passed? |
|------|------------------|-------------|------------------|---------|
| 128 | 310 | 0.04 | 6 | yes |
| 256 | 52 | 1.14 | 317 | yes |
| 512 | 18 | 14.71 | 7105 | yes |
| 1024 | 87 | 57.84 | 12257 | yes |
| 2048 | 464 | 146.15 | 18478 | yes |
| 4096 | 3120 | 190.11 | 22020 | yes |
|------|------------------|-------------|------------------|---------|