Lecture 05: Limits of Parallelism and Locks
COSC 272: Parallel and Distributed Computing
Spring 2023
Announcement
- Lab Assignment 01 Due Today
- Written Homework 01 Posted Sunday
Outline
- Limitations of Parallelism
- Mutual Exclusion
Last Time
Embarrassingly Parallel Problems
- can be broken into many simple computations, (almost) all of which can be performed in parallel
Example: Monte Carlo Estimation
Area of a disk: $A = \pi r^2$: estimate $\pi$!
Question
Why is Monte Carlo estimation embarrassingly parallel?
Another Question
How much performance increase with $k$ cores?
- What if $k \approx$ number of samples taken?
Not So Parallel
Dependencies?
a1 = b1 + c1;
a2 = b2 - c2;
d = a1 * a2
Dependency relation: directed acyclic graph (DAG)
More Generally
Consider a program that requires
- $N$ elementary operations
- $T$ time to run sequentially
Suppose
- a $p$-fraction of operations can be performed in parallel
- $1-p$ fraction must be performed sequentially
Question: how long could program take with $n$ parallel machines?
Idea
With $n $ parallel machines:
- perform $p $-fraction of parallelizable ops in parallel on all $n$ machines
- total time $\frac{T \cdot p}{n}$
- perform remaining ops sequentially on a single machine
- total time $T \cdot (1 - p)$
Total time: $T \cdot (1 - p) + T \cdot \frac{p}{n} = T \cdot \left(1 - p + \frac p n\right)$
How Much Improvement?
The speedup is the ratio of the original time $T $ to the parallel time $T \cdot \left(1 - p + \frac p n\right)$:
- $S = \frac{1}{1 - p + \frac p n}$
This relation is called Amdahl’s Law
This is the best performance improvement possible in principle
- may not be achievable in practice!
Example
1 person can chop 1 onion per minute
Recipe calls for:
- chop 6 onions
- saute onions for 4 minutes
Note:
- chopping onions can be done in parallel
- sauteing
- takes 4 minutes no matter what
- must be accomplished after chopping
Example (continued)
How much can the cooking process be sped up by $n $ cooks?
Example (continued)
- For one chef, $T = 6 + 4 = 10$
- Only chopping onions is parallelizable, so $p = 6 / 10 = 0.6$
- Amdahl’s Law:
- $S = \frac{1}{1 - p - \frac{p}{n}} = \frac{1}{0.4 + \frac 1 n 0.6}$
- So:
- $n = 2 \implies S = 1.43$
- $n = 3 \implies S = 1.67$
- $n = 6 \implies S = 2$
- Always have $S < 1 / (1 - p) = 2.5$
Speedup Improvement by Adding More Processors
- Second processor: 43%
- Third processor: 17%
- Fourth processor: 9%
- Fifth processor: 6%
- Sixth processor 4%
Latency vs Number of Processors
How does latency $T$ scale with $n$?
- Adding more processors has declining marginal utility:
- each additional processor has a smaller effect on total performance
- at some point, adding more processors to a computation is wasteful
- Another consideration:
- after parallel ops have been performed, extra processors are idle (potentially wasteful!)
Back to Counter
Example
The problem with
public void increment () {
++count;
}
The operation ++count
is not atomic
- consists of:
-
read
count value
- increment value in register
-
write
updated value
- these operations can be interleaved for concurrent executions
A Strategy
Fix the issue by locking the count
To increment the Counter
:
- check if
Counter
is locked
- if so, wait until it is unlocked
- lock the
Counter
- no other thread can modify while locked
- increment the counter
- unlock the
Counter
An Attempt
public class LockedCounter {
long count = 0;
boolean locked = false;
public long getCount () { return count; }
public void increment () { count++; }
public void reset () { count = 0; }
public void lock (int id) {
while (locked) { }
locked = true;
}
public void unlock () { locked = false; }
public boolean isLocked () { return locked; }
}
Running the Locked Counter
public void run () {
for (long i = 0; i < times; i++) {
counter.lock(id);
try {
counter.increment();
}
finally {
counter.unlock();
}
}
LockedCounterTester
Demo!
Question
What happened? Can we make the locked counter idea work?
Morals
- Empirical testing is not enough!
- Must understand correctness formally
Next Week
Two threads:
- Mutual Exclusion
- Locality of Reference