Lecture 05: Limits of Parallelism and Locks
COSC 272: Parallel and Distributed Computing
Spring 2023
Announcement
 Lab Assignment 01 Due Today
 Written Homework 01 Posted Sunday
Outline
 Limitations of Parallelism
 Mutual Exclusion
Last Time
Embarrassingly Parallel Problems
 can be broken into many simple computations, (almost) all of which can be performed in parallel
Example: Monte Carlo Estimation
Area of a disk: $A = \pi r^2$: estimate $\pi$!
Question
Why is Monte Carlo estimation embarrassingly parallel?
Another Question
How much performance increase with $k$ cores?
 What if $k \approx$ number of samples taken?
Not So Parallel
Dependencies?
a1 = b1 + c1;
a2 = b2  c2;
d = a1 * a2
Dependency relation: directed acyclic graph (DAG)
More Generally
Consider a program that requires
 $N$ elementary operations
 $T$ time to run sequentially
Suppose
 a $p$fraction of operations can be performed in parallel
 $1p$ fraction must be performed sequentially
Question: how long could program take with $n$ parallel machines?
Idea
With $n $ parallel machines:
 perform $p $fraction of parallelizable ops in parallel on all $n$ machines
 total time $\frac{T \cdot p}{n}$
 perform remaining ops sequentially on a single machine
 total time $T \cdot (1  p)$
Total time: $T \cdot (1  p) + T \cdot \frac{p}{n} = T \cdot \left(1  p + \frac p n\right)$
How Much Improvement?
The speedup is the ratio of the original time $T $ to the parallel time $T \cdot \left(1  p + \frac p n\right)$:
 $S = \frac{1}{1  p + \frac p n}$
This relation is called Amdahl’s Law
This is the best performance improvement possible in principle
 may not be achievable in practice!
Example
1 person can chop 1 onion per minute
Recipe calls for:
 chop 6 onions
 saute onions for 4 minutes
Note:
 chopping onions can be done in parallel
 sauteing
 takes 4 minutes no matter what
 must be accomplished after chopping
Example (continued)
How much can the cooking process be sped up by $n $ cooks?
Example (continued)
 For one chef, $T = 6 + 4 = 10$
 Only chopping onions is parallelizable, so $p = 6 / 10 = 0.6$
 Amdahl’s Law:
 $S = \frac{1}{1  p  \frac{p}{n}} = \frac{1}{0.4 + \frac 1 n 0.6}$
 So:
 $n = 2 \implies S = 1.43$
 $n = 3 \implies S = 1.67$
 $n = 6 \implies S = 2$
 Always have $S < 1 / (1  p) = 2.5$
Speedup Improvement by Adding More Processors
 Second processor: 43%
 Third processor: 17%
 Fourth processor: 9%
 Fifth processor: 6%
 Sixth processor 4%
Latency vs Number of Processors
How does latency $T$ scale with $n$?
 Adding more processors has declining marginal utility:
 each additional processor has a smaller effect on total performance
 at some point, adding more processors to a computation is wasteful
 Another consideration:
 after parallel ops have been performed, extra processors are idle (potentially wasteful!)
Back to Counter
Example
The problem with
public void increment () {
++count;
}
The operation ++count
is not atomic
 consists of:

read
count value
 increment value in register

write
updated value
 these operations can be interleaved for concurrent executions
A Strategy
Fix the issue by locking the count
To increment the Counter
:
 check if
Counter
is locked
 if so, wait until it is unlocked
 lock the
Counter
 no other thread can modify while locked
 increment the counter
 unlock the
Counter
An Attempt
public class LockedCounter {
long count = 0;
boolean locked = false;
public long getCount () { return count; }
public void increment () { count++; }
public void reset () { count = 0; }
public void lock (int id) {
while (locked) { }
locked = true;
}
public void unlock () { locked = false; }
public boolean isLocked () { return locked; }
}
Running the Locked Counter
public void run () {
for (long i = 0; i < times; i++) {
counter.lock(id);
try {
counter.increment();
}
finally {
counter.unlock();
}
}
LockedCounterTester
Demo!
Question
What happened? Can we make the locked counter idea work?
Morals
 Empirical testing is not enough!
 Must understand correctness formally
Next Week
Two threads:
 Mutual Exclusion
 Locality of Reference