Lecture 02: Limitations of Distributed Control and Parallelism
Outline
- Problems with Distributed Control & Braess’s Paradox
- Limits of Parallelism & Amdahl’s Law
A Distributed Problem
- All processors/agents may not be under centralized control
- Agents may have individual incentives
- Each agent acts autonomously to maximize their utility
- e.g., tuning internet routers to maximize internet speed
- choosing driving routes to minimize commute time
Traffic Routing
Two types of roads
- Country road: fixed maximum speed
- Freeway: congestion decreases speed
- more cars \(\implies\) longer commutes
- travel time proportional to number of cars on road
Example
- 100 cars want to travel from city A to city D
- 2 possible routes:
- northern route through city B
- southern route through city C
- Freeways from A to B and from C to D
- Country roads from A to C and B to D
A Simple Map
Blue = freeway, Red = country road
Commute Times
- Commute time on (blue) freeways = proportion of cars on freeway
- e.g., if 60 cars take freeway from A to B, commute time from A to B is 0.6 hr
- Commute time on (red) country roads constant: 1.01 hr
Question 1
Which route, if I know how many cars are taking each route?
Question 2
How to choose a route if I don’t know how many cars take each route?
Observations
-
If I choose route randomly, my average commute time is 1.51 hr, regardless of what everyone else does.
-
Centralized control can guarantee that everyone’s commute is exactly 1.51 hr:
- route 50 cars through B and 50 cars through C
The Free Market
Elon Musk thinks this map is ridiculous!
“We will build a hyperloop connecting cities B and C!!! The commute will be free and (almost) instantaneous!!!”
The New Picture
What is the Best Strategy?
Previously:
- Choosing B/C route randomly gives expected commute time of 1.51 hr
- Centralized control guaranteed commute times of 1.51 hr
These options are still viable, but now get more choices:
- A to B to C to D: commute time \(X_B + X_C\)
- A to C to B to D: commute time 2.02
Which route should you choose? What is the aggregate behavior?
Best Strategy?
Which route should you choose? What is the aggregate behavior?
Dominant Strategy
Observation:
- \(X_B, X_C \leq 1\) no matter what
Independent of everyone else’s choices, your fastest route is A to B to C to D:
- Total cost: \(X_B + X_C\)
This is the dominant strategy
Collective Behavior
If everyone chooses according to self interest:
- all cars commute A to B to C to D
-
\[X_B = X_C = 1\]
- everyone’s commute takes 2 hours!
Without the hyperloop, expected commute time was 1.51 hours…
- adding the hyperloop increased commute times
In real life: closing Times Square to car traffic in NYC decreased avg. commute time in Manhattan. (source)
The Price of Anarchy
With hyperloop and everyone choosing their fastest route:
- commute time is 2 hr for everyone
Optimal global strategy
- send half of cars A, B, D and half A, C, D
- commute time is 1.51 hr for everyone
Rational, self-interested distributed control leads to 33% worse efficiency for everyone! This is the Price of Anarchy.
The Morals
-
Having more options can lead to worse outcomes, even when everyone acts rationally in their self interest.
-
Individual rational behavior can lead to worse outcomes for everyone than centralized control.
COVID as Tragedy of the Commons
can lead to disasterous outcomes
- This issue is inherent to distributed systems, and not merely a result of human or political flaws
Computational Limits of Parallelism
Limits of Parallelism
How much can parallelism speed up computation?
- Sometimes a lot
- Monte Carlo simulation to estimate \(\pi\)
- other “embarrassinlgy parallel” tasks
- Sometimes not?
- given a program and a number \(T\), determine if the program terminates after at most \(T\) steps
- this problem is conjectured not to be parallelizable
Quantifying Speedup
Consider a program consisting of
- \(N\) elementary operations,
- takes time \(T\) to run sequentially
Suppose:
- a fraction \(p\) of operations can be performed in parallel
- (remaining \(1 - p\) fraction must be performed sequentially)
Question: how long to complete computation with \(n\) parallel machines?
Idea
With \(n\) parallel machines:
- perform \(p\)-fraction of parallelizable ops in parallel on all \(n\) machines
- total time \(T \cdot \frac{p}{n}\)
- perform remaining ops sequentially on a single machine
- total time \(T \cdot (1 - p)\)
Total time: \(T \cdot (1 - p) + T \cdot \frac{p}{n} = T \cdot \left(1 - p + \frac p n\right)\)
How Much Improvement?
The speedup is the ratio of the original time \(T\) to the parallel time \(T \cdot \left(1 - p + \frac p n\right)\):
-
\[S = \frac{1}{1 - p + \frac p n}\]
This is the best performance improvement possible in principle
- may not be achievable in practice!
Example
1 person can chop 1 onion per minute
Recipe calls for:
- chop 6 onions
- saute onions for 4 minutes
Note:
- chopping onions can be done in parallel
- sauteing
- takes 4 minutes no matter what
- must be accomplished after chopping
Example (continued)
How much can the cooking process be sped up by \(n\) cooks?
Example (continued)
- For one chef, \(T = 6 + 4 = 10\)
- Only chopping onions is parallelizable, so \(p = 6 / 10 = 0.6\)
- Amdahl’s Law:
-
\[S = \frac{1}{1 - p - \frac{p}{n}} = \frac{1}{0.4 + \frac 1 n 0.6}\]
- So:
-
\[n = 2 \implies S = 1.43\]
-
\[n = 3 \implies S = 1.67\]
-
\[n = 6 \implies S = 2\]
- Always have \(S < 1 / (1 - p) = 2.5\)
Speedup Improvement by Adding More Processors
- Second processor: 43%
- Third processor: 17%
- Fourth processor: 9%
- Fifth processor: 6%
- Sixth processor 4%
Latency vs Number of Processors
How does latency \(T\) scale with \(n\)?
- Adding more processors has declining marginal utility:
- each additional processor has a smaller effect on total performance
- at some point, adding more processors to a computation is wasteful
- Another consideration:
- after parallel ops have been performed, extra processors are idle (potentially wasteful!)
Homework Assignment 1
- You’ll be asked to take a more nuanced approach to computing an optimal schedule for parallel processing