Lecture 02: Limitations of Distributed Control and Parallelism
Outline
 Problems with Distributed Control & Braess’s Paradox
 Limits of Parallelism & Amdahl’s Law
A Distributed Problem
 All processors/agents may not be under centralized control
 Agents may have individual incentives
 Each agent acts autonomously to maximize their utility
 e.g., tuning internet routers to maximize internet speed
 choosing driving routes to minimize commute time
Traffic Routing
Two types of roads
 Country road: fixed maximum speed
 Freeway: congestion decreases speed
 more cars \(\implies\) longer commutes
 travel time proportional to number of cars on road
Example
 100 cars want to travel from city A to city D
 2 possible routes:
 northern route through city B
 southern route through city C
 Freeways from A to B and from C to D
 Country roads from A to C and B to D
A Simple Map
Blue = freeway, Red = country road
Commute Times
 Commute time on (blue) freeways = proportion of cars on freeway
 e.g., if 60 cars take freeway from A to B, commute time from A to B is 0.6 hr
 Commute time on (red) country roads constant: 1.01 hr
Question 1
Which route, if I know how many cars are taking each route?
Question 2
How to choose a route if I don’t know how many cars take each route?
Observations

If I choose route randomly, my average commute time is 1.51 hr, regardless of what everyone else does.

Centralized control can guarantee that everyone’s commute is exactly 1.51 hr:
 route 50 cars through B and 50 cars through C
The Free Market
Elon Musk thinks this map is ridiculous!
“We will build a hyperloop connecting cities B and C!!! The commute will be free and (almost) instantaneous!!!”
The New Picture
What is the Best Strategy?
Previously:
 Choosing B/C route randomly gives expected commute time of 1.51 hr
 Centralized control guaranteed commute times of 1.51 hr
These options are still viable, but now get more choices:
 A to B to C to D: commute time \(X_B + X_C\)
 A to C to B to D: commute time 2.02
Which route should you choose? What is the aggregate behavior?
Best Strategy?
Which route should you choose? What is the aggregate behavior?
Dominant Strategy
Observation:
 \(X_B, X_C \leq 1\) no matter what
Independent of everyone else’s choices, your fastest route is A to B to C to D:
 Total cost: \(X_B + X_C\)
This is the dominant strategy
Collective Behavior
If everyone chooses according to self interest:
 all cars commute A to B to C to D

\[X_B = X_C = 1\]
 everyone’s commute takes 2 hours!
Without the hyperloop, expected commute time was 1.51 hours…
 adding the hyperloop increased commute times
In real life: closing Times Square to car traffic in NYC decreased avg. commute time in Manhattan. (source)
The Price of Anarchy
With hyperloop and everyone choosing their fastest route:
 commute time is 2 hr for everyone
Optimal global strategy
 send half of cars A, B, D and half A, C, D
 commute time is 1.51 hr for everyone
Rational, selfinterested distributed control leads to 33% worse efficiency for everyone! This is the Price of Anarchy.
The Morals

Having more options can lead to worse outcomes, even when everyone acts rationally in their self interest.

Individual rational behavior can lead to worse outcomes for everyone than centralized control.
COVID as Tragedy of the Commons
can lead to disasterous outcomes
 This issue is inherent to distributed systems, and not merely a result of human or political flaws
Computational Limits of Parallelism
Limits of Parallelism
How much can parallelism speed up computation?
 Sometimes a lot
 Monte Carlo simulation to estimate \(\pi\)
 other “embarrassinlgy parallel” tasks
 Sometimes not?
 given a program and a number \(T\), determine if the program terminates after at most \(T\) steps
 this problem is conjectured not to be parallelizable
Quantifying Speedup
Consider a program consisting of
 \(N\) elementary operations,
 takes time \(T\) to run sequentially
Suppose:
 a fraction \(p\) of operations can be performed in parallel
 (remaining \(1  p\) fraction must be performed sequentially)
Question: how long to complete computation with \(n\) parallel machines?
Idea
With \(n\) parallel machines:
 perform \(p\)fraction of parallelizable ops in parallel on all \(n\) machines
 total time \(T \cdot \frac{p}{n}\)
 perform remaining ops sequentially on a single machine
 total time \(T \cdot (1  p)\)
Total time: \(T \cdot (1  p) + T \cdot \frac{p}{n} = T \cdot \left(1  p + \frac p n\right)\)
How Much Improvement?
The speedup is the ratio of the original time \(T\) to the parallel time \(T \cdot \left(1  p + \frac p n\right)\):

\[S = \frac{1}{1  p + \frac p n}\]
This is the best performance improvement possible in principle
 may not be achievable in practice!
Example
1 person can chop 1 onion per minute
Recipe calls for:
 chop 6 onions
 saute onions for 4 minutes
Note:
 chopping onions can be done in parallel
 sauteing
 takes 4 minutes no matter what
 must be accomplished after chopping
Example (continued)
How much can the cooking process be sped up by \(n\) cooks?
Example (continued)
 For one chef, \(T = 6 + 4 = 10\)
 Only chopping onions is parallelizable, so \(p = 6 / 10 = 0.6\)
 Amdahl’s Law:

\[S = \frac{1}{1  p  \frac{p}{n}} = \frac{1}{0.4 + \frac 1 n 0.6}\]
 So:

\[n = 2 \implies S = 1.43\]

\[n = 3 \implies S = 1.67\]

\[n = 6 \implies S = 2\]
 Always have \(S < 1 / (1  p) = 2.5\)
Speedup Improvement by Adding More Processors
 Second processor: 43%
 Third processor: 17%
 Fourth processor: 9%
 Fifth processor: 6%
 Sixth processor 4%
Latency vs Number of Processors
How does latency \(T\) scale with \(n\)?
 Adding more processors has declining marginal utility:
 each additional processor has a smaller effect on total performance
 at some point, adding more processors to a computation is wasteful
 Another consideration:
 after parallel ops have been performed, extra processors are idle (potentially wasteful!)
Homework Assignment 1
 You’ll be asked to take a more nuanced approach to computing an optimal schedule for parallel processing