Lecture 19: $k$-Means Clustering

COSC 225: Algorithms and Visualization

Spring, 2023


the unreasonable effectiveness of simple procedures

  1. Generic problem: cluster analysis
  2. A simple procedure: k-means
  3. Implementation
  4. Demo and explanation

Motivating Problems

Problem 1

Given locations of houses in a sparsely populated county,

Find locations to put mail drop mail drop boxes to serve the community

  • fixed budget for number of drop boxes

Problem 2

Given a large body of texts (e.g., books),

Find collections of similar books (e.g., similar style, topic, etc)

Problem 3

Given an image (e.g., digital photo),

Find a palette of representative colors similar to those in the image.


Question. What to these problems have in common?

Generic Problems


  • a large collection of items


  • a partition of of the collection into clusters containing similar items
  • a small collection of representative elements for the collection


  • similarity measure between items
  • features of new items can be synthesized from existing items

Formalizing the Input

Representing items:

  • items can be represented as geometric points
  • different features correspond to different dimensions

Examples: houses, texts, colors

Representing distances:

  • $a$ has coordinates $(a_1, a_2, \ldots, a_d)$
  • $b$ has coordinates $(b_1, b_2, \ldots, b_d)$
  • $\mathrm{dist}(a, b) = \sqrt{(a_1 - b_1)^2 + (a_2 - b_2)^2 + \cdots + (a_d - b_d)^2}$

Output parameter: $k$ = # of desired parts/clusters/elements

Desired Outputs

  1. Clusters $C_1, C_2, \ldots, C_k$ that partition the elements in the collection

  2. Points $m_1, m_2, \ldots, m_k$ that a representative of the clusters

From representative points to clusters:

  • each point $m_i$ represents a cluster $C_i$
  • $C_i$ is the set of all data points $a$ such that $m_i$ is the closest representative point
    • $a$ in $C_i$ if $i = \arg\min_j \{\mathrm{dist}(a, m_j)\}$

Question. How to choose good representative points?

From Clusters to Representative Points

Question. Given clusters, how could we choose appropriate representative points?

Idea. Choose the centroid a.k.a. mean of each cluster

  • centroid’s coordinates are averages of cluster’s points

Optimization Problem


  • data points \(a_1, a_2, \ldots, a_n\),
  • parameter $k$


  • means (points) $m_1, m_2, \ldots, m_k$
  • associated clusters $C_1, C_2, \ldots, C_k$


  • cost of $a$ in cluster $C_j$ is $\mathrm{dist}^2(a, m_j)$
  • minimize sum of all costs

A Problem with Optimization

Unfortunately. Finding $m_1, m_2, \ldots, m_k$ is computationally hard

  • no known efficient algorithm
  • widely believed that no efficient procedure exists
    • problem is NP-hard

A Heuristic Solution

  • start by selecting $m_1, m_2, \ldots, m_k$ randomly

  • compute associated clusters $C_1, C_2, \ldots, C_k$

    • then what?
  • then update means according to the clusters

    • then what?
  • then update the clusters

“Naive” $k$-Means Algorithm

  1. Choose $m_1, m_2, \ldots, m_k$ randomly
  2. Compute $C_1, C_2, \ldots, C_k$
    • $a \in C_i$ if $i = \arg\min_i \{\mathrm{dist}(a, m_i)\}$
  3. Update $m_1, m_2, \ldots, m_k$ to be the centroids of clusters
  4. Repeat 2 and 3 until no one updates


  • k-means-clustering.zip

Implementation Details: Structure

  • KMeans represents an instance of $k$-means clustering problem
    • stores x- and y-coordinates of data points and means
    • stores associated clusters
    • methods to initialize and update means, update clusters
  • KMeansVisualizer
    • stores SVG element and KMeans instance
    • SVG elements for each point, mean
    • methods to add point, update groups, draw means
    • handles animation
    • responds to clicks

SVG Canvas That Fills Screen

#k-means {
    position: fixed;
    z-index: 0;
    top: 0px;
    left: 0px;
    width: 100svw;
    height: 100svh;
    background-color: rgb(220, 220, 255);

Box Sticks to Bottom

#root {
    width: 700px;
    height: 100svh;
    margin: 0px auto;
    display: flex;
    flex-direction: column;
    justify-content: space-between;
    overflow: clip;

Rounded Corners on Boxes

#root {
    overflow: clip;

.head, .foot {
    border-radius: 10px;
    position: relative;
.head {
    top: -10px;
    padding: 30px 20px 20px 20px;
.foot {
    top: 10px;
    padding: 20px 20px 30px 20px;

Generating Cluster Colors

for (let i = 0; i < this.k; i++) {
  let group = document.createElementNS(SVG_NS, "g");
  group.setAttributeNS(null, "fill", 
                       `hsl(${60 +  360 * i / this.k}, 
                        90%, ${50 - 20 * i / this.k}%)`);

Coloring Clusters

Group element for each cluster, set fill attribute

this.clusterGroups = [];   // SVG groups for each cluster
for (let i = 0; i < this.k; i++) {
  let group = document.createElementNS(SVG_NS, "g");
  group.setAttributeNS(null, "fill", ...);

Adding elements

this.updateGroups = function () {
  for (let i = 0; i < this.pointElts.length; i++) {
    let pt = this.pointElts[i];
    let group = this.kmeans.clusters[i];

Show/Hide Boxes


.hidden {
    visibility: hidden;


const btnHideBoxes = document.querySelector("#btn-hide-boxes");
btnHideBoxes.addEventListener("click", () => {
    for (let box of document.querySelectorAll(".bounding-box")) {

Animation was a Pain

let elt = document.createElementNS(SVG_NS, "polygon");
elt.setAttributeNS(null, "points", "10,0 0,10 -10,0 0,-10");
let animate = document.createElementNS(SVG_NS, "animateTransform");
animate.setAttributeNS(null, "attributeName", "transform");
animate.setAttributeNS(null, "attributeType", "XML");
animate.setAttributeNS(null, "type", "translate");
animate.setAttributeNS(null, "from", "0 0");
animate.setAttributeNS(null, "to", "0 0");
animate.setAttributeNS(null, "dur", "1s");
animate.setAttributeNS(null, "repeatCount", "1");
animate.setAttributeNS(null, "fill", "freeze");
animate.setAttributeNS(null, "begin", "indefinite");

Starting Mean Animation

this.updateMeans = function () {
  for (let i = 0; i < this.k; i++) {
    let elt = this.meanElts[i];
    let x = this.kmeans.xMeans[i];
    let y = this.kmeans.yMeans[i];
    let animate = elt.firstChild;
    let from = animate.getAttribute("to");
    from = (from === "0 0") ? `${x} ${y}` : from;
    animate.setAttributeNS(null, "from", from);
    animate.setAttributeNS(null, "to", `${x} ${y}`);

Would Have Been Easier


.mean-point {
  transition-property: transform;
  transition-duration: 1s;

Unfortunately this applies the transition to style = "transform: ...;" and not the transform svg attribute.

  • maybe there is a simpler way?


Next Week

Something a bit different

  • build a visualization collaboratively in class!
  • details to follow