Lecture 20: Pattern Matching, Part I

COSC 225: Algorithms and Visualization

Spring, 2023

Announcement

Final Project, Preliminary Presentations

  • Monday, May 1 in class
  • includes a substantial component of your project
  • demos in small groups
  • give/receive constructive feedback on projects

Fundamental Problem

Input.

  • a large text, TEXT
  • a smaller text, PATTERN

Output.

  • “yes” if TEXT contains PATTERN as a substring
  • or, starting index of first instance of PATTERN in TEXT
    • -1 if PATTERN does not appear

Example

  • TEXT = "C C T G T C T C T G T A A T C A T G A A"

  • PATTERN1 = "T A A T"
  • PATTERN2 = "T A C T"

An Algorithm?

Question. How to solve the problem in general?

Naive String Search Code

let idx = 0;
let matches = 0;
while (idx < TEXT.length - PATTERN.length) {
  if (matches == PATTERN.length) return idx;
  if (TEXT[idx + matches] == PATTERN[matches]) {
    matches++;
  } else {
    idx++;
    matches = 0;
  }
  return (matches == PATTERN.length) ? idx : -1;
}

Running Time?

Question. If TEXT has length $n$ and PATTERN has length $m$, what is the worst-case running time (big 0) of the procedure as a function of $n$ and $m$?

Worst-Case Example?

Question. What example TEXTs and PATTERNs are especially inefficient for our procedure?

Code, Again

idx = 0;
matches = 0;
while (idx < TEXT.length - PATTERN.length) {
  if (matches == PATTERN.length) return idx;
  if (TEXT[idx + matches] == PATTERN[matches]) {
    matches++;
  } else {
    idx++;
    matches = 0;
  }
  return (matches == PATTERN.length) ? idx : -1;
}

Picturing an Execution

  • TEXT = "C C T G T C T C T G T A A T C A T G A A"

  • PATTERN = "T C A T"

Picture with Boxes

  • TEXT = "C C T G T C T C T G T A A T C A T G A A"
  • PATTERN = "T C A T"

This Week

Today

  • Collaborate to make a visualization of the naive string search algorithm

Wednesday

  • A more sophisticated string search algorithm

User Interactions

  • user specifies a TEXT
  • user specifies a PATTERN
  • user steps through algorithm

Visualization

  • Display TEXT and PATTERN as a horizontal rows of boxes

    • one letter per box
  • TEXT and PATTERN aligned to show current value of idx

  • TEXT and PATTERN colored to indicate (mis) matched character with current alignment

  • Each step does

    • update aligned character (mis)matches (matches incremented)
    • shift PATTERN text (idx incremented)

Given Program Structure

index.html:

  • elements for 'text-container', 'pattern-container'
    • each will contain letter-boxs
  • <input> elements for user input
  • <button> elements to update the text and step

style.css:

  • classes for .container, .letter-box
  • a few other styling things

pattern-matching.js Structure

  • Arrays for
    • text characters, pattern characters
    • text and pattern elements (divs with class letter-box)
  • Elements for
    • text and pattern containers
    • input fields and buttons
  • Incomplete methods for
    • shifting the pattern
    • un/marking matches and mismatches
    • pattern matching algorithm step

The Code

  • Download lec20-pattern-matching.zip

Your Task

  1. Partition yourselves into groups of up to 3
  2. Get a group number
  3. Group number $\implies$ a role
    • Display Group
    • Input Group
    • Algo Group
  4. Follow instructions on Google Doc
    • link on course Moodle site
  5. Paste in your code to Google Doc when you’re done

Bringing it Together

  • Display group
  • Input group
  • Algo group