# Lecture 20: Pattern Matching, Part I

## Announcement

Final Project, Preliminary Presentations

• Monday, May 1 in class
• includes a substantial component of your project
• demos in small groups
• give/receive constructive feedback on projects

## Fundamental Problem

Input.

• a large text, TEXT
• a smaller text, PATTERN

Output.

• “yes” if TEXT contains PATTERN as a substring
• or, starting index of first instance of PATTERN in TEXT
• -1 if PATTERN does not appear

## Example

• TEXT = "C C T G T C T C T G T A A T C A T G A A"

• PATTERN1 = "T A A T"
• PATTERN2 = "T A C T"

## An Algorithm?

Question. How to solve the problem in general?

## Naive String Search Code

let idx = 0;
let matches = 0;
while (idx < TEXT.length - PATTERN.length) {
if (matches == PATTERN.length) return idx;
if (TEXT[idx + matches] == PATTERN[matches]) {
matches++;
} else {
idx++;
matches = 0;
}
return (matches == PATTERN.length) ? idx : -1;
}


## Running Time?

Question. If TEXT has length $n$ and PATTERN has length $m$, what is the worst-case running time (big 0) of the procedure as a function of $n$ and $m$?

## Worst-Case Example?

Question. What example TEXTs and PATTERNs are especially inefficient for our procedure?

## Code, Again

idx = 0;
matches = 0;
while (idx < TEXT.length - PATTERN.length) {
if (matches == PATTERN.length) return idx;
if (TEXT[idx + matches] == PATTERN[matches]) {
matches++;
} else {
idx++;
matches = 0;
}
return (matches == PATTERN.length) ? idx : -1;
}


## Picturing an Execution

• TEXT = "C C T G T C T C T G T A A T C A T G A A"

• PATTERN = "T C A T"

## Picture with Boxes

• TEXT = "C C T G T C T C T G T A A T C A T G A A"
• PATTERN = "T C A T"

## This Week

Today

• Collaborate to make a visualization of the naive string search algorithm

Wednesday

• A more sophisticated string search algorithm

## User Interactions

• user specifies a TEXT
• user specifies a PATTERN
• user steps through algorithm

## Visualization

• Display TEXT and PATTERN as a horizontal rows of boxes

• one letter per box
• TEXT and PATTERN aligned to show current value of idx

• TEXT and PATTERN colored to indicate (mis) matched character with current alignment

• Each step does

• update aligned character (mis)matches (matches incremented)
• shift PATTERN text (idx incremented)

## Given Program Structure

index.html:

• elements for 'text-container', 'pattern-container'
• each will contain letter-boxs
• <input> elements for user input
• <button> elements to update the text and step

style.css:

• classes for .container, .letter-box
• a few other styling things

## pattern-matching.js Structure

• Arrays for
• text characters, pattern characters
• text and pattern elements (divs with class letter-box)
• Elements for
• text and pattern containers
• input fields and buttons
• Incomplete methods for
• shifting the pattern
• un/marking matches and mismatches
• pattern matching algorithm step

## The Code

• Download lec20-pattern-matching.zip

1. Partition yourselves into groups of up to 3
2. Get a group number
3. Group number $\implies$ a role
• Display Group
• Input Group
• Algo Group