Final Project, Preliminary Presentations
Input.
TEXT
PATTERN
Output.
TEXT
contains PATTERN
as a substringPATTERN
in TEXT
-1
if PATTERN
does not appearidx = 0;
matches = 0;
while (idx < TEXT.length - PATTERN.length) {
if (matches == PATTERN.length) return idx;
if (TEXT[idx + matches] == PATTERN[matches]) {
matches++;
} else {
idx++;
matches = 0;
}
return (matches == PATTERN.length) ? idx : -1;
}
lec21-naive-pattern-matching.zip
Resetting the state of the algorithm
Method should be called by update buttons (Input Group), but implementation should be handled by algorithm (Algo Group)
resetState
methodThroughout:
TEXT
PATTERN
Last Time, we showed worst-case running time is $\Theta(n \cdot m)$.
TEXT = 'aaaaaaaaaaaaa...a'
PATTERN = 'aaaa....ab'
Question. What redundant/unnecessary work is being done by the algorithm?
What does naive pattern matching do?
Suppose the following string is matched up to index i = 3
, and mismatched at index i = 4
. What should our next comparison be?
Suppose the following string is matched up to index i = 4
, and mismatched at index i = 5
. What should be our next comparison be?
Question. Can we perform pattern matching search in such a way that the textIndex
never decreases?
TEXT
, then we know they are the same as the previous characters in PATTERN
Question. Suppose we’ve matched $P_{k}$ with our text, but $P[k+1]$ is a mismatch. Under what condition can we match $P_i$ with our text?
Question. Suppose we’ve matched $P_{k}$ with our text, but $P[k+1]$ is a mismatch. Under what condition can we match $P_i$ with our text?
Answer. We can match $P_i$ with the text if $P_i$ is a suffix of $P_k$
Definition. Given a pattern $P$ of length $m$, the associated prefix function is an array $\pi$ of length $m$ defined as follows:
Write the prefix function of this pattern:
Question. Given the prefix function $\pi$, how can we compute matches faster?
Idea.
Use matches and $\pi$ to do more efficient shifts:
In Pseudo-code!
T
a text of length n
, P
a pattern of length m
, pi
the prefix function of P
let matched = 0
for (i from 0 to n - 1):
while matched > 0 and P[matched+1] != T[i]
matched = pi[matched]
if P[matched] == T[i]
matched++
if matched == m
return i
lec21-kmp-pattern-matching
Question. What is the running time of the method?
let matched = 0
for (i from 0 to n - 1):
while matched > 0 and P[matched+1] != T[i]
matched = pi[matched]
if P[matched] == T[i]
matched++
if matched == m
return i
Observations.
while
loop does at most matched
iterationsk
iterations, matched
must be incremented k
timesfor
loop increments k
oncewhile
loop iterations is $\leq$ number of for
loop iterationComputing $\pi$ of $P$ efficiently