Substring Search with KMP (C)
Topics: strings, algorithms, the failure function
Problem
Find the first occurrence of a pattern in a text using the Knuth-Morris-Pratt
algorithm, which runs in O(text + pattern) and never backtracks in the text.
long kmp_search(const char *text, size_t tlen, const char *pat, size_t plen);
// index of first match, or -1; empty pattern matches at 0
Why naive search is slow, and KMP is not
The naive approach, on a mismatch, slides the pattern forward by one and restarts the
comparison from scratch, re-examining text it already looked at. For a pattern like
aaaab against aaaaaaaab, that re-scanning is quadratic.
KMP avoids it by precomputing, for the pattern alone, a failure function (the LPS
table): for each prefix, the length of the longest proper prefix that is also a
suffix. When a mismatch happens after matching pi characters, you do not restart;
you set pi = lps[pi-1], which slides the pattern forward by the most it can move
while keeping the part you already matched aligned. The text index never moves
backward.
// search loop
if (text[ti] == pat[pi]) { ti++; pi++; if (pi == plen) return ti - plen; }
else if (pi != 0) { pi = lps[pi - 1]; } // shift pattern, keep ti
else { ti++; }
Key concepts
- The LPS table is the whole algorithm. Building it is itself a little KMP-style
self-match of the pattern against itself.
- No backtracking in the text.
ti only ever increases, which is what gives the
linear bound.
- This is the canonical "precompute to avoid rework" string algorithm, and the LPS
idea reappears in other pattern-matching and periodicity problems.
Run
cc -std=c11 -O2 -Wall tests.c -o /tmp/kmp && /tmp/kmp
Sign in to submit your solution.