Window-Accumulated Subsequence matching Problem is linear
L. Boasson, Patrick Cegielski
I. Guessarian and Y. Matiyasevich
Given two strings, text $t$ of length $n$,
and pattern $p=p_1\ldots p_k$ of length $k$, and given a natural number $w$,
the subsequence matching problem consists in
finding the number of size $w$ windows of text $t$ which
contain pattern $p$ as a subsequence, i.e. the letters $p_1,\ldots
,p_k$ occur in the window, in the same order as in $p$, but not
necessarily consecutively (they may be interleaved with other
letters). Subsequence matching
is used for finding frequent
patterns and association rules in databases.
We generalize the Knuth-Morris-Pratt (KMP) pattern
matching algorithm; we define a non-conventional kind of RAM, the MP--RAMs which
model more closely the microprocessor operations; we design an $O(n)$ on-line algorithm for
solving the subsequence matching problem on MP--RAMs.