Window-Accumulated Subsequence matching Problem is linear

L. Boasson, Patrick Cegielski I. Guessarian and Y. Matiyasevich


Given two strings, text $t$ of length $n$, and pattern $p=p_1\ldots p_k$ of length $k$, and given a natural number $w$, the subsequence matching problem consists in finding the number of size $w$ windows of text $t$ which contain pattern $p$ as a subsequence, i.e. the letters $p_1,\ldots ,p_k$ occur in the window, in the same order as in $p$, but not necessarily consecutively (they may be interleaved with other letters). Subsequence matching is used for finding frequent patterns and association rules in databases. We generalize the Knuth-Morris-Pratt (KMP) pattern matching algorithm; we define a non-conventional kind of RAM, the MP--RAMs which model more closely the microprocessor operations; we design an $O(n)$ on-line algorithm for solving the subsequence matching problem on MP--RAMs.