One of the most fascinating problem in automata theory is the so-called
"dot-depth 2" problem. It admits several equivalent formulations
Combinatorial version. Is there an
algorithm to decide whether a given rational language is a Boolean
combination of languages of the form
A^{*}_{0}a_{1}A^{*}_{1} ...
a_{n}A^{*}_{n}, where the
a_{i}'s are letters and the A_{i}'s are
subsets of the alphabet ?
Algebraic version. Is there an algorithm to
decide whether a given finite monoid divides, for some n, a monoid
of n by n upper-triangular Boolean matrices ?
Logical version. Is there an algorithm to
decide whether a first order formula of Büchi's sequential calculus
is equivalent (on finite words) with a Boolean combination of
Σ_{2}-formulas ?
Let A be an alphabet. The definition of star-free subsets of A^{*}
makes use of two different types of operations: Boolean operations and
concatenation product. By alternating the use of these two operations, one
gets a hierarchy, called the concatenation hierarchy, defined as
follows.
The sets of level 0 are the empty set and A^{*},
For every integer n > 0, the sets of level n+1/2 are the finite
unions of sets of the form
L_{0}a_{1}L_{1}a_{2} ... a_{k}L_{k}
where L_{0}, L_{1}, ..., L_{k} are sets of
level n and a_{1}, ..., a_{k} are letters.
For every integer n > 0, the sets of level n+1 are the finite
Boolean combinations of sets of level n+1/2.
Note that a set of level m is also a set of level n for every n > m. The
following results are known (Brzozowski and
Knast, Perrin and Pin).
For every n> 0, the sets of level n are closed under union,
intersection, and complement.
For every n> 0, the sets of level n+1/2 are closed under union,
intersection, and concatenation product.
The hierarchy is strict: if A contains at least two letters, then for
every n, there exist some sets of level n+1 that are not of level n + 1/2
and some sets of level n + 1/2 that are not of level n.
2. Algebraic version.
A monoid M divides a monoid N if M is a quotient of a submonoid of N.
The Boolean semiring is B = {0, 1}. Addition and multiplication are
given in the following tables
+
0
1
0
0
1
1
1
1
x
0
1
0
0
0
1
0
1
Matrix multiplication is defined in the usual way. A Boolean upper triangular
matrix is a matrix whith 0 entries in position (i, j) if i
> j. For instance
1
0
1
1
0
0
0
1
0
0
1
1
0
0
0
1
is a 4 x 4 upper triangular Boolean matrix. The set T_{n} of n x n
upper triangular matrices is a monoid under the multiplication of Boolean
matrices. The problem is to decide whether a given finite monoid divides
one of the monoids T_{n}, for some n > 0.
The corresponding problem for the monoids U_{n} of unitriangular
matrices has been solved: a monoid divides a monoid U_{n} for some
n > 0, if and only if it is J-trivial.
3. Logical version.
Büchi's sequential calculus is a logical formalism to specify some
combinatorial properties of a finite word, for instance "the factor bba
occurs three times in the word, but the factor bbb does not occur". Thus,
each logical sentence of this calculus defines a language, namely the set
of all words that satisfy the property expressed by the formula.
More precisely, to each word u ∈ A^{+} is associated a structure
M_{u} = ({1, 2, ..., ¦u¦}, <, (a)_{a ∈
A} )
where < denotes the usual order on {1, 2, ..., ¦u¦}
and a is the set of all i such that the i-th letter of u is
an a. For instance, if A = {a, b} and u = abaab, then a =
{1, 3, 4} and b = {2, 5}. The logical language appropriate to
such models has < and the a's as non logical symbols, and
formulas are built in the standard way by using these non-logical symbols,
variables, Boolean connectives, equality between elements (positions) and
quantifiers.
Given a sentence φ, we denote by L(φ) the set of all words
which satisfy φ, when words are considered as models. For instance,
the formula
∃ i ∃ j i < j ∧
a i ∧ b j
defines the language A^{*}aA^{*}bA^{*}.
Two formulas are said to be (elementary) equivalent if they define
the same languages.
It is well known that every logical formula can be written in prenex formal
form, that is to say, of the form φ = Q(x_{1},
...,x_{k})ψ where Q(x_{1}, ...,x_{k}) is a
sequence of existential or universal quantifiers on the variables
x_{1}, ...,x_{k} and where ψ is quantifier free. If
Q(x_{1}, ...,x_{k}) is formed of n blocks of quantifiers
such that the first block contains only existential quantifiers (note that
this first block may be empty), the second block universal quantifiers,
etc., we say that φ is a Σ_{n}-formula. We denote
by Σ_{n} the set of the Σ_{n}-formulae
and by BΣ_{n} the set of Boolean combinations (Boolean
operations on formulae comprise conjunction, disjunction and negation) of
Σ_{n}-formulae.
The connection between the two hierarchies was established by Thomas (see also Perrin and
Pin): a subset of A^{*} is of level n (resp. n+1/2) if and only
if it is BΣ_{n}-definable (resp.
Σ_{n+1}-definable).
J.-E. Pin, Logic On Words, Bulletin
of the European Association of Theoretical Computer Science
54 (1994), 145--165. Current Trends in Theoretical Computer Science, Entering the 21st
Century, eds. G. Paun, G. Rozenberg and A. Salomaa, Word Scientific,
(2001), 254--273. Abstract