The star-height problem

### The star-height problem. Is there an algorithm to compute the star-height of a given rational language ? The star-height 2 problem. Is there a language of star-height 2 ? History and known results References "Man Before The Infinite" (El Hombre ante el Infinito) by Rufino Tamayo, 1950

## The star-height problem.

The determination of the star-height of a rational language is an old problem of formal language theory (See [Brzozowski, 1980], for an historical survey). The restricted star-height problem has been solved by Hashiguchi (1983), but here we are interested in that aspect of the problem concerning generalized star-height, in which complementation is considered as a basic operator.

The extended rational expressions over an alphabet A are defined recursively as follows.

1. Ø, 1 and a (for every letter a) are extended rational expressions.
2. If E and F are extended rational expressions, so are (E U F) , (EF), Ec and E*.
The (generalized) star-height h(E) of an extended rational expression E is defined recursively by
1. h(Ø) = 0, h(1) = 0 and, for every letter a, h(a) = 0.
2. h(E U F) = h(EF) = max (h(E), h(F)), h(Ec) = h(E) and h(E*) = h(E)+1.
The value v(E) of an extended rational expression E is the language represented by E (Ec stands for the complement of E). More formally, v is recursively defined by
1. v(Ø) = Ø, v(1) = {1} and, for every letter a, v(a) = {a}.
2. v(E U F) = v(E) U v(F), v(EF) = v(E)v(F), v(Ec) = A* \ v(E) and v(E*) = (v(E))*.
The (generalized) star-height h(L) of a rational language L is the minimum of the star-height of all rational expressions representing L.

## The star-height 2 problem.

It is still unknown whether or not there exists a language of star-height 2 !

## History and known results.

The most important result is the characterization of the languages of star-height 0 (or star-free languages), obtained by Schützenberger.

Theorem 1 (Schützenberger 1965) A rational language is star-free if and only if its syntactic monoid is aperiodic.

Corollary 2 There is an algorithm to decide if a given rational language has star-height 0.

The complexity of this algorithm was analyzed by Stern (1985) and later by Cho and Huynh (1991). Given a finite deterministic automaton A, deciding whether A recognize a star-free set can be solved in polynomial space. It is also the complement of an NP-hard problem.

Schützenberger's theorem shows that there exist some languages of star-height 1, for instance (aa)* (it is not difficult to verify that the syntactic monoid of this language is not aperiodic). Schützenberger's Theorem also suggested to study the star-height problem through properties of the syntactic monoid. In this direction, Henneman has studied the languages whose syntactic monoids are groups. The case of commutative groups is especially interesting.

Theorem 3 (Henneman 1971) A language recognized by a finite commutative group is of star-height ≤ 1.

It is then easy to deduce the following consequence.

Corollary 4 A language recognized by a finite commutative monoid is of star-height ≤ 1.

For instance, if a is a letter of A, if B = A \ {a} and if 0 ≤ k ≤ n, then the language (B*a)k((B*aB*)n)* is star-free.

In the quest for a language of star-height > 1, the next level of complexity was nilpotent groups of class 2, and such candidates were actually proposed in Brzozowski (1980). However

Theorem 5 (Pin, Straubing, Thérien 1989) A language recognized by a finite nilpotent group of class ≤ 2 is of star-height ≤ 1.

The combinatorial structure of the languages recognized by nilpotent groups is related to the generalized binomial coefficients introduced by Eilenberg (1976), which count the number of times that a word u appears as a subword (in the sense of subsequence) of a word v. A precise description, involving the class of nilpotency of the group, is given in Thérien (1983). Let L(u,k,n) denote the set of words w in which the number of appearances of u as a subword is congruent to k mod n. Then a language is recognized by a nilpotent group of class c if and only if it is a boolean combination of languages L(u,k,n) where |u| ≤ c. Thus, the problem of finding the star-height of languages recognized by nilpotent groups reduces to finding the star-height of the languages L(u,k,n). Henneman's result mentionned above is a consequence of the fact that the star-height of L(u,k,n) is 1 if u is a word of length 1. Here we show that the star-height of L(u,k,n) is at most 1 if u is a word of length ≤ 2; this corresponds to the case of nilpotent groups of class 2. We were not able to treat completely the case of words of length 3. However, we prove that the star-height of L(u,k,n) is at most 1 if u is a word of length ≤ 3 and n is a square-free integer. This covers in particular the case of L(abc,0,2), which was proposed as a candidate for having star-height 2 in Brzozowski (1980). Other more recent candidates have been eliminated in Robson.

Non-trivial examples of languages of star-height 1 were given by Thomas. Let A = {a,b} and, for each n ≥ 0, put xn = anb. Then the set X = { xn | n ≥ 0 } is a prefix code such that X* = A*b U {1}. In particular, every word of X* admits a unique factorization as a product of words of X. Now, let W(h,k,r,m) be the set of words w of X* such that, in the factorization of w, the number of factors xn with n congruent to r mod m is congruent to h mod k. Then we have

Theorem 6 (Thomas 1981) For every h, k, r, m, the languages W(h,k,r,m) are of star-height at most 1.

It is also shown in (Pin, Straubing, Thérien 1989) that the groups that divide a semidirect product of a commutative group by (\Z/2\Z)n, and the monoids that divide a wreath product of the form M o (G o N), where M and N are aperiodic monoids, and G is a commutative group. In particular,

Theorem 7 Every language recognized by a group of order less than 12 is of star-height at most 1.

It is also known (Pin, Straubing, Thérien 1989) that the languages of star height ≤ n, for a given n are closed under boolean operations, concatenation, left and right quotients, star-free injective substitutions and inverses of length-preserving morphisms. On the other hand, every rational language is the inverse image, under some morphism between free monoids, of a language of (restricted) star-height 1. In particular, if the languages of star-height ≤ 1 are closed under inverse morphisms, every rational language is of star-height ≤ 1.

## References.

1. J. A. Brzozowski, Open problems about regular languages, Formal language theory, perspectives and open problems (R.V. Book editor), Academic Press, New York, 1980, 23-47.

2. S. Cho and D.T. Huynh, Finite automaton aperiodicity is PSPACE-complete, Theoret. Comput. Sci. 88, (1991), 99--116.

3. S. Eilenberg, Automata , Languages and Machines, Academic Press, New York, Vol. A, 1974; Vol B, 1976.

4. K. Hashiguchi, Representation theorems on regular languages, J. Comput. System Sci. 27 (1983) 101-115.

5. W. H. Henneman, Algebraic theory of automata, Ph. D. Dissertation, MIT (1971).

6. G. Lallement, Semigroups and Combinatorial Applications, Wiley, New York, 1979.

7. R. McNaughton and S. Papert, Counter-free Automata, MIT-Press, Cambridge, Mass., 1971.

8. J.-E. Pin, Varieties of formal languages, North Oxford Academic, London and Plenum, New-York, 1986.

9. J.-E. Pin, H. Straubing and D. Thérien, New results on the generalized star-height problem, STACS 89, Lecture Notes in Computer Science 349, (1989), 458-467.

10. J.-E. Pin, H. Straubing et D. Thérien, Some results on the generalized star-height problem, Information and Computation 101 (1992), 219--250. Abstract

11. J. M. Robson, More languages of generalised star height 1. Theoretical Computer Science 106, 327-335, (1992). Note.

12. M. P. Schützenberger, On finite monoids having only trivial subgroups, Information and Control 8 (1965), 190-194.

13. J. Stern, Complexity of some problems from the theory of automata, Information and Computation 66 (1985) 163-176.

14. H. Straubing, Aperiodic homomorphisms and the concatenation product of recognizable sets, J. Pure Appl. Algebra 15 (1979) 319-327.

15. D. Thérien, Classification of regular languages by congruences, Ph. D. Thesis, Waterloo, 1980.

16. D. Thérien, Subwords counting and nilpotent groups, in Combinatorics on Words, Progress and Perspectives, L. Cummings ed., Academic Press (1983), 293-306.

17. W. Thomas, Remark on the Star-Height Problem, Theoretical Computer Science 13, 1981, 231-237.