graph BT
SL["<b>SL</b><br/>Strictly Local"] --> LT["<b>LT</b><br/>Locally Testable"]
SP["<b>SP</b><br/>Strictly Piecewise"] --> PT["<b>PT</b><br/>Piecewise Testable"]
LT --> SF["<b>Star-Free</b>"]
PT --> SF
SF --> REG["<b>Regular</b>"]
SL -.-> TSL["<b>TSL</b><br/>Tier-based<br/>Strictly Local"]
LT -.-> TSL
style SL fill:#d4edda,stroke:#28a745,stroke-width:2px
style SP fill:#d4edda,stroke:#28a745,stroke-width:2px
style TSL fill:#fff3cd,stroke:#ffc107,stroke-width:2px,stroke-dasharray: 5 5
style LT fill:#d1ecf1,stroke:#17a2b8
style PT fill:#d1ecf1,stroke:#17a2b8
style SF fill:#e2e3e5,stroke:#6c757d
style REG fill:#f8d7da,stroke:#dc3545
Overview
We’ve now seen that the regular languages on an alphabet \(\Sigma\) are closed under union, concatenation, Kleene star, complement, and intersection. This means that if we start with any finite collection of regular languages and combine them using these operations, we always get another regular language. It’s a powerful class—and as I mentioned in the section on generation, it seems likely that all phonological grammars of natural languages fall within it.
But recall the point I made in the section on the Generativist Conceit: Gold’s theorem tells us that the full class of regular languages is not learnable in the limit from positive data, because it contains infinite chains of languages \(L_1 \subset L_2 \subset \ldots\) that allow the learner to be tricked. If phonological grammars are regular, and if children learn phonological grammars from positive data, then the class of phonological grammars cannot be the entire class of regular languages. It must be a proper subclass—one that avoids these infinite containment chains.
This observation raises a natural question: which subclass? Is there a principled way to identify the region of the regular languages where phonological patterns live?
Over the past several decades, a body of work in formal language theory and computational phonology has identified a collection of subclasses of the regular languages—known collectively as the subregular hierarchy—that have two properties relevant to this question: (i) they are learnable in the limit from positive data; and (ii) the phonological patterns of natural languages overwhelmingly fall into them (Heinz 2018; Rogers and Pullum 2011; Rogers et al. 2013; Lambert et al. 2021).
The hierarchy looks roughly like this:
Solid arrows indicate strict containment (\(\subset\)); the dashed lines to TSL indicate that it crosscuts the local branch, containing languages that are not SL but are contained in LT. The green classes at the bottom are where most attested phonological patterns live.
- SL (Strictly Local) and SP (Strictly Piecewise) are the smallest classes, capturing constraints on adjacent segments and on subsequences respectively.
- LT (Locally Testable) and PT (Piecewise Testable) are somewhat larger: they can count local or piecewise patterns up to a threshold.
- The Star-Free languages sit just below the full regular languages—they are exactly the languages definable without the Kleene star (using only union, concatenation, and complement).
- TSL (Tier-based Strictly Local) crosscuts this picture: it projects the string onto a tier (a subset of the alphabet) and then applies SL constraints on the projected string. This allows it to handle long-distance consonant harmony and similar patterns that are not SL on the full alphabet but become SL once irrelevant segments are ignored (Heinz et al. 2011; Jardine and Heinz 2016).
Attested phonological patterns cluster at the bottom of this hierarchy, primarily in SL, SP, and TSL. Very few phonological patterns require even LT or PT, and essentially none require the full power of the regular languages. The convergence between the formal hierarchy and the typological distribution of phonological patterns suggests that the learnability constraint is doing real work in shaping what phonological systems look like.
In the following sections, I’ll define each of these classes, ground them in phonological examples, and show how they relate to each other and to the finite state automata we’ve been studying.