Overview

We’ve now seen that the regular languages on an alphabet \(\Sigma\) are closed under union, concatenation, Kleene star, complement, and intersection. This means that if we start with any finite collection of regular languages and combine them using these operations, we always get another regular language. It’s a powerful class—and as I mentioned in the section on generation, it seems likely that all phonological grammars of natural languages fall within it.

But recall the point I made in the section on the Generativist Conceit: Gold’s theorem tells us that the full class of regular languages is not learnable in the limit from positive data, because it contains infinite chains of languages \(L_1 \subset L_2 \subset \ldots\) that allow the learner to be tricked. If phonological grammars are regular, and if children learn phonological grammars from positive data, then the class of phonological grammars cannot be the entire class of regular languages. It must be a proper subclass—one that avoids these infinite containment chains.

This observation raises a natural question: which subclass? Is there a principled way to identify the region of the regular languages where phonological patterns live?

Over the past several decades, a body of work in formal language theory and computational phonology has identified a collection of subclasses of the regular languages—known collectively as the subregular hierarchy—that have two properties relevant to this question: (i) they are learnable in the limit from positive data; and (ii) the phonological patterns of natural languages overwhelmingly fall into them (Heinz 2018; Rogers and Pullum 2011; Rogers et al. 2013; Lambert et al. 2021).

The hierarchy looks roughly like this:

graph BT
    SL["<b>SL</b><br/>Strictly Local"] --> LT["<b>LT</b><br/>Locally Testable"]
    SP["<b>SP</b><br/>Strictly Piecewise"] --> PT["<b>PT</b><br/>Piecewise Testable"]
    LT --> SF["<b>Star-Free</b>"]
    PT --> SF
    SF --> REG["<b>Regular</b>"]
    SL -.-> TSL["<b>TSL</b><br/>Tier-based<br/>Strictly Local"]
    LT -.-> TSL

    style SL fill:#d4edda,stroke:#28a745,stroke-width:2px
    style SP fill:#d4edda,stroke:#28a745,stroke-width:2px
    style TSL fill:#fff3cd,stroke:#ffc107,stroke-width:2px,stroke-dasharray: 5 5
    style LT fill:#d1ecf1,stroke:#17a2b8
    style PT fill:#d1ecf1,stroke:#17a2b8
    style SF fill:#e2e3e5,stroke:#6c757d
    style REG fill:#f8d7da,stroke:#dc3545

Solid arrows indicate strict containment (\(\subset\)); the dashed lines to TSL indicate that it crosscuts the local branch, containing languages that are not SL but are contained in LT. The green classes at the bottom are where most attested phonological patterns live.

Attested phonological patterns cluster at the bottom of this hierarchy, primarily in SL, SP, and TSL. Very few phonological patterns require even LT or PT, and essentially none require the full power of the regular languages. The convergence between the formal hierarchy and the typological distribution of phonological patterns suggests that the learnability constraint is doing real work in shaping what phonological systems look like.

In the following sections, I’ll define each of these classes, ground them in phonological examples, and show how they relate to each other and to the finite state automata we’ve been studying.

References

Heinz, Jeffrey. 2018. “The Computational Nature of Phonological Generalizations.” In Phonological Typology, edited by Larry M. Hyman and Frans Plank. De Gruyter Mouton. https://doi.org/doi:10.1515/9783110451931-005.
Heinz, Jeffrey, Chetan Rawal, and Herbert G. Tanner. 2011. “Tier-Based Strictly Local Constraints for Phonology.” Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 58–64.
Jardine, Adam, and Jeffrey Heinz. 2016. “Learning Tier-Based Strictly 2-Local Languages.” Transactions of the Association for Computational Linguistics 4: 87–98. https://doi.org/10.1162/tacl\_a\_00085.
Lambert, Dakotah, Jonathan Rawski, and Jeffrey Heinz. 2021. “Typology Emerges from Simplicity in Representations and Learning.” J. Language Modelling 9 (1): 151–94. https://doi.org/10.15398/jlm.v9i1.262.
Rogers, James, Jeffrey Heinz, Margaret Fero, Jeremy Hurst, Dakotah Lambert, and Sean Wibel. 2013. “Cognitive and Sub-Regular Complexity.” Formal Grammar, Lecture notes in computer science, vol. 8036: 90–108. https://doi.org/10.1007/978-3-642-39998-5\_6.
Rogers, James, and Geoffrey K. Pullum. 2011. “Aural Pattern Recognition Experiments and the Subregular Hierarchy.” J. Logic, Language and Information 20 (3): 329–42. https://doi.org/10.1007/s10849-011-9140-2.