The mildly context-sensitive hypothesis

In the previous section, I showed that certain syntactic phenomena—cross-serial dependencies in Swiss German, for instance—cannot be generated by context-free grammars. Shieber (1985) established this conclusively. But the Chomsky hierarchy offers only one step up from context-free: the context-sensitive grammars, which are far more powerful than we need—and, importantly, have parsing problems that are PSPACE-complete, making them computationally intractable.

Joshi (1985) proposed that natural languages fall in a narrow band between context-free and context-sensitive—what he called mildly context-sensitive. This is an empirical hypothesis about the computational complexity of natural language, and it has held up well over four decades.

Properties of mildly context-sensitive languages

Joshi characterized the mildly context-sensitive (MCS) languages by four properties:

Limited cross-serial dependencies. The formalism can generate languages with cross-serial dependencies (unlike CFGs), but only a bounded number of them at a time.
Constant growth. The lengths of strings in the language grow at most polynomially—more precisely, the string language is semilinear, meaning the Parikh image (the set of letter counts) is a semilinear set. This rules out languages like $\{a^{2^n} \mid n \geq 0\}$.
Polynomial-time parsing. There exists a parsing algorithm that runs in polynomial time in the length of the input string. (For TAGs, this is $O(n^6)$; for MCFGs with bounded fan-out, it’s $O(n^{3k})$ where $k$ is the fan-out.)
The language $\{a^n b^n c^n d^n\}$ is generable. This language is not context-free (it requires cross-serial dependencies between the $a$’s and $c$’s and between the $b$’s and $d$’s), so any formalism that generates it is strictly more powerful than CFGs.

Convergence of formalisms

Multiple independently developed formalisms turn out to characterize the same class of languages. Vijay-Shanker and Weir (1994) showed that Tree-Adjoining Grammars (TAGs), Head Grammars, Linear Indexed Grammars, and Combinatory Categorial Grammars (CCGs) all generate the same string languages. Seki et al. (1991) showed that Multiple Context-Free Grammars (MCFGs) with fan-out 1 do the same. And Michaelis (2001) showed that Minimalist Grammars (MGs)—a formalization of Chomsky’s Minimalist Program—also generate exactly this class.

These formalisms were developed from very different starting points—TAGs from the study of locality in syntax (Joshi et al. 1975), CCGs from the logical tradition in categorial grammar (Steedman 2000), MCFGs from the formal study of multiple string components (Seki et al. 1991), and MGs from the Chomskyan generative tradition (Stabler 1997)—and yet they converge. This convergence is what gives the MCS hypothesis its force: the boundary between what natural language does and does not require does not seem to depend on how you choose to describe grammars.

In the hierarchy we’ve been building throughout the course:

\[\text{SL} \subset \text{Regular} \subset \text{Context-Free} \subset \text{Mildly Context-Sensitive} \subset \text{Context-Sensitive} \subset \text{Recursively Enumerable}\]

The subregular hierarchy characterizes the fine structure within the regular languages. The MCS class characterizes the fine structure just above the context-free languages. Both are cases where natural language is more constrained than the worst case that the formalism allows—and in both cases, the constraint appears to be connected to learnability.

In the following sections, I’ll define each of the four main MCS formalisms, show how they handle the cross-serial dependency problem, and discuss their relationship to each other.

References

Joshi, Aravind K. 1985. “Tree Adjoining Grammars: How Much Context-Sensitivity Is Required to Provide Reasonable Structural Descriptions?” Natural Language Parsing, 206–50.

Joshi, Aravind K., Leon S. Levy, and Masako Takahashi. 1975. “Tree Adjunct Grammars.” Journal of Computer and System Sciences 10 (1): 136–63.

Michaelis, Jens. 2001. “Transforming Linear Context-Free Rewriting Systems into Minimalist Grammars.” Logical Aspects of Computational Linguistics, LNCS, vol. 2099: 228–44.

Seki, Hiroyuki, Takashi Matsumura, Mamoru Fujii, and Tadao Kasami. 1991. “On Multiple Context-Free Grammars.” Theoretical Computer Science 88 (2): 191–229.

Shieber, Stuart M. 1985. “Evidence Against the Context-Freeness of Natural Language.” Linguistics and Philosophy 8 (3): 333–43.

Stabler, Edward P. 1997. “Derivational Minimalism.” In Logical Aspects of Computational Linguistics, vol. 1328. LNCS. Springer.

Steedman, Mark. 2000. The Syntactic Process. MIT Press.

Vijay-Shanker, K., and David J. Weir. 1994. “The Equivalence of Four Extensions of Context-Free Grammars.” Mathematical Systems Theory 27 (6): 511–46.

--- title: The mildly context-sensitive hypothesis bibliography: ../../references.bib --- In the [previous section](../cfgs-for-syntax.qmd), I showed that certain syntactic phenomena—cross-serial dependencies in Swiss German, for instance—cannot be generated by context-free grammars. @shieber1985evidence established this conclusively. But the Chomsky hierarchy offers only one step up from context-free: the context-sensitive grammars, which are far more powerful than we need—and, importantly, have parsing problems that are PSPACE-complete, making them computationally intractable. @joshi1985tree proposed that natural languages fall in a narrow band between context-free and context-sensitive—what he called *mildly context-sensitive*. This is an empirical hypothesis about the computational complexity of natural language, and it has held up well over four decades. ## Properties of mildly context-sensitive languages Joshi characterized the mildly context-sensitive (MCS) languages by four properties: 1. **Limited cross-serial dependencies.** The formalism can generate languages with cross-serial dependencies (unlike CFGs), but only a bounded number of them at a time. 2. **Constant growth.** The lengths of strings in the language grow at most polynomially—more precisely, the string language is *semilinear*, meaning the Parikh image (the set of letter counts) is a semilinear set. This rules out languages like $\{a^{2^n} \mid n \geq 0\}$. 3. **Polynomial-time parsing.** There exists a parsing algorithm that runs in polynomial time in the length of the input string. (For TAGs, this is $O(n^6)$; for MCFGs with bounded fan-out, it's $O(n^{3k})$ where $k$ is the fan-out.) 4. **The language $\{a^n b^n c^n d^n\}$ is generable.** This language is not context-free (it requires cross-serial dependencies between the $a$'s and $c$'s and between the $b$'s and $d$'s), so any formalism that generates it is strictly more powerful than CFGs. ## Convergence of formalisms Multiple independently developed formalisms turn out to characterize the same class of languages. @vijayshanker1994equivalence showed that Tree-Adjoining Grammars (TAGs), Head Grammars, Linear Indexed Grammars, and Combinatory Categorial Grammars (CCGs) all generate the same string languages. @seki1991multiple showed that Multiple Context-Free Grammars (MCFGs) with fan-out 1 do the same. And @michaelis2001transforming showed that Minimalist Grammars (MGs)—a formalization of Chomsky's Minimalist Program—also generate exactly this class. These formalisms were developed from very different starting points—TAGs from the study of locality in syntax [@joshi1975tree], CCGs from the logical tradition in categorial grammar [@steedman2000syntactic], MCFGs from the formal study of multiple string components [@seki1991multiple], and MGs from the Chomskyan generative tradition [@stabler1997derivational]—and yet they converge. This convergence is what gives the MCS hypothesis its force: the boundary between what natural language does and does not require does not seem to depend on how you choose to describe grammars. In the hierarchy we've been building throughout the course: $$\text{SL} \subset \text{Regular} \subset \text{Context-Free} \subset \text{Mildly Context-Sensitive} \subset \text{Context-Sensitive} \subset \text{Recursively Enumerable}$$ The [subregular hierarchy](../../phonological-patterns/subregular-hierarchy/index.qmd) characterizes the fine structure within the regular languages. The MCS class characterizes the fine structure just above the context-free languages. Both are cases where natural language is more constrained than the worst case that the formalism allows—and in both cases, the constraint appears to be connected to learnability. In the following sections, I'll define each of the four main MCS formalisms, show how they handle the cross-serial dependency problem, and discuss their relationship to each other.