The mildly context-sensitive hypothesis
In the previous section, I showed that certain syntactic phenomena—cross-serial dependencies in Swiss German, for instance—cannot be generated by context-free grammars. Shieber (1985) established this conclusively. But the Chomsky hierarchy offers only one step up from context-free: the context-sensitive grammars, which are far more powerful than we need—and, importantly, have parsing problems that are PSPACE-complete, making them computationally intractable.
Joshi (1985) proposed that natural languages fall in a narrow band between context-free and context-sensitive—what he called mildly context-sensitive. This is an empirical hypothesis about the computational complexity of natural language, and it has held up well over four decades.
Properties of mildly context-sensitive languages
Joshi characterized the mildly context-sensitive (MCS) languages by four properties:
Limited cross-serial dependencies. The formalism can generate languages with cross-serial dependencies (unlike CFGs), but only a bounded number of them at a time.
Constant growth. The lengths of strings in the language grow at most polynomially—more precisely, the string language is semilinear, meaning the Parikh image (the set of letter counts) is a semilinear set. This rules out languages like \(\{a^{2^n} \mid n \geq 0\}\).
Polynomial-time parsing. There exists a parsing algorithm that runs in polynomial time in the length of the input string. (For TAGs, this is \(O(n^6)\); for MCFGs with bounded fan-out, it’s \(O(n^{3k})\) where \(k\) is the fan-out.)
The language \(\{a^n b^n c^n d^n\}\) is generable. This language is not context-free (it requires cross-serial dependencies between the \(a\)’s and \(c\)’s and between the \(b\)’s and \(d\)’s), so any formalism that generates it is strictly more powerful than CFGs.
Convergence of formalisms
Multiple independently developed formalisms turn out to characterize the same class of languages. Vijay-Shanker and Weir (1994) showed that Tree-Adjoining Grammars (TAGs), Head Grammars, Linear Indexed Grammars, and Combinatory Categorial Grammars (CCGs) all generate the same string languages. Seki et al. (1991) showed that Multiple Context-Free Grammars (MCFGs) with fan-out 1 do the same. And Michaelis (2001) showed that Minimalist Grammars (MGs)—a formalization of Chomsky’s Minimalist Program—also generate exactly this class.
These formalisms were developed from very different starting points—TAGs from the study of locality in syntax (Joshi et al. 1975), CCGs from the logical tradition in categorial grammar (Steedman 2000), MCFGs from the formal study of multiple string components (Seki et al. 1991), and MGs from the Chomskyan generative tradition (Stabler 1997)—and yet they converge. This convergence is what gives the MCS hypothesis its force: the boundary between what natural language does and does not require does not seem to depend on how you choose to describe grammars.
In the hierarchy we’ve been building throughout the course:
\[\text{SL} \subset \text{Regular} \subset \text{Context-Free} \subset \text{Mildly Context-Sensitive} \subset \text{Context-Sensitive} \subset \text{Recursively Enumerable}\]
The subregular hierarchy characterizes the fine structure within the regular languages. The MCS class characterizes the fine structure just above the context-free languages. Both are cases where natural language is more constrained than the worst case that the formalism allows—and in both cases, the constraint appears to be connected to learnability.
In the following sections, I’ll define each of the four main MCS formalisms, show how they handle the cross-serial dependency problem, and discuss their relationship to each other.