Minimalist Grammars

The formalisms we’ve seen so far—TAGs, type-logical grammars, CCGs, and MCFGs—were developed primarily from computational and mathematical perspectives. Minimalist Grammars (MGs), introduced by Stabler (1997), come from a different direction: they are a formalization of the core operations proposed in Chomsky’s Minimalist Program, which has been the dominant framework in theoretical syntax since the mid-1990s. The reason I’m including them here is a formal result due to Michaelis (2001): the string languages generated by MGs are exactly the same as those generated by MCFGs. A formalism whose design was motivated by linguistic theory turns out to have the same generative capacity as formalisms designed from computational and mathematical considerations—which suggests that the MCS boundary is not an artifact of any particular approach to grammar but a property of natural language itself.

Merge and Move

An MG has two structure-building operations. Merge combines two expressions: one expression (the selector) takes the other (the selectee) as an argument. This is analogous to function application in CCG or to substitution in TAGs. Move displaces a subexpression from one position to another—it handles phenomena like wh-movement (*what did you see __?), where the wh-phrase what* appears at the front of the sentence but is interpreted as the object of see. Move is analogous to adjunction in TAGs, and it is the operation that takes MGs beyond context-free power.

Formally, an MG is a tuple $G = \langle \Sigma, \text{sel}, \text{lic}, \text{Lex} \rangle$ where $\Sigma$ is the terminal alphabet, $\text{sel}$ is a finite set of selectional features (syntactic categories like $v$, $d$, $t$), $\text{lic}$ is a finite set of licensing features (triggers for movement, like $\text{wh}$ or $\text{case}$), and $\text{Lex}$ is a finite set of lexical items—each a string from $\Sigma^*$ paired with a sequence of features drawn from $\text{sel} \cup \overline{\text{sel}} \cup \text{lic} \cup \overline{\text{lic}}$.

Features come in pairs: a positive feature $=x$ (“I select an $x$”) and a corresponding negative feature $x$ (“I am an $x$”). Merge is triggered when a positive selectional feature $=x$ on one expression matches a negative selectional feature $x$ on another; Move is triggered when a positive licensing feature $+y$ matches a negative licensing feature $-y$ somewhere within the same expression.

To make this concrete, a transitive verb like sees might be the lexical item $\text{sees} :: =d\;=d\;v$, meaning it selects two $d$-expressions (two DPs) and then projects as a $v$ (verb phrase). A determiner like the might be $\text{the} :: =n\;d\;{-\text{case}}$, meaning it selects an $n$ (noun), becomes a $d$ (DP), and carries an unchecked case feature. When the verb’s $=d$ matches the DP’s $d$ via Merge, the DP becomes the verb’s argument. Later, a functional head with $+\text{case}$ can trigger Move to check the DP’s $-\text{case}$ feature, displacing the DP to its surface position.

The equivalence with MCFGs

Michaelis (2001) proved that MGs generate exactly the MCS languages by showing that every MG can be converted to an equivalent MCFG and vice versa. The fan-out of the corresponding MCFG is bounded by the number of movement steps the MG allows. This equivalence has practical consequences: the MCFG parsing algorithms we discussed—and the one you’ll implement in the final project—can be used to parse MGs by first compiling them into MCFGs, as Kobele et al. (2007) showed.

Copying and the MCS boundary

One way to understand why MGs go beyond context-free power is through the lens of copying, as discussed by Kobele (2006). Many syntactic constructions involve a relationship between two positions in a sentence—a “filler” and a “gap”—where the same structural content must be tracked at both, even though only one position is phonologically realized. In a CFG, you cannot enforce such a dependency when the two positions are arbitrarily far apart—this is the same limitation that prevents CFGs from generating $\{ww \mid w \in \Sigma^+\}$, which we proved with the pumping lemma. Move in an MG provides exactly the right amount of copying power to handle these dependencies without exceeding the MCS boundary; the corresponding mechanisms in MCFGs (multi-component yields) and TAGs (adjunction) do the same.

Subregular constraints on derivations

Graf (2013) showed that the constraints on MG derivations—which movements are allowed, how features are checked—can often be characterized in terms of the subregular hierarchy from the phonological module. Just as phonological constraints on strings tend to be Strictly Local or Tier-based Strictly Local, syntactic constraints on derivation trees tend to fall into analogous subregular classes defined over trees rather than strings. This is an active area of research, and it suggests a unified view of the computational complexity of phonology and syntax: both are characterized by constraints that fall in the lower regions of a complexity hierarchy defined over the appropriate kind of structure (strings for phonology, trees for syntax).

The final project asks you to implement a parser for MCFGs, which by the equivalence results can also be seen as a parser for the languages generated by TAGs, CCGs, and MGs. The formalisms differ in how they describe the grammar—elementary trees, categories, string-tuple functions, or feature-checked lexical items—but the languages they generate, and the parsing algorithms that process them, converge.

References

Graf, Thomas. 2013. “Local and Transderivational Constraints in Syntax and Semantics.” PhD thesis, UCLA.

Kobele, Gregory M. 2006. “Generating Copies: An Investigation into Structural Identity in Language and Grammar.” PhD thesis, UCLA.

Kobele, Gregory M., Christian Retoré, and Sylvain Salvati. 2007. “An Automata-Theoretic Approach to Minimalism.” Model Theoretic Syntax at 10.

Michaelis, Jens. 2001. “Transforming Linear Context-Free Rewriting Systems into Minimalist Grammars.” Logical Aspects of Computational Linguistics, LNCS, vol. 2099: 228–44.

Stabler, Edward P. 1997. “Derivational Minimalism.” In Logical Aspects of Computational Linguistics, vol. 1328. LNCS. Springer.

--- title: Minimalist Grammars bibliography: ../../references.bib --- The formalisms we've seen so far—[TAGs](tree-adjoining-grammars.qmd), [type-logical grammars](type-logical-grammars.qmd), [CCGs](combinatory-categorial-grammars.qmd), and [MCFGs](mcfgs-and-lcfrs.qmd)—were developed primarily from computational and mathematical perspectives. Minimalist Grammars (MGs), introduced by @stabler1997derivational, come from a different direction: they are a formalization of the core operations proposed in Chomsky's Minimalist Program, which has been the dominant framework in theoretical syntax since the mid-1990s. The reason I'm including them here is a formal result due to @michaelis2001transforming: the string languages generated by MGs are exactly the same as those generated by MCFGs. A formalism whose design was motivated by linguistic theory turns out to have the same generative capacity as formalisms designed from computational and mathematical considerations—which suggests that the [MCS boundary](index.qmd) is not an artifact of any particular approach to grammar but a property of natural language itself. ## Merge and Move An MG has two structure-building operations. *Merge* combines two expressions: one expression (the *selector*) takes the other (the *selectee*) as an argument. This is analogous to function application in [CCG](combinatory-categorial-grammars.qmd) or to substitution in [TAGs](tree-adjoining-grammars.qmd). *Move* displaces a subexpression from one position to another—it handles phenomena like wh-movement (*what did you see __?*), where the wh-phrase *what* appears at the front of the sentence but is interpreted as the object of *see*. Move is analogous to adjunction in TAGs, and it is the operation that takes MGs beyond context-free power. Formally, an MG is a tuple $G = \langle \Sigma, \text{sel}, \text{lic}, \text{Lex} \rangle$ where $\Sigma$ is the terminal alphabet, $\text{sel}$ is a finite set of *selectional features* (syntactic categories like $v$, $d$, $t$), $\text{lic}$ is a finite set of *licensing features* (triggers for movement, like $\text{wh}$ or $\text{case}$), and $\text{Lex}$ is a finite set of *lexical items*—each a string from $\Sigma^*$ paired with a sequence of features drawn from $\text{sel} \cup \overline{\text{sel}} \cup \text{lic} \cup \overline{\text{lic}}$. Features come in pairs: a positive feature $=x$ ("I select an $x$") and a corresponding negative feature $x$ ("I am an $x$"). Merge is triggered when a positive selectional feature $=x$ on one expression matches a negative selectional feature $x$ on another; Move is triggered when a positive licensing feature $+y$ matches a negative licensing feature $-y$ somewhere within the same expression. To make this concrete, a transitive verb like *sees* might be the lexical item $\text{sees} :: =d\;=d\;v$, meaning it selects two $d$-expressions (two DPs) and then projects as a $v$ (verb phrase). A determiner like *the* might be $\text{the} :: =n\;d\;{-\text{case}}$, meaning it selects an $n$ (noun), becomes a $d$ (DP), and carries an unchecked case feature. When the verb's $=d$ matches the DP's $d$ via Merge, the DP becomes the verb's argument. Later, a functional head with $+\text{case}$ can trigger Move to check the DP's $-\text{case}$ feature, displacing the DP to its surface position. ## The equivalence with MCFGs @michaelis2001transforming proved that MGs generate exactly the MCS languages by showing that every MG can be converted to an equivalent MCFG and vice versa. The fan-out of the corresponding MCFG is bounded by the number of movement steps the MG allows. This equivalence has practical consequences: the [MCFG parsing algorithms](mcfgs-and-lcfrs.qmd) we discussed—and the one you'll implement in the [final project](../../assignments/final-project.ipynb)—can be used to parse MGs by first compiling them into MCFGs, as @kobele2007parsing showed. ## Copying and the MCS boundary One way to understand why MGs go beyond context-free power is through the lens of *copying*, as discussed by @kobele2006generating. Many syntactic constructions involve a relationship between two positions in a sentence—a "filler" and a "gap"—where the same structural content must be tracked at both, even though only one position is phonologically realized. In a CFG, you cannot enforce such a dependency when the two positions are arbitrarily far apart—this is the same limitation that prevents CFGs from generating $\{ww \mid w \in \Sigma^+\}$, which we [proved with the pumping lemma](../../phonological-patterns/generative-capacity/pumping-lemma.qmd). Move in an MG provides exactly the right amount of copying power to handle these dependencies without exceeding the MCS boundary; the corresponding mechanisms in [MCFGs](mcfgs-and-lcfrs.qmd) (multi-component yields) and [TAGs](tree-adjoining-grammars.qmd) (adjunction) do the same. ## Subregular constraints on derivations @graf2013local showed that the constraints on MG derivations—which movements are allowed, how features are checked—can often be characterized in terms of the [subregular hierarchy](../../phonological-patterns/subregular-hierarchy/index.qmd) from the phonological module. Just as phonological constraints on strings tend to be Strictly Local or Tier-based Strictly Local, syntactic constraints on derivation trees tend to fall into analogous subregular classes defined over trees rather than strings. This is an active area of research, and it suggests a unified view of the computational complexity of phonology and syntax: both are characterized by constraints that fall in the lower regions of a complexity hierarchy defined over the appropriate kind of structure (strings for phonology, trees for syntax). The [final project](../../assignments/final-project.ipynb) asks you to implement a parser for MCFGs, which by the equivalence results can also be seen as a parser for the languages generated by TAGs, CCGs, and MGs. The formalisms differ in how they describe the grammar—elementary trees, categories, string-tuple functions, or feature-checked lexical items—but the languages they generate, and the parsing algorithms that process them, converge.