Multiple Context-Free Grammars

Multiple Context-Free Grammars (MCFGs) generalize context-free grammars by allowing a nonterminal to yield a tuple of strings rather than a single contiguous string (Seki et al. 1991). The components of the tuple may be discontinuous—separated by material contributed by other nonterminals—and it is this discontinuity that lets MCFGs handle the cross-serial dependencies and discontinuous constituents that CFGs cannot.

MCFGs were introduced by Seki et al. (1991). A closely related formalism, Linear Context-Free Rewriting Systems (LCFRS), was introduced independently by Vijay-Shanker et al. (1987). The two formalisms are not identical—LCFRS is defined in terms of a rewriting system that operates on tuples of strings, while MCFGs use an algebraic approach based on linear functions—but Seki et al. (1991) proved that they are equivalent in weak generative capacity: they generate exactly the same class of string languages. This is why the literature often treats “MCFG” and “LCFRS” as interchangeable, though strictly speaking they are different characterizations of the same class. I’ll use the MCFG notation here, since it’s what the final project uses, but the parsing inference rules from Kallmeyer (2013) (which you’ll implement in the final project) are stated in the LCFRS framework.

Formal definition

An MCFG $G = \langle N, T, F, P, S \rangle$ consists of:

$N$: a finite set of nonterminals, each with an associated fan-out $\text{dim}(A) \in \mathbb{N}^+$
$T$: a finite terminal alphabet
$F$: a set of linear, non-erasing functions used in rules
$P$: a finite set of production rules
$S \in N$: the start symbol, with $\text{dim}(S) = 1$

The fan-out $\text{dim}(A)$ of a nonterminal $A$ is the number of string components it yields. A CFG nonterminal has fan-out 1—it yields a single contiguous string. An MCFG nonterminal with fan-out 2 yields a pair of strings, which may be separated by material from other nonterminals.

A rule has the form:

\[A(\alpha_1, \ldots, \alpha_{\text{dim}(A)}) \rightarrow B_1(x_1^{(1)}, \ldots, x_{\text{dim}(B_1)}^{(1)}) \cdots B_k(x_1^{(k)}, \ldots, x_{\text{dim}(B_k)}^{(k)})\]

where each $\alpha_i$ is a concatenation of terminal strings and variables drawn from the right-hand side nonterminals. The constraints are:

Linearity: each variable from the right-hand side appears exactly once on the left-hand side.
Non-erasure: every variable from the right-hand side appears somewhere on the left-hand side.

These constraints ensure that the rule is a function from the yields of the right-hand side nonterminals to the yield of the left-hand side nonterminal, and that this function is both deterministic and non-erasing.

Example: Swiss German cross-serial dependencies

Recall the Swiss German pattern that is not context-free: sequences of NPs followed by matching sequences of verbs. In an MCFG, we can handle this using nonterminals with fan-out 2.

Define the nonterminals: $S$ with $\text{dim}(S) = 1$, and $\text{VP}$ with $\text{dim}(\text{VP}) = 2$—yielding a pair of strings, one for the NP part and one for the verb part.

\[\begin{align*} S(uv) &\rightarrow \text{VP}(u, v) \\ \text{VP}(u_1 \cdot \text{NP}_i \cdot u_2,\; v_1 \cdot \text{V}_i \cdot v_2) &\rightarrow \text{VP}(u_1 \cdot u_2,\; v_1 \cdot v_2) \end{align*}\]

The second rule says: to build a VP that yields a pair of strings, take an existing VP pair and insert an NP into the first component and a matching verb into the second component. The first rule combines the two components of VP into a single string for $S$. Because each VP contributes to both the NP-part and the verb-part, the cross-serial dependencies are maintained.

Example: the final project grammar

The final project provides an MCFG grammar for a fragment of English that includes wh-movement. For instance, the rule:

\[\text{VPwhmain}(u, v) \rightarrow \text{NPwh}(u)\;\text{Vroot}(v)\]

says that a $\text{VPwhmain}$ yields two string components: $u$ (the wh-phrase, which may have been extracted) and $v$ (the verb). The fan-out of $\text{VPwhmain}$ is 2, reflecting the discontinuity introduced by extraction.

This is the grammar you’ll implement a parser for. The notation in the final project uses MCFGRule and MCFGRuleElement classes; the mathematical content is exactly what I’ve described here.

MCFG parsing

Parsing MCFGs uses the deductive parsing framework with items that track tuples of spans rather than single spans. An item $[A, \langle (i_1, j_1), \ldots, (i_m, j_m) \rangle]$ records that nonterminal $A$ (with fan-out $m$) yields the string components spanning positions $(i_1, j_1), \ldots, (i_m, j_m)$.

The inference rules, as presented by Kallmeyer (2013), generalize the CKY rules in a natural way. The key rule is:

\[\frac{[B_1, \vec{s}_1] \quad \cdots \quad [B_k, \vec{s}_k]}{[A, f(\vec{s}_1, \ldots, \vec{s}_k)]}\]

where $f$ is the function specified by the MCFG rule $A \rightarrow B_1 \cdots B_k$ and $\vec{s}_i$ is the tuple of spans for $B_i$. The function $f$ concatenates and rearranges the spans according to the rule.

The parsing complexity is $O(n^{3m})$ where $m$ is the maximum fan-out in the grammar. For fan-out 1 (CFGs), this is $O(n^3)$; for fan-out 2, $O(n^6)$; and so on. The exponent grows with fan-out, but for any fixed fan-out, parsing is polynomial—satisfying the MCS requirement.

The fan-out hierarchy

MCFGs with different maximum fan-outs form a strict hierarchy:

\[\text{MCFG}(1) \subset \text{MCFG}(2) \subset \text{MCFG}(3) \subset \cdots\]

$\text{MCFG}(1)$ generates exactly the same string languages as TAGs and CCGs. Higher fan-outs capture more phenomena. The full class $\text{MCFG} = \bigcup_k \text{MCFG}(k)$ is exactly the class of languages generated by LCFRS, and it properly contains the context-free languages while remaining within the class of languages parsable in polynomial time.

This hierarchy mirrors the subregular hierarchy from the phonological module: both stratify a larger class of languages into increasingly powerful subclasses, and in both cases natural language phenomena cluster in the lower levels.

References

Kallmeyer, Laura. 2013. “Linear Context-Free Rewriting Systems.” Language and Linguistics Compass 7 (1): 22–38. https://doi.org/10.1002/lnc3.359.

Seki, Hiroyuki, Takashi Matsumura, Mamoru Fujii, and Tadao Kasami. 1991. “On Multiple Context-Free Grammars.” Theoretical Computer Science 88 (2): 191–229.

Vijay-Shanker, K., David J. Weir, and Aravind K. Joshi. 1987. “Characterizing Structural Descriptions Produced by Various Grammatical Formalisms.” Proceedings of the 25th Annual Meeting of the Association for Computational Linguistics, 104–11.

--- title: Multiple Context-Free Grammars bibliography: ../../references.bib --- Multiple Context-Free Grammars (MCFGs) generalize context-free grammars by allowing a nonterminal to yield a *tuple* of strings rather than a single contiguous string [@seki1991multiple]. The components of the tuple may be discontinuous—separated by material contributed by other nonterminals—and it is this discontinuity that lets MCFGs handle the cross-serial dependencies and discontinuous constituents that CFGs cannot. MCFGs were introduced by @seki1991multiple. A closely related formalism, Linear Context-Free Rewriting Systems (LCFRS), was introduced independently by @vijayshanker1987characterizing. The two formalisms are not identical—LCFRS is defined in terms of a rewriting system that operates on tuples of strings, while MCFGs use an algebraic approach based on linear functions—but @seki1991multiple proved that they are equivalent in weak generative capacity: they generate exactly the same class of string languages. This is why the literature often treats "MCFG" and "LCFRS" as interchangeable, though strictly speaking they are different characterizations of the same class. I'll use the MCFG notation here, since it's what the [final project](../../assignments/final-project.ipynb) uses, but the parsing inference rules from @kallmeyer_linear_2013 (which you'll implement in the final project) are stated in the LCFRS framework. ## Formal definition An MCFG $G = \langle N, T, F, P, S \rangle$ consists of: - $N$: a finite set of nonterminals, each with an associated *fan-out* $\text{dim}(A) \in \mathbb{N}^+$ - $T$: a finite terminal alphabet - $F$: a set of *linear, non-erasing* functions used in rules - $P$: a finite set of production rules - $S \in N$: the start symbol, with $\text{dim}(S) = 1$ The fan-out $\text{dim}(A)$ of a nonterminal $A$ is the number of string components it yields. A CFG nonterminal has fan-out 1—it yields a single contiguous string. An MCFG nonterminal with fan-out 2 yields a *pair* of strings, which may be separated by material from other nonterminals. A rule has the form: $$A(\alpha_1, \ldots, \alpha_{\text{dim}(A)}) \rightarrow B_1(x_1^{(1)}, \ldots, x_{\text{dim}(B_1)}^{(1)}) \cdots B_k(x_1^{(k)}, \ldots, x_{\text{dim}(B_k)}^{(k)})$$ where each $\alpha_i$ is a concatenation of terminal strings and variables drawn from the right-hand side nonterminals. The constraints are: - **Linearity**: each variable from the right-hand side appears exactly once on the left-hand side. - **Non-erasure**: every variable from the right-hand side appears somewhere on the left-hand side. These constraints ensure that the rule is a *function* from the yields of the right-hand side nonterminals to the yield of the left-hand side nonterminal, and that this function is both deterministic and non-erasing. ## Example: Swiss German cross-serial dependencies Recall the [Swiss German pattern](../cfgs-for-syntax.qmd) that is not context-free: sequences of NPs followed by matching sequences of verbs. In an MCFG, we can handle this using nonterminals with fan-out 2. Define the nonterminals: $S$ with $\text{dim}(S) = 1$, and $\text{VP}$ with $\text{dim}(\text{VP}) = 2$—yielding a pair of strings, one for the NP part and one for the verb part. $$\begin{align*} S(uv) &\rightarrow \text{VP}(u, v) \\ \text{VP}(u_1 \cdot \text{NP}_i \cdot u_2,\; v_1 \cdot \text{V}_i \cdot v_2) &\rightarrow \text{VP}(u_1 \cdot u_2,\; v_1 \cdot v_2) \end{align*}$$ The second rule says: to build a VP that yields a pair of strings, take an existing VP pair and insert an NP into the first component and a matching verb into the second component. The first rule combines the two components of VP into a single string for $S$. Because each VP contributes to both the NP-part and the verb-part, the cross-serial dependencies are maintained. ## Example: the final project grammar The [final project](../../assignments/final-project.ipynb) provides an MCFG grammar for a fragment of English that includes wh-movement. For instance, the rule: $$\text{VPwhmain}(u, v) \rightarrow \text{NPwh}(u)\;\text{Vroot}(v)$$ says that a $\text{VPwhmain}$ yields *two* string components: $u$ (the wh-phrase, which may have been extracted) and $v$ (the verb). The fan-out of $\text{VPwhmain}$ is 2, reflecting the discontinuity introduced by extraction. This is the grammar you'll implement a parser for. The notation in the final project uses `MCFGRule` and `MCFGRuleElement` classes; the mathematical content is exactly what I've described here. ## MCFG parsing Parsing MCFGs uses the [deductive parsing framework](../deductive-parsing.qmd) with items that track *tuples of spans* rather than single spans. An item $[A, \langle (i_1, j_1), \ldots, (i_m, j_m) \rangle]$ records that nonterminal $A$ (with fan-out $m$) yields the string components spanning positions $(i_1, j_1), \ldots, (i_m, j_m)$. The inference rules, as presented by @kallmeyer_linear_2013, generalize the CKY rules in a natural way. The key rule is: $$\frac{[B_1, \vec{s}_1] \quad \cdots \quad [B_k, \vec{s}_k]}{[A, f(\vec{s}_1, \ldots, \vec{s}_k)]}$$ where $f$ is the function specified by the MCFG rule $A \rightarrow B_1 \cdots B_k$ and $\vec{s}_i$ is the tuple of spans for $B_i$. The function $f$ concatenates and rearranges the spans according to the rule. The parsing complexity is $O(n^{3m})$ where $m$ is the maximum fan-out in the grammar. For fan-out 1 (CFGs), this is $O(n^3)$; for fan-out 2, $O(n^6)$; and so on. The exponent grows with fan-out, but for any fixed fan-out, parsing is polynomial—satisfying the MCS requirement. ## The fan-out hierarchy MCFGs with different maximum fan-outs form a strict hierarchy: $$\text{MCFG}(1) \subset \text{MCFG}(2) \subset \text{MCFG}(3) \subset \cdots$$ $\text{MCFG}(1)$ generates exactly the same string languages as [TAGs](tree-adjoining-grammars.qmd) and [CCGs](combinatory-categorial-grammars.qmd). Higher fan-outs capture more phenomena. The full class $\text{MCFG} = \bigcup_k \text{MCFG}(k)$ is exactly the class of languages generated by LCFRS, and it properly contains the context-free languages while remaining within the class of languages parsable in polynomial time. This hierarchy mirrors the [subregular hierarchy](../../phonological-patterns/subregular-hierarchy/index.qmd) from the phonological module: both stratify a larger class of languages into increasingly powerful subclasses, and in both cases natural language phenomena cluster in the lower levels.