Generation

We will characterize the relationship between grammars and languages as one of generation: a grammars \(G\) generates a language \(L\) if \(G\) is a description of that language. To express that \(G\) generates \(L\), we will say that \(\mathbb{L}(G) = L\), where \(\mathbb{L}\) is some function from grammars to languages. (It is important to note that generation sounds like a procedural concept, but it is really declarative. Know that \(G\) generates a language \(L\) does not require us to know how to build \(L\) from \(G\).)

Example: regular expressions

We’ve already seen one kind of grammar under this definition: regulars expressions. Remember that regular expressions \(R(\Sigma)\) on \(\Sigma\) themselves are strings of a language on \(\Sigma \cup\{\epsilon, \emptyset, \cup, \circ, (, ), *\}\). We formally define these strings recursively.

\(\rho\) is a regular expression if and only if:

  1. \(\rho \in \Sigma \cup \{\epsilon, \emptyset\}\)
  2. \(\rho\) is \((\rho_1 \cup \rho_2)\) for some regular expressions \(\rho_1\) and \(\rho_2\)
  3. \(\rho\) is \((\rho_1 \circ \rho_2)\) for some regular expressions \(\rho_1\) and \(\rho_2\)
  4. \(\rho\) is \(\rho_1^*\) for some regular expression \(\rho_1\)

We can generate all the regular expressions given an alphabet.

Regular expressions (so defined) evaluate to sets of strings on \(\Sigma\)–i.e. languages on \(\Sigma\). Another way of thinking about this is that a regular expression on \(\Sigma\) describes a language on \(\Sigma\).

We can define this evaluation procedure formally as a function \(\text{eval}: R(\Sigma) \rightarrow 2^{\Sigma^*}\), where \(R(\Sigma)\) is the set of regular expressions on \(\Sigma\).

\(\text{eval}(\rho) = \begin{cases}\{\} & \text{if } \rho = \emptyset \\\{\_\} & \text{if } \rho = \epsilon \\ \{\rho\} & \text{if } \rho \in \Sigma\\ \text{eval}(\rho_1) \times \text{eval}(\rho_2) & \text{if } \rho = (\rho_1 \circ \rho_2) \\ \text{eval}(\rho_1) \cup \text{eval}(\rho_2) & \text{if } \rho = (\rho_1 \cup \rho_2)\\ \bigcup_{i = 0}^\infty \text{eval}(\rho_1)^i & \text{if } \rho = \rho_1^*\\ \end{cases}\)

Each regular expression is thus a grammar; the set of all regular expressions is a class of grammars; and \(\text{eval}\) is an implementation of \(\mathbb{L}\). We will call the class of languages \(\mathcal{R}\) generated by the regular expressions the regular languages: \[\mathcal{R} \equiv \mathbb{L}(R(\Sigma)) = \{\mathbb{L}(r) \mid r \in R(\Sigma)\} = \{\text{eval}(r) \mid r \in R(\Sigma)\}\]

We will be studying these languages in depth, because as I’ve mentiond before and as you will read about in Heinz 2018, it seems very likely that all phonological grammars of natural languages are a subset of this class. To study these languages, it will be useful to introduce a class of grammars that are equivalent to the regular expressions in that they generate exactly the same set of languages–i.e. the regular languages. These grammars are known as finite state automata.

We will refer to this sort of equivalence–i.e. that \(\mathbb{L}(\mathcal{G}_1) = \mathbb{L}(\mathcal{G}_2)\) for classes \(\mathcal{G}_1\) and \(\mathcal{G}_2\)–as weak equivalence or equivalence in weak generative capacity. We will discuss another notion–strong equivalence–later. First, though we should discuss the broader context.