Overview

In the previous sections, we defined finite state automata and showed how to construct them using the regular operations. We also showed how to translate between FSAs and regular expressions. What we have not yet addressed is the computational problem of determining whether a given string is in the language generated by a given FSA—recognition—and the related problem of recovering the path through the automaton that accepts the string—parsing.

These are the problems we turn to now. It is worth noting that there are three fundamental computational problems one can ask about a grammar \(G\):

  1. Recognition: Is \(\boldsymbol\sigma \in \mathbb{L}(G)\)? A yes-or-no question.
  2. Parsing: How does \(G\) accept \(\boldsymbol\sigma\)? This recovers the structural analysis—the path through an FSA, or (later) the parse tree for a CFG.
  3. Generation: What strings does \(G\) produce? We already explored this in the section on generation, where we enumerated the strings in \(\mathbb{L}(G)\).

These three problems are faces of the same coin. Recognition tests membership, parsing explains it, and generation enumerates it. Every grammar formalism we encounter in this course—FSAs, CFGs, PCFGs, and the mildly context-sensitive formalisms—has its own algorithms for each of these problems, and the computational complexity of those algorithms is one of the main ways we compare formalisms. For FSAs, all three problems are efficient (linear or near-linear in the input size). For CFGs, parsing takes \(O(n^3)\) time (CKY, Earley). For mildly context-sensitive grammars, the polynomial bound on parsing is one of the defining properties of the class.

The distinction between recognition and parsing matters because the parse carries structural information that recognition alone does not provide.