The Pumping Lemma for Regular Languages

The pumping lemma essentially states that all sufficiently long strings in a regular language can be “pumped” - meaning a portion of the string can be repeated any number of times while still producing a string in the language. This property follows directly from the finite nature of FSAs.

Formal Definition

Let’s state the pumping lemma formally:

Pumping Lemma for Regular Languages: For any regular language \(L\), there exists a constant \(p > 0\) (called the “pumping length”) such that for any string \(w \in L\) with \(|w| \geq p\), there exists a decomposition \(w = xyz\) such that:

  1. \(|y| > 0\) (y is non-empty)
  2. \(|xy| \leq p\) (x and y together are no longer than p)
  3. For all \(i \geq 0\), \(xy^iz \in L\) (the string with y repeated i times is in the language)

Here, \(y\) is the substring that can be “pumped” (repeated any number of times), while \(x\) and \(z\) remain fixed.

Intuitive Explanation

To understand why this lemma holds, consider what happens when a DFA processes a long string. Since a DFA has a finite number of states (let’s say \(p\) states), if we input a string longer than \(p\), the DFA must visit at least one state more than once by the pigeonhole principle.

This creates a cycle in the computation path. If we let: - \(x\) be the prefix that gets us to the repeated state for the first time - \(y\) be the substring that takes us from the repeated state back to itself - \(z\) be the remainder of the string

Then we can repeat the cycle (the \(y\) part) any number of times and still end up in an accepting state, because we’ll still follow the same path through the remainder of the string.

Using the Pumping Lemma to Prove Non-Regularity

To prove a language \(L\) is not regular using the pumping lemma, we use proof by contradiction:

  1. Assume \(L\) is regular
  2. Then the pumping lemma must hold for \(L\) with some pumping length \(p\)
  3. Find a string \(w \in L\) with \(|w| \geq p\)
  4. Show that for any decomposition \(w = xyz\) satisfying conditions 1 and 2, there exists some \(i\) where \(xy^iz \not\in L\)
  5. This contradicts the pumping lemma, so \(L\) cannot be regular

Classic Example: \(a^nb^n\)

The canonical example of a non-regular language is \(L = \{a^nb^n \mid n \geq 0\}\), which consists of strings with an equal number of \(a\)’s followed by an equal number of \(b\)’s.

Let’s prove that \(L\) is not regular using the pumping lemma:

  1. Assume \(L\) is regular
  2. Then there exists a pumping length \(p > 0\)
  3. Consider the string \(w = a^pb^p \in L\)
  4. By the pumping lemma, \(w\) can be written as \(w = xyz\) where:
    • \(|y| > 0\)
    • \(|xy| \leq p\)
    • For all \(i \geq 0\), \(xy^iz \in L\)
  5. Since \(|xy| \leq p\), both \(x\) and \(y\) consist only of \(a\)’s (they’re within the first \(p\) characters)
  6. Let’s say \(y = a^k\) for some \(k > 0\)
  7. Now consider \(xy^2z\) (pumping \(y\) once more)
  8. This gives us \(a^{p+k}b^p\), which has more \(a\)’s than \(b\)’s
  9. Therefore, \(xy^2z \not\in L\), which contradicts the pumping lemma
  10. Thus, \(L\) is not regular

This example demonstrates why a language that requires “counting” or “matching” cannot be modeled by a finite state machine - the machine would need to remember exactly how many \(a\)’s it has seen to ensure an equal number of \(b\)’s follow, but FSAs have limited memory (determined by their finite number of states).

Why This Matters

Understanding the limitations of regular languages is crucial for computational linguistics. While many patterns in natural language can be modeled using finite state machines, others require more computational power.

In the next section, we’ll explore specific examples from morphology and phonology that demonstrate these computational boundaries, particularly focusing on reduplication processes that have been shown to exceed the power of regular languages.