Top-down parsing with Earley
In the previous section, we explored the CKY algorithm, a bottom-up parsing approach for context-free grammars. Now, we’ll examine the Earley algorithm (Earley 1970), which is a top-down parsing method that combines the advantages of both top-down and bottom-up approaches.
Unlike CKY, which requires the grammar to be in Chomsky Normal Form, the Earley algorithm can work with any context-free grammar. This flexibility makes it particularly useful for morphological analysis, where the natural expression of morphological rules might not conform to CNF.
Top-Down vs. Bottom-Up Parsing
Before diving into the Earley algorithm, let’s briefly compare top-down and bottom-up parsing approaches:
Bottom-Up Parsing (e.g., CKY): - Starts with the input and builds larger constituents - Works well with grammars in normal forms - Typically more efficient for highly ambiguous grammars - May do unnecessary work on strings not in the language
Top-Down Parsing: - Starts with the start symbol and tries to derive the input - More intuitive for tracing the derivation process - Can incorporate contextual constraints more easily - May get stuck in loops with left-recursive grammars
The Earley algorithm combines the strengths of both approaches by using a top-down prediction strategy but avoiding redundant work through dynamic programming techniques similar to those used in CKY. Like CKY, Earley runs in \(O(n^3)\) time in the worst case (for arbitrary CFGs). But for unambiguous grammars—grammars where each string has at most one parse tree—it runs in \(O(n^2)\), and for many practically useful grammar classes it is effectively linear. This makes it a good choice when the grammar is not in CNF and ambiguity is moderate.
Dotted Rules
The key concept in the Earley algorithm is the “dotted rule” (also called an “Earley item”). A dotted rule is a context-free rule with a dot (•) inserted somewhere on the right-hand side to indicate how much of the rule has been recognized so far.
For example, if we have the rule N -> Adj Suffix, we can have the following dotted rules: - N -> • Adj Suffix (nothing recognized yet) - N -> Adj • Suffix (Adj recognized, Suffix expected) - N -> Adj Suffix • (everything recognized)
Dotted rules also track their span (the portion of the input they cover) and any backpointers needed for reconstructing the parse tree.
Earley Algorithm Pseudocode
function Earley(grammar, input):
n = length(input)
# Initialize the chart (n+1 sets of dotted rules)
chart = new Array[n+1]
for i = 0 to n:
chart[i] = new Set()
# Initialize chart[0] with dotted rules for the start symbol
for each rule S -> γ in grammar (where S is the start symbol):
add (S -> • γ, 0) to chart[0]
# Fill in the chart
for i = 0 to n:
for each state (A -> α • β, j) in chart[i]:
if β is not empty:
if β starts with a terminal symbol a:
if i < n and input[i] = a: # Scan
add (A -> α a • β', j) to chart[i+1]
else if β starts with a non-terminal B: # Predict
for each rule B -> γ in grammar:
add (B -> • γ, i) to chart[i]
else: # Complete
for each state (C -> δ • A ζ, k) in chart[j]:
add (C -> δ A • ζ, k) to chart[i]
# Continue processing chart[i] until no more items can be added
repeat until chart[i] doesn't change:
apply Predict and Complete operations to all items in chart[i]
# Check if the start symbol can derive the entire input
return any item (S -> γ •, 0) in chart[n] where S is the start symbol
Implementation Classes
Let’s define the key classes for our Earley parser:
class DottedRule(Rule):
"""A rule with a dot indicating parsing progress (an Earley item).
Parameters
----------
rule : Rule
The underlying grammar rule.
span : SpanIndices
The (start, end) positions in the input.
dot : int, default 0
The position of the dot in the right-hand side.
Attributes
----------
span : SpanIndices
The (start, end) positions.
dot : int
The dot position.
is_complete : bool
Whether the dot is at the end.
left_side : str
The left-hand side of the underlying rule.
"""
def __init__(self, rule: Rule, span: SpanIndices, dot: int = 0):
self._rule = rule
self._left_side = rule.left_side
self._right_side = rule.right_side
self._span = span
self._dot = dot
def to_tuple(self) -> tuple[Rule, SpanIndices, int]:
"""Return a hashable tuple representation.
Returns
-------
tuple[Rule, SpanIndices, int]
The rule, span, and dot position.
"""
return self._rule, self._span, self._dot
def __hash__(self) -> int:
return hash(self.to_tuple())
def __eq__(self, other: 'DottedRule') -> bool:
return self.to_tuple() == other.to_tuple()
def __repr__(self) -> str:
"""Return a string showing the dot position and span."""
return (self._left_side + ' -> ' +
' '.join(self._right_side[:self._dot]) +
' . ' +
' '.join(self._right_side[self._dot:]) +
' [' + str(self._span[0]) + ', ' + str(self._span[1]) + ']')
def complete(self, new_left_edge: int) -> 'DottedRule':
"""Advance the dot past the next symbol.
Parameters
----------
new_left_edge : int
The new right edge of the span.
Returns
-------
DottedRule
A new dotted rule with the dot advanced.
"""
dot = self._dot + 1
span = (self._span[0], new_left_edge)
return DottedRule(self._rule, span, dot)
@property
def next_symbol(self) -> str:
"""The symbol immediately after the dot.
Raises
------
AttributeError
If the dotted rule is already complete.
"""
if self.is_complete:
raise AttributeError('dotted rule is already complete')
else:
return self._right_side[self._dot]
@property
def dot(self) -> int:
"""The dot position in the right-hand side."""
return self._dot
@property
def span(self) -> SpanIndices:
"""The (start, end) positions in the input."""
return self._span
@property
def is_complete(self) -> bool:
"""Whether the dot has reached the end of the right-hand side."""
return self._dot == len(self._right_side)
@property
def left_side(self) -> str:
"""The left-hand side of the underlying rule."""
return self._rule.left_sidetype EarleyBackPointer = tuple[str, int]
class EarleyChartEntry(ChartEntry):
"""A chart entry for an Earley chart.
Parameters
----------
dotted_rule : DottedRule
The dotted rule for this entry.
*backpointers : EarleyBackPointer
Pairs of (symbol, position) for reconstructing parses.
Attributes
----------
dotted_rule : DottedRule
The dotted rule.
backpointers : tuple[EarleyBackPointer, ...]
The backpointers.
span : SpanIndices
The span of input covered.
is_complete : bool
Whether the dotted rule is complete.
"""
def __init__(self, dotted_rule: DottedRule, *backpointers: EarleyBackPointer):
self._dotted_rule = dotted_rule
self._backpointers = backpointers
def to_tuple(self) -> tuple[DottedRule, tuple[EarleyBackPointer, ...]]:
"""Return a hashable tuple representation.
Returns
-------
tuple[DottedRule, tuple[EarleyBackPointer, ...]]
The dotted rule and backpointers.
"""
return self._dotted_rule, self._backpointers
def __hash__(self) -> int:
return hash(self.to_tuple())
def __eq__(self, other: 'EarleyChartEntry') -> bool:
return self.to_tuple() == other.to_tuple()
def __repr__(self) -> str:
return self._dotted_rule.__repr__()
def __str__(self) -> str:
return self.__repr__()
@property
def backpointers(self) -> tuple[EarleyBackPointer, ...]:
"""The backpointers for parse reconstruction."""
return self._backpointers
@property
def dotted_rule(self) -> DottedRule:
"""The dotted rule for this entry."""
return self._dotted_rule
@property
def next_symbol(self) -> str:
"""The next expected symbol after the dot."""
return self._dotted_rule.next_symbol
@property
def span(self) -> SpanIndices:
"""The span of input covered by this entry."""
return self._dotted_rule.span
@property
def is_complete(self) -> bool:
"""Whether the dotted rule is complete."""
return self._dotted_rule.is_complete
def complete(self, entry: 'EarleyChartEntry', new_left_edge: int) -> 'EarleyChartEntry':
"""Advance the dot by incorporating a completed constituent.
Parameters
----------
entry : EarleyChartEntry
The completed entry.
new_left_edge : int
The new right edge of the span.
Returns
-------
EarleyChartEntry
A new entry with the dot advanced.
"""
new_dotted_rule = self._dotted_rule.complete(new_left_edge)
bp = (self._dotted_rule.next_symbol, self._dotted_rule.span[1])
backpointers = self._backpointers + (bp,)
return EarleyChartEntry(new_dotted_rule, *backpointers)
def is_completion_of(self, other: 'EarleyChartEntry') -> bool:
"""Check whether this completed entry satisfies another entry's expectation.
Parameters
----------
other : EarleyChartEntry
The entry waiting for a constituent.
Returns
-------
bool
True if this entry's left side matches the other's next symbol.
"""
return self._dotted_rule.left_side == other.dotted_rule.next_symbolclass EarleyChart(Chart):
"""A chart for an Earley parser.
Parameters
----------
input_size : int
The number of tokens in the input.
start_variable : str, default 'S'
The start symbol of the grammar.
Attributes
----------
parses : set[Tree]
The parse trees extracted from the chart.
input_size : int
The number of tokens in the input.
"""
def __init__(self, input_size: int, start_variable: str = 'S'):
self._start_variable = start_variable
self._chart: list[set[EarleyChartEntry]] = [
set() for _ in range(input_size + 1)
]
def __getitem__(self, index: int) -> set[EarleyChartEntry]:
"""Retrieve the chart set at the given position."""
return self._chart[index]
def __setitem__(self, key: int, item: set[EarleyChartEntry]) -> None:
"""Set the chart set at the given position."""
self._chart[key] = item
@property
def parses(self) -> set:
"""The parse trees for completed derivations spanning the full input."""
try:
return self._parses
except AttributeError:
self._parses = set()
for entry in self._chart[-1]:
is_start = entry.dotted_rule.left_side == self._start_variable
covers_string = entry.dotted_rule.span == (0, self.input_size)
is_complete = entry.dotted_rule.is_complete
if is_start and covers_string and is_complete:
self._parses.add(self._construct_parses(entry))
return self._parses
@property
def input_size(self) -> int:
"""The number of tokens in the input."""
return len(self._chart) - 1class EarleyParser(ContextFreeGrammarParser):
"""An Earley parser for arbitrary context-free grammars.
Parameters
----------
grammar : ContextFreeGrammar
The grammar to parse with.
"""
normal_form = None
def _predict(self, chart: EarleyChart, entry: EarleyChartEntry, position: int) -> None:
"""Add new dotted rules for each expansion of the expected nonterminal.
Parameters
----------
chart : EarleyChart
The chart to update.
entry : EarleyChartEntry
The entry whose next symbol triggers prediction.
position : int
The current position in the input.
"""
for rule in self._grammar.rules(entry.next_symbol):
span = (position, position)
dotted_rule = DottedRule(rule, span)
new_entry = EarleyChartEntry(dotted_rule)
chart[position].add(new_entry)
def _scan(self, chart: EarleyChart, entry: EarleyChartEntry, position: int) -> None:
"""Advance the dot by matching the next terminal in the input.
Parameters
----------
chart : EarleyChart
The chart to update.
entry : EarleyChartEntry
The entry expecting a terminal.
position : int
The current position in the input.
"""
word = self._string[position]
expected = entry.next_symbol
if word == expected:
# Create a rule for the terminal
rule = Rule(expected, word)
span = (position, position+1)
dotted_rule = DottedRule(rule, span, dot=1)
# Add a completed entry for the terminal
terminal_entry = EarleyChartEntry(dotted_rule)
chart[position+1].add(terminal_entry)
# Add a completed entry for the entry that was waiting for this terminal
completed_entry = entry.complete(terminal_entry, position+1)
chart[position+1].add(completed_entry)
def _complete(self, chart: EarleyChart, entry: EarleyChartEntry, position: int) -> None:
"""Advance the dot in entries that were waiting for this completed constituent.
Parameters
----------
chart : EarleyChart
The chart to update.
entry : EarleyChartEntry
The completed entry.
position : int
The current position in the input.
"""
start, end = entry.span
for prev_entry in chart[start]:
if not prev_entry.is_complete and entry.is_completion_of(prev_entry):
completed_entry = prev_entry.complete(entry, end)
chart[position].add(completed_entry)
def _parse(self, string: str | list[str]) -> set:
"""Parse a string to find all possible parse trees.
Parameters
----------
string : str | list[str]
The string to parse.
Returns
-------
set[Tree]
The set of all possible parse trees.
"""
# Ensure the input is a list of tokens
if isinstance(string, str):
string = string.split()
# Fill the chart
chart = self._fill_chart(string)
# Return all possible parses
return chart.parses
def _recognize(self, string: str | list[str]) -> bool:
"""Recognize whether a string is in the language.
Parameters
----------
string : str | list[str]
The string to recognize.
Returns
-------
bool
True if the string is in the language.
"""
# Ensure the input is a list of tokens
if isinstance(string, str):
string = string.split()
# Fill the chart
chart = self._fill_chart(string)
# Check if the start symbol appears in the final position with a complete rule
# that spans the entire input
for entry in chart[-1]:
is_start = entry.dotted_rule.left_side == self._grammar.start_variable
covers_string = entry.dotted_rule.span == (0, len(string))
is_complete = entry.dotted_rule.is_complete
if is_start and covers_string and is_complete:
return True
return FalseImplementations to Complete (Assignment)
The following three methods need to be implemented as part of your assignment:
- The
_fill_chartmethod:
function _fill_chart(string):
# Store the string for use in the scan operation
self._string = string
# Initialize the chart
chart = new EarleyChart(length(string), self._grammar.start_variable)
# Initialize chart[0] with dotted rules for the start symbol
for each rule S → γ where S is the start symbol:
add a dotted rule S → • γ at position 0 to chart[0]
# Fill each position in the chart
for i = 0 to length(string):
# Process all entries at this position until no new entries are added
repeat until chart[i] doesn't change:
for each entry in chart[i]:
if entry is not complete:
if next symbol is a non-terminal:
apply Predict operation
else if next symbol is a terminal and i < length(string):
apply Scan operation
else:
apply Complete operation
return chart
- The
_construct_parsesmethod:
function _construct_parses(entry):
# Create a node for this entry
node = new Tree(entry.dotted_rule.left_side)
# If the entry has backpointers, recursively construct the children
if entry has backpointers:
for each backpointer (symbol, position) in entry.backpointers:
# Find the completed entry in the chart at position that matches the symbol
for each potential_child in chart[position]:
if potential_child.left_side == symbol and potential_child.is_complete:
# Recursively construct the child's parse tree
child = _construct_parses(potential_child)
add child to node.children
break
return node
- The
_predict_next_wordmethod:
function _predict_next_word(prefix):
# Fill the chart for the prefix
chart = _fill_chart(prefix)
# Find all incomplete entries in the final position
incomplete_entries = {all entries in chart[length(prefix)] where entry is not complete}
# Group by next expected symbol
next_symbols = {}
for each entry in incomplete_entries:
next_symbol = entry.next_symbol
# If the next symbol is a terminal, add it directly
if next_symbol is a terminal:
add next_symbol to next_symbols[next_symbol]
# If the next symbol is a non-terminal, find terminals it can derive
else if next_symbol is a non-terminal:
for each rule with left_side = next_symbol:
if rule directly derives a terminal:
add the terminal to next_symbols[next_symbol]
return next_symbols
Detailed Example: Parsing “unhappiness”
Let’s walk through a detailed example of the Earley algorithm parsing the word “unhappiness” with the segmentation ["un", "happy", "ness"] using our simplified grammar:
grammar = ContextFreeGrammar(
alphabet={'un', 'happy', 'ness'},
variables={'Word', 'Adj', 'N', 'Prefix', 'Suffix'},
rules={
Rule('Adj', 'happy'),
Rule('Prefix', 'un'),
Rule('Suffix', 'ness'),
Rule('Adj', 'Prefix', 'Adj'),
Rule('N', 'Adj', 'Suffix'),
Rule('Word', 'N')
},
start_variable='Word'
)
# Set the parser class
ContextFreeGrammar.parser_class = EarleyParser
# Parse the word
parses = grammar(['un', 'happy', 'ness'], mode="parse")Step-by-Step Chart Filling
Let’s trace the Earley algorithm as it fills the chart for “unhappiness”:
Chart[0]: Initial State
First, we initialize chart[0] with a dotted rule for the start symbol Word:
- Add
Word -> • N [0,0](from start symbol rule) - Apply predict:
- For
Word -> • N:- Add
N -> • Adj Suffix [0,0]
- Add
- For
- Apply predict again:
- For
N -> • Adj Suffix:- Add
Adj -> • happy [0,0] - Add
Adj -> • Prefix Adj [0,0]
- Add
- For
- Apply predict again:
- For
Adj -> • Prefix Adj:- Add
Prefix -> • un [0,0]
- Add
- For
Chart[0] now looks like:
Word -> • N [0,0]
N -> • Adj Suffix [0,0]
Adj -> • happy [0,0]
Adj -> • Prefix Adj [0,0]
Prefix -> • un [0,0]
Chart[1]: After Processing “un”
Now we process the first input symbol “un”:
- Apply scan:
- For
Prefix -> • un [0,0]:- Add
Prefix -> un • [0,1](complete)
- Add
- For
- Apply complete:
- For
Prefix -> un • [0,1](complete):- Find
Adj -> • Prefix Adj [0,0]waiting for Prefix - Add
Adj -> Prefix • Adj [0,1]
- Find
- For
- Apply predict:
- For
Adj -> Prefix • Adj [0,1]:- Add
Adj -> • happy [1,1] - Add
Adj -> • Prefix Adj [1,1]
- Add
- For
- Apply predict again:
- For
Adj -> • Prefix Adj [1,1]:- Add
Prefix -> • un [1,1]
- Add
- For
Chart[1] now looks like:
Prefix -> un • [0,1]
Adj -> Prefix • Adj [0,1]
Adj -> • happy [1,1]
Adj -> • Prefix Adj [1,1]
Prefix -> • un [1,1]
Chart[2]: After Processing “happy”
Now we process the second input symbol “happy”:
- Apply scan:
- For
Adj -> • happy [1,1]:- Add
Adj -> happy • [1,2](complete)
- Add
- For
- Apply complete:
- For
Adj -> happy • [1,2](complete):- Find
Adj -> Prefix • Adj [0,1]waiting for Adj - Add
Adj -> Prefix Adj • [0,2](complete)
- Find
- For
- Apply complete again:
- For
Adj -> Prefix Adj • [0,2](complete):- Find
N -> • Adj Suffix [0,0]waiting for Adj - Add
N -> Adj • Suffix [0,2]
- Find
- For
- Apply predict:
- For
N -> Adj • Suffix [0,2]:- Add
Suffix -> • ness [2,2]
- Add
- For
Chart[2] now looks like:
Adj -> happy • [1,2]
Adj -> Prefix Adj • [0,2]
N -> Adj • Suffix [0,2]
Suffix -> • ness [2,2]
Chart[3]: After Processing “ness”
Finally, we process the third input symbol “ness”:
- Apply scan:
- For
Suffix -> • ness [2,2]:- Add
Suffix -> ness • [2,3](complete)
- Add
- For
- Apply complete:
- For
Suffix -> ness • [2,3](complete):- Find
N -> Adj • Suffix [0,2]waiting for Suffix - Add
N -> Adj Suffix • [0,3](complete)
- Find
- For
- Apply complete again:
- For
N -> Adj Suffix • [0,3](complete):- Find
Word -> • N [0,0]waiting for N - Add
Word -> N • [0,3](complete)
- Find
- For
Chart[3] now looks like:
Suffix -> ness • [2,3]
N -> Adj Suffix • [0,3]
Word -> N • [0,3]
Since there’s a completed dotted rule for the start symbol Word that spans the entire input (0 to 3), the word “unhappiness” is in the language defined by our grammar.
Visualizing the Chart State
We can visualize the entire chart as follows:
|-----------------------|--------------------------|--------------------------|--------------------------|
| Chart[0] | Chart[1] | Chart[2] | Chart[3] |
|-----------------------|--------------------------|--------------------------|--------------------------|
| Word -> • N [0,0] | Prefix -> un • [0,1] | Adj -> happy • [1,2] | Suffix -> ness • [2,3] |
| N -> • Adj Suffix [0,0]| Adj -> Prefix • Adj [0,1]| Adj -> Prefix Adj • [0,2]| N -> Adj Suffix • [0,3] |
| Adj -> • happy [0,0] | Adj -> • happy [1,1] | N -> Adj • Suffix [0,2] | Word -> N • [0,3] |
| Adj -> • Prefix Adj [0,0]| Adj -> • Prefix Adj [1,1]| Suffix -> • ness [2,2] | |
| Prefix -> • un [0,0] | Prefix -> • un [1,1] | | |
|-----------------------|--------------------------|--------------------------|--------------------------|
Reconstructing the Parse Tree
From the completed dotted rule Word -> N • [0,3] in Chart[3], we can reconstruct the parse tree by following the backpointers:
- Start with
Word -> N • [0,3]in Chart[3] - Follow the backpointer to
N -> Adj Suffix • [0,3]in Chart[3] - Follow the backpointer to
Adj -> Prefix Adj • [0,2]in Chart[2] - Follow the backpointer to
Prefix -> un • [0,1]in Chart[1] - Follow the backpointer to
Adj -> happy • [1,2]in Chart[2] - Follow the backpointer to
Suffix -> ness • [2,3]in Chart[3]
This gives us the parse tree:
Word
|
N
/ \
Adj Suffix
/ \ |
Prefix Adj ness
| |
un happy
This tree correctly captures the structure ((un+happy)+ness).
Handling Ambiguity: The “unlockable” Example
Let’s revisit the ambiguous word “unlockable”, which has two possible interpretations:
un + (lock + able)= “not able to be locked”(un + lock) + able= “able to be unlocked”
With the Earley algorithm, we can handle this ambiguity naturally. Let’s create a grammar that captures both interpretations:
ambiguous_grammar = ContextFreeGrammar(
alphabet={'un', 'lock', 'able'},
variables={'Word', 'V', 'Adj', 'Prefix', 'Suffix'},
rules={
Rule('V', 'lock'),
Rule('Prefix', 'un'),
Rule('Suffix', 'able'),
Rule('V', 'Prefix', 'V'), # un + lock -> unlock
Rule('Adj', 'V', 'Suffix'), # lock + able -> lockable OR unlock + able -> unlockable
Rule('Adj', 'Prefix', 'Adj'), # un + lockable -> unlockable
Rule('Word', 'Adj')
},
start_variable='Word'
)When we parse “unlockable” with Earley, we’ll end up with two completed dotted rules in the final chart:
Word -> Adj • [0,3]where Adj comes fromAdj -> Prefix Adj • [0,3](un + lockable)Word -> Adj • [0,3]where Adj comes fromAdj -> V Suffix • [0,3](unlock + able)
Both of these analyses derive Word, resulting in two different parse trees:
Parse Tree 1 (un + lockable):
Word
|
Adj
/ \
Prefix Adj
| / \
un V Suffix
| |
lock able
Parse Tree 2 (unlock + able):
Word
|
Adj
/ \
V Suffix
/ \ |
Prefix V able
| |
un lock
The Earley algorithm finds all possible parses, which is essential for accurately analyzing structurally ambiguous words.
The Three Operations of Earley
The Earley algorithm consists of three main operations:
Predict: For each dotted rule where the dot precedes a non-terminal, add new dotted rules for all possible expansions of that non-terminal. This operation looks ahead to determine what might come next.
Scan: For each dotted rule where the dot precedes a terminal that matches the current input symbol, advance the dot past that terminal. This operation represents recognizing an input symbol.
Complete: For each dotted rule where the dot is at the end (indicating a completed constituent), go back to the chart position where the constituent began and advance the dot in all dotted rules that were waiting for this constituent.
These operations work together to efficiently parse the input while avoiding redundant work.
Advantages of Earley for Morphological Analysis
The Earley algorithm offers several advantages for morphological analysis:
Any CFG: Unlike CKY, Earley works with any context-free grammar, including those with ε-rules (empty productions), unary rules, and arbitrary rule lengths. This allows for more natural expression of morphological patterns.
Incremental processing: Earley processes the input left-to-right, which can be useful for interactive applications or streaming inputs.
Left recursion: Earley handles left-recursive rules without modification, which is important for certain morphological patterns.
Explicit predictions: The predict operation explicitly lists all the possible continuations at each position, which can be useful for generating suggestions in applications like spelling correction or auto-completion.
Efficiency: While the worst-case complexity is O(n³), for many practical grammars Earley performs better, especially for unambiguous or nearly unambiguous grammars.
Real-World Example: CELEX Data
Let’s revisit our “unbelievably” example from CELEX:
((((un)[Prefix] (believe)[V])[V] (able)[Suffix])[Adj] (ly)[Suffix])[Adv]
With the Earley algorithm, we can parse this complex structure without converting to CNF:
# Extract grammar from CELEX
celex_grammar = ContextFreeGrammar.from_treebank(celex_parses)
# Parse a specific word
parses = celex_grammar(['un', 'believe', 'able', 'ly'], mode="parse")
for parse in parses:
print(parse)We can also predict what might follow partial words:
# Predict what could follow "un believe"
predictions = celex_grammar.parser._predict_next_word(['un', 'believe'])
print(predictions) # Might output {'Suffix': {'able', 'er', 'ing', ...}}