Assignments 3 and 4

Like Assignments 1 and 2, Assignments 3 and 4 are bundled together. You only need to do Tasks 1 and 2 for Assignment 3 and Task 3 for Assignment 4.

These assignments focus on implementing a natural language inference system. In natural language inference, we receive a premise sentence and a hypothesis sentence and we must say whether we can infer the premise from the hypothesis. For instance, if (1) were our premise and (2) were our hypothesis, our system should respond yes.

  1. Every firm polled saw costs grow more than expected, even after adjusting for inflation.
  2. Every big company in the poll reported cost increases.

In MacCartney & Manning 2009 (henceforth, M&M), you read about one sort of system for doing this: a natural logic system. This system works by (i) obtaining an edit path from the premise and the hypothesis; (ii) mapping that edit path into an inference path; (iii) computing the join of the inferences in this path to obtain a relation between the premise and the hypothesis; and (iv) checking whether there is a forward entailment relation between the premise and the hypothesis.

The definition of the relations is given in Table 2 of the paper.

Symbol Names Example Set theoretic definition
\(x \equiv y\) equivalence couch \(\equiv\) sofa \(x = y\)
\(x \sqsubset y\) forward entailment crow \(\sqsubset\) bird \(x \subset y\)
\(x \sqsupset y\) reverse entailment European \(\sqsupset\) French \(x \supset y\)
\(x \land y\) negation human \(\land\) nonhuman \(x \cap y = \emptyset\) & \(x \cup y = U\)
\(x \mid y\) alternation cat \(\mid\) dog \(x \cap y = \emptyset\) & \(x \cup y \neq U\)
\(x \smile y\) cover animal \(\smile\) nonhuman \(x \cap y \neq \emptyset\) & \(x \cup y = U\)
\(x\;\#\;y\) independence animal \(\;\#\;\) nonhuman otherwise

The table of joins is given below.

relations = ['≡', '[', ']', '^', '|', 'u', '#']

join_table = {('≡', '≡'): {'≡'},
              ('≡', '['): {'['},
              ('≡', ']'): {']'},
              ('≡', '^'): {'^'},
              ('≡', '|'): {'|'},
              ('≡', 'u'): {'u'},
              ('≡', '#'): {'#'},
              ('[', '≡'): {'['},
              ('[', '['): {'['},
              ('[', ']'): {'#', '|', '≡', '[', ']'},
              ('[', '^'): {'|'},
              ('[', '|'): {'|'},
              ('[', 'u'): {'#', '^', 'u', '|', '['},
              ('[', '#'): {'#', '|', '['},
              (']', '≡'): {']'},
              (']', '['): {'#', 'u', '≡', '[', ']'},
              (']', ']'): {']'},
              (']', '^'): {'u'},
              (']', '|'): {'#', '^', 'u', '|', ']'},
              (']', 'u'): {'u'},
              (']', '#'): {'#', 'u', ']'},
              ('^', '≡'): {'^'},
              ('^', '['): {'u'},
              ('^', ']'): {'|'},
              ('^', '^'): {'≡'},
              ('^', '|'): {']'},
              ('^', 'u'): {'['},
              ('^', '#'): {'#'},
              ('|', '≡'): {'|'},
              ('|', '['): {'[', '^', '|', 'u', '#'},
              ('|', ']'): {'|'},
              ('|', '^'): {'['},
              ('|', '|'): {'#', '|', '≡', '[', ']'},
              ('|', 'u'): {'['},
              ('|', '#'): {'#', '|', '['},
              ('u', '≡'): {'u'},
              ('u', '['): {'u'},
              ('u', ']'): {'#', '^', 'u', '|', ']'},
              ('u', '^'): {']'},
              ('u', '|'): {']'},
              ('u', 'u'): {'#', 'u', '≡', '[', ']'},
              ('u', '#'): {'#', 'u', ']'},
              ('#', '≡'): {'#'},
              ('#', '['): {'#', 'u', '['},
              ('#', ']'): {']', '|', '#'},
              ('#', '^'): {'#'},
              ('#', '|'): {'#', '|', ']'},
              ('#', 'u'): {'#', 'u', '['},
              ('#', '#'): set()}

print('\t'.join(['']+relations))
for r1 in relations:
    row = '\t'.join(''.join([r3 for r3 in relations 
                             if r3 in join_table[r1, r2]]) 
                    for r2 in relations)
    print(f'{r1}\t{row}\n')
    ≡   [   ]   ^   |   u   #
≡   ≡   [   ]   ^   |   u   #

[   [   [   ≡[]|#   |   |   [^|u#   [|#

]   ]   ≡[]u#   ]   u   ]^|u#   u   ]u#

^   ^   u   |   ≡   ]   [   #

|   |   [^|u#   |   [   ≡[]|#   [   [|#

u   u   u   ]^|u#   ]   ]   ≡[]u#   ]u#

#   #   [u# ]|# #   ]|# [u# 

In Tasks 1 and 2, you will be developing the core of this system, using the minimum edit distance-based edit paths we developed in class and assuming default inference relations associated with each atomic edit operation (as discussed in Section 4 of M&M). In Tasks 3 and 4, you will enrich this system with lexical relation information from WordNet (Task 3) and with more intelligent handling of inferences associated with certain environments (as discussed in Section 5 of M&M). You will then test the system on the classic FraCaS dataset.

Task 1

Lines: 4

Define the __add__ magic method for the Inference class below. This method should use join_table (defined above) to produce a set of Inferences by joining two inferences—e.g. animal \(\sqsupset\) dog \(\bowtie\) dog \(\sqsupset\) greyhound = {animal \(\sqsupset\) greyhound}. __add__ must return a set because, as M&M discussed, the result of joining two relations can result in indeterminacy. (In their implementation, M&M actually treat all such indetrminate joins as #. We will not do that here, since it is useful to see why they do that.)

Importantly, note that __add__ should not be symmetric for the same reason joins are not: animal \(\sqsupset\) dog \(\bowtie\) dog \(\sqsubset\) mammal = {animal \(\equiv\) mammal, animal \(\sqsupset\) mammal, animal \(\sqsubset\) mammal, animal \(\smile\) mammal, animal \(\#\) mammal}, but dog \(\sqsubset\) mammal \(\bowtie\) animal \(\sqsupset\) dog isn’t even a licit join.

class Inference:    """An inference from one linguistic expression to another.        Parameters    ----------    premise : list[str]        The premise in the relation.    hypothesis : list[str]        The hypothesis in the relation.    relation : str        The relation.    """        def __init__(self, premise: list[str], hypothesis: list[str], relation: str):        if relation not in relations:            raise ValueError(f'relation must be in {relations}')                self.premise = premise        self.hypothesis = hypothesis        self.relation = relation        def __repr__(self):        return ' '.join(self.premise) + ' ' + self.relation + ' ' + ' '.join(self.hypothesis)        def __hash__(self):        return hash((tuple(self.premise), tuple(self.hypothesis), self.relation))            def __add__(self, other: 'Inference') -> set['Inference']:        """Compose this inference with another using the join table.        Parameters        ----------        other : Inference            The inference to compose with.        Returns        -------        set[Inference]            The set of inferences resulting from composition.        """        raise NotImplementedError        def __eq__(self, other: 'Inference') -> bool:        return (self.premise == other.premise) &\               (self.hypothesis == other.hypothesis) &\               (self.relation == other.relation)

Test your implementation of Inference.__add__ using the Editor subclasses below.

from abc import ABCclass Editor(ABC):    """Abstract base class for text edit operations.    Parameters    ----------    args        Arguments passed to the editor subclass.    """        def __init__(self, *args):        raise NotImplementedError    def __call__(self, input: list[str], idx: int) -> Inference:        """Apply this edit operation at the given index.        Parameters        ----------        input : list[str]            The input text as a list of tokens.        idx : int            The index at which to apply the edit.        Returns        -------        Inference            The resulting inference after applying the edit.        """        raise NotImplementedError            @property    def input(self):        """The input string this editor operates on."""        return self._input        @property    def output(self):        """The output string this editor produces."""        return self._output        class Substitution(Editor):    """A substitution editor.        Parameters    ----------    input : str        The string in the input to replace.    output : str        The string to replace the input string with.    relation : str        The inference relation that results.    """        default_relation = None        def __init__(self, input: str, output: str, relation: str):        self._input = input        self._output = output        self._relation = relation            def __repr__(self):        return f'<SUB "{self._output}" for "{self._input}" resulting in {self._relation}>'        def __call__(self, input: list[str], idx: int) -> Inference:        """Substitute input for output at the given location.        Parameters        ----------        input : list[str]            The input text as a list of tokens.        idx : int            The position at which to substitute.        Returns        -------        Inference            The resulting inference after substitution.        """        if input[idx] != self._input:            raise ValueError(f'SUB "{self._input}" -> "{self._output}" at {idx} '                             f'cannot be applied to {input}')                output = input[:idx] + [self._output] + input[(idx+1):]                return Inference(input, output, self._relation)        class Deletion(Editor):    """A deletion editor.        Parameters    ----------    input : str        The string in the input to delete.    relation : str        The inference relation that results.    """        def __init__(self, input: str, relation: str='['):        self._input = input        self._relation = relation            def __repr__(self):        return f'<DEL "{self._input}" resulting in {self._relation}>'            def __call__(self, input: list[str], idx: int) -> Inference:        """Delete the element at the given location.        Parameters        ----------        input : list[str]            The input text as a list of tokens.        idx : int            The position at which to delete.        Returns        -------        Inference            The resulting inference after deletion.        """        if input[idx] != self._input:            raise ValueError(f'DEL "{self._input}" at {idx} '                             f'cannot be applied to {input}')                output = input[:idx] + input[(idx+1):]                return Inference(input, output, self._relation)        class Insertion(Editor):    """An insertion editor.        Parameters    ----------    output : str        The string to insert into the output.    relation : str        The inference relation that results.    """        def __init__(self,  output: str, relation: str=']'):        self._output = output        self._relation = relation          def __repr__(self):        return f'<INS "{self._output}" resulting in {self._relation}>'        def __call__(self, input: list[str], idx: int) -> Inference:        """Insert an element at the given location.        Parameters        ----------        input : list[str]            The input text as a list of tokens.        idx : int            The position at which to insert.        Returns        -------        Inference            The resulting inference after insertion.        """        output = input[:idx] + [self._output] + input[idx:]                return Inference(input, output, self._relation)

These subclasses are initialized with input and/or output strings and a relation. For instance, “brindle” and “fawn” are two different colorings of greyhounds—no greyhound is both brindle and fawn—and so they are in the | relation. Each is at least a subsective modifier (all brindle greyhounds are greyhounds), so if we delete one, we obtain a \(\sqsubset\) relation, and if we insert one, we get a \(\sqsupset\) relation (the default relations for deletion and insertion, as discussed in M&M).

substitute_fawn_for_brindle = Substitution('brindle', 'fawn', '|')
delete_brindle = Deletion('brindle')
insert_brindle = Insertion('brindle')

substitute_fawn_for_brindle, delete_brindle, insert_brindle
(<SUB "fawn" for "brindle" resulting in |>,
 <DEL "brindle" resulting in [>,
 <INS "brindle" resulting in ]>)

Note that not all insertions or deletions of adjectives will be associated with \(\sqsubset\) or \(\sqsupset\): privative adjectives like “fake” will introduce a \(|\): fake greyhounds are not greyhounds (fake greyhound \(|\) greyhound) and greyhounds are not fake greyhounds (greyhound \(|\) fake greyhound).

delete_fake = Deletion('fake', relation='|')
insert_fake = Insertion('fake', relation='|')

Indeed, most substitutions involving “fake” will also yield a \(|\) relation.

substitute_fake_for_virtuosic = Substitution('virtuosic', 'fake', '|')
substitute_virtuosic_for_fake = Substitution('fake', 'virtuosic', '|')

But insertion and deletion edits involving “virtuosic” should act like “brindle”.

delete_virtuosic = Deletion('virtuosic')
insert_virtuosic = Insertion('virtuosic')

Use the following four sentences to write your tests. These tests should involve applying an edit \(e_1\) to sentence \(s_i\) to yield sentence \(e_1(s_i)\), then applying an edit \(e_2\) to \(e_1(s_i)\) to yield sentence. You should then combine the inferences associated with \(e_1\) and \(e_2\) using your Inference.__add__ and check that it is correct. Make sure to test at least one case where the result should be a non-singleton set of inferences.

test_sentence1 = ['a', 'virtuosic', 'synthesist', 'loves', 'a', 'happy', 'brindle', 'greyhound']
test_sentence2 = ['a', 'synthesist', 'loves', 'a', 'happy', 'greyhound']
test_sentence3 = ['a', 'fake', 'synthesist', 'loves', 'a', 'happy', 'brindle', 'greyhound']
test_sentence4 = ['a', 'synthesist', 'loves', 'a', 'happy', 'brindle', 'greyhound']
# write tests here  

Editor Libraries

We’ll need a way to store collections of editors and, crucially, make new default ones when needed. The EditorLibrary class provides a convenient way to store and retrieve editors (like substitutions, deletions, and insertions). When we try to get an editor that isn’t in the library, it automatically creates a default one - substitutions get a # relation (since we don’t have a default for them), while deletions and insertions get the default behavior defined by MacCartney & Manning.

from typing import Literaltype EditorType = Literal['substitute', 'delete', 'insert']type EditorInputOutput = str | tuple[str, str]empty_library = {'substitute': {}, 'delete': {}, 'insert': {}}class EditorLibrary:    """A library of editors indexed by type and input/output parameters.    Parameters    ----------    library : dict[EditorType, dict[EditorInputOutput, Editor]]        Mapping from editor types to dictionaries of editors.    """    def __init__(self, library: dict[EditorType, dict[EditorInputOutput, Editor]] = empty_library):        self._library = library            def __getitem__(self, key: tuple[EditorType, EditorInputOutput]) -> Editor:        """Look up an editor by type and input/output parameters.        Parameters        ----------        key : tuple[EditorType, EditorInputOutput]            A tuple of the editor type and the edit parameters.        Returns        -------        Editor            The editor for the given type and parameters.        """        editor_type, edit = key        if edit not in self._library[editor_type]:            self._add_default_editor(editor_type, edit)        return self._library[editor_type][edit]        def add_editor(self, editor: Editor):        """Register an editor in the library.        Parameters        ----------        editor : Editor            The editor to add.        """        if isinstance(editor, Substitution):            self._library['substitute'][(editor.input, editor.output)] = editor                    if isinstance(editor, Insertion):            self._library['insert'][editor.output] = editor                    if isinstance(editor, Deletion):            self._library['delete'][editor.input] = editor                def _add_default_editor(self, editor_type: str, edit: EditorInputOutput):        if editor_type == 'substitute':            self.add_editor(Substitution(input=edit[0], output=edit[1], relation='#'))                elif editor_type == 'insert':            self.add_editor(Insertion(output=edit))                    elif editor_type == 'delete':            self.add_editor(Deletion(input=edit))                    else:            raise ValueError(f'{editor_type} is not a recognized edit type')

Task 2

Lines: 20

We don’t want to have to hand-compute the edits that are required to convert one sentence into another. Instead, we will use a modified form of the StringEdit class we developed in class. What we need in particular are the edit paths that that class produces.

First, we’ll define a class for representing and manipulating edit paths. One important thing we want this class to do is to convert the edit path into a list of editors, for which we need to have a way to look up the editor for a given edit type and parameters given an EditorLibrary.

type EditLocation = inttype EditorParameters = tuple[EditorInputOutput, EditLocation]class EditPath:    """Class for representing and manipulating sequences of text edits.    Parameters    ----------    edits : list[tuple[EditorType, EditorParameters]]        List of tuples containing edit type and parameters for each edit operation.    """    def __init__(self, edits: list[tuple[EditorType, EditorParameters]]):        self.edits_unshifted = edits        self.edits = self._shift_indices(edits)            def __repr__(self) -> str:        return str(self.edits)    def __call__(self, input_text: list[str]) -> list[str]:        """Apply the edit path to transform the input text.                Parameters        ----------        input_text : list[str]            The input text to transform.                Returns        -------        list[str]            The transformed text after applying all edits.        """        current_text = input_text.copy()                for edit_type, (edit, idx) in self.edits:            if edit_type == 'substitute':                old_word, new_word = edit                if old_word != current_text[idx]:                    raise ValueError(                        f'Substitution {old_word} -> {new_word} at {idx} '                        f'cannot be applied to {current_text}'                    )                current_text[idx] = new_word                            elif edit_type == 'delete':                current_text.pop(idx)                            elif edit_type == 'insert':                current_text.insert(idx, edit)                        return current_text    def _shift_indices(self, edit_path: list[tuple[EditorType, EditorParameters]]) -> list[tuple[EditorType, EditorParameters]]:        """Adjust indices of edits to account for previous insertions and deletions.        The edit sequence output by the ``StringEdit`` class is always relativized to         the original string. But if we are applying the edits in sequence, the string        that we apply subsequent edits to will differ from the original string, so we        need to shift the indices of the subsequent edits to account for the previous        edits.        Parameters        ----------        edit_path : list[tuple[EditorType, EditorParameters]]            Original list of edits with unadjusted indices.        Returns        -------        list[tuple[EditorType, EditorParameters]]            List of edits with indices shifted to account for previous edits.        """        edit_path_shifted = []        # track cumulative index shifts from insertions/deletions;        # always shift back by 1 because the original edit sequence        # is 1-indexed (due to the sentinel)        shifts = [(0, -1)]        for i, (edit_type, (edit, idx)) in enumerate(edit_path):            original_idx = idx            # apply all previous shifts to current index            for j, s in shifts:                idx = idx + s if idx >= j else idx             if edit_type == 'substitute':                # identical substitutions are no-ops                if edit[0] == edit[1]:                    continue                else:                    # substitution does not shift because we                    # replace one element with another                    shifts.append((idx, 0))                elif edit_type == 'delete':                # deletion shifts back by 1                shifts.append((idx, -1))            elif edit_type == 'insert':                # insertion shifts forward by 1                shifts.append((idx, 1))                next_edit_type, (_, next_idx) = edit_path[i+1]                                # keep inserted element in order with an                # immediately following substitution                if next_edit_type != 'substitute' or next_idx != original_idx:                    idx += 1            edit_path_shifted.append((edit_type, (edit, idx)))        return edit_path_shifted        def to_editors(self, library: EditorLibrary) -> list[tuple[EditLocation, Editor]]:        """Convert edit path to a sequence of editors using the provided library.        Parameters        ----------        library : EditorLibrary            Library mapping edit types and parameters to Editor instances.        Returns        -------        list[tuple[EditLocation, Editor]]            List of tuples containing the edit location and the editor for each             edit in the path.        """        return [            (edit_idx, library[edit_type, edit])             for edit_type, (edit, edit_idx) in self.edits        ]

Next, we’ll define a class for computing edit distances, alignments, and edit paths between strings we used in class, with some slight updates for the current assignment.

import numpy as nptype Alignment = list[tuple[int, int]]class StringEdit:    """Class for computing edit distances, alignments, and edit paths between strings.    This class implements the Wagner-Fisher algorithm for computing minimum edit    distance between sequences, along with the corresponding alignments and edit paths.    Parameters    ----------    insertion_cost : float, default=1.0        Cost of inserting a character.    deletion_cost : float, default=1.0        Cost of deleting a character.    substitution_cost : float or None, default=None        Cost of substituting a character. If None, defaults to insertion_cost + deletion_cost.    """        def __init__(self, insertion_cost: float = 1., deletion_cost: float = 1., substitution_cost: float | None = None):        self._insertion_cost = insertion_cost        self._deletion_cost = deletion_cost        if substitution_cost is None:            self._substitution_cost = insertion_cost + deletion_cost        else:            self._substitution_cost = substitution_cost    def __call__(self, source: list[str], target: list[str], only_distance: bool = False) ->  float | tuple[float, Alignment, list[EditPath]]:        """Compute the edit distance between source and target.        Parameters        ----------        source : list[str]            Source sequence.        target : list[str]            Target sequence.        only_distance : bool            If True, return only the edit distance.        Returns        -------        float | tuple[float, Alignment, list[EditPath]]            If only_distance is True, returns just the edit distance.            Otherwise returns a tuple of (distance, alignment, edit_paths).        """        return self._wagner_fisher(source, target, only_distance)                def _wagner_fisher(self, source: list[str], target: list[str], only_distance: bool) ->  float | tuple[float, Alignment, list[EditPath]]:        """Compute minimum edit distance, alignment, and edit sequence using Wagner-Fisher algorithm.        Parameters        ----------        source : list[str]            Source sequence.        target : list[str]            Target sequence.        only_distance : bool            If True, return only the edit distance.        Returns        -------        float | tuple[float, Alignment, list[EditPath]]            If only_distance is True, returns just the edit distance.            Otherwise returns a tuple of (distance, alignment, edit_paths).        """        n, m = len(source), len(target)        source, target = self._add_sentinel(source, target)        # initialize matrices for dynamic programming        distance = np.zeros([n+1, m+1], dtype=float)        pointers = np.zeros([n+1, m+1], dtype=list)        edits = np.zeros([n+1, m+1], dtype=list)        pointers[0,0] = []        edits[0,0] = []                # initialize first column (deletions)        for i in range(1,n+1):            distance[i,0] = distance[i-1,0]+self._deletion_cost            pointers[i,0] = [(i-1,0)]            edits[i,0] = [('delete', (source[i], i))]        # initialize first row (insertions)        for j in range(1,m+1):            distance[0,j] = distance[0,j-1]+self._insertion_cost            pointers[0,j] = [(0,j-1)]            edits[0,j] = [('insert', (target[j], j))]                    # fill in the rest of the matrices        for i in range(1,n+1):            for j in range(1,m+1):                if source[i] == target[j]:                    substitution_cost = 0.                else:                    substitution_cost = self._substitution_cost                                    costs = np.array([distance[i-1,j]+self._deletion_cost,                                  distance[i-1,j-1]+substitution_cost,                                  distance[i,j-1]+self._insertion_cost])                                    distance[i,j] = costs.min()                best_edits = np.where(costs==distance[i,j])[0]                indices = [(i-1,j), (i-1,j-1), (i,j-1)]                pointers[i,j] = [indices[k] for k in best_edits]                 edit_types = list(zip(["delete", "substitute", "insert"],                                      [(source[i], i),                                        ((source[i], target[j]), i),                                        (target[j], i)]))                edits[i,j] = [edit_types[k] for k in best_edits]        if only_distance:            return distance[n,m]        pointer_backtrace, edit_backtrace = self._construct_backtrace(pointers, edits, n, m)        edit_paths = [EditPath(bt) for bt in edit_backtrace]        return distance[n,m], pointer_backtrace, edit_paths    def _construct_backtrace(self, pointers: np.ndarray, edits: np.ndarray, n: int, m: int) -> tuple[list[list[tuple[int, int]]], list[list[tuple[str, tuple[str, int]]]]]:        """Construct all possible backtraces through the dynamic programming matrix.        Parameters        ----------        pointers : np.ndarray            Matrix of pointers to previous cells.        edits : np.ndarray            Matrix of edit operations.        n : int            Length of source sequence.        m : int            Length of target sequence.        Returns        -------        tuple[list[list[tuple[int, int]]], list[list[tuple[str, tuple[str, int]]]]]            Returns (pointer_backtraces, edit_backtraces).        """        stack = [([(n,m)], [])]        complete_pointer_backtraces = []        complete_edit_backtraces = []                while stack:            current_pointer_path, current_edit_path = stack.pop()            current_pos = current_pointer_path[-1]                        if current_pos == (0,0):                complete_pointer_backtraces.append(current_pointer_path[::-1])                complete_edit_backtraces.append(current_edit_path[::-1])                continue                            for next_pos, edit in zip(pointers[current_pos], edits[current_pos]):                new_pointer_path = current_pointer_path + [next_pos]                new_edit_path = current_edit_path + [edit]                stack.append((new_pointer_path, new_edit_path))                        return complete_pointer_backtraces, complete_edit_backtraces            def _add_sentinel(        self,         source: str | list | tuple,         target: str | list | tuple    ) -> tuple[str | list | tuple, str | list | tuple]:        """Add sentinel symbols to beginning of sequences.        Parameters        ----------        source            Source sequence.        target            Target sequence.        Returns        -------        tuple[str | list | tuple, str | list | tuple]            Source and target with added sentinels.        Raises        ------        ValueError            If source or target are not str, list, or tuple.        """        if isinstance(source, str):            source = '#'+source        elif isinstance(source, list):            source = ['#'] + source        elif isinstance(source, tuple):            source = ('#',) + source        else:            raise ValueError('source must be str, list, or tuple')                    if isinstance(target, str):            target = '#' + target        elif isinstance(target, list):            target = ['#'] + target        elif isinstance(target, tuple):            target = ('#',) + target        else:            raise ValueError('target must be str, list, or tuple')                    return source, target

In the original implementation, the edit path indexed into the source string. This made sense at the time because we wanted to know which words, relative to their original position in the string, are operated on by an edit. It’s problematic for current purposes, because once we compute insertions and deletions, the position of later insertions or deletions change. The implementation below now corrects for this, but just make sure you’re taking into account that the order of edits matters for this reason.

editdist = StringEdit(1, 1, 1)

dist, align, edits = editdist(test_sentence1, test_sentence2)

edit_path_is_correct = all(
    e(test_sentence1) == test_sentence2 for e in edits
)

print('Source:   ', test_sentence1)
print('Target:   ', test_sentence2)
print('Pointer path:', align)
print('Edit path:', edits)
print("Edited:", [e(test_sentence1) for e in edits])
print("Edit path is correct:", edit_path_is_correct)
Source:    ['a', 'virtuosic', 'synthesist', 'loves', 'a', 'happy', 'brindle', 'greyhound']
Target:    ['a', 'synthesist', 'loves', 'a', 'happy', 'greyhound']
Pointer path: [[(0, 0), (1, 1), (2, 1), (3, 2), (4, 3), (5, 4), (6, 5), (7, 5), (8, 6)]]
Edit path: [[('delete', ('virtuosic', 1)), ('delete', ('brindle', 5))]]
Edited: [['a', 'synthesist', 'loves', 'a', 'happy', 'greyhound']]
Edit path is correct: True
dist, align, edits = editdist(test_sentence2, test_sentence1)

edit_path_is_correct = all(
    e(test_sentence2) == test_sentence1 for e in edits
)

print('Source:   ', test_sentence2)
print('Target:   ', test_sentence1)
print('Pointer path:', align)
print('Edit path:', edits)
print("Edited:", [e(test_sentence2) for e in edits])
print("Edit path is correct:", edit_path_is_correct)
Source:    ['a', 'synthesist', 'loves', 'a', 'happy', 'greyhound']
Target:    ['a', 'virtuosic', 'synthesist', 'loves', 'a', 'happy', 'brindle', 'greyhound']
Pointer path: [[(0, 0), (1, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (5, 7), (6, 8)]]
Edit path: [[('insert', ('virtuosic', 1)), ('insert', ('brindle', 6))]]
Edited: [['a', 'virtuosic', 'synthesist', 'loves', 'a', 'happy', 'brindle', 'greyhound']]
Edit path is correct: True
test_sentence1_prime = ["some"] + test_sentence1[1:]

dist, align, edits = editdist(test_sentence2, test_sentence1_prime)

edit_path_is_correct = all(
    e(test_sentence2) == test_sentence1_prime for e in edits
)

print('Source:   ', test_sentence2)
print('Target:   ', test_sentence1_prime)
print('Pointer path:', align)
print('Edit path:', edits)
print("Edit path is correct:", edit_path_is_correct)
Source:    ['a', 'synthesist', 'loves', 'a', 'happy', 'greyhound']
Target:    ['some', 'virtuosic', 'synthesist', 'loves', 'a', 'happy', 'brindle', 'greyhound']
Pointer path: [[(0, 0), (1, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (5, 7), (6, 8)], [(0, 0), (0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (5, 7), (6, 8)]]
Edit path: [[('substitute', (('a', 'some'), 0)), ('insert', ('virtuosic', 1)), ('insert', ('brindle', 6))], [('insert', ('some', 0)), ('substitute', (('a', 'virtuosic'), 1)), ('insert', ('brindle', 6))]]
Edit path is correct: True
dist, align, edits = editdist(test_sentence1_prime, test_sentence2)

edit_path_is_correct = all(
    e(test_sentence1_prime) == test_sentence2 for e in edits
)

print('Source:   ', test_sentence1_prime)
print('Target:   ', test_sentence2)
print('Pointer path:', align)
print('Edit path:', edits)
print("Edit path is correct:", edit_path_is_correct)
Source:    ['some', 'virtuosic', 'synthesist', 'loves', 'a', 'happy', 'brindle', 'greyhound']
Target:    ['a', 'synthesist', 'loves', 'a', 'happy', 'greyhound']
Pointer path: [[(0, 0), (1, 0), (2, 1), (3, 2), (4, 3), (5, 4), (6, 5), (7, 5), (8, 6)], [(0, 0), (1, 1), (2, 1), (3, 2), (4, 3), (5, 4), (6, 5), (7, 5), (8, 6)]]
Edit path: [[('delete', ('some', 0)), ('substitute', (('virtuosic', 'a'), 0)), ('delete', ('brindle', 5))], [('substitute', (('some', 'a'), 0)), ('delete', ('virtuosic', 1)), ('delete', ('brindle', 5))]]
Edit path is correct: True
test_sentence1_prime2 = test_sentence1[:4] + ["some"] + test_sentence1[5:]

dist, align, edits = editdist(
    test_sentence2, 
    test_sentence1_prime2
)

edit_path_is_correct = all(
    e(test_sentence2) == test_sentence1_prime2 for e in edits
)

print('Source:   ', test_sentence2)
print('Target:   ', test_sentence1_prime2)
print('Pointer path:', align)
print('Edit path:', edits)
print("Edit path is correct:", edit_path_is_correct)
Source:    ['a', 'synthesist', 'loves', 'a', 'happy', 'greyhound']
Target:    ['a', 'virtuosic', 'synthesist', 'loves', 'some', 'happy', 'brindle', 'greyhound']
Pointer path: [[(0, 0), (1, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (5, 7), (6, 8)]]
Edit path: [[('insert', ('virtuosic', 1)), ('substitute', (('a', 'some'), 4)), ('insert', ('brindle', 6))]]
Edit path is correct: True
dist, align, edits = editdist(test_sentence1_prime2, test_sentence2)

edit_path_is_correct = all(
    e(test_sentence1_prime2) == test_sentence2 for e in edits
)

print('Source:   ', test_sentence1_prime2)
print('Target:   ', test_sentence2)
print('Pointer path:', align)
print('Edit path:', edits)
print("Edit path is correct:", all(e(test_sentence1_prime2) == test_sentence2 for e in edits))
Source:    ['a', 'virtuosic', 'synthesist', 'loves', 'some', 'happy', 'brindle', 'greyhound']
Target:    ['a', 'synthesist', 'loves', 'a', 'happy', 'greyhound']
Pointer path: [[(0, 0), (1, 1), (2, 1), (3, 2), (4, 3), (5, 4), (6, 5), (7, 5), (8, 6)]]
Edit path: [[('delete', ('virtuosic', 1)), ('substitute', (('some', 'a'), 3)), ('delete', ('brindle', 5))]]
Edit path is correct: True

test_sentence1_prime3 = ["some"] + test_sentence1[1:4] + ["some"] + test_sentence1[5:]
dist, align, edits = editdist(test_sentence1_prime3, test_sentence2)

edit_path_is_correct = all(
    e(test_sentence1_prime3) == test_sentence2 for e in edits
)

print('Source:   ', test_sentence1_prime3)
print('Target:   ', test_sentence2)
print('Pointer path:', align)
print('Edit path:', edits)
print("Edit path is correct:", edit_path_is_correct)
Source:    ['some', 'virtuosic', 'synthesist', 'loves', 'some', 'happy', 'brindle', 'greyhound']
Target:    ['a', 'synthesist', 'loves', 'a', 'happy', 'greyhound']
Pointer path: [[(0, 0), (1, 0), (2, 1), (3, 2), (4, 3), (5, 4), (6, 5), (7, 5), (8, 6)], [(0, 0), (1, 1), (2, 1), (3, 2), (4, 3), (5, 4), (6, 5), (7, 5), (8, 6)]]
Edit path: [[('delete', ('some', 0)), ('substitute', (('virtuosic', 'a'), 0)), ('substitute', (('some', 'a'), 3)), ('delete', ('brindle', 5))], [('substitute', (('some', 'a'), 0)), ('delete', ('virtuosic', 1)), ('substitute', (('some', 'a'), 3)), ('delete', ('brindle', 5))]]
Edit path is correct: True

dist, align, edits = editdist(test_sentence2, test_sentence1_prime3)

edit_path_is_correct = all(
    e(test_sentence2) == test_sentence1_prime3 for e in edits
)

print('Source:   ', test_sentence2)
print('Target:   ', test_sentence1_prime3)
print('Pointer path:', align)
print('Edit path:', edits)
print("Edit path is correct:", edit_path_is_correct)
Source:    ['a', 'synthesist', 'loves', 'a', 'happy', 'greyhound']
Target:    ['some', 'virtuosic', 'synthesist', 'loves', 'some', 'happy', 'brindle', 'greyhound']
Pointer path: [[(0, 0), (1, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (5, 7), (6, 8)], [(0, 0), (0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (5, 7), (6, 8)]]
Edit path: [[('substitute', (('a', 'some'), 0)), ('insert', ('virtuosic', 1)), ('substitute', (('a', 'some'), 4)), ('insert', ('brindle', 6))], [('insert', ('some', 0)), ('substitute', (('a', 'virtuosic'), 1)), ('substitute', (('a', 'some'), 4)), ('insert', ('brindle', 6))]]
Edit path is correct: True

Implement the __call__ method for the NaturalLogic class. This should take a premise sentence and a hypothesis sentence, and it should produce the paths of inferences (computed from the paths of edits) that take you from premise to hypothesis.

Each path should be a list of inferences that result from cumulatively composing the inferences associated with each edit in the path. It should not be a path of local inferences. That is, it should not be a list of inferences that result from each edit in an edit path, but rather a list of inferences that result from composing those edits using Inference.__add__.

You will not be using EditPath.__call__ in any way. That method is implemented to demonstrate how we should apply edit paths to strings. You should instead be using EditPath.to_editors to get the list of editors that result from the edit path, and then you should use those editors to compute the local inferences (again, the inferences that result from applying each editor in sequence).

type InferencePath = tuple[Inference]class NaturalLogic:    """Class for performing natural logic inference between sentences.    This class implements natural logic inference by finding edit paths between sentences    and composing the inferences associated with each edit. It uses an editor library    that maps edit operations (substitutions, deletions, insertions) to Editor instances    that specify the inference relations for those edits.    Parameters    ----------    editor_library : EditorLibrary, optional        Library mapping edit types and parameters to Editor instances.        Defaults to an empty EditorLibrary.    Attributes    ----------    EDIT : StringEdit        StringEdit instance used for computing edit distances and paths between strings,        with default costs of 1 for substitution, deletion, and insertion.    """        EDIT = StringEdit(1, 1, 1)        def __init__(self, editor_library: EditorLibrary=EditorLibrary()):        self._editor_library = editor_library        def __getitem__(self, key: tuple[EditorType, EditorInputOutput]):        """Look up an editor in the library.        Parameters        ----------        key : tuple[EditorType, EditorInputOutput]            A tuple of the editor type and the edit parameters.        Returns        -------        Editor            The editor for the given type and parameters.        """        return self._editor_library[key]        def __call__(self, premise: list[str], hypothesis: list[str]) -> set[InferencePath]:        """Perform natural logic inference between a premise and hypothesis sentence.        This method computes the possible edit paths between the premise and         hypothesis, and then composes the inferences associated with each edit         to yield all possible inference paths implied by the edit paths.        Parameters        ----------        premise : list[str]            The premise sentence to infer from.        hypothesis : list[str]            The hypothesis sentence to infer to.        Returns         -------        set[InferencePath]            A set of inference paths representing the paths of cumulatively             composed inferences from the premise to the hypothesis implied by             the edit path.        """        raise NotImplementedError

Implement tests using the four test sentences above. (Ignore my modified versions of these sentences.) For now, you can just assume that the editor library contains the editors defined for Task 1. (We don’t need to explicitly specify any insertions that result in \(\sqsupset\) or deletions that result in \(\sqsubset\), since those are added by default by NaturalLogic.add_editor.) In Task 3, we will expand the library using WordNet.

library = {'substitute': {('virtuosic', 'fake'): substitute_fake_for_virtuosic,
                          ('fake', 'virtuosic'): substitute_virtuosic_for_fake,
                          ('brindle', 'fawn'): substitute_fawn_for_brindle}, 
           'delete': {"fake": delete_fake}, 
           'insert': {"fake": insert_fake}}
# write tests here

Evaluating against FraCaS

For the remainder of the assignment (Tasks 3 and 4), we will evaluate our NaturalLogic implementation using the FraCaS textual inference test suite. FraCaS is shipped as XML.

%%bash

wget https://nlp.stanford.edu/~wcmac/downloads/fracas.xml
cat fracas.xml

I’ve included a simple corpus reader below.

!pip install beautifulsoup4
!pip install lxml
from bs4 import BeautifulSoup, Tag

class Fracas:
    """Corpus reader for the FraCaS textual inference problem set"""
    
    def __init__(self, root: str="fracas.xml"):
        with open(root) as fp:
            self._data = BeautifulSoup(fp, 'lxml')
            
        self._construct_problem_generator()
            
    def __iter__(self):
        return self
    
    def __next__(self):
        return next(self._problem_generator)
    
    def __repr__(self):
        return self._data.comment.string
     
    def _construct_problem_generator(self):
        for problem in self.problems:
            yield problem
    
    @property
    def problems(self):
        return [FracasProblem(problem) 
                for problem in self._data.find_all('problem')]

class FracasProblem:
    """A FraCaS problem"""
    
    problem_type_map = {'001'}
    
    def __init__(self, problem: Tag):
        self.id = problem.get('id')
        self.answer = problem.get('fracas_answer')
        
        self.premise = problem.p.string.strip()
        self.question = problem.q.string.strip()
        self.hypothesis = problem.h.string.strip()
        
    def __repr__(self):
        return (f"id: {self.id}"
                f"\n\npremise: {self.premise}"
                f"\nquestion: {self.question}"
                f"\nhypothesis: {self.hypothesis}"
                f"\n\nanswer: {self.answer}")
fracas = Fracas()

fracas

Since the sentences are just raw strings, to get them in the form of a list of strings, you will need a tokenizer. I would suggest using the one available in the stanza package. For our purposes, it is also simpler to use the lemma, rather than the token itself, because your WordNet editor library won’t handle inflectional morphology (unless you explicitly engineered it to).

!pip install stanza

import stanza

stanza.download('en')
lemmatizer = stanza.Pipeline('en', processors='tokenize, mwt, pos, lemma')

lemmatizer('Every virtuosic synthesist loves some greyhounds.')

To use this dataset to test your NaturalLogic implementation, you will need to convert the inference produced by __call__ into a “yes”, “no”, or “don’t know” answer. (Don’t worry about any items not labeled with one of these three. This will require you to define a mapping from inference types to answers. You should then compute the accuracy, precision, recall, and F1 of your system.

Each of these metrics can be defined in terms of…

  1. The true positive count for class \(c\): \[\mathrm{tp}(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c) = |\{i\;:\;y^\mathrm{test}_i = \hat{y}^\mathrm{test}_i = c\}|\]
  2. The true negative count for class \(c\): \[\mathrm{tn}(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c) = |\{i\;:\;y^\mathrm{test}_i \neq c \land \hat{y}^\mathrm{test}_i \neq c\}|\]
  3. The false positve count for class \(c\): \[\mathrm{fp}(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c) = |\{i\;:\;y^\mathrm{test}_i \neq c \land \hat{y}^\mathrm{test}_i = c\}|\]
  4. The false negative count for class \(c\): \[\mathrm{fn}(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c) = |\{i\;:\;y^\mathrm{test}_i = c \land \hat{y}^\mathrm{test}_i \neq c\}|\]

…where the class is “yes”, “no”, or “unknown”; \(y^\mathrm{test}_i\) is the true label for item \(i\) (found in FraCaS) and \(\hat{y}^\mathrm{test}_i\) is your system’s prediction for the class of item \(i\). (Ignore cases where the class is not one of these three.)

Accuracy

For what proportion of the test data \(\{(x^\mathrm{test}_{1}, y^\mathrm{test}_1), ..., (x^\mathrm{test}_N, y^\mathrm{test}_N)\}\) does the model’s predicted class \(f(x^\mathrm{test}_i) = \hat{y}^\mathrm{test}_i\) for an item match the ground truth class for that item?

\[\mathrm{accuracy}\left(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}\right) = \frac{\sum_{c \in \mathcal{Y}}\mathrm{tp}(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c) + \mathrm{tn}(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c)}{N}\]

sklearn.metrics technically provides an accuracy_score function, but generally it’s just as straightforward to compute it yourself.

!pip install sklearn
from sklearn.metrics import accuracy_score

Precision

For a particular class \(c\), what proportion of the test items that the model said have that class actually have that class?

\[\mathrm{precision}\left(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c\right) = \frac{\mathrm{tp}(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c)}{\mathrm{tp}(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c) + \mathrm{fp}(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c)}\]

For giving an aggregate precision across classes, it’s common to distinguish micro-average precision and macro-average precision.

\[\mathrm{microprecision}\left(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}\right) = \frac{\sum_{c \in \mathcal{Y}} \mathrm{tp}(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c)}{\sum_{c \in \mathcal{Y}} \mathrm{tp}(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c) + \mathrm{fp}(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c)}\]

\[\mathrm{macroprecision}\left(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}\right) = \frac{1}{|\mathcal{Y}|}\sum_{c \in \mathcal{Y}} \mathrm{precision}\left(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c\right)\]

from sklearn.metrics import precision_score

Recall

For a particular class \(c\), what proportion of the test items that have that class did the model correctly predict to have that class?

\[\mathrm{recall}\left(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c\right) = \frac{\mathrm{tp}(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c)}{\mathrm{tp}(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c) + \mathrm{fn}(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c)}\]

Similar definitions for micro- and macro-average recall can be given.

from sklearn.metrics import recall_score

F1

For a class \(c\), what is the harmonic mean of precision and recall?

\[F_1\left(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c\right) = \frac{2}{\frac{1}{\mathrm{precision}\left(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c\right)} + \frac{1}{\mathrm{recall}\left(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c\right)}} = 2\frac{\mathrm{precision}\left(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c\right)\;\cdot\;\mathrm{recall}\left(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c\right)}{\mathrm{precision}\left(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c\right) + \mathrm{recall}\left(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c\right)}\]

To define micro- and macro-average \(F_1\) it can be useful to reexpress it.

\[F_1\left(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c\right) = \frac{2\mathrm{tp}(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c)}{2\mathrm{tp}(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c) + \mathrm{fp}(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c) + \mathrm{fn}(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c)}\]

Definitions similar to those for precision can be given for micro- and macro-average \(F_1\).

from sklearn.metrics import f1_score

Task 3

Define an instance method NaturalLogic.load_wordnet that constructs an editor library from WordNet hypernymy, hyponymy, and antonymy relations.

!pip install nltk

import nltk

nltk.download('wordnet')

from nltk.corpus import wordnet
class NaturalLogic(NaturalLogic):

    def load_wordnet(self):
        raise NotImplementedError
    
    @classmethod
    def from_wordnet(cls):
        natlog = cls()
        natlog.load_wordnet()
        
        return natlog

Test your new library by writing examples that require knowledge of hypernymy, hyponymy, and antonymy to correctly handle.

# write tests here

Evaluate your new library on FraCaS by computing precision, recall, and F1 for the items that are either labeled “yes”, “no”, or “don’t know”. Remember that this is going to require you to define a way of mapping inference types to answers.

# write evaluation here

These numbers will be bad. The point is to see that handling even the apparently simple cases in FraCaS is very difficult, even with a fairly extensive edit library.

Find at least three examples you get wrong. For each example, identify where in the edit sequence the problem occurs and explain how this issue might be fixed on the basis of what you read in MacCartney and Manning. (Hint: look at how they model quantifiers and negation.) If the MacCartney and Manning approach is not fully sufficient to fix the error, identify what you would need to do to extend it to handle the problem. (Hint: this will often involve changes to your editor library. What would those changes need to look like?)

# write explanation here