Like Assignments 1 and 2, Assignments 3 and 4 are bundled together. You only need to do Tasks 1 and 2 for Assignment 3 and Task 3 for Assignment 4.
These assignments focus on implementing a natural language inference system. In natural language inference, we receive a premise sentence and a hypothesis sentence and we must say whether we can infer the premise from the hypothesis. For instance, if (1) were our premise and (2) were our hypothesis, our system should respond yes.
Every firm polled saw costs grow more than expected, even after adjusting for inflation.
Every big company in the poll reported cost increases.
In MacCartney & Manning 2009 (henceforth, M&M), you read about one sort of system for doing this: a natural logic system. This system works by (i) obtaining an edit path from the premise and the hypothesis; (ii) mapping that edit path into an inference path; (iii) computing the join of the inferences in this path to obtain a relation between the premise and the hypothesis; and (iv) checking whether there is a forward entailment relation between the premise and the hypothesis.
The definition of the relations is given in Table 2 of the paper.
≡ [ ] ^ | u #
≡ ≡ [ ] ^ | u #
[ [ [ ≡[]|# | | [^|u# [|#
] ] ≡[]u# ] u ]^|u# u ]u#
^ ^ u | ≡ ] [ #
| | [^|u# | [ ≡[]|# [ [|#
u u u ]^|u# ] ] ≡[]u# ]u#
# # [u# ]|# # ]|# [u#
In Tasks 1 and 2, you will be developing the core of this system, using the minimum edit distance-based edit paths we developed in class and assuming default inference relations associated with each atomic edit operation (as discussed in Section 4 of M&M). In Tasks 3 and 4, you will enrich this system with lexical relation information from WordNet (Task 3) and with more intelligent handling of inferences associated with certain environments (as discussed in Section 5 of M&M). You will then test the system on the classic FraCaS dataset.
Task 1
Lines: 4
Define the __add__ magic method for the Inference class below. This method should use join_table (defined above) to produce a set of Inferences by joining two inferences—e.g. animal \(\sqsupset\) dog \(\bowtie\) dog \(\sqsupset\) greyhound = {animal \(\sqsupset\) greyhound}. __add__ must return a set because, as M&M discussed, the result of joining two relations can result in indeterminacy. (In their implementation, M&M actually treat all such indetrminate joins as #. We will not do that here, since it is useful to see why they do that.)
Importantly, note that __add__ should not be symmetric for the same reason joins are not: animal \(\sqsupset\) dog \(\bowtie\) dog \(\sqsubset\) mammal = {animal \(\equiv\) mammal, animal \(\sqsupset\) mammal, animal \(\sqsubset\) mammal, animal \(\smile\) mammal, animal \(\#\) mammal}, but dog \(\sqsubset\) mammal \(\bowtie\) animal \(\sqsupset\) dog isn’t even a licit join.
class Inference:'''An inference from one linguistic expression to another Parameters ---------- premise The premise in the relation hypothesis The hypothesis in the relation relation The relation '''def__init__(self, premise: list[str], hypothesis: list[str], relation: str):if relation notin relations:raiseValueError(f'relation must be in {relations}')self.premise = premiseself.hypothesis = hypothesisself.relation = relationdef__repr__(self):return' '.join(self.premise) +' '+self.relation +' '+' '.join(self.hypothesis)def__hash__(self):returnhash((tuple(self.premise), tuple(self.hypothesis), self.relation))def__add__(self, other: 'Inference') ->set['Inference']:raiseNotImplementedErrordef__eq__(self, other: 'Inference') ->bool:return (self.premise == other.premise) &\ (self.hypothesis == other.hypothesis) &\ (self.relation == other.relation)
Test your implementation of Inference.__add__ using the Editor subclasses below.
from abc import ABCfrom typing import Tuple, Optionalclass Editor(ABC):def__init__(self, *args):raiseNotImplementedErrordef__call__(self, input: list[str], idx: int) -> Inference:raiseNotImplementedError@propertydefinput(self):returnself._input@propertydef output(self):returnself._outputclass Substitution(Editor):"""A substitution editor Parameters ---------- input The string in the input to replace output The string to replace the input string with relation The inference relation that results """ default_relation =Nonedef__init__(self, input: str, output: str, relation: str):self._input =inputself._output = outputself._relation = relationdef__repr__(self):returnf'<SUB "{self._output}" for "{self._input}" resulting in {self._relation}>'def__call__(self, input: list[str], idx: int) -> Inference:"""Substitute input for output at location idx"""ifinput[idx] !=self._input:raiseValueError(f'SUB "{self._input}" -> "{self._output}" at {idx} 'f'cannot be applied to {input}') output =input[:idx] + [self._output] +input[(idx+1):]return Inference(input, output, self._relation)class Deletion(Editor):"""A deletion editor Parameters ---------- input The string in the input to delete relation The inference relation that results """def__init__(self, input: str, relation: str='['):self._input =inputself._relation = relationdef__repr__(self):returnf'<DEL "{self._input}" resulting in {self._relation}>'def__call__(self, input: list[str], idx: int) -> Inference:"""Substitute input for output at location idx"""ifinput[idx] !=self._input:raiseValueError(f'DEL "{self._input}" at {idx} 'f'cannot be applied to {input}') output =input[:idx] +input[(idx+1):]return Inference(input, output, self._relation)class Insertion(Editor):"""An insertion editor Parameters ---------- input The string to insert into the output relation The inference relation that results """def__init__(self, output: str, relation: str=']'):self._output = outputself._relation = relationdef__repr__(self):returnf'<INS "{self._output}" resulting in {self._relation}>'def__call__(self, input: list[str], idx: int) -> Inference:"""Substitute input for output at location idx""" output =input[:idx] + [self._output] +input[idx:]return Inference(input, output, self._relation)
These subclasses are initialized with input and/or output strings and a relation. For instance, “brindle” and “fawn” are two different colorings of greyhounds—no greyhound is both brindle and fawn—and so they are in the | relation. Each is at least a subsective modifier (all brindle greyhounds are greyhounds), so if we delete one, we obtain a \(\sqsubset\) relation, and if we insert one, we get a \(\sqsupset\) relation (the default relations for deletion and insertion, as discussed in M&M).
(<SUB "fawn" for "brindle" resulting in |>,
<DEL "brindle" resulting in [>,
<INS "brindle" resulting in ]>)
Note that not all insertions or deletions of adjectives will be associated with \(\sqsubset\) or \(\sqsupset\): privative adjectives like “fake” will introduce a \(|\): fake greyhounds are not greyhounds (fake greyhound \(|\) greyhound) and greyhounds are not fake greyhounds (greyhound \(|\) fake greyhound).
Use the following four sentences to write your tests. These tests should involve applying an edit \(e_1\) to sentence \(s_i\) to yield sentence \(e_1(s_i)\), then applying an edit \(e_2\) to \(e_1(s_i)\) to yield sentence. You should then combine the inferences associated with \(e_1\) and \(e_2\) using your Inference.__add__ and check that it is correct. Make sure to test at least one case where the result should be a non-singleton set of inferences.
We’ll need a way to store collections of editors and, crucially, make new default ones when needed. The EditorLibrary class provides a convenient way to store and retrieve editors (like substitutions, deletions, and insertions). When we try to get an editor that isn’t in the library, it automatically creates a default one - substitutions get a # relation (since we don’t have a default for them), while deletions and insertions get the default behavior defined by MacCartney & Manning.
We don’t want to have to hand-compute the edits that are required to convert one sentence into another. Instead, we will use a modified form of the StringEdit class we developed in class. What we need in particular are the edit paths that that class produces.
First, we’ll define a class for representing and manipulating edit paths. One important thing we want this class to do is to convert the edit path into a list of editors, for which we need to have a way to look up the editor for a given edit type and parameters given an EditorLibrary.
EditLocation =intEditorParameters =tuple[EditorInputOutput, EditLocation]class EditPath:"""Class for representing and manipulating sequences of text edits. Parameters ---------- edits List of tuples containing edit type and parameters for each edit operation. """def__init__(self, edits: list[tuple[EditorType, EditorParameters]]):self.edits_unshifted = editsself.edits =self._shift_indices(edits)def__repr__(self) ->str:returnstr(self.edits)def__call__(self, input_text: list[str]) ->list[str]:"""Apply the edit path to transform the input text. Parameters ---------- input_text : list[str] The input text to transform Returns ------- list[str] The transformed text after applying all edits """ current_text = input_text.copy()for edit_type, (edit, idx) inself.edits:if edit_type =='substitute': old_word, new_word = editif old_word != current_text[idx]:raiseValueError(f'Substitution {old_word} -> {new_word} at {idx} 'f'cannot be applied to {current_text}' ) current_text[idx] = new_wordelif edit_type =='delete': current_text.pop(idx)elif edit_type =='insert': current_text.insert(idx, edit)return current_textdef _shift_indices(self, edit_path: list[tuple[EditorType, EditorParameters]]) ->list[tuple[EditorType, EditorParameters]]:"""Adjust indices of edits to account for previous insertions and deletions. The edit sequence output by the `StringEdit` class is always relativized to the original string. But if we are applying the edits in sequence, the string that we apply subsequent edits to will differ from the original string, so we need to shift the indices of the subsequent edits to account for the previous edits. Parameters ---------- edit_path : list[tuple[EditorType, EditorParameters]] Original list of edits with unadjusted indices. Returns ------- list[tuple[EditorType, EditorParameters]] List of edits with indices shifted to account for previous edits. """ edit_path_shifted = []# track cumulative index shifts from insertions/deletions# we always need to shift back by 1 to account for the fact that# the original edit sequence is 1-indexed (due to the sentinel) shifts = [(0, -1)]for i, (edit_type, (edit, idx)) inenumerate(edit_path): original_idx = idx# apply all previous shifts to current indexfor j, s in shifts: idx = idx + s if idx >= j else idx if edit_type =='substitute':# if the substitution is the same element, we don't need to# record the edit at all because it doesn't change the stringif edit[0] == edit[1]:continueelse:# substitution does not shift the index because # we're simply replacing one element with another shifts.append((idx, 0))elif edit_type =='delete':# deletion shifts the index back by 1 because # they remove an element shifts.append((idx, -1))elif edit_type =='insert':# insertion shifts the index forward by 1 because # they add an element shifts.append((idx, 1)) next_edit_type, (_, next_idx) = edit_path[i+1]# ensures that the inserted element does not get inserted out of # order with an immediately following substitution if next_edit_type !='substitute'or next_idx != original_idx: idx +=1 edit_path_shifted.append((edit_type, (edit, idx)))return edit_path_shifteddef to_editors(self, library: EditorLibrary) ->list[tuple[EditLocation, Editor]]:"""Convert edit path to a sequence of editors using the provided library. Parameters ---------- library : EditorLibrary Dictionary mapping edit types and parameters to Editor instances. The outer dictionary maps edit types (e.g. 'substitute', 'delete', 'insert') to inner dictionaries that map edit parameters to Editor instances. Returns ------- list[tuple[EditLocation, Editor]] List of tuples containing the edit location and the editor for each edit in the path. """return [ (edit_idx, library[edit_type, edit]) for edit_type, (edit, edit_idx) inself.edits ]
Next, we’ll define a class for computing edit distances, alignments, and edit paths between strings we used in class, with some slight updates for the current assignment.
import numpy as npAlignment =list[tuple[int, int]]class StringEdit:"""Class for computing edit distances, alignments, and edit paths between strings. This class implements the Wagner-Fisher algorithm for computing minimum edit distance between sequences, along with the corresponding alignments and edit paths. Parameters ---------- insertion_cost : float, default=1.0 Cost of inserting a character deletion_cost : float, default=1.0 Cost of deleting a character substitution_cost : float or None, default=None Cost of substituting a character. If None, defaults to insertion_cost + deletion_cost """def__init__(self, insertion_cost: float=1., deletion_cost: float=1., substitution_cost: float|None=None):self._insertion_cost = insertion_costself._deletion_cost = deletion_costif substitution_cost isNone:self._substitution_cost = insertion_cost + deletion_costelse:self._substitution_cost = substitution_costdef__call__(self, source: list[str], target: list[str], only_distance: bool=False) ->float|tuple[float, Alignment, list[EditPath]]:returnself._wagner_fisher(source, target, only_distance)def _wagner_fisher(self, source: list[str], target: list[str], only_distance: bool) ->float|tuple[float, Alignment, list[EditPath]]:"""Compute minimum edit distance, alignment, and edit sequence using Wagner-Fisher algorithm. Parameters ---------- source : list[str] Source sequence target : list[str] Target sequence only_distance : bool If True, return only the edit distance Returns ------- float | tuple[float, Alignment, list[EditPath]] If only_distance is True, returns just the edit distance. Otherwise returns a tuple of (distance, alignment, edit_paths) """ n, m =len(source), len(target) source, target =self._add_sentinel(source, target)# initialize matrices for dynamic programming distance = np.zeros([n+1, m+1], dtype=float) pointers = np.zeros([n+1, m+1], dtype=list) edits = np.zeros([n+1, m+1], dtype=list) pointers[0,0] = [] edits[0,0] = []# initialize first column (deletions)for i inrange(1,n+1): distance[i,0] = distance[i-1,0]+self._deletion_cost pointers[i,0] = [(i-1,0)] edits[i,0] = [('delete', (source[i], i))]# initialize first row (insertions)for j inrange(1,m+1): distance[0,j] = distance[0,j-1]+self._insertion_cost pointers[0,j] = [(0,j-1)] edits[0,j] = [('insert', (target[j], j))]# fill in the rest of the matricesfor i inrange(1,n+1):for j inrange(1,m+1):if source[i] == target[j]: substitution_cost =0.else: substitution_cost =self._substitution_cost costs = np.array([distance[i-1,j]+self._deletion_cost, distance[i-1,j-1]+substitution_cost, distance[i,j-1]+self._insertion_cost]) distance[i,j] = costs.min() best_edits = np.where(costs==distance[i,j])[0] indices = [(i-1,j), (i-1,j-1), (i,j-1)] pointers[i,j] = [indices[k] for k in best_edits] edit_types =list(zip(["delete", "substitute", "insert"], [(source[i], i), ((source[i], target[j]), i), (target[j], i)])) edits[i,j] = [edit_types[k] for k in best_edits]if only_distance:return distance[n,m] pointer_backtrace, edit_backtrace =self._construct_backtrace(pointers, edits, n, m) edit_paths = [EditPath(bt) for bt in edit_backtrace]return distance[n,m], pointer_backtrace, edit_pathsdef _construct_backtrace(self, pointers: np.ndarray, edits: np.ndarray, n: int, m: int) ->tuple[list[list[tuple[int, int]]], list[list[tuple[str, tuple[str, int]]]]]:"""Construct all possible backtraces through the dynamic programming matrix. Parameters ---------- pointers : np.ndarray Matrix of pointers to previous cells edits : np.ndarray Matrix of edit operations n : int Length of source sequence m : int Length of target sequence Returns ------- tuple[list[list[tuple[int, int]]], list[list[tuple[str, tuple[str, int]]]]] Returns (pointer_backtraces, edit_backtraces) """ stack = [([(n,m)], [])] complete_pointer_backtraces = [] complete_edit_backtraces = []while stack: current_pointer_path, current_edit_path = stack.pop() current_pos = current_pointer_path[-1]if current_pos == (0,0): complete_pointer_backtraces.append(current_pointer_path[::-1]) complete_edit_backtraces.append(current_edit_path[::-1])continuefor next_pos, edit inzip(pointers[current_pos], edits[current_pos]): new_pointer_path = current_pointer_path + [next_pos] new_edit_path = current_edit_path + [edit] stack.append((new_pointer_path, new_edit_path))return complete_pointer_backtraces, complete_edit_backtracesdef _add_sentinel(self, source: str|list|tuple, target: str|list|tuple ) ->tuple[str|list|tuple, str|list|tuple]:"""Add sentinel symbols to beginning of sequences. Parameters ---------- source Source sequence target Target sequence Returns ------- tuple[str | list | tuple, str | list | tuple] Source and target with added sentinels Raises ------ ValueError If source or target are not str, list, or tuple """ifisinstance(source, str): source ='#'+sourceelifisinstance(source, list): source = ['#'] + sourceelifisinstance(source, tuple): source = ('#',) + sourceelse:raiseValueError('source must be str, list, or tuple')ifisinstance(target, str): target ='#'+ targetelifisinstance(target, list): target = ['#'] + targetelifisinstance(target, tuple): target = ('#',) + targetelse:raiseValueError('target must be str, list, or tuple')return source, target
In the original implementation, the edit path indexed into the source string. This made sense at the time because we wanted to know which words, relative to their original position in the string, are operated on by an edit. It’s problematic for current purposes, because once we compute insertions and deletions, the position of later insertions or deletions change. The implementation below now corrects for this, but just make sure you’re taking into account that the order of edits matters for this reason.
editdist = StringEdit(1, 1, 1)dist, align, edits = editdist(test_sentence1, test_sentence2)edit_path_is_correct =all( e(test_sentence1) == test_sentence2 for e in edits)print('Source: ', test_sentence1)print('Target: ', test_sentence2)print('Pointer path:', align)print('Edit path:', edits)print("Edited:", [e(test_sentence1) for e in edits])print("Edit path is correct:", edit_path_is_correct)
dist, align, edits = editdist(test_sentence2, test_sentence1)edit_path_is_correct =all( e(test_sentence2) == test_sentence1 for e in edits)print('Source: ', test_sentence2)print('Target: ', test_sentence1)print('Pointer path:', align)print('Edit path:', edits)print("Edited:", [e(test_sentence2) for e in edits])print("Edit path is correct:", edit_path_is_correct)
dist, align, edits = editdist(test_sentence1_prime2, test_sentence2)edit_path_is_correct =all( e(test_sentence1_prime2) == test_sentence2 for e in edits)print('Source: ', test_sentence1_prime2)print('Target: ', test_sentence2)print('Pointer path:', align)print('Edit path:', edits)print("Edit path is correct:", all(e(test_sentence1_prime2) == test_sentence2 for e in edits))
Implement the __call__ method for the NaturalLogic class. This should take a premise sentence and a hypothesis sentence, and it should produce the paths of inferences (computed from the paths of edits) that take you from premise to hypothesis.
Each path should be a list of inferences that result from cumulatively composing the inferences associated with each edit in the path. It should not be a path of local inferences. That is, it should not be a list of inferences that result from each edit in an edit path, but rather a list of inferences that result from composing those edits using Inference.__add__.
You will not be using EditPath.__call__ in any way. That method is implemented to demonstrate how we should apply edit paths to strings. You should instead be using EditPath.to_editors to get the list of editors that result from the edit path, and then you should use those editors to compute the local inferences (again, the inferences that result from applying each editor in sequence).
InferencePath =tuple[Inference]class NaturalLogic:"""Class for performing natural logic inference between sentences. This class implements natural logic inference by finding edit paths between sentences and composing the inferences associated with each edit. It uses an editor library that maps edit operations (substitutions, deletions, insertions) to Editor instances that specify the inference relations for those edits. Parameters ---------- editor_library : EditorLibrary, optional Dictionary mapping edit types and parameters to Editor instances. The outer dictionary maps edit types (e.g. 'substitute', 'delete', 'insert') to inner dictionaries that map edit parameters to Editor instances. Defaults to an empty EditorLibrary. Attributes ---------- EDIT : StringEdit StringEdit instance used for computing edit distances and paths between strings, with default costs of 1 for substitution, deletion, and insertion. _editor_library : EditorLibrary The editor library containing the inference rules for different edits. """ EDIT = StringEdit(1, 1, 1)def__init__(self, editor_library: EditorLibrary=EditorLibrary()):self._editor_library = editor_librarydef__getitem__(self, key: tuple[EditorType, EditorInputOutput]):returnself._editor_library[key]def__call__(self, premise: list[str], hypothesis: list[str]) ->set[InferencePath]:"""Perform natural logic inference between a premise and hypothesis sentence. This method computes the possible edit paths between the premise and hypothesis, and then composes the inferences associated with each edit to yield all possible inference paths implied by the edit paths. Parameters ---------- premise : list[str] The premise sentence to infer from. hypothesis : list[str] The hypothesis sentence to infer to. Returns ------- global_inference_paths : list[InferencePath] A list of inference paths representing the paths of cumulatively composed inferences from the premise to the hypothesis implied by the edit path. """raiseNotImplementedError
Implement tests using the four test sentences above. (Ignore my modified versions of these sentences.) For now, you can just assume that the editor library contains the editors defined for Task 1. (We don’t need to explicitly specify any insertions that result in \(\sqsupset\) or deletions that result in \(\sqsubset\), since those are added by default by NaturalLogic.add_editor.) In Task 3, we will expand the library using WordNet.
For the remainder of the assignment (Tasks 3 and 4), we will evaluate our NaturalLogic implementation using the FraCaS textual inference test suite. FraCaS is shipped as XML.
from bs4 import BeautifulSoup, Tagclass Fracas:"""Corpus reader for the FraCaS textual inference problem set"""def__init__(self, root: str="fracas.xml"):withopen(root) as fp:self._data = BeautifulSoup(fp, 'lxml')self._construct_problem_generator()def__iter__(self):returnselfdef__next__(self):returnnext(self._problem_generator)def__repr__(self):returnself._data.comment.stringdef _construct_problem_generator(self):for problem inself.problems:yield problem@propertydef problems(self):return [FracasProblem(problem) for problem inself._data.find_all('problem')]class FracasProblem:"""A FraCaS problem""" problem_type_map = {'001'}def__init__(self, problem: Tag):self.id= problem.get('id')self.answer = problem.get('fracas_answer')self.premise = problem.p.string.strip()self.question = problem.q.string.strip()self.hypothesis = problem.h.string.strip()def__repr__(self):return (f"id: {self.id}"f"\n\npremise: {self.premise}"f"\nquestion: {self.question}"f"\nhypothesis: {self.hypothesis}"f"\n\nanswer: {self.answer}")
fracas = Fracas()fracas
Since the sentences are just raw strings, to get them in the form of a list of strings, you will need a tokenizer. I would suggest using the one available in the stanza package. For our purposes, it is also simpler to use the lemma, rather than the token itself, because your WordNet editor library won’t handle inflectional morphology (unless you explicitly engineered it to).
To use this dataset to test your NaturalLogic implementation, you will need to convert the inference produced by __call__ into a “yes”, “no”, or “don’t know” answer. (Don’t worry about any items not labeled with one of these three. This will require you to define a mapping from inference types to answers. You should then compute the accuracy, precision, recall, and F1 of your system.
Each of these metrics can be defined in terms of…
The true positive count for class \(c\): \[\mathrm{tp}(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c) = |\{i\;:\;y^\mathrm{test}_i = \hat{y}^\mathrm{test}_i = c\}|\]
The true negative count for class \(c\): \[\mathrm{tn}(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c) = |\{i\;:\;y^\mathrm{test}_i \neq c \land \hat{y}^\mathrm{test}_i \neq c\}|\]
The false positve count for class \(c\): \[\mathrm{fp}(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c) = |\{i\;:\;y^\mathrm{test}_i \neq c \land \hat{y}^\mathrm{test}_i = c\}|\]
The false negative count for class \(c\): \[\mathrm{fn}(\hat{\mathbf{y}}^\mathrm{test}_i, \mathbf{y}^\mathrm{test}, c) = |\{i\;:\;y^\mathrm{test}_i = c \land \hat{y}^\mathrm{test}_i \neq c\}|\]
…where the class is “yes”, “no”, or “unknown”; \(y^\mathrm{test}_i\) is the true label for item \(i\) (found in FraCaS) and \(\hat{y}^\mathrm{test}_i\) is your system’s prediction for the class of item \(i\). (Ignore cases where the class is not one of these three.)
Accuracy
For what proportion of the test data \(\{(x^\mathrm{test}_{1}, y^\mathrm{test}_1), ..., (x^\mathrm{test}_N, y^\mathrm{test}_N)\}\) does the model’s predicted class \(f(x^\mathrm{test}_i) = \hat{y}^\mathrm{test}_i\) for an item match the ground truth class for that item?
class NaturalLogic(NaturalLogic):def load_wordnet(self):raiseNotImplementedError@classmethoddef from_wordnet(cls): natlog = cls() natlog.load_wordnet()return natlog
Test your new library by writing examples that require knowledge of hypernymy, hyponymy, and antonymy to correctly handle.
# write tests here
Evaluate your new library on FraCaS by computing precision, recall, and F1 for the items that are either labeled “yes”, “no”, or “don’t know”. Remember that this is going to require you to define a way of mapping inference types to answers.
# write evaluation here
These numbers will be bad. The point is to see that handling even the apparently simple cases in FraCaS is very difficult, even with a fairly extensive edit library.
Find at least three examples you get wrong. For each example, identify where in the edit sequence the problem occurs and explain how this issue might be fixed on the basis of what you read in MacCartney and Manning. (Hint: look at how they model quantifiers and negation.) If the MacCartney and Manning approach is not fully sufficient to fix the error, identify what you would need to do to extend it to handle the problem. (Hint: this will often involve changes to your editor library. What would those changes need to look like?)