Overview

Reading

Marcus, Santorini, and Marcinkiewicz (1993) on building the first large-scale constituency treebank: the Penn Treebank.

In this submodule, we’ll cover basic concepts relevant to how to work with annotated corpora—in particular, treebanks—by developing a system of classes for representing those corpora and their annotations from the ground up.

References

Marcus, Mitchell P., Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. “Building a Large Annotated Corpus of English: The Penn Treebank.” Edited by Julia Hirschberg. Computational Linguistics 19 (2): 313–30. https://aclanthology.org/J93-2004.