About

This site contains materials for a course on Representation Learning for Syntactic and Semantic Theory given by Aaron Steven White at the 2023 Linguistic Society of America Institute, held at the University of Massachusetts, Amherst from June 19–July 14, 2023.

About the course

Experimental methods and corpus annotation are becoming increasingly important tools in the development of syntactic and semantic theories. And while regression-based approaches to the analysis of experimental and corpus data are widely known, methods for inducing expressive syntactic and semantic representations from such data remain relatively underused. Such methods have only recently become feasible due to advances in machine learning and the availability of large-scale datasets of acceptability and inference judgments; and they hold promise because they allow theoreticians (i) to design analyses directly in terms of the theoretical constructs of interest and (ii) to synthesize multiple sources and types of data within a single model.

The broad area of machine learning that techniques for syntactic and semantic representation induction come from is known as representation learning; and while such techniques are now common in the natural language processing (NLP) literature, their use is largely confined either to models focused on particular NLP tasks, such as question answering or information extraction, or to ‘probing’ the representations of existing NLP models. As such, it remains difficult to see this literature’s relevance for theoreticians. This course aims to demonstrate that relevance by focusing on the use of representation learning for developing syntactic and semantic theories.

About the instructor

Aaron Steven White is an Associate Professor of Linguistics and Computer Science at the University of Rochester, where he directs the Formal and Computational Semantics lab (FACTS.lab). His research investigates the relationship between linguistic expressions and conceptual categories that undergird the human ability to convey information about possible past, present, and future configurations of things in the world.

In addition to being a principal investigator on numerous federally funded grants and contracts, White is the recipient of a National Science Foundation Faculty Early Career Development (CAREER) award. His work has appeared in a variety linguistics, cognitive science, and natural language processing venues, including Semantics & Pragmatics, Glossa, Language Acquisition, Cognitive Science, Cognitive Psychology, Transactions of the Association for Computational Linguistics, and Empirical Methods in Natural Language Processing.

About the site

The site itself is built using Quarto. The source files for this site are available on github at aaronstevenwhite/representation-learning-course. See Installation for information on how to run the code documented here.

Acknowledgments

The development of these course materials builds on collaborations between Aaron Steven White and a variety of other researchers:

  1. Module 1 of this course builds on unpublished collaborative research with Jon Sprouse. A version of this work was presented as a poster at WCCFL34.
  2. Module 2 builds on unpublished collaborative research with Julian Grove, who led the development of the models covered in that module.
  3. Module 3 builds on collaborative research with Kyle Rawlins as well as the rest of the MegaAttitude Project team.
  4. Module 4 builds on work with Kyle Rawlins and Ben Van Durme as well as the rest of the Decompositional Semantics Initiative team–with specific acknowledgment of Will Gantt and Elias Stengel-Eskin for their work on the decomp toolkit.

It was additionally supported by multiple National Science Foundation grants:

  1. The MegaAttitude Project: Investigating selection and polysemy at the scale of the lexicon (BCS-1748969/BCS-1749025)
  2. Computational Modeling of the Internal Structure of Events (BCS-2040831/BCS-2040820)
  3. The typology of subordinate clauses: A case study (BCS-2214933)
  4. CAREER: Logical Form Induction (BCS/IIS-2237175)

License Creative Commons License

Representation Learning for Syntactic and Semantic Theory by Aaron Steven White is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Based on a work at https://github.com/aaronstevenwhite/representation-learning-course.