The human ability to convey information about possible past, present, and future configurations of things in the world is undergirded by systematic relationships between intricately structured linguistic expressions and the dizzying array of conceptual categories available to humans. My research investigates (a) which of these conceptual categories can be related to which sorts of linguistic expressions, (b) what those relationships look like, and (c) how they can be leveraged for building better natural language understanding (NLU) systems.
My research program has two components: the design, implementation, and deployment of (i) precision instruments for measuring the distributional and inferential properties of linguistic expressions; and (ii) scalable systems for synthesizing those data to provide both scientific insights in natural language semantics and modular components of NLU systems. My current approach consists in two complementary projects: the MegaAttitude Project and the Decompositional Semantics Initiative.
The MegaAttitude Project aims to (a) test hypothesized generalizations about the relationship between the inferential characteristics and morphosyntactic distributions of predicates that combine with subordinate clauses and (b) develop computational models for discovering such generalizations and the underlying lexical semantic properties that drive them by exhaustively cataloguing the distributional and inferential characteristics of all such predicates in English.
Key papers: White and Rawlins 2016, 2018a,b, 2020; White 2019, 2021; Moon and White 2020; An and White 2020; Kim and White 2021; Kane et al. 2022
The Decompositional Semantics Initiative (Decomp) aims to capture a rich array of theoretically motivated semantic properties in context by annotating genre-diverse corpora. These annotations have been used both for testing hypotheses about the nature of thematic roles and event structure classes as well as for developing state-of-the-art syntactic and semantic parsers and natural language inference systems. I gave an interview about Decomp on the NLP Highlights podcast.
Key papers: White et al. 2016, 2017b, 2018b, 2020; Vashishtha et al. 2019; Govindarajan et al. 2019; Stengel-Eskin et al. 2020, 2021; Gantt et al. 2022
It is becoming increasingly clear that integrating data from targeted behavioral experiments with evidence from attested language use is integral for the development of broad-coverage syntactic and semantic theories. My goal in the coming years is to build on models of behavioral data developed under the MegaAttitude Project and annotated corpus data developed under Decomp to construct unified computational models that synthesize the two forms of data within a theoretically informed framework. I recently laid out a blueprint for how this might be done at the sentence level (Kim and White 2021), which I have begun to expand to the discourse/document level (Gantt et al. 2022).