Methodological Approach

This course is highly methodologically opinionated in taking a hypothesis-driven approach to representation learning, rather than the now common analysis-driven approach seen in much work at the intersection of computational linguistics and natural language processing (see Baroni 2022; Pavlick 2023 and references therein). Hypothesis-driven approaches to representation learning are distinguished from analysis-driven approaches in that they aim to finely delineate hypotheses about the nature of a phenomenon in terms of the constraints they place on the representations to be learned. In contrast, analysis-driven approaches aim to learn highly expressive representations and then extract generalizations about those representations post hoc.

This methodological distinction is roughly analogous to one observed in the theoretical syntax literature–a distinction classically exemplified by work in transformational grammar in the 1970s and 1980s. For background: transformational grammars are extremely expressive–generating the recursively enumerable languages (Peters and Ritchie 1973). But it is relatively well accepted that natural languages are a subset of a much smaller class of languages–itself a strict subset of the context sensitive languages (Joshi, Shanker, and Weir 1990). Insofar as one is merely interested in observational adequacy, there isn’t really a reason not to use a highly expressive formalism, like a transformational grammar; but insofar as one is interested in specifying “…the observed data…in terms of significant generalizations that express underlying regularities in the language” (Chomsky 1964, 63)–e.g. to obtain descriptive adequacy–then it is necessary to go beyond simply specifying an observationally adequate transformational grammar.

On the one hand, one might implement this idea by stating metaanalytical generalizations about the observationally adequate analyses in the too-expressive formalism, with the ultimate goal of reifying those generalizations as constraints on the formalism (see Chomsky 1973 et seq). This approach is similar to what I refer to above as analysis-driven representation learning.

On the other hand, one might attempt to take a more constrained formalism–e.g. some mildly context sensitive formalism, such as combinatory categorial grammars (Steedman 1996) or minimalist grammars (Stabler 1997)–and ask how well that formalism can cover the data. This approach is similar to what I refer to above as hypothesis-driven representation learning–the approach taken in this course.

References

Baroni, Marco. 2022. “On the Proper Role of Linguistically Oriented Deep Net Analysis in Linguistic Theorising.” In Algebraic Structures in Natural Language. CRC Press.
Chomsky, Noam. 1964. “Current Issues in Linguistic Theory.” Edited by J. Fodor and J. Katz. The Structure of Language. New York: Prentice Hall.
———. 1973. “Conditions on Transformations.” In A Festschrift for Morris Halle, edited by S. Anderson and P. Kiparsky, 232–86. New York: Holt, Rinehart, & Winston.
Joshi, Aravind, Vijay K. Shanker, and David Weir. 1990. “The Convergence of Mildly Context-Sensitive Grammar Formalisms.” MS-CIS-90-01. Philadelphia: Department of Computer; Information Science, University of Pennsylvania. https://repository.upenn.edu/cis_reports/539.
Pavlick, Ellie. 2023. “Symbols and Grounding in Large Language Models.” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 381 (2251). https://doi.org/10.1098/rsta.2022.0041.
Peters, P. Stanley, and R. W. Ritchie. 1973. “On the Generative Power of Transformational Grammars.” Information Sciences 6 (January): 49–83. https://doi.org/10.1016/0020-0255(73)90027-3.
Stabler, Edward. 1997. “Derivational Minimalism.” In Logical Aspects of Computational Linguistics, edited by Christian Retoré, 68–95. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer. https://doi.org/10.1007/BFb0052152.
Steedman, Mark. 1996. Surface Structure and Interpretation. Cambridge, MA: MIT Press.