This course covers foundational concepts in computational linguistics and is designed for students with a strong background in formal linguistic methods but little or no programming experience. Topics include introductory formal language theory and probability theory, finite state phonological and morphological analysis, generative and discriminative approaches to shallow syntactic parsing (e.g. part-of-speech tagging, chunking) and shallow semantic parsing (e.g. semantic role labeling), and bottom-up and top-down algorithms for syntactic and semantic parsing. Major focus is placed on deploying techniques used in computational linguistics to advance linguistic theory and developing students’ ability to implement these techniques.
|Classroom||513 Lattimore Hall|
|Time||Monday & Wednesday 10:25-11:40am|
|Instructor||Aaron Steven White|
|Office||511A Lattimore Hall|
|Office hours||Tuesday 9:30-10:30am & 12:30-1:30pm (or by appointment)|
|Office||4107 Wegmans Hall|
There will be a substantial amount of reading per week. A tentative schedule is provided below. This tentative schedule will almost certainly change as the semester goes forward. Consult the table below for the most current list.
Homeworks will be assigned weekly or biweekly throughout the semester, except during the first three weeks, during which there will be a short programming homework due every class. Most homeworks after the first three weeks of the course will involve both a programming component and a written component, though the relative balance between the two will vary. A template to guide you through both components will be provided through Overleaf.
All written work must be produced in LaTeX (using the provided template), and it must be submitted through Overleaf (where templates will be hosted). Do not hand in handwritten work or email me electronic documents. You will not receive credit for that assignment unless you subsequently turn the assignment in through Overleaf (subject to late penalties).
All coding work must be done in Python, and like written work, it must also be submitted through Overleaf. You can upload coding assignments to Overleaf by right- or CTRL-clicking on a directory in the project sidebar and clicking Upload from Computer. Do not submit coding work by email.
There will be a take-home midterm, which will tentatively be made available on Wednesday, Oct. 24 and will be due one week later (Wednesday, Oct. 31).
For LIN224 students, there will be a take-home final exam, which will be made available on Wednesday, Dec. 12 and will be due one week later (Wednesday, Dec. 19).
For LIN424 students, there will be a final paper, which will be due on Wednesday, Dec. 14. A 500-word prospectus for this paper will be due Monday, Nov. 12. More information will be made available closer to that date.
LIN224 students may opt to write a final paper in place of the final exam, but they should do so with the knowledge that this paper will be graded relative to a rubric designed for graduate student papers.
One major aim of this class is to familiarize you with the computational linguist’s tools of the trade. The two we will be working with are LaTeX, for all written work, and Python, for programming assignments.
You will be expected to learn LaTeX on your own. There are plenty of tutorials online for doing this, including a few targeted specifically at linguists. Please do not ask me if you can use a word processor instead. The answer will be no.
You will not be expected to learn Python on your own; we will spend the first two full weeks of class working through a Python tutorial. But because there is a lot of content to cover in this course, this section of the course will be relatively brief and fast-paced. If it turns out that you need extra help on some basic programming concept, it is your responsibility to seek out help as soon as that becomes clear to you. This is especially important because programming assignments will build in complexity, and so if you get stuck early, you have trouble throughout the rest of the course. Please do not wait to get help.
Note for Windows 10 users: Some Python packages that we will use have known issues on Windows. As such, you must install either the Ubuntu Linux subsystem or Docker and then install Python within the virtual machine you chose. This will require you to gain a basic competence in using the command line. You are responsible for building this competence if you do not already possess it. Please do not try to use a Windows-based Python installation. You will have issues.
The grading breakdown is: (bi)weekly assignments (50%), midterm (15%), final (25%), participation (10%). (Percentages represent percent of total grade.)
Assignments should be submitted by 11:59pm the day they are due. An automatic 10% deduction will be applied after this time, and you must explicitly notify both Venkat and me by email when you turn the late assignment in or you will not receive any credit.
Starting from the end of class, assignments will lose 10% per day late according to the UTC time-stamp of submission that Overleaf reports. Late assignments may not be turned in for credit after a week unless explicit permission was sought before the due date.
Students will not be penalized because of important civic, ethnic, family or religious obligations, or university service. You will be have a chance, whenever feasible, to make up within a reasonable time any assignment that is missed for these reasons. Absences for these reasons will count as excused for the sake of the participation grade. But it is your job to inform me of any expected absences in advance, as soon as possible.
All assignments and activities associated with this course must be performed in accordance with the University of Rochester’s Academic Honesty Policy. More information is available at: http://www.rochester.edu/college/honesty/.
Any student who needs special accommodations due to a disability should let me know privately, at the start of the semester.
|Aug.||29||Introduction to Computational Linguistics||-||-|
|Sept.||05||First steps in Python||Downey 2015, Ch. 1-4||HW0|
|Sept.||10||Control flow in Python||Downey 2015, Ch. 5-9||HW1|
|Sept.||12||Collections in Python I||Downey 2015, Ch. 10-12, 14||HW2|
|Sept.||17||Collections in Python II||-||-|
|Sept.||19||Classes in Python||Downey 2015, Ch. 15-18||-|
|Sept.||24||Working with text in Python||Jurafsky & Martin 2018, Ch. 2||HW3|
|Sept.||26||Finite state automata I||Sipser 2013, Ch. 1||-|
|Oct.||01||Finite state automata II||-||HW4|
|Oct.||03||Finite state transducers||Jurafsky & Martin 2009, Ch. 3||-|
|Oct.||08||Context free grammars||Sipser 2013, Ch. 2.1-2.3||-|
|Oct.||10||The CKY algorithm||Jurafsky & Martin 2009, Ch. 13.1-13.4.1||HW5|
|Oct.||17||The Earley algorithm||Jurafsky & Martin 2009, Ch. 13.4.2||-|
|Oct.||22||Shift-reduce parsing||Jurafsky & Martin 2018, Ch. 14.1-14.4||-|
|Oct.||24||Mildly context sensitive formalisms||Clark 2014||HW6|
|Oct.||29||Parsing MCFG/LCFRS||Kallmeyer 2013||-|
|Oct.||31||Basic probability theory I||Goldwater 2016, Sec. 1-4||MT|
|Nov.||05||Basic probability theory II||Goldwater 2016, Sec. 5||-|
|Nov.||07||N-gram models||Jurafsky & Martin 2018, Ch. 4||-|
|Nov.||12||Basic information theory||Manning & Schütze 1999 Ch. 2.2||(P)|
|Nov.||14||Collocation measures||Manning & Schütze 1999 Ch. 5||HW7|
|Nov.||19||Naive Bayes||Jurafsky & Martin 2018, Ch. 6||-|
|Nov.||21||Probabilistic Topic Models||Steyvers & Griffiths 2007||-|
|Nov.||26||Hidden Markov Models||Jurafsky & Martin 2018, Ch. 9.1-9.4||-|
|Nov.||28||The Forward-Backward algorithm I||Jurafsky & Martin 2018, Ch. 9.5||HW8|
|Dec.||03||The Forward-Backward algorithm II||-||-|
|Dec.||05||Probabilistic context free grammars||Jurafsky & Martin 2018, Ch. 13.1-13.7||-|
|Dec.||07||The Inside-Outside algorithm I||Manning & Schütze 1999, Ch. 11||HW9|
|Dec.||12||The Inside-Outside algorithm II||-||-|