Instructor: Robin Jia
Note: As per University policy, class will be held remotely for the first two weeks of the semester. Please use the Zoom link on Blackboard.
In natural language processing (NLP), we set out to solve language-related tasks (e.g., machine translation, question answering) but often evaluate on narrow, in-distribution test datasets. With recent advances in deep learning, modern systems have achieved high accuracy on many canonical datasets, but still seem far from solving general tasks. In this class, we will survey recent research on robustness and generalization that studies this gap between in-distribution accuracy and task competency through out-of-distribution settings. We will learn about different settings in which NLP systems often fail to generalize well, including adversarial perturbations, settings that require compositional reasoning, and domain transfer. We will also learn about how average accuracy can mask disparate performance across subpopulations, and how this can lead to undesirable consequences. Across these topics, we will cover methods both for measuring these robustness and generalization issues and ways that we can improve model robustness and generalization.
Logistics
- Office hours: Tuesdays 4-5pm
in SAL 236on Zoom (link on Blackboard/Slack), or by appointment. - Assignments: Submit assignments on Blackboard. Feedback will also be provided on Blackboard.
- Discussion: Please use the official course Slack channel for general questions. Email me (please put “CSCI 699” in the subject line) or come to office hours to discuss individual matters, such as project ideas or grading.
Prerequisites
Familiarity with natural language processing and/or machine learning at the level of CSCI 544 (Applied Natural Language Processing) or CSCI 567 (Machine learning). Please email me if you want to enroll but are unsure if you meet the prerequisites.
For those without prior NLP experience, I recommend going through Lena Voita’s NLP Course For You, which provides a concise and interactive introduction to modern NLP. For a more extensive introduction to NLP, I recommend Jurafsky and Martin’s Speech and Language Processing, whose third edition is available online and is very current.
Schedule
Date | Topic | Reading(s) | Additional reading(s) | Assignments |
---|---|---|---|---|
Mon Jan 10 | Introduction | |||
Wed Jan 12 | The Turing Test: Lecture | Turing 1950, Shieber 2016 | Shieber et al. 2004 | |
Mon Jan 17 | No class (Martin Luther King Day) | |||
Wed Jan 19 | Adversarial examples I: Lecture | Goodfellow et al. 2014, Adversarial ML Tutorial | ||
Mon Jan 24 | Adversarial examples II: Adversarial Perturbations | Pruthi et al. 2019, Jones et al. 2020 | Ribeiro et al. 2018, Jia et al. 2019, Huang et al. 2019 | |
Wed Jan 26 | Adversarial examples III: Adversarial triggers | Wallace et al. 2019, Atanasova et al. 2020 | ||
Mon Jan 31 | Adversarial examples IV: Model stealing, data poisoning | Krishna et al. 2020, Wallace et al. 2021 | Wallace et al. 2020 | |
Wed Feb 2 | Domain adaptation I: Lecture | Ramponi and Plank, 2020 | ||
Mon Feb 7 | Domain adaptation II: Unsupervised domain adaptation and pretraining | Blitzer et al. 2006, Han and Eisenstein 2019 | Gururangan et al. 2020 | |
Wed Feb 9 | Domain adaptation III: Fair generalization tasks, empirical trends | Geiger et al. 2019, Miller et al. 2020 | Fisch et al. 2019, Taori et al. 2021 | Project proposal due Feb 11 |
Mon Feb 14 | Spurious correlations I: Lecture | Imbens and Rubin, 2015, Imbens 2020, Feder et al., 2021 | ||
Wed Feb 16 | Spurious correlations II: Dataset biases | Schwartz et al. 2017, Gururangan et al. 2018, Gardner et al. 2021 | Poliak et al. 2018, Kaushik et al. 2018, Schuster et al. 2019, Ribeiro et al. 2020 | |
Mon Feb 21 | No class (Presidents’ Day) | |||
Wed Feb 23 | Spurious correlations III: Training-time strategies | Clark et al. 2019, Utama et al. 2020 | Clark et al. 2020, Tu et al. 2020 | |
Mon Feb 28 | Spurious correlations IV: Counterfactual data augmentation | Kaushik et al. 2019, Joshi and He 2021 | Gardner et al. 2020, Ross et al. 2021, Sen et al. 2021 | |
Wed Mar 2 | Fairness I: Lecture | Barocas, Hardt, and Narayanan | ||
Mon Mar 7 | Fairness II: Gender and race bias in NLP systems | Zhao et al. 2018, Rudinger et al. 2018, Sap et al. 2019 | Blodgett et al. 2020, Field et al. 2021 | |
Wed Mar 9 | Fairness III: Bias in representations | Goldfarb-Tarrant et al. 2021, Vig et al. 2020 | Caliskan et al. 2017 | |
Mon Mar 14 | No class (Spring break) | |||
Wed Mar 16 | No class (Spring break) | |||
Mon Mar 21 | Fairness IV: Distributionally robust optimization | Hashimoto et al. 2018, Sagawa et al. 2020 | Oren et al. 2019, Michel et al. 2021 | |
Wed Mar 23 | Fairness V: Bias amplification | Zhao et al. 2017, Jia et al. 2020 | Wang et al. 2019 | Project progress report due Mar 25 |
Mon Mar 28 | Compositionality I: Lecture | Fodor and Pylyshyn 1988 | Coppock and Champollion, Szabó 2008 | |
Wed Mar 30 | Compositionality II: Measuring compositional behavior | Hupkes et al. 2020 | Lake and Baroni 2018, Kim and Linzen 2020, Dankers et al. 2021 | |
Mon Apr 4 | Compositionality III: Modeling choices | Herzig et al. 2021, Csordás et al. 2021 | Chen et al. 2020, Shaw et al. 2021, Furrer et al. 2021 | |
Wed Apr 6 | Dataset creation I: Adversarial data collection | Kaushik et al. 2021, Wallace et al. 2021 | Wallace et al. 2019, Kiela et al. 2021 | |
Mon Apr 11 | Dataset creation II: Adversarial filtering | Le Bras et al. 2020, Phang et al. 2021 | Swayamdipta et al. 2020 | |
Wed Apr 13 | Conclusion, Bonus topics | |||
Mon Apr 18 | Project presentations | |||
Wed Apr 20 | Project presentations | |||
Mon Apr 25 | Project presentations | |||
Wed Apr 27 | Project presentations | Project final report due May 6 |
Format
Class days marked as Introduction, Conclusion, or Lecture will be presentations by me. Other classes will be paper presentations and discussions led by 1-2 students. The expected format of these classes is:
- 60 minutes: Presentation on all papers
- 25 minutes: Small group discussion
- 25 minutes: Whole class discussion
Grading
Grades will be based on paper presentations (30%), discussion (10%), and a final project (60% total).
Paper presentations (30%). Students will be expected to present ~2 research papers (sometimes 3 short ones or 1 long one) and lead class discussion on these papers. The presentation should help everyone in the class understand these papers as well as relevant background material. The presenter should also prepare a few discussion questions to encourage discussion after the presentation. To help presenters prepare their presentations, each presentation day will also have an assigned proofreader. The presenter should send a draft of the presentation and discussion questions to the proofreader at least 48 hours in advance of the presentation, and the proofreader should give some feedback at least 24 hours in advance.
Paper discussion participation (10%). Students are expected to participate in class discussions. This includes asking questions during presentations as well as voicing opinions on discussion topics.
Final project (60% total). Students must complete a final research project on a topic related to the class. Projects may be conducted individually or in groups of two. This project is expected to include novel research on either evaluation methodology for identifying problems with models related to robustness, generalization, or fairness, or modeling innovations for improving robustness, generalization, fairness, or other related aspects of model behavior. Please come to office hours or email me if you have questions related to choosing a project direction.
Final project
The final project is worth 60% of the total grade. Points will be allocated as follows:
Project proposal (5%). Students should submit a ~2-page (minimum) proposal for their project by the end of Week 5 (February 11). The proposal should describe the goal of the project and include a survey of related work. When reading these proposals, I will be looking for the following:
- Do you have a clear plan for your final project?
- Clearly state your problem statement or goal, and how this relates to previously studied problems in the literature.
- Describe an idea for your method. It does not need to be guaranteed to work, but it should come with a clear plan of how you would carry this out.
- Describe what resources you will need (compute, data, models) and whether you have access to these.
- State why this project is relevant to the course themes (broadly construed), if not obvious
- Is there a reasonable chance this plan will succeed?
- Summarize what is known in the literature about this problem and about methods like the one you’ve proposed, and use this to argue that your method makes sense for this problem.
- Optionally, include results of preliminary experiments (not at all expected but helpful to judge likelihood of success)
Project progress report (10%). Students should submit a ~5-page progress report for their project by the end of Week 10 (March 25). This should once again describe the project’s goals (which may have changed since the proposal), initial results, and a concrete plan of what will be done for the final report. While the initial results need not be positive, students are expected to have made non-trivial implementation progress by this point. For parts of the report describing project goals and plans, the expectations are largely the same as for the proposal. In addition, I will be looking for the following:
- Why did you choose to do the experiments you did? What hypotheses are you testing?
- Technical detail about what experiments were conducted. The level of description should be sufficient for someone else to be able to reproduce your experiments.
- Analysis of results. What conclusions can you draw from these results? Or if they are inconclusive, what further experiments are needed so that you can draw some conclusions?
Project final presentation (20%). This will be a 20-30 minute presentation during the last two weeks of class. Students should describe the motivation for their work, relevant background material, and results. I encourage students to present both positive and negative results. There will also be some time for audience questions.
Project final report (25%). Students should submit a ~8-page final report detailing all aspects of their project (due May 6). The report should be structured like a conference paper. Parts of the proposal and progress report may be reused for the final report.
- Regarding structure: If you’re not sure what to do, I recommend looking back at some of the papers we read this semester from ACL/EMNLP and using that paper’s structure as a template. Broadly speaking, every paper should have an abstract followed by sections for Introduction, Problem Statement, Approach, Experiments, and Discussion. Related work can go a couple places, usually either after the introduction or mixed with Discussion. (My rough rule of thumb: put Related Work after the introduction if there are significant prerequisites to understanding the context of your paper that cannot be adequately summarized in the introduction. Otherwise, mix Related Work with Discussion, as the paper will flow better if it goes directly from Introduction to Problem Statement.) You should end with some sort of conclusion–it can be its own section or just the end of the Discussion, but it should wrap up and provide some forward looking thoughts.
- Similarly, use proper LaTeX formatting. Use the ACL LaTeX template linked below as your guide. One common thing I see is confusion between \citet{} and \citep{}. Use \citet{} whenever the work you are citing is playing the role of a noun in your sentence. If you’re saying something like, “As shown in Pruthi et al. (2019), BERT is vulnerable to adversarial typos,” that should be written with \citet{}.
- It is of course important to report experimental results, but it is equally important to analyze them. What conclusions can be drawn from them? What have we learned by doing these experiments? Don’t expect the reader to infer everything you want them to from your results table—it’s your job to tell the reader what your results mean.
- Negative results will not be penalized, but should be accompanied with detailed analysis of why the proposed method did not work as anticipated. For example, did you have an underlying hypothesis about why your method would work? If your method did not work, was it because that hypothesis was not true? What do the negative results teach us about NLP models that you did not anticipate?
All written project-related assignments should use the standard *ACL paper submission template (Log in to Overleaf and go to Menu -> Copy Project). All due dates are 11:59pm PST on Friday.
Late days
You are given 4 late days to use for the project proposal and progress report (no late days for the final report), to be used in integer amounts and distributed as you see fit. Additional late days will result in a deduction of 10% of the grade on the corresponding assignment per day.
Project resources
Google Colab provides free computational resources, though there are limits (e.g., jobs can only run for 12 hours at a time). See their FAQ for details.