Large language models (LLMs) are modern engineering marvels that have revolutionized natural language processing. Despite this success, there are still many open questions surrounding how and why LLMs work. This class will cover current research that considers LLMs as scientific objects of study. We will consider three complementary perspectives on understanding LLMs. First, we will analyze the internal operations of LLMs to shed light on how their predictions are computed. Second, we will study LLMs as black boxes and aim to discover principles that govern their behavior. Finally, we will survey external data-related factors that shape the general tendencies of LLMs. By understanding these different perspectives, students will develop a fuller understanding of modern research on LLMs.

Course Staff

Robin Jia
Robin Jia
Instructor
Office hours: Thursday 11am-12pm
Location: SAL 236

Johnny Wei
Johnny Wei
Teaching Assistant
Office hours: Monday and Friday 3-4pm
Location: MCB Lobby

Logistics

Prerequisites

This course is designed for students currently pursuing research on large language models. Students are expected to be comfortable reading and presenting NLP research papers. In terms of coursework, familiarity with natural language processing at the level of CSCI 544 (Applied Natural Language Processing) is expected.

The course’s recommended NLP textbook is Jurafsky and Martin’s Speech and Language Processing, whose third edition is available online and is very current.

Schedule

Date Topic Papers Deadlines
Mon Aug 26 Introduction. What can we learn from the sciences? (slides)    
  Unit 0: The nuts and bolts of LLMs    
Wed Aug 28 Transformers (lecture notes, in-class lecture) Vaswani et al., 2017. Attention is All You Need.  
Mon Sep 2 No class (Labor Day)    
Wed Sep 4 Deep dive on Llama-3, Part 1: Architecture and pre-training Llama Team, 2024. The Llama 3 Herd of Models.
Su et al., 2021. RoFormer: Enhanced Transformer with Rotary Position Embedding.
Shazeer, 2020. GLU Variants Improve Transformer.
 
Mon Sep 9 Deep dive on Llama-3, Part 2: Post-training and results Rafailov et al., 2023. Direct Preference Optimization: Your Language Model is Secretly a Reward Model  
  Unit 1: Internals of LLMs    
Wed Sep 11 Do language models do state tracking? Main: Li et al., 2021. Implicit Representations of Meaning in Neural Language Models.
Main: Kim and Schuster, 2023. Entity Tracking in Language Models.
Background: Hewitt and Liang, 2019. Designing and Interpreting Probes with Control Tasks.
 
Mon Sep 16 Interpreting hidden states as vocabulary distributions Main: Geva et al., 2022. Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space.
Main: Ghandeharioun et al., 2024. Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models.
 
Wed Sep 18 Localization of factual knowledge Main: Meng et al., 2022. Locating and Editing Factual Associations in GPT.
Main: Geva et al., 2023. Dissecting Recall of Factual Associations in Auto-Regressive Language Models.
Bonus: Jiang et al., 2024. On Large Language Models’ Hallucination with Regard to Known Facts.
 
Mon Sep 23 Finding the “truth” within language models Main: Burns et al., 2023. Discovering Latent Knowledge in Language Models Without Supervision.
Main: Li et al., 2023. Inference-Time Intervention: Eliciting Truthful Answers from a Language Model.
 
Wed Sep 25 Sparse autoencoders Main: Cunningham et al., 2023. Sparse Autoencoders Find Highly Interpretable Features in Language Models.
Main: Templeton et al., 2024. Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet.
Background: Lee et al., 2007. Sparse deep belief net model for visual area V2.
Project proposal due Fri Sep 27
Mon Sep 30 Circuit analysis and patching Main: Wang et al., 2022. Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small.
Main: Merullo et al., 2023. Circuit Component Reuse Across Tasks in Transformer Language Models.
 
Wed Oct 2 Interpretability “illusions”: Can we trust patching? Main: Makelov et al., 2023. Is This the Subspace You Are Looking for? An Interpretability Illusion for Subspace Activation Patching.
Main: Wu et al., 2024. A Reply to Makelov et al. (2023)’s “Interpretability Illusion” Arguments.
 
  Unit 2: Black-box LLM behavior    
Mon Oct 7 Understanding in-context learning Main: Min et al., 2022. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
Main: Yoo et al., 2022. Ground-Truth Labels Matter: A Deeper Look into Input-Label Demonstrations.
Main: Wei et al., 2023. Larger language models do in-context learning differently.
 
Wed Oct 9 Faithfulness of model-generated explanations Background: Jacovi and Goldberg, 2020. Towards Faithfully Interpretable NLP Systems: How should we define and evaluate faithfulness?
Main: Turpin et al., 2023. Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting.
Main: Lanham et al., 2023. Measuring Faithfulness in Chain-of-Thought Reasoning.
Bonus: Mercier and Sperber. Chapter 14: A Reason for Everything. From The Enigma of Reason.
 
Mon Oct 14 Extracting confidence estimates from LLMs Background: Gonzalez et al., 2021. Do Explanations Help Users Detect Errors in Open-Domain QA? An Evaluation of Spoken vs. Visual Explanations.
Main: Zhou et al., 2023. Navigating the Grey Area: How Expressions of Uncertainty and Overconfidence Affect Language Models.
Main: Feng et al., 2024. Don’t Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration.
Bonus: Mercier and Sperber. Chapter 15: The Bright Side of Reason. From The Enigma of Reason.
 
Wed Oct 16 Scaling laws for LLMs Main: Kaplan et al., 2020. Scaling Laws for Neural Language Models
Main: Hoffmann et al., 2022. Training Compute-Optimal Large Language Models.
Main: Porian et al., 2024. Resolving Discrepancies in Compute-Optimal Scaling of Language Models.
 
Mon Oct 21 Emergent abilities of LLMs: Are they a mirage? Main: Wei et al., 2022. Emergent Abilities of Large Language Models.
Main: Schaeffer et al., 2023. Are Emergent Abilities of Large Language Models a Mirage?
Revisit: Scaling Laws from Llama 3.
 
  Unit 3: External forces on LLMs    
Wed Oct 23 Memorization of training data Main: Carlini et al., 2022. Quantifying Memorization Across Neural Language Models.
Main: Tirumala et al., 2022. Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models.
Main: Lesci et al., 2024. Causal Estimation of Memorisation Profiles.
 
Mon Oct 28 Deduplication and pre-training data quality Main: Lee et al., 2021. Deduplicating Training Data Makes Language Models Better.
Main: Longpre et al., 2023. A Pretrainer’s Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity.
Bonus: Li et al., 2024. DataComp-LM: In search of the next generation of training sets for language models.
Revisit: Data filtering from Llama 3.
 
Wed Oct 30 Why does in-context learning emerge? Background: Liu et al., 2018. LSTMs Exploit Linguistic Attributes of Data.
Main: Chan et al., 2022. Data Distributional Properties Drive Emergent In-Context Learning in Transformers.
Main: Chen et al., 2024. Parallel Structures in Pre-training Data Yield In-Context Learning.
Project midterm report due Fri Nov 1
Mon Nov 4 The effects of tokenization Background: Sennrich et al., 2015. Neural Machine Translation of Rare Words with Subword Units.
Main: Feucht et al., 2024. Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs.
Main: Hayase et al., 2024. Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?
Revisit: Tokenization choices for Llama-3.
 
Wed Nov 6 What does fine-tuning accomplish? Main: Zhou et al., 2023. LIMA: Less Is More for Alignment.
Main: Kang et al., 2024. Unfamiliar Finetuning Examples Control How Language Models Hallucinate.
Main: Prakash et al., 2024. Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking.
 
Mon Nov 11 No class (Veteran’s Day)    
Wed Nov 13 No class (EMNLP)    
Mon Nov 18 Fine-tuning data and LLM “opinions” Background: Caliskan et al., 2016. Semantics derived automatically from language corpora contain human-like biases.
Main: Santurkar et al., 2023. Whose Opinions Do Language Models Reflect?
Main: Ryan et al., 2024. Unintended Impacts of LLM Alignment on Global Representation.
 
Start of project presentations      
Wed Nov 20 Final project presentations  
Mon Nov 25 Final project presentations    
Wed Nov 27 No class (Thanksgiving break)    
Mon Dec 2 Final project presentations  
Wed Dec 4 Final project presentations, Conclusion   Project final report due Fri Dec 6

Format of Classes

After the first four classes, classes will revolve around student presentations of papers. For each paper, students will be assigned different roles. We will go one by one through each role, and the corresponding student will give a short presentation in this role.

Main paper roles

For each “main” paper, multiple students will play different roles. Below is the complete list of roles, in presentation order (click on each for details).

Proposer: Proposes the research in the paper to a funding agency.
  • Write-up: A 1-2 page report that answers each of the following questions. These questions are a subset of the questions from the Heilmeier Catechism, an often-used set of questions for evaluating research proposals, such as grant proposals or fellowship applications.
    • What are you trying to do? Articulate your objectives using absolutely no jargon.
    • How is it done today, and what are the limits of current practice?
    • What is new in your approach and why do you think it will be successful? (Keep this brief to the “pitch”–no need to describe the method in detail because that will be done in the main presentation.)
    • Who cares? If you are successful, what difference will it make?
    • What are the risks? (I interpret this to mean, how might this proposed work not succeed? What will you do about it if that happens?)
    • What are the mid-term and final “exams” to check for success?
  • Presentation: ~3 minute presentation covering the first four points. To save time, we will skip over the “risks” and “exams” sections.
Main Presenter: Presents the paper's methods and results.
  • Write-up: Submit your slides, no separate write-up.
  • Presentation: 10 minute slide-based presentation of the paper’s methods and results. Answer the following questions:
    • What is the main method proposed by this paper?
    • What are the baselines (if applicable)?
    • What are the main experiments and results? You do not need to cover all experiments in the paper, just choose the most important ones.
    • What conclusions can be drawn from the results?
Archaeologist: Compares and contrasts the current paper with relevant prior work.
  • Write-up: 1-2 page report that discusses at least 3 related papers that were published before the current paper (excluding all papers that are scheduled to be discussed in class). At least one prior papers must not be cited by the current paper (indicate which one this is in your write-up). For each prior paper, describe:
    • What does the prior work do? Give a brief summary.
    • In what ways is this prior work similar to the current paper?
    • What are the key differences?
    • In what ways is the current paper “novel” compared to this previous paper?
    • Does the prior work and the current work come to similar or different conclusions?
  • Presentation: Choose one related paper and give a ~3 minute oral presentation about how it is related to the main paper.
Reviewer: Writes a review of the paper, identifying both its strengths and weaknesses.
  • Write-up: A 1-2 page review in the format of ACL Rolling Review (somewhat abridged for the purposes of the class). The review should answer each of the following questions in separate sections. Please refer to the ARR Review Form page for details about each section, and the ARR Reviewer Tutorial for more advice on how to write good reviews.
    • Paper Summary
    • Summary of Strengths
    • Summary of Weaknesses
    • Comments/Suggestions (no need to flag typos, though this would also be part of the normal ARR review form)
    • Soundness score (1-5)
    • Overall assessment (1-5)
  • Presentation: ~3 minute presentation of your review. You may optionally share your screen to show your review and/or parts of the paper related to your review.
Visionary: Brainstorms follow-up research and products based on the paper.
  • Write-up: 1 page report detailing at least one idea for each of the following two types of future work:
    • Research: What is a set of natural next research questions to ask? How could the authors of this paper answer these questions?
    • Product: How could this research be made into the basis for a new product? You could imagine it being useful in a corporate setting, in a non-profit, for government use, etc. You should envision a specific use case for this product, and describe how the research paper would help or enable that specific application.
  • Presentation: ~3 minute oral presentation of your future work ideas (visual aids are optional but encouraged).

Non-main paper roles

On some days, we will have papers that serve as background material or bonus material. For these special papers, there will only be two special roles:

Summarizer: Presents a summary of a background or bonus paper.
  • Write-up: Submit your slides, no separate write-up.
  • Presentation: 5-10 minute slide-based summary of the paper. The summary should answer the following questions:
    • What is the goal of this paper?
    • At a high level, what is the paper’s methodology?
    • What are the main experiments and results of the paper?
    • What conclusions can be drawn from the results?
Connector: Draws connections between the background/bonus paper and the main papers.
  • Write-up: 1 page description of the connections between this paper and all main papers for that day (note that the Connector is expected to read both the background/bonus paper they are covering and all main papers for that day). Have a separate paragraph for each main paper. Within each paragraph, discuss:
    • What themes does this paper share with the main papers?
    • In what ways is this paper different from the main papers?
    • (For background papers) How does this paper provide context to understand the main papers?
    • (For bonus papers) How does this paper enhance our understanding of the main papers?
  • Presentation: ~3 minute oral presentation summarizing your written report.

We will also revisit parts of the Llama 3 paper on a couple occasions. When we do this, there will be another special role:

Re-examiner: Investigates how Llama 3 handles the challenges discussed in that day's readings.
  • Write-up: 1 page summary of the key challenges discussed in the day’s main papers (note that this requires reading all main papers). Then, discuss how Llama 3 deals with those challenges, and whether this seems to be a good choice given what is discussed in the main papers.
  • Presentation: ~3 minute oral presentation of the written document.

Concluder Role

Finally, at the end of each class, the Concluder is responsible for reading all main papers from that day and summarizing the connections between them.

Concluder: Summarizes the relationships between all main papers.
  • Write-up: 1-2 page report describing how the main papers are connected. For each paper, answer the following:
    • What themes does this paper share with the other papers?
    • In what ways does this paper support the other papers?
    • In what ways does this paper disagree or present a different narrative than some of the other papers? Are these narratives mutually incompatible, and why?
    • If there are disagreements, which side do you find more convincing, and why?
  • Presentation: ~3 minute oral presentation of these connections. If there are disagreements between papers, stake out a clear position to seed further discussion.

Grading

Grades will be based on role-based written reports (25%) and presentations (25%), in-class discussion (10%), and a final project (40% total).

Roles (25% written reports, 25% presentations; 50% total). Students will play different roles (as described above) on different days. Over the course of the semester, each student will play six unique roles on six different days. One of these roles must be the Main Presenter role. All written reports are due by the time class starts (4:00pm). Each role’s grade will contribute equally to the overall grade.

Class discussion participation (10%). Students are expected to participate in class discussions even when they have no assigned role. This includes asking questions during presentations as well as voicing opinions on discussion topics.

Final project (40% total). Students must complete a final research project on a topic related to the class. Projects may be conducted individually or in groups of up to three.

Final project

Students must complete a final research project on a topic related to the class, either individually or in groups of up to three. This project is expected to include novel research that studies a scientific question about language models (which may or may not be “large,” depending on resource constraints). While projects may involve querying closed-source models like ChatGPT, all projects must also study some open-weight language models (i.e., weights are released, but full training details may not be). Please come to office hours or email me if you have questions related to choosing a project direction.

The final project is worth 40% of the total grade. Points will be allocated as follows:

Project proposal (5%). Students should submit a ~2-page proposal for their project by the end of Week 5 (September 27). The proposal should describe the goal of the project and include a survey of related work. When reading these proposals, I will be looking for the following:

Project midterm report (10%). Students should submit a 3-4 page progress report for their project by the end of Week 10 (November 1). This should describe the project’s goals (which may have changed since the proposal), initial results, and a concrete plan of what will be done for the final report. While the initial results need not be positive, students are expected to have made non-trivial implementation progress by this point. For parts of the report describing project goals and plans, the expectations are largely the same as for the proposal. In addition, I will be looking for the following:

Project final presentation (10%). This will be a ~20 minute presentation during one of the final few class periods. Students should describe the motivation for their work, relevant background material, and results. I encourage students to present both positive and negative results. There will also be some time for audience questions.

Project final report (15%). Students should submit a 5-6 page final report detailing all aspects of their project by the end of the last week of class (December 6). The report should be structured like a conference paper, including an abstract, introduction, related work, and experiments. Parts of the proposal and progress report may be reused for the final report. Negative results will not be penalized, but should be accompanied with detailed analysis of why the expected results did not materialize.

All written project-related assignments should use the standard *ACL paper submission template. All project due dates are 11:59pm PST on Friday.

Late days

You are given 3 free late days that may be used in integer amounts on any role-related written report, the project proposal, and the project midterm report. Each late day extends your deadline by exactly 24 hours. You do not need to contact the course staff before using these late days. For role-related reports, you still must present in class on the scheduled day even if you submit your report late. If you are working in a group and want to submit the project proposal or midterm report late, every group member must spend a late day. No late days are allowed for the project final report (we need to grade them quickly to assign final grades).

Additional late days will result in a deduction of 10% of the grade on the corresponding assignment per day. For the project proposal and midterm report, no late submissions will be accepted more than 3 days after the stated due date, so that we can provide feedback in a timely manner.

Project resources