In winter 2021, we ran a seminar discussing interesting papers related to interpretability in NLP. Here are the list of topics, with links to slides. This is the first iteration; I expect there will be a refined version in the future.
Introduction (2021-01-12) slides
- Why should we care about interpretable NLP?
- We want to build NLP systems with better performance.
- “Good performance” requires much more than just “high accuracy”.
- We want to build NLP systems that deploy well in the society.
- What do interpretable NLP research include?
- Mainly about the ACL/EMNLP/NAACL track “Interpretability and analysis of NLP models”. BlackboxNLP workshop is also relevant.
- Connection to: FAccT, theory, psycholinguistics, ML4H
- Why should we care about interpretable NLP?
Background: language modeling, DNNs in NLP (Jan 19, 2021) slides
- A view of NLP: it is a window for understanding knowledge & intelligence.
- Many popular tasks and models (e.g., neural networks for language modeling) are developed along this goal: LSA, probabilistic neural LM, word2vec / GloVe, contextualized LM, …
Background: Interpretability, explainable AI (Feb 2, 2021) slides
- Some principles of model interpretability.
- Some early methods to interpret models, including:
- A local, (almost-) linear, post-hoc method: LIME
- A method based on Shapley values: SHAP
- Attention-based methods
- SVCCA
- Interpretability to humans might be complicated.
Topic: The geometry of embeddings (Feb 9, 2021) slides
- Three viewpoints in interpreting the geometry:
- Linear Analogy
- Anisotropy
- Manifold
- How can we use these understanding to build better embeddings?
- New methods could benefit old models!
- Consider frequency & isotropy might help
- Three viewpoints in interpreting the geometry:
Topic: Probing (Feb 16, 2021) slides
- What is probe and how to probe?
- Probe as a diagnostic classifier
- Probe for semantic evidence, syntax, or other aspects in NLP pipeline
- An information theory framework
- Extend “probe” to e.g., without parameters
- Probe as a diagnostic classifier
- What can probes do?
- Assess & remove bias
- Assess the utility of features
- What is probe and how to probe?
Topic: Behavioral tests on NLP models (Feb 23, 2021) slides
- Syntactic evaluation of LMs.
- Pragmatic, semantic, and commonsense evaluations.
- Specifically-designed tests (e.g., invariance tests).
Topic: Spurious correlations, shortcut learning (March 2, 2021) slides
- The “right for the wrong reasons” problem.
- Solving this problem:
- Changing dataset distributions.
- Let models avoid the bias.
- Train LMs on larger data.
Topic: Influence of samples, understanding the datasets (March 16, 2021) slides
- Perturbing the samples
- Influence functions
- Anchors, features, and adversarial examples
- Studying the datasets