2021-05-27

Interpretable NLP 2021 winter

In winter 2021, we ran a seminar discussing interesting papers related to interpretability in NLP. Here are the list of topics, with links to slides. This is the first iteration; I expect there will be a refined version in the future.

Introduction (2021-01-12) slides
- Why should we care about interpretable NLP?
  - We want to build NLP systems with better performance.
  - “Good performance” requires much more than just “high accuracy”.
  - We want to build NLP systems that deploy well in the society.
- What do interpretable NLP research include?
  - Mainly about the ACL/EMNLP/NAACL track “Interpretability and analysis of NLP models”. BlackboxNLP workshop is also relevant.
  - Connection to: FAccT, theory, psycholinguistics, ML4H
Background: language modeling, DNNs in NLP (Jan 19, 2021) slides
- A view of NLP: it is a window for understanding knowledge & intelligence.
- Many popular tasks and models (e.g., neural networks for language modeling) are developed along this goal: LSA, probabilistic neural LM, word2vec / GloVe, contextualized LM, …
Background: Interpretability, explainable AI (Feb 2, 2021) slides
- Some principles of model interpretability.
- Some early methods to interpret models, including:
  - A local, (almost-) linear, post-hoc method: LIME
  - A method based on Shapley values: SHAP
  - Attention-based methods
  - SVCCA
- Interpretability to humans might be complicated.
Topic: The geometry of embeddings (Feb 9, 2021) slides
- Three viewpoints in interpreting the geometry:
  - Linear Analogy
  - Anisotropy
  - Manifold
- How can we use these understanding to build better embeddings?
  - New methods could benefit old models!
  - Consider frequency & isotropy might help
Topic: Probing (Feb 16, 2021) slides
- What is probe and how to probe?
  - Probe as a diagnostic classifier
    - Probe for semantic evidence, syntax, or other aspects in NLP pipeline
    - An information theory framework
  - Extend “probe” to e.g., without parameters
- What can probes do?
  - Assess & remove bias
  - Assess the utility of features
Topic: Behavioral tests on NLP models (Feb 23, 2021) slides
- Syntactic evaluation of LMs.
- Pragmatic, semantic, and commonsense evaluations.
- Specifically-designed tests (e.g., invariance tests).
Topic: Spurious correlations, shortcut learning (March 2, 2021) slides
- The “right for the wrong reasons” problem.
- Solving this problem:
  - Changing dataset distributions.
  - Let models avoid the bias.
  - Train LMs on larger data.
Topic: Influence of samples, understanding the datasets (March 16, 2021) slides
- Perturbing the samples
- Influence functions
- Anchors, features, and adversarial examples
- Studying the datasets