Research projects

We research on foundations and application of approaches that make AI explainable and controllable.

Mechanistic Interpretability

We try to understand the mechanisms of DNN models, and reveal how their internal mechanisms lead to their behavior, especially the behavior related to knowledge and model safety. We develop methods to probe DNNs, both language models and beyond. These methods probe the DNNs at multiple abstraction levels: layerwise, module-wise, attention-wise, neuron-wise, etc. Recently we are also working on Sparse Autoencoder features. How do the signals extracted at each of these components explain the model behavior? When we intervene on these components, can we steer the model’s behavior? In addition to developing “testing methods” which understand DNN models with high validity and reliability, we also set up “testing materials” which evaluate the crucial capabilities of DNN models.
Related works include:

AI for Research and Knowledge Discovery

We develop AI tools for scientific research. We are interested in how AI can help researchers along multiple steps in the process of producing novel scientific knowledge. These steps include understanding scientific papers, creating scientific hypotheses, planning experiments, analyzing experiment results, generating scientific explanations, writing academic papers, etc. These eventually lead to the discovery of novel knowledge.
Related works include:

Applications of AI Agents

We explore novel use cases of AI agents driven by strong, general-purpose foundational models (including but not limited to language models and vision-language models). We consider the problems that have profound real-world impacts, including: finance, sports, education, and document processing, etc. We explore innovative architectures for these agents, and develop benchmarks that rigorously evaluate their performances.
Related works include: