User Guide¶
Welcome to the mi-crow user guide! This comprehensive guide will help you understand and use the mi-crow library for mechanistic interpretability research.
What is mi-crow?¶
mi-crow is a Python package for explaining and steering LLM behavior using Sparse Autoencoders (SAE) and concepts. It provides a complete toolkit for:
- Activation Analysis: Save and analyze model activations from any layer
- SAE Training: Train sparse autoencoders to discover interpretable features
- Concept Discovery: Identify and name concepts learned by SAE neurons
- Model Steering: Manipulate model behavior through concept-based interventions
- Hook System: Flexible system for intercepting and modifying activations
What is Mechanistic Interpretability?¶
Mechanistic interpretability is the study of understanding how neural networks work by reverse-engineering their internal computations. In the context of language models, this means:
- Understanding what features the model learns at different layers
- Identifying how these features combine to produce outputs
- Discovering interpretable concepts that correspond to human-understandable ideas
- Using this understanding to control and improve model behavior
Library Capabilities¶
mi-crow provides a modular architecture for mechanistic interpretability research:
- Language Model Wrapper: Easy loading and inference with HuggingFace models
- Sparse Autoencoders: Train and use SAEs to discover interpretable features
- Hooks System: Powerful framework for observing and modifying activations
- Store: Hierarchical storage for activations, models, and metadata
- Datasets: Flexible data loading from HuggingFace or local files
Getting Started¶
- Installation - Set up mi-crow and its dependencies
- Quick Start - Run your first example in minutes
- Core Concepts - Understand the fundamental ideas
- Hooks System - Learn about the powerful hooks framework
- Workflows - Step-by-step guides for common tasks
Documentation Structure¶
Core Documentation¶
- Installation & Setup - Installation and environment configuration
- Quick Start - Get up and running quickly
- Core Concepts - Fundamental concepts and architecture
Hooks System¶
The hooks system is the foundation of mi-crow's interpretability capabilities:
- Hooks Overview - Introduction to the hooks system
- Hooks Fundamentals - Core concepts and lifecycle
- Detector Hooks - Observing activations without modification
- Controller Hooks - Modifying activations during inference
- Hook Registration - Managing hooks on layers
- Advanced Hooks - Advanced patterns and best practices
Workflows¶
Step-by-step guides for common tasks:
- Workflows Overview - When to use each workflow
- Saving Activations - Collect activation data
- Training SAE Models - Train sparse autoencoders
- Concept Discovery - Find interpretable concepts
- Concept Manipulation - Control model behavior
- Activation Control - Direct activation manipulation
Additional Resources¶
- Best Practices - Tips for effective research
- Troubleshooting - Common issues and solutions
- Examples - Example notebooks and learning path
- Experiments - Detailed experiment walkthroughs
Next Steps¶
If you're new to mi-crow, we recommend following this path:
- Start with Installation to set up your environment
- Run through the Quick Start tutorial
- Read Core Concepts to understand the fundamentals
- Explore the Hooks System - it's central to everything
- Try a Workflow that matches your research goals
- Check out Examples for more detailed code
For API reference, see the API Documentation.