User Guide¶

Welcome to the mi-crow user guide! This comprehensive guide will help you understand and use the mi-crow library for mechanistic interpretability research.

What is mi-crow?¶

mi-crow is a Python package for explaining and steering LLM behavior using Sparse Autoencoders (SAE) and concepts. It provides a complete toolkit for:

Activation Analysis: Save and analyze model activations from any layer
SAE Training: Train sparse autoencoders to discover interpretable features
Concept Discovery: Identify and name concepts learned by SAE neurons
Model Steering: Manipulate model behavior through concept-based interventions
Hook System: Flexible system for intercepting and modifying activations

What is Mechanistic Interpretability?¶

Mechanistic interpretability is the study of understanding how neural networks work by reverse-engineering their internal computations. In the context of language models, this means:

Understanding what features the model learns at different layers
Identifying how these features combine to produce outputs
Discovering interpretable concepts that correspond to human-understandable ideas
Using this understanding to control and improve model behavior

Library Capabilities¶

mi-crow provides a modular architecture for mechanistic interpretability research:

Language Model Wrapper: Easy loading and inference with HuggingFace models
Sparse Autoencoders: Train and use SAEs to discover interpretable features
Hooks System: Powerful framework for observing and modifying activations
Store: Hierarchical storage for activations, models, and metadata
Datasets: Flexible data loading from HuggingFace or local files

Getting Started¶

Installation - Set up mi-crow and its dependencies
Quick Start - Run your first example in minutes
Core Concepts - Understand the fundamental ideas
Hooks System - Learn about the powerful hooks framework
Workflows - Step-by-step guides for common tasks

Documentation Structure¶

Core Documentation¶

Installation & Setup - Installation and environment configuration
Quick Start - Get up and running quickly
Core Concepts - Fundamental concepts and architecture

Hooks System¶

The hooks system is the foundation of mi-crow's interpretability capabilities:

Hooks Overview - Introduction to the hooks system
Hooks Fundamentals - Core concepts and lifecycle
Detector Hooks - Observing activations without modification
Controller Hooks - Modifying activations during inference
Hook Registration - Managing hooks on layers
Advanced Hooks - Advanced patterns and best practices

Workflows¶

Step-by-step guides for common tasks:

Workflows Overview - When to use each workflow
Saving Activations - Collect activation data
Training SAE Models - Train sparse autoencoders
Concept Discovery - Find interpretable concepts
Concept Manipulation - Control model behavior
Activation Control - Direct activation manipulation

Additional Resources¶

Best Practices - Tips for effective research
Troubleshooting - Common issues and solutions
Examples - Example notebooks and learning path
Experiments - Detailed experiment walkthroughs

Next Steps¶

If you're new to mi-crow, we recommend following this path:

Start with Installation to set up your environment
Run through the Quick Start tutorial
Read Core Concepts to understand the fundamentals
Explore the Hooks System - it's central to everything
Try a Workflow that matches your research goals
Check out Examples for more detailed code

For API reference, see the API Documentation.