Skip to content

User Guide

Welcome to the mi-crow user guide! This comprehensive guide will help you understand and use the mi-crow library for mechanistic interpretability research.

What is mi-crow?

mi-crow is a Python package for explaining and steering LLM behavior using Sparse Autoencoders (SAE) and concepts. It provides a complete toolkit for:

  • Activation Analysis: Save and analyze model activations from any layer
  • SAE Training: Train sparse autoencoders to discover interpretable features
  • Concept Discovery: Identify and name concepts learned by SAE neurons
  • Model Steering: Manipulate model behavior through concept-based interventions
  • Hook System: Flexible system for intercepting and modifying activations

What is Mechanistic Interpretability?

Mechanistic interpretability is the study of understanding how neural networks work by reverse-engineering their internal computations. In the context of language models, this means:

  • Understanding what features the model learns at different layers
  • Identifying how these features combine to produce outputs
  • Discovering interpretable concepts that correspond to human-understandable ideas
  • Using this understanding to control and improve model behavior

Library Capabilities

mi-crow provides a modular architecture for mechanistic interpretability research:

  • Language Model Wrapper: Easy loading and inference with HuggingFace models
  • Sparse Autoencoders: Train and use SAEs to discover interpretable features
  • Hooks System: Powerful framework for observing and modifying activations
  • Store: Hierarchical storage for activations, models, and metadata
  • Datasets: Flexible data loading from HuggingFace or local files

Getting Started

  1. Installation - Set up mi-crow and its dependencies
  2. Quick Start - Run your first example in minutes
  3. Core Concepts - Understand the fundamental ideas
  4. Hooks System - Learn about the powerful hooks framework
  5. Workflows - Step-by-step guides for common tasks

Documentation Structure

Core Documentation

Hooks System

The hooks system is the foundation of mi-crow's interpretability capabilities:

Workflows

Step-by-step guides for common tasks:

Additional Resources

Next Steps

If you're new to mi-crow, we recommend following this path:

  1. Start with Installation to set up your environment
  2. Run through the Quick Start tutorial
  3. Read Core Concepts to understand the fundamentals
  4. Explore the Hooks System - it's central to everything
  5. Try a Workflow that matches your research goals
  6. Check out Examples for more detailed code

For API reference, see the API Documentation.