Mi-Crow¶

Python library for mechanistic interpretability research on Large Language Models

What is Mi-Crow?¶

Mi-Crow is a Python library designed for researchers working on mechanistic interpretability of Large Language Models (LLMs). It provides a unified interface for analyzing and controlling model behavior through mechanistic interpretability methods, making it easy to understand what's happening inside neural networks.

Key Capabilities¶

:robot: Activation Analysis

Save and analyze model activations from any layer with minimal performance overhead
:brain: SAE Training

Train sparse autoencoders to discover interpretable features and concepts
:bulb: Concept Discovery

Identify and name concepts learned by SAE neurons through automated analysis
:video_game: Model Steering

Manipulate model behavior through concept-based interventions and activation control
:hook:Hook System

Flexible framework for intercepting and modifying activations at any layer
:floppy_disk: Data Persistence

Efficient hierarchical storage for managing large-scale experiment data

Quick Start¶

Installation¶

pip install mi-crow

Basic Usage¶

from mi_crow.language_model import LanguageModel

# Initialize a language model
lm = LanguageModel(model_id="bielik")

# Run inference
outputs = lm.forwards(["Hello, world!"])

# Access activations and outputs
print(outputs.logits)

Training an SAE¶

from mi_crow.language_model import LanguageModel
from mi_crow.mechanistic.sae import SaeTrainer
from mi_crow.mechanistic.sae.modules import TopKSae

# Load model and collect activations
lm = LanguageModel(model_id="bielik")
activations = lm.save_activations(
    dataset=["Your text data here"],
    layers=["transformer_h_0_attn_c_attn"]
)

# Train SAE
trainer = SaeTrainer(
    model=lm,
    layer="transformer_h_0_attn_c_attn",
    sae_class=TopKSae,
    hyperparams={"epochs": 10, "batch_size": 256}
)
sae = trainer.train(activations)

Documentation Structure¶

🚀 Getting Started¶

New to Mi-Crow? Start here:

Installation Guide - Set up your environment
Quick Start Tutorial - Run your first example in minutes
Core Concepts - Understand the fundamentals

📚 User Guide¶

Comprehensive guides for all features:

Hooks System - Complete guide to the powerful hooks framework
Fundamentals - Core concepts
Detectors - Observing activations
Controllers - Modifying behavior
Registration - Hook management
Advanced Usage - Advanced patterns
Workflows - Step-by-step guides for common tasks
Saving Activations
Training SAE Models
Concept Discovery
Concept Manipulation
Activation Control
Best Practices - Tips for effective research
Troubleshooting - Common issues and solutions
Examples - Example notebooks overview

🧪 Experiments¶

Real-world experiments demonstrating Mi-Crow usage:

Experiments Overview - Available experiments
Verify SAE Training - Complete SAE training workflow
SLURM Pipeline - Distributed training setup

📖 API Reference¶

Complete API documentation:

API Overview - API structure and organization
Language Model - Model loading and inference
SAE - Sparse autoencoder APIs
Datasets - Dataset loading and processing
Store - Persistence layer
Hooks - Hook system APIs

Features¶

Unified Model Interface¶

Work with any HuggingFace language model through a consistent API. No need to handle model-specific initialization details.

Research-Focused Design¶

Built specifically for interpretability research workflows:

Comprehensive Testing: 85%+ code coverage requirement
Type Safety: Extensive use of Python type annotations
Documentation: Complete API reference and user guides
CI/CD: Automated testing and deployment
Minimal Overhead: Hook system introduces negligible latency during inference

Flexible Architecture¶

Five core modules that work independently or together:

Language Model - Unified interface for any HuggingFace model
Hooks - Flexible activation interception system
Mechanistic - SAE training and concept manipulation
Store - Hierarchical data persistence
Datasets - Dataset loading and processing

Repository & Links¶

GitHub: AdamKaniasty/Inzynierka
PyPI: mi-crow
Documentation: This site

Citation¶

If you use Mi-Crow in your research, please cite:

@thesis{kaniasty2025microw,
  title={Mechanistic Interpretability for Large Language Models: A Production-Ready Framework},
  author={Kaniasty, Adam and Kowalski, Hubert},
  year={2025},
  school={Warsaw University of Technology},
  note={Engineering Thesis}
}

Next Steps¶

:rocket: Quick Start

Get up and running in minutes
:book: User Guide

Comprehensive documentation
:wrench: Examples

Explore example notebooks
🧪 Experiments

Real-world use cases