Store API¶

Persistence layer for activations, models, and runs.

mi_crow.store ¶

LocalStore ¶

LocalStore(base_path='', runs_prefix='runs', dataset_prefix='datasets', model_prefix='models')

Bases: Store

Local filesystem implementation of Store interface.

Initialize LocalStore.

Parameters:

Name	Type	Description	Default
`base_path`	`Path \| str`	Base directory path for the store	`''`
`runs_prefix`	`str`	Prefix for runs directory	`'runs'`
`dataset_prefix`	`str`	Prefix for datasets directory	`'datasets'`
`model_prefix`	`str`	Prefix for models directory	`'models'`

Source code in src/mi_crow/store/local_store.py

def __init__(
        self,
        base_path: Path | str = '',
        runs_prefix: str = "runs",
        dataset_prefix: str = "datasets",
        model_prefix: str = "models",
):
    """Initialize LocalStore.

    Args:
        base_path: Base directory path for the store
        runs_prefix: Prefix for runs directory
        dataset_prefix: Prefix for datasets directory
        model_prefix: Prefix for models directory
    """
    super().__init__(base_path, runs_prefix, dataset_prefix, model_prefix)

Store ¶

Store(base_path='', runs_prefix='runs', dataset_prefix='datasets', model_prefix='models')

Bases: ABC

Abstract store optimized for tensor batches grouped by run_id.

This interface intentionally excludes generic bytes/JSON APIs. Implementations should focus on efficient safetensors-backed IO.

The store organizes data hierarchically: - Runs: Top-level grouping by run_id - Batches: Within each run, data is organized by batch_index - Layers: Within each batch, tensors are organized by layer_signature - Keys: Within each layer, tensors are identified by key (e.g., "activations")

Initialize Store.

Parameters:

Name	Type	Description	Default
`base_path`	`Path \| str`	Base directory path for the store	`''`
`runs_prefix`	`str`	Prefix for runs directory (default: "runs")	`'runs'`
`dataset_prefix`	`str`	Prefix for datasets directory (default: "datasets")	`'datasets'`
`model_prefix`	`str`	Prefix for models directory (default: "models")	`'models'`

Source code in src/mi_crow/store/store.py

def __init__(
        self,
        base_path: Path | str = "",
        runs_prefix: str = "runs",
        dataset_prefix: str = "datasets",
        model_prefix: str = "models",
):
    """Initialize Store.

    Args:
        base_path: Base directory path for the store
        runs_prefix: Prefix for runs directory (default: "runs")
        dataset_prefix: Prefix for datasets directory (default: "datasets")
        model_prefix: Prefix for models directory (default: "models")
    """
    self.runs_prefix = runs_prefix
    self.dataset_prefix = dataset_prefix
    self.model_prefix = model_prefix
    self.base_path = Path(base_path)

get_detector_metadata `abstractmethod` ¶

get_detector_metadata(run_id, batch_index)

Load detector metadata with separate JSON and tensor store.

Parameters:

Name	Type	Description	Default
`run_id`	`str`	Run identifier	required
`batch_index`	`int`	Batch index	required

Returns:

Type	Description
`tuple[Dict[str, Any], TensorMetadata]`	Tuple of (metadata dict, tensor_metadata dict). Returns empty dicts if not found.

Raises:

Type	Description
`ValueError`	If parameters are invalid or metadata format is invalid
`JSONDecodeError`	If metadata file exists but contains invalid JSON
`OSError`	If tensor files exist but cannot be loaded

Source code in src/mi_crow/store/store.py

@abc.abstractmethod
def get_detector_metadata(
        self,
        run_id: str,
        batch_index: int
) -> tuple[Dict[str, Any], TensorMetadata]:
    """Load detector metadata with separate JSON and tensor store.

    Args:
        run_id: Run identifier
        batch_index: Batch index

    Returns:
        Tuple of (metadata dict, tensor_metadata dict). Returns empty dicts if not found.

    Raises:
        ValueError: If parameters are invalid or metadata format is invalid
        json.JSONDecodeError: If metadata file exists but contains invalid JSON
        OSError: If tensor files exist but cannot be loaded
    """
    raise NotImplementedError

get_detector_metadata_by_layer_by_key `abstractmethod` ¶

get_detector_metadata_by_layer_by_key(run_id, batch_index, layer, key)

Get a specific tensor from detector metadata by layer and key.

Parameters:

Name	Type	Description	Default
`run_id`	`str`	Run identifier	required
`batch_index`	`int`	Batch index	required
`layer`	`str`	Layer signature	required
`key`	`str`	Tensor key (e.g., "activations")	required

Returns:

Type	Description
`Tensor`	The requested tensor

Raises:

Type	Description
`ValueError`	If parameters are invalid
`FileNotFoundError`	If the tensor doesn't exist
`OSError`	If tensor file exists but cannot be loaded

Source code in src/mi_crow/store/store.py

@abc.abstractmethod
def get_detector_metadata_by_layer_by_key(
        self,
        run_id: str,
        batch_index: int,
        layer: str,
        key: str
) -> torch.Tensor:
    """Get a specific tensor from detector metadata by layer and key.

    Args:
        run_id: Run identifier
        batch_index: Batch index
        layer: Layer signature
        key: Tensor key (e.g., "activations")

    Returns:
        The requested tensor

    Raises:
        ValueError: If parameters are invalid
        FileNotFoundError: If the tensor doesn't exist
        OSError: If tensor file exists but cannot be loaded
    """
    raise NotImplementedError

get_run_metadata `abstractmethod` ¶

get_run_metadata(run_id)

Load metadata for a run.

Parameters:

Name	Type	Description	Default
`run_id`	`str`	Run identifier	required

Returns:

Type	Description
`Dict[str, Any]`	Metadata dictionary, or empty dict if not found

Raises:

Type	Description
`ValueError`	If run_id is invalid
`JSONDecodeError`	If metadata file exists but contains invalid JSON

Source code in src/mi_crow/store/store.py

@abc.abstractmethod
def get_run_metadata(self, run_id: str) -> Dict[str, Any]:
    """Load metadata for a run.

    Args:
        run_id: Run identifier

    Returns:
        Metadata dictionary, or empty dict if not found

    Raises:
        ValueError: If run_id is invalid
        json.JSONDecodeError: If metadata file exists but contains invalid JSON
    """
    raise NotImplementedError

iter_run_batch_range ¶

iter_run_batch_range(run_id, *, start=0, stop=None, step=1)

Iterate run batches for indices in range(start, stop, step).

If stop is None, it will be set to max(list_run_batches(run_id)) + 1 (or 0 if none). Raises ValueError if step == 0 or start < 0.

Source code in src/mi_crow/store/store.py

def iter_run_batch_range(
        self,
        run_id: str,
        *,
        start: int = 0,
        stop: int | None = None,
        step: int = 1,
) -> Iterator[List[torch.Tensor] | Dict[str, torch.Tensor]]:
    """Iterate run batches for indices in range(start, stop, step).

    If stop is None, it will be set to max(list_run_batches(run_id)) + 1 (or 0 if none).
    Raises ValueError if step == 0 or start < 0.
    """
    if step == 0:
        raise ValueError("step must not be 0")
    if start < 0:
        raise ValueError("start must be >= 0")
    indices = self.list_run_batches(run_id)
    if not indices:
        return
    max_idx = max(indices)
    if stop is None:
        stop = max_idx + 1
    for idx in range(start, stop, step):
        try:
            yield self.get_run_batch(run_id, idx)
        except FileNotFoundError:
            continue

put_detector_metadata `abstractmethod` ¶

put_detector_metadata(run_id, batch_index, metadata, tensor_metadata)

Save detector metadata with separate JSON and tensor store.

Parameters:

Name	Type	Description	Default
`run_id`	`str`	Run identifier	required
`batch_index`	`int`	Batch index (must be non-negative)	required
`metadata`	`Dict[str, Any]`	JSON-serializable metadata dictionary (aggregated from all detectors)	required
`tensor_metadata`	`TensorMetadata`	Dictionary mapping layer_signature to dict of tensor_key -> tensor (from all detectors)	required

Returns:

Type	Description
`str`	Full path key used for store (e.g., "runs/{run_id}/batch_{batch_index}")

Raises:

Type	Description
`ValueError`	If parameters are invalid or metadata is not JSON-serializable
`OSError`	If file system operations fail

Source code in src/mi_crow/store/store.py

@abc.abstractmethod
def put_detector_metadata(
        self,
        run_id: str,
        batch_index: int,
        metadata: Dict[str, Any],
        tensor_metadata: TensorMetadata
) -> str:
    """Save detector metadata with separate JSON and tensor store.

    Args:
        run_id: Run identifier
        batch_index: Batch index (must be non-negative)
        metadata: JSON-serializable metadata dictionary (aggregated from all detectors)
        tensor_metadata: Dictionary mapping layer_signature to dict of tensor_key -> tensor
                       (from all detectors)

    Returns:
        Full path key used for store (e.g., "runs/{run_id}/batch_{batch_index}")

    Raises:
        ValueError: If parameters are invalid or metadata is not JSON-serializable
        OSError: If file system operations fail
    """
    raise NotImplementedError

put_run_detector_metadata `abstractmethod` ¶

put_run_detector_metadata(run_id, metadata, tensor_metadata)

Save detector metadata for a whole run in a unified location.

This differs from put_detector_metadata which organises data per-batch under runs/{run_id}/batch_{batch_index}.

put_run_detector_metadata instead stores everything under runs/{run_id}/detectors. Implementations are expected to support being called multiple times for the same run_id and append / aggregate new metadata rather than overwrite it.

Parameters:

Name	Type	Description	Default
`run_id`	`str`	Run identifier	required
`metadata`	`Dict[str, Any]`	JSON-serialisable metadata dictionary aggregated from all detectors for the current chunk / batch.	required
`tensor_metadata`	`TensorMetadata`	Dictionary mapping layer_signature to dict of tensor_key -> tensor (from all detectors).	required

Returns:

Type	Description
`str`	String path/key where metadata was stored
`str`	(e.g. `runs/{run_id}/detectors`).

Raises:

Type	Description
`ValueError`	If parameters are invalid or metadata is not JSON‑serialisable.
`OSError`	If file system operations fail.

Source code in src/mi_crow/store/store.py

@abc.abstractmethod
def put_run_detector_metadata(
        self,
        run_id: str,
        metadata: Dict[str, Any],
        tensor_metadata: TensorMetadata,
) -> str:
    """
    Save detector metadata for a whole run in a unified location.

    This differs from ``put_detector_metadata`` which organises data
    per-batch under ``runs/{run_id}/batch_{batch_index}``.

    ``put_run_detector_metadata`` instead stores everything under
    ``runs/{run_id}/detectors``. Implementations are expected to
    support being called multiple times for the same ``run_id`` and
    append / aggregate new metadata rather than overwrite it.

    Args:
        run_id: Run identifier
        metadata: JSON-serialisable metadata dictionary aggregated
            from all detectors for the current chunk / batch.
        tensor_metadata: Dictionary mapping layer_signature to dict
            of tensor_key -> tensor (from all detectors).

    Returns:
        String path/key where metadata was stored
        (e.g. ``runs/{run_id}/detectors``).

    Raises:
        ValueError: If parameters are invalid or metadata is not
            JSON‑serialisable.
        OSError: If file system operations fail.
    """
    raise NotImplementedError

put_run_metadata `abstractmethod` ¶

put_run_metadata(run_id, meta)

Persist metadata for a run (e.g., dataset/model identifiers).

Parameters:

Name	Type	Description	Default
`run_id`	`str`	Run identifier	required
`meta`	`Dict[str, Any]`	Metadata dictionary to save (must be JSON-serializable)	required

Returns:

Type	Description
`str`	String path/key where metadata was stored (e.g., "runs/{run_id}/meta.json")

Raises:

Type	Description
`ValueError`	If run_id is invalid or meta is not JSON-serializable
`OSError`	If file system operations fail

Note

Implementations should store JSON at a stable location, e.g., runs/{run_id}/meta.json.

Source code in src/mi_crow/store/store.py

@abc.abstractmethod
def put_run_metadata(self, run_id: str, meta: Dict[str, Any]) -> str:
    """Persist metadata for a run (e.g., dataset/model identifiers).

    Args:
        run_id: Run identifier
        meta: Metadata dictionary to save (must be JSON-serializable)

    Returns:
        String path/key where metadata was stored (e.g., "runs/{run_id}/meta.json")

    Raises:
        ValueError: If run_id is invalid or meta is not JSON-serializable
        OSError: If file system operations fail

    Note:
        Implementations should store JSON at a stable location, e.g., runs/{run_id}/meta.json.
    """
    raise NotImplementedError

Store API¶

mi_crow.store ¶

LocalStore ¶

Store ¶

get_detector_metadata abstractmethod ¶

get_detector_metadata_by_layer_by_key abstractmethod ¶

get_run_metadata abstractmethod ¶

iter_run_batch_range ¶

put_detector_metadata abstractmethod ¶

put_run_detector_metadata abstractmethod ¶

put_run_metadata abstractmethod ¶

get_detector_metadata `abstractmethod` ¶

get_detector_metadata_by_layer_by_key `abstractmethod` ¶

get_run_metadata `abstractmethod` ¶

put_detector_metadata `abstractmethod` ¶

put_run_detector_metadata `abstractmethod` ¶

put_run_metadata `abstractmethod` ¶