Skip to content

Store API

Persistence layer for activations, models, and runs.

mi_crow.store

LocalStore

LocalStore(base_path='', runs_prefix='runs', dataset_prefix='datasets', model_prefix='models')

Bases: Store

Local filesystem implementation of Store interface.

Initialize LocalStore.

Parameters:

Name Type Description Default
base_path Path | str

Base directory path for the store

''
runs_prefix str

Prefix for runs directory

'runs'
dataset_prefix str

Prefix for datasets directory

'datasets'
model_prefix str

Prefix for models directory

'models'
Source code in src/mi_crow/store/local_store.py
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
def __init__(
        self,
        base_path: Path | str = '',
        runs_prefix: str = "runs",
        dataset_prefix: str = "datasets",
        model_prefix: str = "models",
):
    """Initialize LocalStore.

    Args:
        base_path: Base directory path for the store
        runs_prefix: Prefix for runs directory
        dataset_prefix: Prefix for datasets directory
        model_prefix: Prefix for models directory
    """
    super().__init__(base_path, runs_prefix, dataset_prefix, model_prefix)

Store

Store(base_path='', runs_prefix='runs', dataset_prefix='datasets', model_prefix='models')

Bases: ABC

Abstract store optimized for tensor batches grouped by run_id.

This interface intentionally excludes generic bytes/JSON APIs. Implementations should focus on efficient safetensors-backed IO.

The store organizes data hierarchically: - Runs: Top-level grouping by run_id - Batches: Within each run, data is organized by batch_index - Layers: Within each batch, tensors are organized by layer_signature - Keys: Within each layer, tensors are identified by key (e.g., "activations")

Initialize Store.

Parameters:

Name Type Description Default
base_path Path | str

Base directory path for the store

''
runs_prefix str

Prefix for runs directory (default: "runs")

'runs'
dataset_prefix str

Prefix for datasets directory (default: "datasets")

'datasets'
model_prefix str

Prefix for models directory (default: "models")

'models'
Source code in src/mi_crow/store/store.py
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
def __init__(
        self,
        base_path: Path | str = "",
        runs_prefix: str = "runs",
        dataset_prefix: str = "datasets",
        model_prefix: str = "models",
):
    """Initialize Store.

    Args:
        base_path: Base directory path for the store
        runs_prefix: Prefix for runs directory (default: "runs")
        dataset_prefix: Prefix for datasets directory (default: "datasets")
        model_prefix: Prefix for models directory (default: "models")
    """
    self.runs_prefix = runs_prefix
    self.dataset_prefix = dataset_prefix
    self.model_prefix = model_prefix
    self.base_path = Path(base_path)

get_detector_metadata abstractmethod

get_detector_metadata(run_id, batch_index)

Load detector metadata with separate JSON and tensor store.

Parameters:

Name Type Description Default
run_id str

Run identifier

required
batch_index int

Batch index

required

Returns:

Type Description
tuple[Dict[str, Any], TensorMetadata]

Tuple of (metadata dict, tensor_metadata dict). Returns empty dicts if not found.

Raises:

Type Description
ValueError

If parameters are invalid or metadata format is invalid

JSONDecodeError

If metadata file exists but contains invalid JSON

OSError

If tensor files exist but cannot be loaded

Source code in src/mi_crow/store/store.py
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
@abc.abstractmethod
def get_detector_metadata(
        self,
        run_id: str,
        batch_index: int
) -> tuple[Dict[str, Any], TensorMetadata]:
    """Load detector metadata with separate JSON and tensor store.

    Args:
        run_id: Run identifier
        batch_index: Batch index

    Returns:
        Tuple of (metadata dict, tensor_metadata dict). Returns empty dicts if not found.

    Raises:
        ValueError: If parameters are invalid or metadata format is invalid
        json.JSONDecodeError: If metadata file exists but contains invalid JSON
        OSError: If tensor files exist but cannot be loaded
    """
    raise NotImplementedError

get_detector_metadata_by_layer_by_key abstractmethod

get_detector_metadata_by_layer_by_key(run_id, batch_index, layer, key)

Get a specific tensor from detector metadata by layer and key.

Parameters:

Name Type Description Default
run_id str

Run identifier

required
batch_index int

Batch index

required
layer str

Layer signature

required
key str

Tensor key (e.g., "activations")

required

Returns:

Type Description
Tensor

The requested tensor

Raises:

Type Description
ValueError

If parameters are invalid

FileNotFoundError

If the tensor doesn't exist

OSError

If tensor file exists but cannot be loaded

Source code in src/mi_crow/store/store.py
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
@abc.abstractmethod
def get_detector_metadata_by_layer_by_key(
        self,
        run_id: str,
        batch_index: int,
        layer: str,
        key: str
) -> torch.Tensor:
    """Get a specific tensor from detector metadata by layer and key.

    Args:
        run_id: Run identifier
        batch_index: Batch index
        layer: Layer signature
        key: Tensor key (e.g., "activations")

    Returns:
        The requested tensor

    Raises:
        ValueError: If parameters are invalid
        FileNotFoundError: If the tensor doesn't exist
        OSError: If tensor file exists but cannot be loaded
    """
    raise NotImplementedError

get_run_metadata abstractmethod

get_run_metadata(run_id)

Load metadata for a run.

Parameters:

Name Type Description Default
run_id str

Run identifier

required

Returns:

Type Description
Dict[str, Any]

Metadata dictionary, or empty dict if not found

Raises:

Type Description
ValueError

If run_id is invalid

JSONDecodeError

If metadata file exists but contains invalid JSON

Source code in src/mi_crow/store/store.py
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
@abc.abstractmethod
def get_run_metadata(self, run_id: str) -> Dict[str, Any]:
    """Load metadata for a run.

    Args:
        run_id: Run identifier

    Returns:
        Metadata dictionary, or empty dict if not found

    Raises:
        ValueError: If run_id is invalid
        json.JSONDecodeError: If metadata file exists but contains invalid JSON
    """
    raise NotImplementedError

iter_run_batch_range

iter_run_batch_range(run_id, *, start=0, stop=None, step=1)

Iterate run batches for indices in range(start, stop, step).

If stop is None, it will be set to max(list_run_batches(run_id)) + 1 (or 0 if none). Raises ValueError if step == 0 or start < 0.

Source code in src/mi_crow/store/store.py
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
def iter_run_batch_range(
        self,
        run_id: str,
        *,
        start: int = 0,
        stop: int | None = None,
        step: int = 1,
) -> Iterator[List[torch.Tensor] | Dict[str, torch.Tensor]]:
    """Iterate run batches for indices in range(start, stop, step).

    If stop is None, it will be set to max(list_run_batches(run_id)) + 1 (or 0 if none).
    Raises ValueError if step == 0 or start < 0.
    """
    if step == 0:
        raise ValueError("step must not be 0")
    if start < 0:
        raise ValueError("start must be >= 0")
    indices = self.list_run_batches(run_id)
    if not indices:
        return
    max_idx = max(indices)
    if stop is None:
        stop = max_idx + 1
    for idx in range(start, stop, step):
        try:
            yield self.get_run_batch(run_id, idx)
        except FileNotFoundError:
            continue

put_detector_metadata abstractmethod

put_detector_metadata(run_id, batch_index, metadata, tensor_metadata)

Save detector metadata with separate JSON and tensor store.

Parameters:

Name Type Description Default
run_id str

Run identifier

required
batch_index int

Batch index (must be non-negative)

required
metadata Dict[str, Any]

JSON-serializable metadata dictionary (aggregated from all detectors)

required
tensor_metadata TensorMetadata

Dictionary mapping layer_signature to dict of tensor_key -> tensor (from all detectors)

required

Returns:

Type Description
str

Full path key used for store (e.g., "runs/{run_id}/batch_{batch_index}")

Raises:

Type Description
ValueError

If parameters are invalid or metadata is not JSON-serializable

OSError

If file system operations fail

Source code in src/mi_crow/store/store.py
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
@abc.abstractmethod
def put_detector_metadata(
        self,
        run_id: str,
        batch_index: int,
        metadata: Dict[str, Any],
        tensor_metadata: TensorMetadata
) -> str:
    """Save detector metadata with separate JSON and tensor store.

    Args:
        run_id: Run identifier
        batch_index: Batch index (must be non-negative)
        metadata: JSON-serializable metadata dictionary (aggregated from all detectors)
        tensor_metadata: Dictionary mapping layer_signature to dict of tensor_key -> tensor
                       (from all detectors)

    Returns:
        Full path key used for store (e.g., "runs/{run_id}/batch_{batch_index}")

    Raises:
        ValueError: If parameters are invalid or metadata is not JSON-serializable
        OSError: If file system operations fail
    """
    raise NotImplementedError

put_run_detector_metadata abstractmethod

put_run_detector_metadata(run_id, metadata, tensor_metadata)

Save detector metadata for a whole run in a unified location.

This differs from put_detector_metadata which organises data per-batch under runs/{run_id}/batch_{batch_index}.

put_run_detector_metadata instead stores everything under runs/{run_id}/detectors. Implementations are expected to support being called multiple times for the same run_id and append / aggregate new metadata rather than overwrite it.

Parameters:

Name Type Description Default
run_id str

Run identifier

required
metadata Dict[str, Any]

JSON-serialisable metadata dictionary aggregated from all detectors for the current chunk / batch.

required
tensor_metadata TensorMetadata

Dictionary mapping layer_signature to dict of tensor_key -> tensor (from all detectors).

required

Returns:

Type Description
str

String path/key where metadata was stored

str

(e.g. runs/{run_id}/detectors).

Raises:

Type Description
ValueError

If parameters are invalid or metadata is not JSON‑serialisable.

OSError

If file system operations fail.

Source code in src/mi_crow/store/store.py
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
@abc.abstractmethod
def put_run_detector_metadata(
        self,
        run_id: str,
        metadata: Dict[str, Any],
        tensor_metadata: TensorMetadata,
) -> str:
    """
    Save detector metadata for a whole run in a unified location.

    This differs from ``put_detector_metadata`` which organises data
    per-batch under ``runs/{run_id}/batch_{batch_index}``.

    ``put_run_detector_metadata`` instead stores everything under
    ``runs/{run_id}/detectors``. Implementations are expected to
    support being called multiple times for the same ``run_id`` and
    append / aggregate new metadata rather than overwrite it.

    Args:
        run_id: Run identifier
        metadata: JSON-serialisable metadata dictionary aggregated
            from all detectors for the current chunk / batch.
        tensor_metadata: Dictionary mapping layer_signature to dict
            of tensor_key -> tensor (from all detectors).

    Returns:
        String path/key where metadata was stored
        (e.g. ``runs/{run_id}/detectors``).

    Raises:
        ValueError: If parameters are invalid or metadata is not
            JSON‑serialisable.
        OSError: If file system operations fail.
    """
    raise NotImplementedError

put_run_metadata abstractmethod

put_run_metadata(run_id, meta)

Persist metadata for a run (e.g., dataset/model identifiers).

Parameters:

Name Type Description Default
run_id str

Run identifier

required
meta Dict[str, Any]

Metadata dictionary to save (must be JSON-serializable)

required

Returns:

Type Description
str

String path/key where metadata was stored (e.g., "runs/{run_id}/meta.json")

Raises:

Type Description
ValueError

If run_id is invalid or meta is not JSON-serializable

OSError

If file system operations fail

Note

Implementations should store JSON at a stable location, e.g., runs/{run_id}/meta.json.

Source code in src/mi_crow/store/store.py
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
@abc.abstractmethod
def put_run_metadata(self, run_id: str, meta: Dict[str, Any]) -> str:
    """Persist metadata for a run (e.g., dataset/model identifiers).

    Args:
        run_id: Run identifier
        meta: Metadata dictionary to save (must be JSON-serializable)

    Returns:
        String path/key where metadata was stored (e.g., "runs/{run_id}/meta.json")

    Raises:
        ValueError: If run_id is invalid or meta is not JSON-serializable
        OSError: If file system operations fail

    Note:
        Implementations should store JSON at a stable location, e.g., runs/{run_id}/meta.json.
    """
    raise NotImplementedError