Dataset Classes

The datamint.dataset module provides specialised PyTorch-compatible dataset classes for different medical imaging modalities. Import them directly:

from datamint.dataset import ImageDataset, VolumeDataset, VideoDataset

Dataset Classes Overview

Split Modes

All dataset classes inherit split(), which supports three split modes:

  • Local random splitting with ratio kwargs such as train=0.7.

  • Project-scoped split assignments resolved through api.projects.get_splits().

  • Legacy split:* resource tags, which remain available for backwards compatibility but are deprecated.

When you call split() without an explicit mode, the client chooses the mode automatically:

  • If ratio kwargs are provided, a local random split is used.

  • If no ratios are provided and the dataset was loaded from a project, project-scoped splits are used.

  • Otherwise, legacy split:* resource tags are used.

from datamint.dataset import ImageDataset

dataset = ImageDataset(project="my-project", include_unannotated=True)

# Project-backed datasets prefer project-scoped assignments.
project_parts = dataset.split()

# Persist and replay the exact historical snapshot later.
snapshot = project_parts["train"].split_as_of_timestamp
replayed_parts = dataset.split(as_of_timestamp=snapshot)

# Force an ad hoc local split instead.
local_parts = dataset.split(train=0.8, val=0.2, seed=42)

To override the automatic selection, pass use_project_splits=True or use_server_splits=True explicitly. use_server_splits is deprecated and exists only for compatibility with older tag-based workflows.

Project-scoped splits require the dataset to be loaded from a project and must not be combined with ratio kwargs. Each resolved subset records split_name, split_source, and, when applicable, split_as_of_timestamp so downstream training and MLflow lineage can reuse the same split snapshot.

Base Classes

DatamintBaseDataset - Abstract base class for all Datamint datasets.

Provides the PyTorch Dataset interface with transform support and annotation filtering, while delegating data management to DatamintProjectManager.

class datamint.dataset.base.DatamintBaseDataset(project=None, resources=None, auto_update=True, api_key=None, server_url=None, return_metainfo=True, return_segmentations=True, return_as_semantic_segmentation=False, semantic_seg_merge_strategy=None, alb_transform=None, include_unannotated=True, include_annotators=None, exclude_annotators=None, include_segmentation_names=None, exclude_segmentation_names=None, include_image_label_names=None, exclude_image_label_names=None, include_frame_label_names=None, exclude_frame_label_names=None, allow_external_annotations=False, image_labels_merge_strategy=None, image_categories_merge_strategy=None, worklists='all')

Bases: ABC, Dataset

Abstract base class for Datamint datasets.

This class provides the PyTorch Dataset interface with: - Transform hooks (albumentations) - Annotation filtering - Data loading utilities

Subclasses must implement _get_raw_item() to define how data is loaded.

Parameters:
  • project (str | Project | None) – Project name, Project object, or None. Mutually exclusive with resources.

  • resources (Sequence[Resource] | None) – List of Resource objects/IDs, or None. Mutually exclusive with project.

  • auto_update (bool) – If True, sync with server on init.

  • api_key (str | None) – API key for authentication.

  • server_url (str | None) – Datamint server URL.

  • all_annotations – If True, include unpublished annotations.

  • return_metainfo (bool) – If True, include metadata in output.

  • return_segmentations (bool) – If True, process and return segmentations.

  • return_as_semantic_segmentation (bool) – If True, convert to semantic format.

  • semantic_seg_merge_strategy (Literal['union', 'intersection', 'mode'] | None) – Strategy for merging multi-annotator segs.

  • alb_transform (Callable | BaseCompose | None) – Albumentations transform.

  • include_unannotated (bool) – If True, include resources without annotations.

  • include_annotators (list[str] | None) – Whitelist of annotators.

  • exclude_annotators (list[str] | None) – Blacklist of annotators.

  • include_segmentation_names (list[str] | None) – Whitelist of segmentation labels.

  • exclude_segmentation_names (list[str] | None) – Blacklist of segmentation labels.

  • include_image_label_names (list[str] | None) – Whitelist of image labels.

  • exclude_image_label_names (list[str] | None) – Blacklist of image labels.

  • include_frame_label_names (list[str] | None) – Whitelist of frame labels.

  • exclude_frame_label_names (list[str] | None) – Blacklist of frame labels.

  • allow_external_annotations (bool) – If True, allow and automatically include annotation labels that are not part of the project’s official schema (e.g., labels from other projects or legacy annotations). If False, these annotations will be filtered out.

  • image_labels_merge_strategy (Literal['union', 'intersection', 'mode'] | None)

  • image_categories_merge_strategy (Literal['union', 'intersection', 'mode'] | None)

  • worklists (Sequence[AnnotationWorklist] | Literal['all'] | None)

__add__(other)

Concatenate datasets.

Parameters:

other (DatamintBaseDataset)

Return type:

ConcatDataset

__getitem__(index)

Get item with full processing.

Parameters:

index (int)

Return type:

dict[str, Any]

__iter__()

Iterate over dataset.

Return type:

Iterator[dict[str, Any]]

__len__()

Dataset length.

Return type:

int

add_transform(alb_transform)
Parameters:

alb_transform (BaseCompose)

Return type:

None

abstractmethod apply_alb_transform(img, segmentations)

Apply albumentations transform to image and masks.

Return type:

dict[str, Any]

Returns:

Dict with transformed ‘image’ and ‘segmentations’ (dict).

It is recommended that ‘image’ has shape (C, depth, H, W) and each segmentation of ‘segmentations’ has shape (num_instances, depth, H, W), so that common downstream processing can be applied. If not, please override _process_segmentations() accordingly.

Parameters:
  • img (ndarray)

  • segmentations (dict[str, ndarray])

build_mlflow_dataset()

Create a DatamintMLflowDataset for this dataset.

Return type:

DatamintMLflowDataset

Returns:

An MLflow dataset wrapper for the current dataset.

filter(*, tags=None, filename_pattern=None, has_annotations=None, annotation_names=None, custom_fn=None)

Return a new dataset containing only resources that match all specified criteria.

This method is chainable — the returned dataset supports the same interface, so you can write:

filtered = dataset.filter(tags=['busi']).filter(has_annotations=True)

or combine with split():

parts = dataset.filter(tags=['ultrasound']).split(train=0.8, test=0.2)
Parameters:
  • tags (list[str] | None) – Keep resources whose tags contain any of the given values.

  • filename_pattern (str | None) – Keep resources whose filename matches this pattern (interpreted as a glob pattern, using fnmatch() internally).

  • has_annotations (bool | None) – If True, keep only resources with at least one annotation. If False, keep only those without annotations.

  • annotation_names (list[str] | None) – Keep resources that have at least one annotation whose identifier is in this list.

  • custom_fn (Callable[[Resource, Sequence[Annotation]], bool] | None) – Arbitrary predicate receiving (resource, annotations) and returning True to keep the resource.

Return type:

DatamintBaseDataset

Returns:

A new DatamintBaseDataset containing only the matching resources.

Raises:

ValueError – If no filter criteria are specified.

property frame_labels_set: list[str]

Frame-level label names.

get_collate_fn()

Get collate function for DataLoader.

Return type:

Callable[[list[dict]], dict]

get_dataloader(*args, **kwargs)

Get DataLoader with proper collate function.

Return type:

DataLoader

get_resource(index)

Get the Resource object for a given index.

Parameters:

index (int)

Return type:

Resource

property image_categories_set: list[tuple[str, str]]

Image-level classification category names/values.

property image_labels_set: list[str]

Image-level label names.

prefetch(*, include_annotations=False)

Download and cache dataset files eagerly.

Ensures that resource file bytes are present in the local cache before training begins, so __getitem__ calls are served from disk rather than triggering on-demand network requests. When include_annotations is enabled, segmentation annotation payloads are cached too so DataLoader workers do not need to fetch them from the API.

Calls _prepare() implicitly if the dataset has not been initialised yet.

Parameters:

include_annotations (bool) – Whether to also prefetch segmentation annotation payloads.

Return type:

None

project: Project | None
resource_annotations: list[Sequence[Annotation]]
resources: Sequence[Resource]
property segmentation_labels_set: list[str]

Segmentation label names.

set_transform(alb_transform=None)

Set transforms after initialization.

Parameters:

alb_transform (BaseCompose | None)

Return type:

None

split(*, seed=None, use_server_splits=None, use_project_splits=None, as_of_timestamp=None, **splits)

Split the dataset into multiple named subsets.

The mode is selected automatically when no explicit split mode is given:

  • If ratio kwargs are provided (e.g. train=0.7), local splitting is used.

  • If no ratio kwargs are provided and the dataset was loaded from a project, project-scoped split assignments are used.

  • Otherwise, server-side split:* tags on resources are used.

Examples:

# Local split
parts = dataset.split(train=0.7, val=0.15, test=0.15, seed=42)
train_ds = parts['train']

# Project-scoped split — inferred for project-backed datasets
parts = dataset.split()

# Explicit override
parts = dataset.split(use_project_splits=True)
Parameters:
  • seed (int | None) – Random seed for reproducible local splitting.

  • use_project_splits (bool | None) – If True, read split assignments from the project splits API. If None (default), project-backed datasets prefer this mode when no ratios are provided.

  • as_of_timestamp (str | None) – Historical timestamp to resolve project-scoped splits against. When omitted for project-scoped splits, the current UTC timestamp is captured and stored on the resolved split datasets for later reuse.

  • use_server_splits (bool | None) – (DEPRECATED in favor of use_project_splits)

  • **splits (float) – Named split ratios (e.g. train=0.7, test=0.3). Must sum to 1.0 (±0.01 tolerance). Must be empty when use_server_splits or use_project_splits is True.

Return type:

dict[str, DatamintBaseDataset]

Returns:

Dictionary mapping split names to new dataset instances.

Raises:

ValueError – If ratios are invalid or arguments conflict.

subset(indices)

Create a dataset subset by slicing resources and annotations.

Parameters:

indices (list[int])

Return type:

DatamintBaseDataset

exception datamint.dataset.base.DatamintDatasetException

Bases: DatamintException

Exception raised for dataset errors.

MultiFrameDataset - Abstract base for datasets with multiple frames per resource.

Shared logic for VolumeDataset (3D medical volumes) and VideoDataset (temporal video sequences). Both handle data with shape (C, N, H, W) where N is the number of frames/slices.

class datamint.dataset.multiframe_dataset.MultiFrameDataset(project=None, resources=None, auto_update=True, api_key=None, server_url=None, return_metainfo=True, return_segmentations=True, return_as_semantic_segmentation=False, semantic_seg_merge_strategy=None, alb_transform=None, include_unannotated=True, include_annotators=None, exclude_annotators=None, include_segmentation_names=None, exclude_segmentation_names=None, include_image_label_names=None, exclude_image_label_names=None, include_frame_label_names=None, exclude_frame_label_names=None, allow_external_annotations=False, image_labels_merge_strategy=None, image_categories_merge_strategy=None, worklists='all')

Bases: DatamintBaseDataset

Abstract base for multi-frame datasets.

Handles loading and augmenting data with shape (C, N, H, W) where N is the number of frames (temporal for video) or slices (spatial for volumes).

Subclasses add modality-specific features: - VolumeDataset: anatomical slicing via .slice() - VideoDataset: frame-by-frame iteration via .frame_by_frame()

Parameters:
  • project (str | Project | None)

  • resources (Sequence[Resource] | None)

  • auto_update (bool)

  • api_key (str | None)

  • server_url (str | None)

  • return_metainfo (bool)

  • return_segmentations (bool)

  • return_as_semantic_segmentation (bool)

  • semantic_seg_merge_strategy (Literal['union', 'intersection', 'mode'] | None)

  • alb_transform (Callable | BaseCompose | None)

  • include_unannotated (bool)

  • include_annotators (list[str] | None)

  • exclude_annotators (list[str] | None)

  • include_segmentation_names (list[str] | None)

  • exclude_segmentation_names (list[str] | None)

  • include_image_label_names (list[str] | None)

  • exclude_image_label_names (list[str] | None)

  • include_frame_label_names (list[str] | None)

  • exclude_frame_label_names (list[str] | None)

  • allow_external_annotations (bool)

  • image_labels_merge_strategy (Literal['union', 'intersection', 'mode'] | None)

  • image_categories_merge_strategy (Literal['union', 'intersection', 'mode'] | None)

  • worklists (Sequence[AnnotationWorklist] | Literal['all'] | None)

apply_alb_transform(img, segmentations)

Apply albumentations transform to 4D image and masks.

Parameters:
  • img (ndarray) – Image array of shape (C, depth, H, W).

  • segmentations (dict[str, ndarray]) – Dict of author -> mask arrays of shape (#instances, depth, H, W).

Return type:

dict[str, Any]

Returns:

Dict with transformed 'image' and 'segmentations'.

Specialised Datasets

ImageDataset

ImageDataset - Dataset for 2D images.

Handles standard 2D medical images like X-rays, pathology patches, single-frame DICOM, PNG, JPEG, etc.

class datamint.dataset.image_dataset.ImageDataset(project=None, resources=None, auto_update=True, api_key=None, server_url=None, return_metainfo=True, return_segmentations=True, return_as_semantic_segmentation=False, semantic_seg_merge_strategy=None, alb_transform=None, include_unannotated=True, include_annotators=None, exclude_annotators=None, include_segmentation_names=None, exclude_segmentation_names=None, include_image_label_names=None, exclude_image_label_names=None, include_frame_label_names=None, exclude_frame_label_names=None, allow_external_annotations=False, image_labels_merge_strategy=None, image_categories_merge_strategy=None, worklists='all')

Bases: VolumeDataset

Dataset for 2D medical images.

Parameters:
  • project (str | Project | None)

  • resources (Sequence[Resource] | None)

  • auto_update (bool)

  • api_key (str | None)

  • server_url (str | None)

  • return_metainfo (bool)

  • return_segmentations (bool)

  • return_as_semantic_segmentation (bool)

  • semantic_seg_merge_strategy (Literal['union', 'intersection', 'mode'] | None)

  • alb_transform (Callable | BaseCompose | None)

  • include_unannotated (bool)

  • include_annotators (list[str] | None)

  • exclude_annotators (list[str] | None)

  • include_segmentation_names (list[str] | None)

  • exclude_segmentation_names (list[str] | None)

  • include_image_label_names (list[str] | None)

  • exclude_image_label_names (list[str] | None)

  • include_frame_label_names (list[str] | None)

  • exclude_frame_label_names (list[str] | None)

  • allow_external_annotations (bool)

  • image_labels_merge_strategy (Literal['union', 'intersection', 'mode'] | None)

  • image_categories_merge_strategy (Literal['union', 'intersection', 'mode'] | None)

  • worklists (Sequence[AnnotationWorklist] | Literal['all'] | None)

apply_alb_transform(img, segmentations)

Apply albumentations transform to 4D image and masks.

Parameters:
  • img (ndarray) – Image array of shape (C, depth, H, W).

  • segmentations (dict[str, ndarray]) – Dict of author -> mask arrays of shape (#instances, depth, H, W).

Return type:

dict[str, Any]

Returns:

Dict with transformed 'image' and 'segmentations'.

VolumeDataset

VolumeDataset - Dataset for 3D medical volumes.

Handles NIfTI volumes, DICOM series, and other 3D medical imaging data with support for different slice orientations and affine preservation.

class datamint.dataset.volume_dataset.VolumeDataset(project=None, resources=None, auto_update=True, api_key=None, server_url=None, return_metainfo=True, return_segmentations=True, return_as_semantic_segmentation=False, semantic_seg_merge_strategy=None, alb_transform=None, include_unannotated=True, include_annotators=None, exclude_annotators=None, include_segmentation_names=None, exclude_segmentation_names=None, include_image_label_names=None, exclude_image_label_names=None, include_frame_label_names=None, exclude_frame_label_names=None, allow_external_annotations=False, image_labels_merge_strategy=None, image_categories_merge_strategy=None, worklists='all')

Bases: MultiFrameDataset

Dataset for 3D medical volumes.

Handles NIfTI (3D/4D), DICOM series, and other volumetric data. Inherits multi-frame loading and augmentation from MultiFrameDataset.

Parameters:
  • project (str | Project | None)

  • resources (Sequence[Resource] | None)

  • auto_update (bool)

  • api_key (str | None)

  • server_url (str | None)

  • return_metainfo (bool)

  • return_segmentations (bool)

  • return_as_semantic_segmentation (bool)

  • semantic_seg_merge_strategy (Literal['union', 'intersection', 'mode'] | None)

  • alb_transform (Callable | BaseCompose | None)

  • include_unannotated (bool)

  • include_annotators (list[str] | None)

  • exclude_annotators (list[str] | None)

  • include_segmentation_names (list[str] | None)

  • exclude_segmentation_names (list[str] | None)

  • include_image_label_names (list[str] | None)

  • exclude_image_label_names (list[str] | None)

  • include_frame_label_names (list[str] | None)

  • exclude_frame_label_names (list[str] | None)

  • allow_external_annotations (bool)

  • image_labels_merge_strategy (Literal['union', 'intersection', 'mode'] | None)

  • image_categories_merge_strategy (Literal['union', 'intersection', 'mode'] | None)

  • worklists (Sequence[AnnotationWorklist] | Literal['all'] | None)

slice(axis='axial')

Create a 2D dataset by slicing this volume along an axis.

Each 3D volume is expanded into multiple 2D slices, one per depth index along the given axis. The returned dataset yields 2D items with shape (C, H, W) instead of (C, D, H, W).

Parsed volumes are cached to disk as gzip-compressed .npy.gz files. A shared in-memory LRU cache also keeps recently used full volumes to avoid repeated decompression when iterating neighboring slices.

Parameters:

axis (str | int) – Slice orientation. One of 'axial' (depth), 'coronal' (height), 'sagittal' (width), or an integer axis index (0–2).

Return type:

SlicedVolumeDataset

Returns:

A SlicedVolumeDataset that iterates over individual 2D slices.

Example:

vol_ds = VolumeDataset(project='my_ct_project')
sliced = vol_ds.slice(axis='axial')
print(len(sliced))  # total number of axial slices across all volumes
item = sliced[0]
print(item['image'].shape)  # (C, H, W)

VideoDataset

VideoDataset - Dataset for video medical data.

Handles video files (MP4, AVI, etc.) and multi-frame DICOM data from modalities like ultrasound (US), angiography (XA), and fluoroscopy (RF).

class datamint.dataset.video_dataset.VideoDataset(project=None, resources=None, auto_update=True, api_key=None, server_url=None, return_metainfo=True, return_segmentations=True, return_as_semantic_segmentation=False, semantic_seg_merge_strategy=None, alb_transform=None, include_unannotated=True, include_annotators=None, exclude_annotators=None, include_segmentation_names=None, exclude_segmentation_names=None, include_image_label_names=None, exclude_image_label_names=None, include_frame_label_names=None, exclude_frame_label_names=None, allow_external_annotations=False, image_labels_merge_strategy=None, image_categories_merge_strategy=None, worklists='all')

Bases: MultiFrameDataset

Dataset for video medical data.

Each item is a full video with shape (C, N, H, W) where N is the number of frames. Inherits multi-frame loading and augmentation from MultiFrameDataset.

Supports video files (MP4, AVI, MOV) and multi-frame DICOM from temporal modalities (ultrasound, angiography, fluoroscopy).

Example:

ds = VideoDataset(project='my_ultrasound_project')
item = ds[0]
print(item['image'].shape)  # (C, N, H, W)

# Iterate frame-by-frame
frame_ds = ds.frame_by_frame()
print(frame_ds[0]['image'].shape)  # (C, H, W)
Parameters:
  • project (str | Project | None)

  • resources (Sequence[Resource] | None)

  • auto_update (bool)

  • api_key (str | None)

  • server_url (str | None)

  • return_metainfo (bool)

  • return_segmentations (bool)

  • return_as_semantic_segmentation (bool)

  • semantic_seg_merge_strategy (Literal['union', 'intersection', 'mode'] | None)

  • alb_transform (Callable | BaseCompose | None)

  • include_unannotated (bool)

  • include_annotators (list[str] | None)

  • exclude_annotators (list[str] | None)

  • include_segmentation_names (list[str] | None)

  • exclude_segmentation_names (list[str] | None)

  • include_image_label_names (list[str] | None)

  • exclude_image_label_names (list[str] | None)

  • include_frame_label_names (list[str] | None)

  • exclude_frame_label_names (list[str] | None)

  • allow_external_annotations (bool)

  • image_labels_merge_strategy (Literal['union', 'intersection', 'mode'] | None)

  • image_categories_merge_strategy (Literal['union', 'intersection', 'mode'] | None)

  • worklists (Sequence[AnnotationWorklist] | Literal['all'] | None)

frame_by_frame()

Create a 2D dataset iterating over individual video frames.

Each video is expanded into N individual frames. The returned dataset yields 2D items with shape (C, H, W) instead of (C, N, H, W).

Parsed frames are cached to disk as gzip-compressed .npy.gz files.

Return type:

SlicedVideoDataset

Returns:

A SlicedVideoDataset that iterates over individual frames.

Example:

vid_ds = VideoDataset(project='my_ultrasound_project')
frame_ds = vid_ds.frame_by_frame()
print(len(frame_ds))  # total number of frames across all videos
item = frame_ds[0]
print(item['image'].shape)  # (C, H, W)

Sliced Datasets

SlicedVolumeDataset

SlicedVolumeDataset - 2D dataset created by slicing a VolumeDataset along an axis.

Provides a way to iterate over individual 2D slices from 3D volume data, enabling training of 2D models on volumetric medical imaging data.

class datamint.dataset.sliced_dataset.SlicedVolumeDataset(*, slice_axis='axial', parent_dataset=None, **kwargs)

Bases: DatamintBaseDataset

2D dataset created by slicing a VolumeDataset along an axis.

Each item corresponds to a single 2D slice from a 3D volume. The __getitem__ returns arrays with shape (C, H, W) for images and (num_instances, H, W) or (num_labels+1, H, W) for segmentations.

Can be instantiated directly with all the same parameters as DatamintBaseDataset plus slice_axis, or created from an already-loaded dataset via the from_dataset() factory classmethod (which avoids additional server calls).

Parameters:
  • project – Project name, Project object, or None. Mutually exclusive with resources.

  • resources – List of Resource objects, or None. Mutually exclusive with project.

  • slice_axis (Literal['axial', 'sagittal', 'coronal'] | int) – Slice orientation. One of 'axial' (depth), 'coronal' (height), 'sagittal' (width), or an integer axis index (0–2).

:param See DatamintBaseDataset for all remaining parameters.:

Parameters:

parent_dataset (DatamintBaseDataset | None)

__getitem__(index)

Get a 2D slice item with full processing.

Returns dict with: - ‘image’: np.ndarray or Tensor of shape (C, H, W). - ‘segmentations’ (if enabled): segmentation masks with depth dimension removed. - ‘image_labels’: dict of annotator -> label tensor.

Parameters:

index (int)

Return type:

dict[str, Any]

apply_alb_transform(img, segmentations)

Apply 2D albumentations transform to a single-slice image and masks.

Uses the same approach as ImageDataset: treats the data as 2D.

Parameters:
  • img (ndarray) – Image array of shape (C, 1, H, W) or (C, H, W).

  • segmentations (dict[str, ndarray]) – Dict of author -> mask arrays of shape (#instances, 1, H, W) or (#instances, H, W).

Return type:

dict[str, Any]

Returns:

Dict with transformed ‘image’ and ‘segmentations’.

classmethod from_dataset(parent_dataset, slice_axis='axial')

Create a SlicedVolumeDataset from an existing dataset without additional server calls.

Copies all configuration, label mappings, and already-loaded resources from parent_dataset, then expands them into per-slice proxy resources. Use this factory when you already have a loaded dataset and want to obtain 2D slices without triggering new API requests.

Parameters:
  • parent_dataset (DatamintBaseDataset) – The source DatamintBaseDataset (e.g. VolumeDataset) providing resources, annotations, and configuration.

  • slice_axis (Literal['axial', 'sagittal', 'coronal'] | int) – Slice orientation. One of 'axial' (depth), 'coronal' (height), 'sagittal' (width), or an integer axis index (0–2).

Return type:

SlicedVolumeDataset

Returns:

A new SlicedVolumeDataset instance.

SlicedVideoDataset

SlicedVideoDataset - 2D dataset created by iterating over frames of a VideoDataset.

Provides a way to iterate over individual 2D frames from video data, enabling training of 2D models on temporal medical imaging data.

class datamint.dataset.sliced_video_dataset.SlicedVideoDataset(*args, **kwargs)

Bases: DatamintBaseDataset

2D dataset created by iterating over frames of a video.

Each item corresponds to a single frame from a video. The __getitem__ returns arrays with shape (C, H, W) for images and (num_instances, H, W) or (num_labels+1, H, W) for segmentations.

Can be instantiated directly with all the same parameters as DatamintBaseDataset, or created from an already-loaded dataset via the from_dataset() factory classmethod (which avoids additional server calls).

__getitem__(index)

Get a single frame item with full processing.

Returns dict with: - 'image': np.ndarray or Tensor of shape (C, H, W). - 'segmentations' (if enabled): segmentation masks of shape (num_instances, H, W) or (num_labels+1, H, W). - 'image_labels': dict of annotator -> label tensor.

Parameters:

index (int)

Return type:

dict[str, Any]

apply_alb_transform(img, segmentations)

Apply 2D albumentations transform to a single frame and masks.

Parameters:
  • img (ndarray) – Image array of shape (C, H, W).

  • segmentations (dict[str, ndarray]) – Dict of author -> mask arrays of shape (#instances, 1, H, W) or (#instances, H, W).

Return type:

dict[str, Any]

Returns:

Dict with transformed 'image' and 'segmentations'.

classmethod from_dataset(parent_dataset)

Create a SlicedVideoDataset from an existing dataset without additional server calls.

Copies all configuration, label mappings, and already-loaded resources from parent_dataset, then expands them into per-frame proxy resources.

Parameters:

parent_dataset (DatamintBaseDataset) – The source dataset (e.g. VideoDataset).

Return type:

SlicedVideoDataset

Returns:

A new SlicedVideoDataset instance.

Annotation Processing

class datamint.dataset.annotation.Annotation(id, identifier, scope, annotation_type, resource_id, created_by, annotation_worklist_id=None, status=None, frame_index=None, text_value=None, numeric_value=None, units=None, geometry=<factory>, created_at=None, approved_at=None, approved_by=None, associated_file=None, file=None, deleted=False, deleted_at=None, deleted_by=None, created_by_model=None, old_geometry=None, set_name=None, resource_filename=None, resource_modality=None, annotation_worklist_name=None, user_info=None, values=None)

Class representing an annotation from the Datamint API.

This class stores annotation data and provides methods for loading and saving annotations through the API handler.

Parameters:
  • id (str) – Unique identifier for the annotation

  • identifier (str) – The annotation identifier/label name

  • scope (str) – Whether annotation applies to ‘frame’ or ‘image’

  • annotation_type (str) – Type of annotation (‘segmentation’, ‘label’, ‘category’, etc.)

  • resource_id (str) – ID of the resource this annotation belongs to

  • annotation_worklist_id (str | None) – ID of the annotation worklist

  • created_by (str) – Email of the user who created the annotation

  • status (str | None) – Status of the annotation (‘published’, ‘new’, etc.)

  • frame_index (int | None) – Frame index for frame-scoped annotations

  • text_value (str | None) – Text value for category annotations

  • numeric_value (float | None) – Numeric value for numeric annotations

  • units (str | None) – Units for numeric annotations

  • geometry (list[Any]) – Geometry data for geometric annotations

  • created_at (str | None) – When the annotation was created

  • approved_at (str | None) – When the annotation was approved

  • approved_by (str | None) – Who approved the annotation

  • associated_file (str | None) – Path to associated file (for segmentations)

  • deleted (bool) – Whether the annotation is deleted

  • deleted_at (str | None) – When the annotation was deleted

  • deleted_by (str | None) – Who deleted the annotation

  • created_by_model (str | None) – Model ID if created by AI

  • old_geometry (Any | None) – Previous geometry data

  • set_name (str | None) – Set name for grouped annotations

  • resource_filename (str | None) – Filename of the associated resource

  • resource_modality (str | None) – Modality of the associated resource

  • annotation_worklist_name (str | None) – Name of the annotation worklist

  • user_info (dict[str, str] | None) – Information about the user who created the annotation

  • values (Any | None) – Additional values

  • file (str | None)

__repr__()

String representation of the annotation.

Return type:

str

property added_by: str

Get the creator email (alias for created_by).

annotation_type: str
annotation_worklist_id: str | None = None
annotation_worklist_name: str | None = None
approved_at: str | None = None
approved_by: str | None = None
associated_file: str | None = None
created_at: str | None = None
created_by: str
created_by_model: str | None = None
deleted: bool = False
deleted_at: str | None = None
deleted_by: str | None = None
file: str | None = None
frame_index: int | None = None
classmethod from_dict(data)

Create an Annotation instance from a dictionary.

Parameters:

data (dict[str, Any]) – Dictionary containing annotation data from API

Return type:

Annotation

Returns:

Annotation instance

geometry: list[Any]
get_created_datetime()

Get the creation datetime as a datetime object.

Return type:

datetime | None

Returns:

datetime object or None if created_at is not set

id: str
identifier: str
property index: int | None

Get the frame index (alias for frame_index).

is_category()

Check if this is a category annotation.

Return type:

bool

is_frame_scoped()

Check if this annotation is frame-scoped.

Return type:

bool

is_image_scoped()

Check if this annotation is image-scoped.

Return type:

bool

is_label()

Check if this is a label annotation.

Return type:

bool

is_segmentation()

Check if this is a segmentation annotation.

Return type:

bool

property name: str

Get the annotation name (alias for identifier).

numeric_value: float | None = None
old_geometry: Any | None = None
resource_filename: str | None = None
resource_id: str
resource_modality: str | None = None
scope: str
set_name: str | None = None
status: str | None = None
text_value: str | None = None
to_dict()

Convert the annotation to a dictionary format.

Return type:

dict[str, Any]

Returns:

Dictionary representation of the annotation

property type: str

Get the annotation type.

units: str | None = None
user_info: dict[str, str] | None = None
property value: str | None

Get the annotation value (for category annotations).

values: Any | None = None

AnnotationProcessor - Handles segmentation and label processing.

This module provides annotation processing classes for different dataset types: - BaseAnnotationProcessor: Generic processor with shared logic for all dataset types - ImageAnnotationProcessor: Processor for simple 2D images (no frame/slot concept) - SequenceAnnotationProcessor: Extended processor for multi-frame/multi-slot data (videos, volumes)

The class hierarchy ensures that the base class contains only generic logic that works for any dataset type, while specialized logic is in subclasses.

class datamint.dataset.annotation_processor.AnnotationProcessor(seglabel2code, image_labels_set, image_lcodes, allow_external_annotations=False)

Base processor for annotations - contains only generic shared logic.

This class provides generic annotation processing that works for any dataset type: - Loading segmentation data from annotations (raw load, no frame handling) - Generic merging strategies for semantic segmentations - Label name conversion utilities - Annotation filtering utilities

Subclasses (ImageAnnotationProcessor, SequenceAnnotationProcessor) handle dataset-specific logic like frame/slot assignment and dimension handling.

Parameters:
  • seglabel2code (dict[str, int]) – Mapping from label name to code.

  • image_labels_set (list[str]) – List of image-level label names.

  • image_lcodes (dict[str, dict[str, int]]) – Mapping for image labels.

  • allow_external_annotations (bool)

apply_merge_strategy(segmentations, strategy)
Overloads:
  • self, segmentations (dict[str, Tensor]), strategy (MergeStrategy) → Tensor

  • self, segmentations (dict[str, np.ndarray]), strategy (MergeStrategy) → np.ndarray

Merge semantic segmentations from multiple annotators.

Parameters:
  • segmentations (dict[str, Tensor] | dict[str, ndarray]) – Dict of author -> semantic segmentation tensor.

  • strategy (Literal['union', 'intersection', 'mode']) – Merge strategy (‘union’, ‘intersection’, ‘mode’).

Returns:

Merged tensor if strategy is specified, otherwise original dict.

Return type:

Tensor | ndarray

collate_frame_segmentations(fr_anns, depth=None)
Parameters:
  • fr_anns (Sequence[Annotation])

  • depth (int | None)

Return type:

tuple[ndarray | None, int]

convert_image_categories(annotations)

Convert image-level category annotations to class index tensors.

For multiclass classification, we expect exclusively one valid category per image (per user), representing the target class index in CrossEntropyLoss. If multiple categories exist, the first one encountered is used.

Parameters:

annotations (Sequence[Annotation]) – List of category annotations (image-scoped).

Return type:

dict[str, Tensor]

Returns:

Dict of annotator_id -> 0-D long tensor containing the class index.

convert_image_labels(annotations)

Convert image-level label annotations to one-hot tensors.

Parameters:

annotations (Sequence[Annotation]) – List of label annotations (image-scoped).

Return type:

dict[str, Tensor]

Returns:

Dict of annotator_id -> one-hot tensor of shape (num_labels,).

static filter_annotations(annotations, type='all', scope='all')

Filter annotations by type and scope.

Parameters:
  • annotations (Sequence[Annotation]) – List of annotations.

  • type (Literal['label', 'category', 'segmentation', 'all']) – Filter by annotation type.

  • scope (Literal['frame', 'image', 'all']) – Filter by scope (frame/image).

Return type:

list[Annotation]

Returns:

Filtered list of annotations.

static get_author(ann)

Return a consistent author key for an annotation.

Prefers created_by, falls back to created_by_model, then "unknown".

Parameters:

ann (Annotation)

Return type:

str

group_annotations(annotations, by_author=False, by_identifier=False)

Group annotations by author and/or identifier.

Parameters:
  • annotations (Iterable[Annotation]) – Iterable of Annotation objects.

  • by_author (bool) – If True, group by author.

  • by_identifier (bool) – If True, group by identifier.

Return type:

dict[tuple, list[Annotation]]

Returns:

Dict mapping grouping keys to lists of annotations.

instance_to_semantic_segmentation(segmentations, seg_labels, num_labels)
Overloads:
  • self, segmentations (None), seg_labels (Tensor | np.ndarray), num_labels (int) → None

  • self, segmentations (Tensor), seg_labels (Tensor), num_labels (int) → Tensor

  • self, segmentations (np.ndarray), seg_labels (np.ndarray), num_labels (int) → np.ndarray

Convert instance segmentation to semantic segmentation for a sequence.

Parameters:
  • segmentations (Tensor | ndarray | None) – Tensor/array of shape (num_instances, depth, H, W).

  • seg_labels (Tensor | ndarray) – Tensor/array of shape (num_instances,).

  • num_labels (int)

Returns:

Tensor/array of shape (num_labels+1, depth, H, W); If segmentations is None: None; If segmentations is a Tensor/array: Tensor/array of shape (num_labels+1, depth, H, W).

Return type:

If segmentations is a Sequence

load_frame_segmentations(annotations)

Load frame-level segmentations

Parameters:

annotations (Iterable[Annotation]) – Iterable of Annotation objects (segmentation type).

Returns:

  • segmentations: dict[author -> list of np.ndarray of shape (#num_instances, #frames, H, W)]

  • seg_labels: dict[author -> list of int codes]

  • seg_anns: dict[author -> list of list of Annotation objects]

Return type:

tuple[dict[str, list], dict[str, list], dict[str, list]]

load_image_segmentations(annotations)

Load segmentations defined at image scope. :type annotations: Iterable[Annotation] :param annotations: Iterable of Annotation objects (segmentation type).

Returns:

  • segmentations: dict[author -> list of mask arrays of shape (#slices, H, W)]

  • seg_labels: dict[author -> list of int codes]

  • seg_anns: dict[author -> list of Annotation objects]

Return type:

tuple[dict[str, list], dict[str, list], dict[str, list]]

load_segmentation_data(ann, auto_convert_gray=True)

Load segmentation data from an annotation.

Parameters:
  • ann (Annotation) – The annotation to load data from.

  • auto_convert_gray (bool) – If True, convert multi-channel grayscale to single channel.

Returns:

Binary segmentation array with shape (N, H, W).

For image-level: N=#frames or #slices or depth For frame-level: N=1

Return type:

ndarray

load_segmentations(annotations)

Load segmentations for multi-slot data (videos, volumes).

Parameters:

annotations (Iterable[Annotation]) – Iterable of Annotation objects (segmentation type).

Returns:

  • segmentations: dict[author -> np.ndarray of shape (#num_instances, depth or #slices or #frames, H, W)]

  • seg_labels: dict[author -> np.ndarray of #num_instances ints]

  • seg_metainfos: dict[author -> list of Annotation objects]

Return type:

tuple[dict[str, ndarray], dict[str, ndarray], dict[str, list]]

static merge_image_categories(categories_by_user, strategy, num_categories)

Merge per-annotator category tensors into a single tensor.

For 'mode', returns a scalar long tensor with the majority class index (-1 if empty). For 'union' and 'intersection', returns a multi-hot int tensor of shape (num_categories,).

Parameters:
  • categories_by_user (dict[str, Tensor]) – Dict of annotator_id -> scalar long tensor (class index).

  • strategy (Literal['union', 'intersection', 'mode']) – One of ‘union’, ‘intersection’, or ‘mode’.

  • num_categories (int) – Total number of (identifier, value) category classes.

Return type:

Tensor

Returns:

Scalar long tensor for ‘mode’; multi-hot int tensor for ‘union’/’intersection’.

merge_image_labels(labels_by_user, strategy)

Merge per-annotator label tensors into a single binary tensor.

Parameters:
  • labels_by_user (dict[str, Tensor]) – Dict of annotator_id -> binary label tensor of shape (num_labels,).

  • strategy (Literal['union', 'intersection', 'mode']) – One of ‘union’, ‘intersection’, or ‘mode’.

Return type:

Tensor

Returns:

Merged label tensor of shape (num_labels,), dtype int32.

resolve_seg_code(identifier)

Resolve a segmentation label name to its integer code.

If the label is unknown and allow_external_annotations is True, a new code is assigned and stored in seglabel2code. Otherwise raises ValueError.

Parameters:

identifier (str) – Segmentation label name.

Return type:

int

Returns:

Integer code for the label.

Legacy Classes (Deprecated)

Deprecated since version The: classes below are kept for backwards compatibility and may be removed in a future release. Use ImageDataset or VolumeDataset instead.

class datamint.dataset.dataset.DatamintDataset(project_name, root=None, auto_update=True, api_key=None, server_url=None, return_dicom=False, return_metainfo=True, return_frame_by_frame=False, return_annotations=True, return_segmentations=True, return_as_semantic_segmentation=False, image_transform=None, mask_transform=None, alb_transform=None, semantic_seg_merge_strategy=None, include_unannotated=True, include_annotators=None, exclude_annotators=None, include_segmentation_names=None, exclude_segmentation_names=None, include_image_label_names=None, exclude_image_label_names=None, include_frame_label_names=None, exclude_frame_label_names=None, all_annotations=False)

Bases: DatamintBaseDataset

This Dataset class extends the DatamintBaseDataset class to be easily used with PyTorch. In addition to that, it has functionality to better process annotations and segmentations.

Note

Import using from datamint import Dataset.

Parameters:
  • root (str | None) – Root directory of dataset where data already exists or will be downloaded.

  • project_name (str) – Name of the project to download.

  • auto_update (bool) – If True, the dataset will be checked for updates and downloaded if necessary.

  • api_key (str | None) – API key to access the Datamint API. If not provided, it will look for the environment variable ‘DATAMINT_API_KEY’. Not necessary if you don’t want to download/update the dataset.

  • return_dicom (bool) – If True, the DICOM object will be returned, if the image is a DICOM file.

  • return_metainfo (bool) – If True, the metainfo of the image will be returned.

  • return_annotations (bool) – If True, the annotations of the image will be returned.

  • return_frame_by_frame (bool) – If True, each frame of a video/DICOM/3d-image will be returned separately.

  • include_unannotated (bool) – If True, images without annotations will be included. If False, images without annotations will be discarded.

  • all_annotations (bool) – If True, all annotations will be downloaded, including the ones that are not set as closed/done.

  • server_url (str | None) – URL of the Datamint server. If not provided, it will use the default server.

  • return_segmentations (bool) – If True (default), the segmentations of the image will be returned in the ‘segmentations’ key.

  • return_as_semantic_segmentation (bool) – If True, the segmentations will be returned as semantic segmentation.

  • image_transform (Callable[[Tensor], Any] | None) – A function to transform the image.

  • mask_transform (Callable[[Tensor], Any] | None) – A function to transform the mask.

  • semantic_seg_merge_strategy (Literal['union', 'intersection', 'mode'] | None) – If not None, the segmentations will be merged using this strategy. Possible values are ‘union’, ‘intersection’, ‘mode’.

  • include_annotators (list[str] | None) – List of annotators to include. If None, all annotators will be included. See parameter exclude_annotators.

  • exclude_annotators (list[str] | None) – List of annotators to exclude. If None, no annotators will be excluded. See parameter include_annotators.

  • include_segmentation_names (list[str] | None) – List of segmentation names to include. If None, all segmentations will be included.

  • exclude_segmentation_names (list[str] | None) – List of segmentation names to exclude. If None, no segmentations will be excluded.

  • include_image_label_names (list[str] | None) – List of image label names to include. If None, all image labels will be included.

  • exclude_image_label_names (list[str] | None) – List of image label names to exclude. If None, no image labels will be excluded.

  • include_frame_label_names (list[str] | None) – List of frame label names to include. If None, all frame labels will be included.

  • exclude_frame_label_names (list[str] | None) – List of frame label names to exclude. If None, no frame labels will be excluded.

  • all_annotations – If True, all annotations will be downloaded, including the ones that are not set as closed/done.

  • alb_transform (BasicTransform | None)

__getitem__(index)

Get the item at the given index.

Parameters:

index (int) – Index of the item to return.

Returns:

A dictionary with the following keys:

  • ’image’ (Tensor): Tensor of shape (C, H, W) or (N, C, H, W), depending on self.return_frame_by_frame. If self.return_as_semantic_segmentation is True, the image is a tensor of shape (N, L, H, W) or (L, H, W), where L is the number of segmentation labels + 1 (background): L=len(self.segmentation_labels_set)+1.

  • ’metainfo’ (dict): Dictionary with metadata information.

  • ’segmentations’ (dict[str, list[Tensor]] or dict[str,Tensor] or Tensor): Segmentation masks, depending on the configuration of parameters self.return_segmentations, self.return_as_semantic_segmentation, self.return_frame_by_frame, self.semantic_seg_merge_strategy.

  • ’seg_labels’ (dict[str, list[Tensor]] or Tensor): Segmentation labels with the same length as segmentations.

  • ’frame_labels’ (dict[str, Tensor]): Frame-level labels.

  • ’image_labels’ (dict[str, Tensor]): Image-level labels.

Return type:

dict[str, Any]

apply_semantic_seg_merge_strategy(segmentations, nframes, h, w)
Parameters:
  • segmentations (dict[str, Tensor])

  • nframes (int)

Return type:

Tensor | dict[str, Tensor]

class datamint.dataset.base_dataset.DatamintBaseDataset(project_name, root=None, auto_update=True, api_key=None, server_url=None, return_dicom=False, return_metainfo=True, return_annotations=True, return_frame_by_frame=False, include_unannotated=True, all_annotations=False, include_annotators=None, exclude_annotators=None, include_segmentation_names=None, exclude_segmentation_names=None, include_image_label_names=None, exclude_image_label_names=None, include_frame_label_names=None, exclude_frame_label_names=None)

Class to download and load datasets from the Datamint API.

Parameters:
  • project_name (str) – Name of the project to download.

  • root (str | None) – Root directory of dataset where data already exists or will be downloaded.

  • auto_update (bool) – If True, the dataset will be checked for updates and downloaded if necessary.

  • api_key (str | None) – API key to access the Datamint API. If not provided, it will look for the environment variable ‘DATAMINT_API_KEY’. Not necessary if you don’t want to download/update the dataset.

  • return_dicom (bool) – If True, the DICOM object will be returned, if the image is a DICOM file.

  • return_metainfo (bool) – If True, the metainfo of the image will be returned.

  • return_annotations (bool) – If True, the annotations of the image will be returned.

  • return_frame_by_frame (bool) – If True, each frame of a video/DICOM/3d-image will be returned separately.

  • include_unannotated (bool) – If True, images without annotations will be included.

  • all_annotations (bool) – If True, all annotations will be downloaded, including the ones that are not set as closed/done.

  • server_url (str | None) – URL of the Datamint server. If not provided, it will use the default server.

  • include_annotators (list[str] | None) – List of annotators to include. If None, all annotators will be included.

  • exclude_annotators (list[str] | None) – List of annotators to exclude. If None, no annotators will be excluded.

  • include_segmentation_names (list[str] | None) – List of segmentation names to include. If None, all segmentations will be included.

  • exclude_segmentation_names (list[str] | None) – List of segmentation names to exclude. If None, no segmentations will be excluded.

  • include_image_label_names (list[str] | None) – List of image label names to include. If None, all image labels will be included.

  • exclude_image_label_names (list[str] | None) – List of image label names to exclude. If None, no image labels will be excluded.

  • include_frame_label_names (list[str] | None) – List of frame label names to include. If None, all frame labels will be included.

  • exclude_frame_label_names (list[str] | None) – List of frame label names to exclude. If None, no frame labels will be excluded.

DATAMINT_DATASETS_DIR = 'datasets'
__add__(other)

Concatenate datasets.

__getitem__(index)

Get item at index.

Parameters:

index (int) – Index

Return type:

dict[str, Tensor | FileDataset | dict | list]

Returns:

A dictionary containing ‘image’, ‘metainfo’ and ‘annotations’ keys.

__iter__()

Iterate over dataset items.

__len__()

Return dataset length.

Return type:

int

__repr__()

String representation of the dataset.

Return type:

str

property frame_categories_set: list[tuple[str, str]]

Returns the set of categories in the dataset (multi-class tasks).

property frame_labels_set: list[str]

Returns the set of independent labels in the dataset (multi-label tasks).

get_annotations(index, type='all', scope='all')

Returns the annotations of the image at the given index.

Parameters:
  • index (int) – Index of the image.

  • type (Literal['label', 'category', 'segmentation', 'all']) – The type of the annotations. Can be ‘label’, ‘category’, ‘segmentation’ or ‘all’.

  • scope (Literal['frame', 'image', 'all']) – The scope of the annotations. Can be ‘frame’, ‘image’ or ‘all’.

Return type:

list[Annotation]

Returns:

The annotations of the image.

get_collate_fn()

Get collate function for DataLoader.

Return type:

Callable

get_dataloader(*args, **kwargs)

Returns a DataLoader for the dataset with proper collate function.

Parameters:
  • *args – Positional arguments for the DataLoader.

  • **kwargs – Keyword arguments for the DataLoader.

Return type:

DataLoader

Returns:

DataLoader instance with custom collate function.

get_framelabel_distribution(normalize=False)

Returns the distribution of frame labels in the dataset.

Parameters:

normalize (bool)

Return type:

dict[str, float]

get_info()

Get project information from API.

Return type:

dict

get_resources_ids()

Get list of resource IDs.

Return type:

list[str]

get_segmentationlabel_distribution(normalize=False)

Returns the distribution of segmentation labels in the dataset.

Parameters:

normalize (bool)

Return type:

dict[str, float]

property image_categories_set: list[tuple[str, str]]

Returns the set of categories in the dataset (multi-class tasks).

property image_labels_set: list[str]

Returns the set of independent labels in the dataset (multi-label tasks).

static read_number_of_frames(filepath)

Read the number of frames in a file.

Parameters:

filepath (str)

Return type:

int

property segmentation_labels_set: list[str]

Returns the set of segmentation labels in the dataset.

subset(indices)

Create a subset of the dataset.

Parameters:

indices (list[int]) – List of indices to include in the subset.

Return type:

DatamintBaseDataset

Returns:

Self with updated subset indices.

exception datamint.dataset.base_dataset.DatamintDatasetException