Datamint

Documentation

Version: 2.17.3

A comprehensive Python SDK for interacting with the Datamint platform, providing seamless integration for medical imaging workflows, dataset management, and machine learning experiments.

From inception to completion, Datamint is your reliable partner. It assists from the very first day when you make your data available to your team, right up to the moment you’re set to launch your model.

Datamint

Quick Start

Install the package:

pip install datamint

Configure your API access:

datamint-config

Start using the API:

from datamint import Api

# Initialize API handler
api = Api()
all_projects = api.projects.get_all()

# Upload a resource
api.resources.upload_resource("/path/to/image.dcm")

# Load a dataset for training
from datamint.dataset import ImageDataset
dataset = ImageDataset(project="my-project")

Architecture Overview

The Datamint Python API is organized into several key modules:

Module

Purpose

Key Classes

datamint.api

HTTP client and endpoint handlers for the API

Api, ResourcesApi ProjectsApi, etc.

datamint.entities

Pydantic data models representing platform objects

Resource, Project Annotation, etc.

datamint.dataset

PyTorch dataset classes for medical imaging

ImageDataset, VolumeDataset, etc.

datamint.lightning

PyTorch Lightning integration for training workflows

DatamintDataModule, UNetPPTrainer, etc.

datamint.mlflow

MLflow integration for experiment tracking and model registration

DatamintMLflowDataset, DatamintModel, etc.

Key Concepts

The SDK is built around a few core concepts that make data ingestion, annotation, training, and deployment work together smoothly.

Resources

Manage source data.

Upload medical images, videos, and other files. Organize resources into channels and projects, then tag and annotate them.

datamint.entities.resource.Resource
Annotations

Capture labels and geometry.

Add segmentations, bounding boxes, classifications, and other geometry to resources, with support for both 2D images and 3D volumes.

datamint.entities.annotations.annotation.Annotation
Projects

Group data for workflows.

Collect resources for annotation and ML training. Projects support split assignments (train/val/test) to keep experiments reproducible.

datamint.entities.project.Project
Datasets

Train with PyTorch-ready data.

Use dataset classes that load data from Datamint projects and automatically handle DICOM, NIfTI, image, and video formats.

datamint.dataset.base.DatamintBaseDataset
Trainers

Accelerate common training loops.

Rely on high-level trainers to streamline dataset setup, model configuration, MLflow logging, and checkpointing.

datamint.lightning.trainers.BaseTrainer
Models

Package models for deployment.

Register ML models for inference on the Datamint platform, including segmentation, classification, and other custom use cases.

datamint.mlflow.flavors.model.DatamintModel

Common Workflows

Uploading Data

from datamint import Api

api = Api()

# Upload a single file
resource = api.resources.upload_resource("/path/to/image.dcm")

# Upload with options
api.resources.upload_resource(
    "/path/to/image.dcm",
    channel="CT Scans",
    tags=["baseline", "ct"],
    anonymize=True,
)

# Upload multiple files
api.resources.upload_resources(["/path/to/a.dcm", "/path/to/b.dcm"])

Creating a Training Project

from datamint import Api
from datamint.dataset import ImageDataset

api = Api()

# Create project
project = api.projects.create(
    name="Liver Segmentation",
    description="CT liver segmentation dataset",
)

# Add resources
resources = api.resources.get_list(channel="CT Scans")
api.projects.add_resources(resources, project)

# Load dataset
dataset = ImageDataset(project="Liver Segmentation")

Training a Model

from datamint.lightning import UNetPPTrainer

trainer = UNetPPTrainer(
    project="Liver Segmentation",
    image_size=256,
    batch_size=16,
    max_epochs=50,
    accelerator="gpu",
)

results = trainer.fit()
print(results["test_results"])

Deploying a Model

from datamint import Api

api = Api()

# Deploy a registered model
deploy_job = api.deploy.start(
    model_name="liver-segmentation-model",
    model_alias="latest",
)
print(deploy_job.status)

Community & Support

GitHub Issues

Indices and Tables