Entities

The datamint.entities module provides the core data structures that represent various objects within the DataMint ecosystem. These entities are built using Pydantic models, ensuring robust data validation, type safety, and seamless serialization/deserialization when interacting with the DataMint API.

DataMint entities package.

class datamint.entities.Annotation(*, identifier, scope, annotation_type, confiability=1.0, id=None, frame_index=None, text_value=None, numeric_value=None, units=None, geometry=None, created_at=None, created_by=None, annotation_worklist_id=None, status=None, approved_at=None, approved_by=None, resource_id=None, associated_file=None, deleted=False, deleted_at=None, deleted_by=None, created_by_model=None, set_name=None, resource_filename=None, resource_modality=None, annotation_worklist_name=None, user_info=None, values='MISSING_FIELD', file=None, **data)

Bases: AnnotationBase

Pydantic Model representing a DataMint annotation.

id: Unique identifier for the annotation.

identifier: User-friendly identifier or label for the annotation.

scope: Scope of the annotation (e.g., “frame”, “image”).

frame_index: Index of the frame if scope is frame-based.

annotation_type: Type of annotation (e.g., “segmentation”, “bbox”, “label”).

text_value: Optional text value associated with the annotation.

numeric_value: Optional numeric value associated with the annotation.

units: Optional units for numeric_value.

geometry: Optional geometry payload (e.g., polygons, masks) as a list.

created_at: ISO timestamp for when the annotation was created.

created_by: Email or identifier of the creating user.

annotation_worklist_id: Optional worklist ID associated with the annotation.

status: Lifecycle status of the annotation (e.g., “new”, “approved”).

approved_at: Optional ISO timestamp for approval time.

approved_by: Optional identifier of the approver.

resource_id: ID of the resource this annotation belongs to.

associated_file: Path or identifier of any associated file artifact.

deleted: Whether the annotation is marked as deleted.

deleted_at: Optional ISO timestamp for deletion time.

deleted_by: Optional identifier of the user who deleted the annotation.

created_by_model: Optional identifier of the model that created this annotation.

old_geometry: Optional previous geometry payload for change tracking.

set_name: Optional set name this annotation belongs to.

resource_filename: Optional filename of the resource.

resource_modality: Optional modality of the resource (e.g., CT, MR).

annotation_worklist_name: Optional worklist name associated with the annotation.

user_info: Optional user information with keys like firstname and lastname.

values: Optional extra values payload for flexible schemas.

property added_by: str: Get the creator email (alias for created_by).

annotation_worklist_id: str | None

annotation_worklist_name: str | None

approved_at: str | None

approved_by: str | None

associated_file: str | None

created_at: str | None

created_by: str | None

created_by_model: str | None

deleted: bool

deleted_at: str | None

deleted_by: str | None

fetch_file_data(save_path=None, auto_convert=True, use_cache=False)

Parameters:

save_path (PathLike | str | None)
auto_convert (bool)
use_cache (bool)

Return type:

bytes | pydicom.dataset.Dataset | Image.Image | cv2.VideoCapture | nib_FileBasedImage

file: str | None

frame_index: int | None

classmethod from_dict(data)

Create an Annotation instance from a dictionary.

Parameters:: data (dict[str, Any]) – Dictionary containing annotation data from API
Return type:: Annotation
Returns:: Annotation instance

geometry: list | dict | None

get_created_datetime()

Get the creation datetime as a datetime object.

Return type:: datetime | None
Returns:: datetime object or None if created_at is not set

id: str | None

identifier: str

property index: int | None: Get the frame index (alias for frame_index).

invalidate_cache()

Invalidate all cached data for this annotation.

Return type:: None

is_category()

Check if this is a category annotation.

Return type:: bool

is_frame_scoped()

Check if this annotation is frame-scoped.

Return type:: bool

is_image_scoped()

Check if this annotation is image-scoped.

Return type:: bool

is_label()

Check if this is a label annotation.

Return type:: bool

is_segmentation()

Check if this is a segmentation annotation.

Return type:: bool

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'ser_json_bytes': 'base64', 'val_json_bytes': 'base64'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_BaseEntity__context)

Handle unknown fields by logging a warning once per class/field combination in debug mode.

Parameters:: _BaseEntity__context (Any)
Return type:: None

numeric_value: float | int | None

property resource: Resource: Lazily load and cache the associated Resource entity.

resource_filename: str | None

resource_id: str | None

resource_modality: str | None

scope: str

set_name: str | None

status: str | None

text_value: str | None

property type: str: Alias for annotation_type.

units: str | None

user_info: dict | None

property value: str | None: Get the annotation value (for category annotations).

values: list | None

class datamint.entities.BaseEntity(**data)

Bases: BaseModel

Base class for all entities in the Datamint system.

This class provides common functionality for all entities, such as serialization and deserialization from dictionaries, as well as handling unknown fields gracefully.

The API client is automatically injected by the Api class when entities are created through API endpoints.

asdict()

Convert the entity to a dictionary, including unknown fields.

Return type:: dict[str, Any]

asjson()

Convert the entity to a JSON string, including unknown fields.

Return type:: str

has_missing_attrs()

Check if the entity has any attributes that are MISSING_FIELD.

Return type:: bool
Returns:: True if any attribute is MISSING_FIELD, False otherwise

is_attr_missing(attr_name)

Check if a value is the MISSING_FIELD sentinel.

Parameters:: attr_name (str)
Return type:: bool

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'ser_json_bytes': 'base64', 'val_json_bytes': 'base64'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_BaseEntity__context)

Handle unknown fields by logging a warning once per class/field combination in debug mode.

Parameters:: _BaseEntity__context (Any)
Return type:: None

class datamint.entities.CacheManager(entity_type, cache_root=None)

Bases: Generic[T]

Manages local caching of entity data with versioning support.

This class handles storing and retrieving cached data with automatic validation against server versions to ensure data consistency.

The cache uses a directory structure: - cache_root/

resources/ - {resource_id}/

image_data.pkl

metadata.json

annotations/ - {annotation_id}/

segmentation_data.pkl

metadata.json

cache_root: Root directory for cache storage

entity_type: Type of entity being cached (e.g., ‘resources’, ‘annotations’)

Parameters:

entity_type (str)
cache_root (Path | str | None)

class ItemMetadata(**data)

Bases: BaseModel

Parameters:: data (Any)

cached_at: datetime

data_path: str

data_type: str

entity_id: str | None

mimetype: str

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

version_hash: str | None

version_info: dict | None

clear_all()

Clear all cached data for this entity type.

Return type:: None

get(entity_id, data_key, version_info=None)

Retrieve cached data for an entity.

Parameters:

entity_id (str) – Unique identifier for the entity
data_key (str) – Key identifying the type of data
version_info (dict[str, Any] | None) – Optional version information from server to validate cache

Return type:

TypeVar(T) | None

Returns:

Cached data if valid, None if cache miss or invalid

get_cache_info(entity_id)

Get information about cached data for an entity.

Parameters:: entity_id (str) – Unique identifier for the entity
Return type:: dict[str, Any]
Returns:: Dictionary containing cache information

get_path(entity_id, data_key, version_info=None)

Get the path to cached data for an entity if valid.

Parameters:

entity_id (str) – Unique identifier for the entity
data_key (str) – Key identifying the type of data
version_info (dict[str, Any] | None) – Optional version information from server to validate cache

Return type:

Path | None

Returns:

Path to cached data if valid, None if cache miss or invalid

invalidate(entity_id, data_key=None)

Invalidate cached data for an entity.

Parameters:

entity_id (str) – Unique identifier for the entity
data_key (str | None) – Optional key for specific data. If None, invalidates all data for entity.

Return type:

None

set(entity_id, data_key, data, version_info=None)

Store data in cache for an entity.

Parameters:

entity_id (str) – Unique identifier for the entity
data_key (str) – Key identifying the type of data
data (TypeVar(T)) – Data to cache
version_info (dict[str, Any] | None) – Optional version information from server

Return type:

None

class datamint.entities.Channel(*, channel_name, resource_data, deleted=False, created_at=None, updated_at=None, **data)

Bases: BaseEntity

Represents a channel containing multiple resources.

A channel is a collection of resources grouped together, typically for batch processing or organization purposes.

channel_name: Name identifier for the channel.

resource_data: List of resources contained in this channel.

deleted: Whether the channel has been marked as deleted.

created_at: Timestamp when the channel was created.

updated_at: Timestamp when the channel was last updated.

channel_name: str

created_at: str | None

deleted: bool

get_resource_ids()

Get list of all resource IDs in this channel.

Return type:: list[str]

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'ser_json_bytes': 'base64', 'val_json_bytes': 'base64'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_BaseEntity__context)

Handle unknown fields by logging a warning once per class/field combination in debug mode.

Parameters:: _BaseEntity__context (Any)
Return type:: None

resource_data: list[ChannelResourceData]

updated_at: str | None

class datamint.entities.ChannelResourceData(**data)

Bases: BaseModel

Represents resource data within a channel.

created_by: Email of the user who created the resource.

customer_id: UUID of the customer.

resource_id: UUID of the resource.

resource_file_name: Original filename of the resource.

resource_mimetype: MIME type of the resource.

Parameters:: data (Any)

created_by: str

customer_id: str

model_config: ClassVar[ConfigDict] = {'extra': 'allow'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

resource_file_name: str

resource_id: str

resource_mimetype: str

class datamint.entities.DatasetInfo(*, id, name, created_at, created_by, description, customer_id, updated_at, total_resource, resource_ids, **data)

Bases: BaseEntity

Pydantic Model representing a DataMint dataset.

This class provides access to dataset information and related entities like resources and projects.

created_at: str

created_by: str

customer_id: str

description: str

id: str

invalidate_cache()

Invalidate all cached relationship data.

This forces fresh data fetches on the next access.

Return type:: None

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'ser_json_bytes': 'base64', 'val_json_bytes': 'base64'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_BaseEntity__context)

Handle unknown fields by logging a warning once per class/field combination in debug mode.

Parameters:: _BaseEntity__context (Any)
Return type:: None

name: str

resource_ids: list[str]

total_resource: int

updated_at: str | None

class datamint.entities.Project(*, id, name, created_at, created_by, dataset_id, worklist_id, archived, resource_count, annotated_resource_count, description, viewable_ai_segs, editable_ai_segs, registered_model='MISSING_FIELD', ai_model_id='MISSING_FIELD', closed_resources_count='MISSING_FIELD', resources_to_annotate_count='MISSING_FIELD', most_recent_experiment='MISSING_FIELD', annotators='MISSING_FIELD', archived_on='MISSING_FIELD', archived_by='MISSING_FIELD', is_active_learning='MISSING_FIELD', two_up_display='MISSING_FIELD', require_review='MISSING_FIELD', **data)

Bases: BaseEntity

Pydantic Model representing a DataMint project.

This class models a project entity from the DataMint API, containing information about the project, its dataset, worklist, AI model, and annotation statistics.

id: Unique identifier for the project

name: Human-readable name of the project

description: Optional description of the project

created_at: ISO timestamp when the project was created

created_by: Email of the user who created the project

dataset_id: ID of the associated dataset

worklist_id: ID of the associated worklist

ai_model_id: Optional ID of the associated AI model

viewable_ai_segs: Optional configuration for viewable AI segments

editable_ai_segs: Optional configuration for editable AI segments

archived: Whether the project is archived

resource_count: Total number of resources in the project

annotated_resource_count: Number of resources that have been annotated

most_recent_experiment: Optional information about the most recent experiment

closed_resources_count: Number of resources marked as closed/completed

resources_to_annotate_count: Number of resources still needing annotation

annotators: List of annotators assigned to this project

ai_model_id: str | None

annotated_resource_count: int

annotators: list[dict]

archived: bool

archived_by: str | None

archived_on: str | None

as_torch_dataset(root_dir=None, auto_update=True, return_as_semantic_segmentation=False)

Parameters:

root_dir (str | None)
auto_update (bool)
return_as_semantic_segmentation (bool)

cache_resources(progress_bar=True)

Cache all project resources in parallel for faster subsequent access.

This method downloads and caches all resource file data concurrently, skipping resources that are already cached. This dramatically improves performance when working with large projects.

Parameters:: progress_bar (bool) – Whether to show a progress bar. Default is True.
Return type:: None

Example

>>> proj = api.projects.get_by_name("My Project")
>>> proj.cache_resources()  # Cache all resources in parallel
>>> # Now fetch_file_data() will be instantaneous for cached resources
>>> for res in proj.fetch_resources():
...     data = res.fetch_file_data(use_cache=True)

closed_resources_count: int

created_at: str

created_by: str

dataset_id: str

description: str | None

download_resources_datas(progress_bar=True)

Downloads all project resources in parallel for faster subsequent access.

This method downloads and caches all resource file data concurrently, skipping resources that are already cached. This dramatically improves performance when working with large projects.

Parameters:: progress_bar (bool) – Whether to show a progress bar. Default is True.
Return type:: None

Example

>>> proj = api.projects.get_by_name("My Project")
>>> proj.download_resources()  # Cache all resources in parallel
>>> # Now fetch_file_data() will be instantaneous for cached resources
>>> for res in proj.fetch_resources():
...     data = res.fetch_file_data(use_cache=True)

editable_ai_segs: list | None

fetch_resources()

Fetch resources associated with this project from the API, IMPORTANT: It always fetches fresh data from the server.

Return type:: Sequence[Resource]
Returns:: List of Resource instances associated with the project.

id: str

is_active_learning: bool

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'ser_json_bytes': 'base64', 'val_json_bytes': 'base64'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_BaseEntity__context)

Handle unknown fields by logging a warning once per class/field combination in debug mode.

Parameters:: _BaseEntity__context (Any)
Return type:: None

most_recent_experiment: str | None

name: str

registered_model: Any | None

require_review: bool

resource_count: int

resources_to_annotate_count: int

set_work_status(resource, status)

Set the status of a resource.

Parameters:

resource (Resource) – The resource unique id or a resource object.
status (Literal['opened', 'annotated', 'closed']) – The new status to set.

Return type:

None

show()

Open the project in the default web browser.

Return type:: None

two_up_display: bool

property url: str: Get the URL to access this project in the DataMint web application.

viewable_ai_segs: list | None

worklist_id: str

class datamint.entities.Resource(*, id, resource_uri, storage, location, upload_channel, filename, mimetype, size, customer_id, status, created_at, created_by, published, deleted, upload_mechanism=None, modality=None, source_filepath=None, published_on=None, published_by=None, tags=None, deleted_at=None, deleted_by=None, instance_uid=None, series_uid=None, study_uid=None, patient_id=None, user_info='MISSING_FIELD', **data)

Bases: BaseEntity

Represents a DataMint resource with all its properties and metadata.

This class models a resource entity from the DataMint API, containing information about uploaded files, their metadata, and associated projects.

id: Unique identifier for the resource

resource_uri: URI path to access the resource file

storage: Storage type (e.g., ‘DicomResource’)

location: Storage location path

upload_channel: Channel used for upload (e.g., ‘tmp’)

filename: Original filename of the resource

modality: Medical imaging modality

mimetype: MIME type of the file

size: File size in bytes

upload_mechanism: Mechanism used for upload (e.g., ‘api’)

customer_id: Customer/organization identifier

status: Current status of the resource

created_at: ISO timestamp when resource was created

created_by: Email of the user who created the resource

published: Whether the resource is published

published_on: ISO timestamp when resource was published

published_by: Email of the user who published the resource

publish_transforms: Optional publication transforms

deleted: Whether the resource is deleted

deleted_at: Optional ISO timestamp when resource was deleted

deleted_by: Optional email of the user who deleted the resource

metadata: Resource metadata with DICOM information

source_filepath: Original source file path

tags: List of tags associated with the resource

instance_uid: DICOM SOP Instance UID (top-level)

series_uid: DICOM Series Instance UID (top-level)

study_uid: DICOM Study Instance UID (top-level)

patient_id: Patient identifier (top-level)

segmentations: Optional segmentation data

measurements: Optional measurement data

categories: Optional category data

labels: List of labels associated with the resource

user_info: Information about the user who created the resource

projects: List of projects this resource belongs to

__repr__()

Detailed string representation of the resource.

Return type:: str
Returns:: Detailed string representation for debugging

__str__()

String representation of the resource.

Return type:: str
Returns:: Human-readable string describing the resource

created_at: str

created_by: str

customer_id: str

deleted: bool

deleted_at: str | None

deleted_by: str | None

fetch_annotations(annotation_type=None)

Get annotations associated with this resource.

Parameters:: annotation_type (AnnotationType | str | None)
Return type:: Sequence[Annotation]

fetch_file_data(auto_convert=True, save_path=None, use_cache=False)

Get the file data for this resource.

This method automatically caches the file data locally. On subsequent calls, it checks the server for changes and uses cached data if unchanged.

Parameters:

use_cache (bool) – If True, uses cached data when available and valid
auto_convert (bool) – If True, automatically converts to appropriate format (pydicom.Dataset, PIL Image, etc.)
save_path (str | None) – Optional path to save the file locally

Return type:

bytes | ImagingData

Returns:

File data (format depends on auto_convert and file type)

filename: str

property filepath_cached: Path | None

Get the file path of the cached resource data, if available.

Returns:: Path to the cached file data, or None if not cached.

static from_local_file(file_path)

Create a LocalResource instance from a local file path.

Parameters:: file_path (str | Path) – Path to the local file

id: str

instance_uid: str | None

invalidate_cache()

Invalidate cached data for this resource.

Return type:: None

is_cached()

Check if the resource’s file data is already cached locally and valid.

Return type:: bool
Returns:: True if valid cached data exists, False otherwise.

is_dicom()

Check if the resource is a DICOM file.

Return type:: bool
Returns:: True if the resource is a DICOM file, False otherwise

location: str

mimetype: str

modality: str | None

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'ser_json_bytes': 'base64', 'val_json_bytes': 'base64'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_BaseEntity__context)

Handle unknown fields by logging a warning once per class/field combination in debug mode.

Parameters:: _BaseEntity__context (Any)
Return type:: None

patient_id: str | None

published: bool

published_by: str | None

published_on: str | None

resource_uri: str

series_uid: str | None

show()

Open the resource in the default web browser.

Return type:: None

size: int

property size_mb: float

Get file size in megabytes.

Returns:: File size in MB rounded to 2 decimal places

source_filepath: str | None

status: str

storage: str

study_uid: str | None

tags: list[str] | None

upload_channel: str

upload_mechanism: str | None

property url: str: Get the URL to access this resource in the DataMint web application.

user_info: dict[str, str | None]

class datamint.entities.User(*, email, firstname, lastname, roles, customer_id, created_at, **data)

Bases: BaseEntity

User entity model.

email: User email address (unique identifier in most cases).

firstname: First name.

lastname: Last name.

roles: List of role strings assigned to the user.

customer_id: UUID of the owning customer/tenant.

created_at: ISO 8601 timestamp of creation.

created_at: str

customer_id: str

email: str

firstname: str | None

lastname: str | None

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'ser_json_bytes': 'base64', 'val_json_bytes': 'base64'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_BaseEntity__context)

Handle unknown fields by logging a warning once per class/field combination in debug mode.

Parameters:: _BaseEntity__context (Any)
Return type:: None

roles: list[str]