Datamint Dataset
MLflow Dataset adapter for Datamint project splits.
- class datamint.mlflow.data.datamint_dataset.DatamintDatasetSource(project_id, project_name, split, extra_params=None)
Bases:
DatasetSourceSource info pointing to a Datamint project.
- Parameters:
project_id (
str)project_name (
str)split (
str|None)extra_params (
dict[str,Any] |None)
- classmethod from_json(source_json)
Constructs an instance of the DatasetSource from a JSON string representation.
- Parameters:
source_json (
str) – A JSON string representation of the DatasetSource.- Return type:
DatamintDatasetSource- Returns:
A DatasetSource instance.
- load(**kwargs)
Loads files / objects referred to by the DatasetSource. For example, depending on the type of
DatasetSource, this may download source CSV files from S3 to the local filesystem, load a source Delta Table as a Spark DataFrame, etc.- Return type:
Any- Returns:
The downloaded source, e.g. a local filesystem path, a Spark DataFrame, etc.
- Parameters:
kwargs (
Any)
- to_json()
Obtains a JSON string representation of the
DatasetSource.- Return type:
str- Returns:
A JSON string representation of the
DatasetSource.
- class datamint.mlflow.data.datamint_dataset.DatamintMLflowDataset(project_id, project_name, split, resources, extra_params=None)
Bases:
DatasetMLflow Dataset wrapping a Datamint project split for lineage tracking.
- Parameters:
project_id (
str)project_name (
str)split (
str|None)resources (
Sequence[str] |Sequence[Resource])extra_params (
dict[str,Any] |None)
- property profile: Any | None
Optional summary statistics for the dataset, such as the number of rows in a table, the mean / median / std of each table column, etc.
- property schema
Optional dataset schema, such as an instance of
mlflow.types.Schemarepresenting the features and targets of the dataset.
- to_dict()
Create config dictionary for the dataset.
Subclasses should override this method to provide additional fields in the config dict, e.g., schema, profile, etc.
Returns a string dictionary containing the following fields: name, digest, source, source type.
- Return type:
dict[str,str]