arthurai.core.data_service.DatasetService#
- class arthurai.core.data_service.DatasetService#
Bases:
object
Methods
Takes in a directory path with parquet and/or json files containing image attributes.
Convert a dataframe to parquet named {model.id}-{stage}.parquet in the system tempdir
files_size
- rtype
int
Sends parquet or json files iteratively from a specified directory to a specified url for a given model
Attributes
COUNTS
DEFAULT_MAX_IMAGE_DATA_BYTES
FAILURE
FAILURES
SUCCESS
TOTAL
- static chunk_image_set(directory_path, image_attribute, max_image_data_bytes=300000000)#
Takes in a directory path with parquet and/or json files containing image attributes. Divides images up into 300MB chunks, then zipped, the parquet/json file is also split up to match. The files will have random filename, and image zips will have matching name.
- Return type
str
- static convert_dataframe(model_id, stage, df, max_rows_per_file=500000)#
Convert a dataframe to parquet named {model.id}-{stage}.parquet in the system tempdir
- Parameters
- Returns:
The filename of the parquet file that was created
- Return type
str
- static send_files_from_dir_iteratively(model, directory_path, endpoint, upload_file_param_name, additional_form_params=None, retries=0)#
Sends parquet or json files iteratively from a specified directory to a specified url for a given model
- Parameters
retries (
int
) – Number of times to retry the request if it results in a 400 or higher response codemodel (
ArthurModel
) – thearthurai.client.apiv2.model.ArthurModel
directory_path (
str
) – local path containing parquet and/or json files to sendendpoint (
str
) – POST url endpoint to send files toupload_file_param_name (
str
) – key to use in body with each attached fileadditional_form_params (
Optional
[Dict
[str
,Any
]]) – dictionary of additional form file params to send along with parquet or json file
- Raises
MissingParameterError – the request failed
:returns A list of files which failed to upload