arthurai.core.data_service.DatasetService#

class arthurai.core.data_service.DatasetService#

Bases: object

Methods

chunk_image_set

Takes in a directory path with parquet and/or json files containing image attributes.

convert_dataframe

Convert a dataframe to parquet named {model.id}-{stage}.parquet in the system tempdir

files_size

rtype

int

send_files_from_dir_iteratively

Sends parquet or json files iteratively from a specified directory to a specified url for a given model

Attributes

COUNTS

DEFAULT_MAX_IMAGE_DATA_BYTES

FAILURE

FAILURES

SUCCESS

TOTAL

static chunk_image_set(directory_path, image_attribute, max_image_data_bytes=300000000)#

Takes in a directory path with parquet and/or json files containing image attributes. Divides images up into 300MB chunks, then zipped, the parquet/json file is also split up to match. The files will have random filename, and image zips will have matching name.

Return type

str

static convert_dataframe(model_id, stage, df, max_rows_per_file=500000)#

Convert a dataframe to parquet named {model.id}-{stage}.parquet in the system tempdir

Parameters
  • model_id (str) – a model id

  • stage (Optional[Stage]) – the Stage

  • df (DataFrame) – the dataframe to convert

  • max_rows_per_file – the maximum number of rows per parquet file

Returns:

The filename of the parquet file that was created

Return type

str

static send_files_from_dir_iteratively(model, directory_path, endpoint, upload_file_param_name, additional_form_params=None, retries=0)#

Sends parquet or json files iteratively from a specified directory to a specified url for a given model

Parameters
  • retries (int) – Number of times to retry the request if it results in a 400 or higher response code

  • model (ArthurModel) – the arthurai.client.apiv2.model.ArthurModel

  • directory_path (str) – local path containing parquet and/or json files to send

  • endpoint (str) – POST url endpoint to send files to

  • upload_file_param_name (str) – key to use in body with each attached file

  • additional_form_params (Optional[Dict[str, Any]]) – dictionary of additional form file params to send along with parquet or json file

Raises

MissingParameterError – the request failed

:returns A list of files which failed to upload