Data Drift#

Querying Drift in Python#

The basic format of a drift query using the Python SDK involves specifying that the query_type parameter has the value ‘drift’:

query = {...}

arthur_model.query(query, query_type='drift')

Data Drift Endpoint#

Data drift has a dedicated endpoint at /models/{model_id}/inferences/query/data_drift.

Returns the data drift metric between a base dataset with a target dataset. This endpoint can support up to 100 properties in one request.

num_bins - Specifies the granularity of bucketing for continuous distributions and will be ignored if the attribute is categorical.
metric - Specify one metric among the data drift metrics Arthur offers.
filter - Optional blocks specific to either reference or inference set and specify which data should be used in the data drift calculation.
group_by - Global and applies to both the base and target data.
rollup - Optional parameter that will aggregate the calculated data drift value by the supported time dimension.

For HypothesisTest, the returned value is transformed as -log_10(P_value) to maintain directional parity with the other data drift metrics. That is, lower P_value is more significant and implies data drift, reflected in a higher -log_10(P_value). Further mathematical details are in the {ref}`glossary <glossary_hypothesis_test>.

Query Request:

{
    "properties": [
        "<attribute1_name> [string]",
        "<attribute2_name> [string]",
        "<attribute3_name> [string]"
    ],
    "num_bins": "<num_bins> [int]",
    "metric": "[PSI|KLDivergence|JSDivergence|HellingerDistance|HypothesisTest]",
    "base": {
        "source": "[inference|reference]",
        "filter [Optional]": [
            {
                "property": "<filter_attribute_name> [string]",
                "comparator": "<comparator> [string]",
                "value": "<filter_threshold_value> [string|int|float]"
            }
        ]
    },
    "target": {
        "source": "[inference|reference|ground_truth]",
        "filter [Optional]": [
            {
                "property": "<filter_attribute_name> [string]",
                "comparator": "<comparator> [string]",
                "value": "<filter_threshold_value> [string|int|float]"
            }
        ]
    },
    "group_by [Optional]": [
        {
            "property": "<group_by_attribute_name> [string]"
        }
    ],
    "rollup [Optional]": "minute|hour|day|month|year|batch_id"
}

Query Response:

{
    "query_result": [
        {
            "<attribute1_name>": "<attribute1_data_drift> [float]",
            "<attribute2_name>": "<attribute2_data_drift> [float]",
            "<attribute3_name>": "<attribute3_data_drift> [float]",
            "<group_by_attribute_name>": "<group_by_attribute_value> [string|int|null]",
            "rollup": "<rollup_attribute_value> [string|null]"
        }
    ]
}

See Filter Comparators for a list of valid comparators.

Example: Reference vs. Inference#

Sample Request: Calculate data drift for males, grouped by country, rolled up by hour.

{
    "properties": [
        "age"
    ],
    "num_bins": 10,
    "metric": "PSI",
    "base": {
        "source": "reference",
        "filter": [
            {
                "property": "gender",
                "comparator": "eq",
                "value": "male"
            }
        ]
    },
    "target": {
        "source": "inference",
        "filter": [
            {
                "property": "gender",
                "comparator": "eq",
                "value": "male"
            },
            {
                "property": "inference_timestamp",
                "comparator": "gte",
                "value": "2020-07-22T10:00:00Z"
            },
            {
                "property": "inference_timestamp",
                "comparator": "lt",
                "value": "2020-07-23T10:00:00Z"
            }
        ]
    },
    "group_by": [
        {
            "property": "country"
        }
    ],
    "rollup": "hour"
}

Sample Response:

{
    "query_result": [
        {
            "age": 2.3,
            "country": "Canada",
            "rollup": "2020-07-22T10:00:00Z"
        },
        {
            "age": 2.4,
            "country": "United States",
            "rollup": "2020-07-22T10:00:00Z"
        }
    ]
}

Example: Inference vs. Inference#

Sample Request: Compare data drift between two batches, with no grouping, no filters, and no rollups.

{
    "properties": [
        "age"
    ],
    "num_bins": 10,
    "metric": "PSI",
    "base": {
        "source": "inference",
        "filter": [
            {
                "property": "batch_id",
                "comparator": "eq",
                "value": "5"
            }
        ]
    },
    "target": {
        "source": "inference",
        "filter": [
            {
                "property": "batch_id",
                "comparator": "eq",
                "value": "6"
            }
        ]
    }
}

Sample Response:

{
    "query_result": [
        {
            "age": 2.3
        }
    ]
}

Example: Reference vs. Ground Truth#

Sample Request: Calculate data drift for individual ground truth class prediction probabilities, rolled up by hour.

{
    "properties": [
        "gt_1"
    ],
    "num_bins": 10,
    "metric": "PSI",
    "base": {
        "source": "reference"
    },
    "target": {
        "source": "ground_truth",
        "filter": [
            {
                "property": "ground_truth_timestamp",
                "comparator": "gte",
                "value": "2020-07-22T10:00:00Z"
            },
            {
                "property": "ground_truth_timestamp",
                "comparator": "lt",
                "value": "2020-07-23T10:00:00Z"
            }
        ]
    },
    "rollup": "hour"
}

Sample Response:

{
    "query_result": [
        {
            "gt_1": 0.03,
            "rollup": "2020-07-22T10:00:00Z"
        },
        {
            "gt_1": 0.4,
            "rollup": "2020-07-22T11:00:00Z"
        }
    ]
}

Data Drift PSI Bucket Table Values#

This metric has a dedicated endpoint at /models/{model_id}/inferences/query/data_drift_psi_bucket_calculation_table.

Returns the PSI scores by bucket using the reference set data. This query for this endpoint omits the need for metric and takes in a single property but otherwise is identical to the data drift endpoint

Note when using this endpoint with categorical features, the bucket_min and bucket_max fields will not be returned in the response. Instead, the bucket field will contain the category name.

Query Request:

{
    "property": "<attribute_name> [string]",
    "num_bins": "<num_bins> [int]",
    "base": {
        "source": "[inference|reference]",
        "filter [Optional]": [
            {
                "property": "<filter_attribute_name> [string]",
                "comparator": "<comparator> [string]",
                "value": "<filter_threshold_value> [string|int|float]"
            }
        ]
    },
    "target": {
        "source": "[inference|reference]",
        "filter [Optional]": [
            {
                "property": "<filter_attribute_name> [string]",
                "comparator": "<comparator> [string]",
                "value": "<filter_threshold_value> [string|int|float]"
            }
        ]
    },
    "group_by [Optional]": [
        {
            "property": "<group_by_attribute_name> [string]"
        }
    ],
    "rollup [Optional]": "minute|hour|day|month|year|batch_id"
}

Query Response:

{
    "query_result": [
        {
            "bucket": "string",
            "rollup": "string|null",
            "group_by_property_1": "string|null",
            "base_bucket_max": "number",
            "base_bucket_min": "number",
            "base_count_per_bucket": "number",
            "base_ln_probability_per_bucket": "number",
            "base_probability_per_bucket": "number",
            "base_total": "number",
            "target_bucket_max": "number",
            "target_bucket_min": "number",
            "target_count_per_bucket": "number",
            "target_ln_probability_per_bucket": "number",
            "target_probability_per_bucket": "number",
            "target_total": "number",
            "probability_difference": "number",
            "ln_probability_difference": "number",
            "psi": "number"
        }
    ]
}

See Filter Comparators for a list of valid comparators.

Sample Request: Calculate data drift bucket components for males, grouped by country, rolled up by hour.

{
    "property": "age",
    "num_bins": 2,
    "base": {
        "source": "reference",
        "filter": [
            {
                "property": "gender",
                "comparator": "eq",
                "value": "male"
            }
        ]
    },
    "target": {
        "source": "inference",
        "filter": [
            {
                "property": "gender",
                "comparator": "eq",
                "value": "male"
            },
            {
                "property": "inference_timestamp",
                "comparator": "gte",
                "value": "2020-07-22T10:00:00Z"
            },
            {
                "property": "inference_timestamp",
                "comparator": "lt",
                "value": "2020-07-23T10:00:00Z"
            }
        ]
    },
    "group_by": [
        {
            "property": "country"
        }
    ],
    "rollup": "hour"
}

Sample Response:

{
    "query_result": [
        {
            "bucket": "bucket_1",
            "rollup": "2020-01-01T00:00:00Z",
            "country": "Canada",
            "base_bucket_max": 0.9999971182990177,
            "base_bucket_min": 0.5009102069226075,
            "base_count_per_bucket": 4988,
            "base_ln_probability_per_bucket": -0.6955500651756032,
            "base_probability_per_bucket": 0.4988,
            "base_total": 10000,
            "target_bucket_max": 0.9999971182990177,
            "target_bucket_min": 0.5009102069226075,
            "target_count_per_bucket": 2487,
            "target_ln_probability_per_bucket": -0.6701670131762315,
            "target_probability_per_bucket": 0.5116231228142357,
            "target_total": 4861,
            "probability_difference": -0.012823122814235699,
            "ln_probability_difference": -0.025383051999371742,
            "psi": 0.00032548999318807485
        },
        {
            "bucket": "bucket_2",
            "rollup": "2020-01-01T00:00:00Z",
            "country": "United States",
            "base_bucket_max": 0.9999971182990177,
            "base_bucket_min": 0.5009102069226075,
            "base_count_per_bucket": 4988,
            "base_ln_probability_per_bucket": -0.6955500651756032,
            "base_probability_per_bucket": 0.4988,
            "base_total": 10000,
            "target_bucket_max": 0.9999971182990177,
            "target_bucket_min": 0.5009102069226075,
            "target_count_per_bucket": 2487,
            "target_ln_probability_per_bucket": -0.6701670131762315,
            "target_probability_per_bucket": 0.5116231228142357,
            "target_total": 4861,
            "probability_difference": -0.012823122814235699,
            "ln_probability_difference": -0.025383051999371742,
            "psi": 0.00032548999318807485
        },
        {
            "bucket": "bucket_1",
            "rollup": "2020-01-01T01:00:00Z",
            "country": "Canada",
            "base_bucket_max": 0.9999971182990177,
            "base_bucket_min": 0.5009102069226075,
            "base_count_per_bucket": 4988,
            "base_ln_probability_per_bucket": -0.6955500651756032,
            "base_probability_per_bucket": 0.4988,
            "base_total": 10000,
            "target_bucket_max": 0.9999971182990177,
            "target_bucket_min": 0.5009102069226075,
            "target_count_per_bucket": 2487,
            "target_ln_probability_per_bucket": -0.6701670131762315,
            "target_probability_per_bucket": 0.5116231228142357,
            "target_total": 4861,
            "probability_difference": -0.012823122814235699,
            "ln_probability_difference": -0.025383051999371742,
            "psi": 0.00032548999318807485
        },
        {
            "bucket": "bucket_2",
            "rollup": "2020-01-01T01:00:00Z",
            "country": "United States",
            "base_bucket_max": 0.9999971182990177,
            "base_bucket_min": 0.5009102069226075,
            "base_count_per_bucket": 4988,
            "base_ln_probability_per_bucket": -0.6955500651756032,
            "base_probability_per_bucket": 0.4988,
            "base_total": 10000,
            "target_bucket_max": 0.9999971182990177,
            "target_bucket_min": 0.5009102069226075,
            "target_count_per_bucket": 2487,
            "target_ln_probability_per_bucket": -0.6701670131762315,
            "target_probability_per_bucket": 0.5116231228142357,
            "target_total": 4861,
            "probability_difference": -0.012823122814235699,
            "ln_probability_difference": -0.025383051999371742,
            "psi": 0.00032548999318807485
        }
    ]
}

Sample Request: Compare data drift bucket components between two batches, with no grouping, no filters, and no rollups.

{
    "property": "age",
    "num_bins": 10,
    "base": {
        "source": "inference",
        "filter": [
            {
                "property": "batch_id",
                "comparator": "eq",
                "value": "5"
            }
        ]
    },
    "target": {
        "source": "inference",
        "filter": [
            {
                "property": "batch_id",
                "comparator": "eq",
                "value": "6"
            }
        ]
    }
}

Sample Response:

{
    "query_result": [
        {
            "bucket": "bucket_1",
            "base_bucket_max": 0.9999971182990177,
            "base_bucket_min": 0.5009102069226075,
            "base_count_per_bucket": 4988,
            "base_ln_probability_per_bucket": -0.6955500651756032,
            "base_probability_per_bucket": 0.4988,
            "base_total": 10000,
            "target_bucket_max": 0.9999971182990177,
            "target_bucket_min": 0.5009102069226075,
            "target_count_per_bucket": 2487,
            "target_ln_probability_per_bucket": -0.6701670131762315,
            "target_probability_per_bucket": 0.5116231228142357,
            "target_total": 4861,
            "probability_difference": -0.012823122814235699,
            "ln_probability_difference": -0.025383051999371742,
            "psi": 0.00032548999318807485
        },
        {
            "bucket": "bucket_2",
            "base_bucket_max": 0.9999971182990177,
            "base_bucket_min": 0.5009102069226075,
            "base_count_per_bucket": 4988,
            "base_ln_probability_per_bucket": -0.6955500651756032,
            "base_probability_per_bucket": 0.4988,
            "base_total": 10000,
            "target_bucket_max": 0.9999971182990177,
            "target_bucket_min": 0.5009102069226075,
            "target_count_per_bucket": 2487,
            "target_ln_probability_per_bucket": -0.6701670131762315,
            "target_probability_per_bucket": 0.5116231228142357,
            "target_total": 4861,
            "probability_difference": -0.012823122814235699,
            "ln_probability_difference": -0.025383051999371742,
            "psi": 0.00032548999318807485
        }
    ]
}

Data Drift for Classification Outputs#

For classification outputs, one may want to examine drift among a collection of different classes, i.e. the system of outputs, instead of the drift of the probability predictions of a single class. The query uses one of "predicted_classes": ["*"] or "ground_truth_classes": ["*"] but otherwise is identical to a standard data drift query. Rather than using the star operator to select all prediction or ground truth classes, respectively, in a model, a list of string classes can be provided for looking at drift of a subset of multiclass outputs.

predicted_classes - Specifies which prediction classes to use for predictedClass data drift.
ground_truth_classes - Specifies which prediction classes to use for groundTruthClass data drift.

properties can be included in the same query as long as the target source corresonds to the classification output tag. For example, one can query drift on input attributes and predictedClass in the same query with target source of inference; one can query drift on individual ground truth labels and groundTruthClass in the same query with target source of ground_truth.

Query Request:

{
    "properties [Optional]": [
        "<attribute1_name> [string]",
        "<attribute2_name> [string]",
        "<attribute3_name> [string]"
    ],
    "[predicted_classes|ground_truth_classes]": [
        "<class0_name> [string]"
        "<class1_name> [string]"
        ],
    "num_bins": "<num_bins> [int]",
    "metric": "[PSI|KLDivergence|JSDivergence|HellingerDistance|HypothesisTest]",
    "base": {
        "source": "[inference|reference]",
        "filter [Optional]": [
            {
                "property": "<filter_attribute_name> [string]",
                "comparator": "<comparator> [string]",
                "value": "<filter_threshold_value> [string|int|float]"
            }
        ]
    },
    "target": {
        "source": "[inference|reference|ground_truth]",
        "filter [Optional]": [
            {
                "property": "<filter_attribute_name> [string]",
                "comparator": "<comparator> [string]",
                "value": "<filter_threshold_value> [string|int|float]"
            }
        ]
    },
    "group_by [Optional]": [
        {
            "property": "<group_by_attribute_name> [string]"
        }
    ],
    "rollup [Optional]": "minute|hour|day|month|year|batch_id"
}

Query Response:

{
    "query_result": [
        {
            "<attribute1_name>": "<attribute1_data_drift> [float]",
            "<attribute2_name>": "<attribute2_data_drift> [float]",
            "<attribute3_name>": "<attribute3_data_drift> [float]",
            "[predictedClass|groundTruthClass]": "<classification_data_drift> [float]",
            "<group_by_attribute_name>": "<group_by_attribute_value> [string|int|null]",
            "rollup": "<rollup_attribute_value> [string|null]"
        }
    ]
}

See Filter Comparators for a list of valid comparators.

Sample Request: Calculate data drift on all prediction classes.

{
    "predicted_classes": [
        "*"
    ],
    "num_bins": 20,
    "base": {
        "source": "reference"
    },
    "target": {
        "source": "inference"
    },
    "metric": "PSI"
}

Sample Response:

{
    "query_result": [
        {
            "predictedClass": 0.021
        }
    ]
}

Sample Request: Calculate data drift on ground truth using the first and third ground truth classes.

{
    "predicted_classes": [
        "gt_1",
        "gt_3"
    ],
    "num_bins": 20,
    "base": {
        "source": "reference"
    },
    "target": {
        "source": "ground_truth"
    },
    "metric": "PSI"
}

Sample Response:

{
    "query_result": [
        {
            "groundTruthClass": 0.021
        }
    ]
}

Automated Data Drift Thresholds#

What is a sufficiently high data drift value to suggest that the target data has actually drifted from the base data? For HypothesisTest, we can reverse engineer -log_10(P_value) and plug in the conventional .05 alpha level to establish a lower bound of -log_10(.05).

For the other data drift metrics, it is not sufficient to pin a constant. We abstract this away for the user and allow queries to obtain automatically generated data drift thresholds (lower bounds) based on a model’s data. These thresholds can be used in alerting. For more information see: Automating Data Drift Thresholding in Machine Learning Systems.

The query uses "metric": "Thresholds" and does not require nor use "target" and "rollup" fields but otherwise is identical to a standard data drift query.

Query Request:

{
    "properties": [
        "<attribute1_name> [string]",
        "<attribute2_name> [string]",
        "<attribute3_name> [string]"
    ],
    "num_bins": "<num_bins> [int]",
    "metric": "Thresholds",
    "base": {
        "source": "reference",
        "filter [Optional]": [
            {
                "property": "<filter_attribute_name> [string]",
                "comparator": "<comparator> [string]",
                "value": "<filter_threshold_value> [string|int|float]"
            }
        ]
    },
    "group_by [Optional]": [
        {
            "property": "<group_by_attribute_name> [string]"
        }
    ]
}

Query Response:

{
    "query_result": [
        {
            "<attribute1_name>": {
                "HellingerDistance": "<threshold> [float]",
                "JSDivergence": "<threshold> [float]",
                "KLDivergence": "<threshold> [float]",
                "PSI": "<threshold> [float]"
            },
            "<attribute2_name>": {
                "HellingerDistance": "<threshold> [float]",
                "JSDivergence": "<threshold> [float]",
                "KLDivergence": "<threshold> [float]",
                "PSI": "<threshold> [float]"
            }
            
        }
    ]
}

See Filter Comparators for a list of valid comparators.

Sample Request:

{
    "properties": [
        "AGE"
    ],
    "num_bins": 20,
    "base": {
        "source": "reference"
    },
    "metric": "Thresholds"
}

Sample Response:

{
    "query_result": [
        {
            "AGE": {
                "HellingerDistance": 0.00041737395239735647,
                "JSDivergence": 2.959228131592643,
                "KLDivergence": 0.001893866910388703,
                "PSI": 0.0018945640055550161
            }
        }
    ]
}