Model Onboarding#

Overview#

This guide walks through the steps of onboarding a model deployed in production to Arthur. Once your deployed model is onboarded, you can use Arthur to retrieve insights about model performance efficiently and at scale.

Note

This walkthrough uses tabular data.

To onboard models of other input types, see CV Onboarding and NLP Onboarding.

Requirements#

You will need to have access to the data your model ingests and the predictions it produces.

The model object itself is not required, but it can be uploaded to enable the explainability enrichment. See our FAQs for more info.

Outline#

This guide will cover the three main steps to onboarding a model to the Arthur platform:

Model Registration is the process of registering the model schema with Arthur and sending reference data
Onboarding Existing Inferences sends your model’s historical predictions to the Arthur platform
Production Integration connects your model’s ongoing predictions in deployment to be logged with Arthur

Model Registration#

Connect to Arthur#

The first step is to import functions from the arthurai package and establish a connection with an Arthur username and password.

# Arthur imports
from arthurai import ArthurAI
from arthurai.common.constants import InputType, OutputType, Stage, ValueType, Enrichment

arthur = ArthurAI(url="https://app.arthur.ai",
                  login="<YOUR_USERNAME_OR_EMAIL>")

Register Model Type#

To register a model, we start by creating a model object and defining its high-level metadata:

arthur_model = arthur.model(
    partner_model_id=f"OnboardingModel_123-{datetime.now().strftime('%Y%m%d%H%M%S')}",
    display_name="OnboardingModel",
    input_type=InputType.Tabular,
    output_type=OutputType.Multiclass,
    is_batch=False)

In particular, we set is_batch=False to define this as a streaming model, which means the Arthur platform will receive the model’s inferences as they are produced live in deployment.

Note - The partner_model_id parameter is used to help API users dereference the models that are being monitored with their Source System Identifier. For example, if you are using a Model in an Upstream Model Catalog ID ‘My-Enterprise-Model’, you will want to use this to construct the partner_model_id. The Arthur Platform has a uniqueness constraint across all models for the partner_model_id attribute. Once a model has been onboarded with a specific partner_model_id, users will NOT be able to onboard new models with that same partner_model_id.

To avoid any confusion later, it is best practice to construct a partner_model_id with a timestamp embedded in the string, so that each time the model is onboarded, a new partner_model_id is generated.

See the example above for how we pass in a timestamp to the partner_model_id.

Register Attributes with ArthurModel.build()#

Next we’ll add more detail to the model metadata, defining the model’s attributes. The simplest method of registering your attributes is to use ArthurModel.build() , which parses a Pandas DataFrame of your reference dataset containing inputs, metadata, predictions, and ground truth labels. In addition, a pred_to_ground_truth_map is required, which tells Arthur which of your attributes represent to your model’s predicted values, and how those predicted attributes correspond to your model’s ground truth attributes.

Here we build a model with a pred_to_ground_truth_map configured for a binary classification model.

# Map PredictedValue attribute to its corresponding GroundTruth attribute value.
# This tells Arthur that in the data you send to the platform,
# the `predicted_probability` column represents
# the probability that the ground-truth column has the value 1
pred_to_ground_truth_map = {
    'predicted_probability' : 1
}
arthur_model.build(
    reference_df,
    ground_truth_column='ground_truth_label',
    pred_to_ground_truth_map=pred_to_ground_truth_map)

Non Input Attributes#

Some features of your data may be important to track for monitoring model performance even though they are not model inputs or outputs. These features can be added as non input attributes in the ArthurModel:

# Specifying additional non input attributes when building a model.
# This tells Arthur to monitor ['age','sex','race','education']
# in the reference and inference data you send to the platform
arthur_model.build(
    reference_df,
    ground_truth_column='ground_truth_label',
    pred_to_ground_truth_map=pred_to_ground_truth_map,
    non_input_columns=['age','sex','race','education']
)

Register Attributes Manually#

As an alternative to passing a DataFrame to ArthurModel.build()` , attributes can also be registered for your model manually. Registering attributes manually may be preferable if you don’t use the Pandas library, or if there are attribute properties not configurable from parsing your reference data alone.

ArthurModel.add_attribute() is the generic method add any type of attribute to a model - its docstring also links to the additional attribute registration methods tailored to specific model and data types for convenience.

Binary Classifier with Two Ground Truth Classes#

If the data you send to the platform for a binary classifier has columns for the predicted probability and ground-truth-status of class 0, as well as columns for the predicted probability and ground-truth-status of class 1, then map each predicted value column to its corresponding ground truth column:

# Map PredictedValue attributes to their corresponding GroundTruth attribute names
pred_to_ground_truth_map = {'pred_0' : 'gt_0',
                            'pred_1' : 'gt_1'}

# add the ground truth and predicted attributes to the model
# specifying that the `pred_1` attribute is the
# positive predicted attribute, which means it corresponds to the
# probability that the binary target attribute is 1
arthur_model.add_binary_classifier_output_attributes(
    positive_predicted_attr='pred_1',
    pred_to_ground_truth_map=pred_to_ground_truth_map)

More Than Two Ground Truth Classes#

If you are using a Multi-class model then you will have more than two Ground Truth classes. In order to make this work with the Arthur Platform, you will need to:

Ensure that you are using predict_proba (or a similar function) to predict the probability of a specific Ground Truth Class
Ensure that each class probability is included in its own column in your dataset
Ensure that your Ground Truth mapping contains all possible classes that might be predicted

So for example, if your model identifies the presence of an animal, specifically a dog, cat, or horse, in an image, your Ground Truth mapping must contain items for each of these clasess (even if the model output doesn’t predict a value for these categories).

If the data you send to the platform has ground truth one-hot encoded, then map predictions to each column name:

# Map PredictedValue attributes to their corresponding GroundTruth attribute names.
# This pred_to_ground_truth_map maps predicted values to one-hot encoded ground truth columns.
# For example, this tells Arthur that the `probability_dog` column represents
# the probability that the `dog_ground_truth` column has the value 1.
pred_to_ground_truth_map = {
    "probability_dog": "dog_ground_truth",
    "probability_cat": "cat_ground_truth",
    "probability_horse": "horse_ground_truth"
}

arthur_model.add_multiclass_classifier_output_attributes(
    pred_to_ground_truth_map=pred_to_ground_truth_map
)

If the data you send to the platform has ground truth values in a single column, then map predictions to each column value:

# Map PredictedValue attributes to their corresponding GroundTruth attribute values.
# This pred_to_ground_truth_map maps predicted values to the values of the ground truth column.
# For example, this tells Arthur that the `probability_dog` column represents
# the probability that the ground truth column has the value "dog".
pred_to_ground_truth_map = {
    "probability_dog": "dog",
    "probability_cat": "cat",
    "probability_horse": "horse"
}

arthur_model.add_classifier_output_attributes_gtclass(
    pred_to_ground_truth_map=pred_to_ground_truth_map,
    ground_truth_column="animal"
)

Regression Attributes#

If you are registering a regression model, then specify the type of the predicted and ground truth values when registering the attributes:

# Map PredictedValue attribute to its corresponding GroundTruth attribute
pred_to_ground_truth_map = {
    "predidcted_value": "ground_truth_value",
}

# add the pred_to_ground_truth_map, and specify the type of the
# predicted and ground truth values
arthur_model.add_regression_output_attributes(
    pred_to_ground_truth_map = pred_to_ground_truth_map,
    value_type = ValueType.Float
)

Set Reference Data#

If you used your reference data to register your model’s attributes with ArthurModel.build() , you don’t need to complete this step because the dataframe you pass in as input to build() will be automatically saved as your model’s reference data in the Arthur system.

If you didn’t use build() or want to update the reference dataset to be sent to Arthur, you can set it directly by using the ArthurModel.set_reference_data() method. This is also necessary if your reference dataset is too large to fit into memory as a Pandas DataFrame.

Review Model#

The method ArthurModel.review() returns the model schema, which is a dataframe of properties for each of your model’s registered attributes. The review() method is automatically called when using build(), and can also be called on its own. Inspecting the model schema review() returns is recommended to verify that attribute properties have been inferred correctly.

Note

Some important properties to check in the model schema:

Check that attributes have the correct value types
Check that attributes are correctly marked as categorical or continuous
Check that attributes you want to monitor for bias have monitor_for_bias=True

By default, printing the model schema doesn’t display all the attribute properties. Therefore if you want to examine the model schema in its entirety, you can do so by formatting the maximum number of rows and columns to display:

pd.set_option('display.max_columns', 10); pd.set_option('max_rows', 50)
arthur_model.review()

The model schema should look like this:

    name                    stage	            value_type	    categorical	    is_unique	    categories	                bins	range	        monitor_for_bias
 X0	                    PIPELINE_INPUT	    FLOAT	    False	    False	    []	                        None	[16.0, 58.0]	False
 ground_truth_label      GROUND_TRUTH            INTEGER	    True	    False	    [{value: 0}, {value: 1}]	None	[None, None]	False
 predicted_probability   PREDICTED_VALUE	    FLOAT	    False	    False	    []	                        None	[0, 1]	        False

Note

To modify attribute properties in the model schema table, see the docstring for ArthurAttribute for a complete description of model attribute properties and their configuration methods.

Save Model#

Once you have reviewed your model schema and made any necessary modification to your model’s attributes, you are ready to save your model to Arthur.

Calling arthur_model.save() returns the unique ID Arthur creates for your model. You can easily load the model from the Arthur system later on using either this ID or the partner_model_id you specified when you first created the model.

arthur_model_id = arthur_model.save()

Once you call arthur_model.save(), Arthur will handle creating the model and provisioning the necessary infrastructure to enable data ingestion for this model. If model creation fails, you may try re-saving the model or contact support if the problem persists.

Activate Enrichments#

Enrichments are model monitoring services Arthur provides that can be activated once your model is saved to Arthur.

Models will have the Anomaly Detection enrichment enabled automatically once the reference data is uploaded. However, we’ll first enable Hotspots which doesn’t require any configuration.

Second, we activate explainability, which requires more configuration and therefore comes with its own helper function.

# first activate hotspots
arthur_model.enable_hotspots()

# enable explainability using its own helper function for convenience
arthur_model.enable_explainability(
    df=X_train,
    project_directory="/path/to/model_folder/",
    requirements_file="requirements.txt",
    user_predict_function_import_path="model_entrypoint",
    ignore_dirs=["folder_to_ignore"] # optionally exclude directories within the project folder from being bundled with predict function
)

For more information on enabling enrichments and updating their configurations, see Enrichments.

Onboarding Existing Inferences#

If your model is already running in production, a good next step is to send your historical inferences to Arthur. In this section, we’ll gather those historical inferences and then send them to the platform.

Collecting Historical Inferences#

When logging inferences with Arthur, you may include:

Model Inputs which were sent to your model to make predictions
Model Predictions which you could fetch from storage or re-compute from your input data if you don’t have them saved
Non-Input Data that you want to include, and you registered with your Arthur model but doesn’t feed into your model
Ground Truth labels for the inputs if you have them available
Partner Inference IDs that uniquely identify your predictions and can be used to update inferences with ground truth labels in the future (details below)
Inference Timestamps that you can approximate with the generate_timestamps() function if you’re just simulating production data or omit to use the current time
Ground Truth Timestamps that you can approximate with the generate_timestamps() function if you’re just simulating production data or omit to use the current time
Batch IDs that denote something like a unique “run ID” if your model is a batch model

You might have all the data you need in one convenient place, or more often you’ll need to gather them from a couple of tables or data stores. For example, you might:

collect your input and non-input data from your data warehouse
fetch your predictions and timestamps from blob storage used with your model deployment
match them to your ground truth labels in a different legacy system

Partner Inference IDs#

Arthur offers Partner Inference IDs as a way to match specific inferences in Arthur against your other systems and update your inferences with ground truth labels as they become available in the future. The most appropriate choice for a partner inference ID depends on your specific circumstances but common strategies include using existing IDs and joining metadata with non-unique IDs.

If you already have existing IDs that are unique to each inference and easily attached to future ground truth labels, you can simply use those (casting to strings if needed).

Another common approach is to construct a partner inference ID from multiple pieces of metadata. For example, if your model makes predictions about your customers at most once per day, you might construct your partner inference IDs as {customer_id}-{date}. This would be easy to reconstruct when sending ground truth labels much later: simply lookup the labels for all the customers passed to the model on a given day and append that date to their ID.

If you don’t supply partner inference IDs, the SDK will generate them for you and return them to your send_inferences() call. These can be kept for future reference, or discarded if you’ve already sent ground truth values or don’t plan to in the future.

Sending Inferences#

Arthur offers many flexible options for sending your inferences. We have a few SDK methods can accept Pandas DataFrames, native Python objects, and Parquet files — with data grouped into single datasets or spread across separate method calls and parameters. Two examples of these are outlined below, but for all the available usages see our SDK Reference for:

the ArthurModel.send_inference() and update_inference_ground_truths() methods, which are recommended for non-Parquet datasets under 100,000 rows
the ArthurModel.send_bulk_inferences() and send_bulk_ground_truths() methods which are recommended for sending large datasets or Parquet files

If you’d prefer to send data directly the REST API, see the Inferences section of our API Reference.

A Simple Case#

Here we suppose we’ve gathered our input, non-input, and ground truth labels into a single DataFrame. We also fetch our predictions and the time at which they were made, and send everything in a single method call. Here we’re passing the predictions and timestamps as parameters into the method, but we could also simply add them to the inference_data DataFrame. We don’t worry about partner inference IDs here, leaving them to be auto-generated.

# load model input and non-input values, and ground truth labels + timestamps as a Pandas DataFrame
inference_data = ...

# retrieve predictions and timestamps as lists
#  note that we could also include these as columns in the DataFrame above
predictions, inference_timestamps = ...

# Send the inferences to Arthur
# just using auto-generated partner inference IDs since we're sending ground truth right now
arthur_model.send_inferences(
    inference_data,
    predictions=predictions,
    inference_timestamps=inference_timestamps)

Sending Inferences at Scale with Delayed Ground Truth#

Next, we consider a more complex case where we have a batch model with many inferences and send the ground truth separately, relying on our Partner Inference IDs to join the ground truth values to the previous inferences. We assume the data is neatly collected as described above. This may rely on an ETL job that might involve a Spark job or a Redshift export or a Snowflake export or Apache Beam job in Google Cloud Dataflow or Pandas from_sql() and to_parquet() calls or whatever data wrangling toolkit you’re most comfortable with.

# we can collect a set of folder names each corresponding to a batch run, containing one or
#  more Parquet files with the input attributes columns, non-input attribute columns, and
#  prediction attribute columns as well as a "partner_inference_id" column with our unique
#  identifiers and an "inference_timestamp" column
inference_batch_dirs = ...

# then suppose we have a directory with one or more parquet files containing matching
#  "partner_inference_id"s and our ground truth attribute columns as well as a
#  "ground_truth_timestamp" column
ground_truth_dir = ...

# send the inferences to Arthur
for batch_dir in inference_batch_dirs:
    batch_id = batch_dir.split("/")[-1]  # use the directory name as the Batch ID
    arthur_model.send_bulk_inferences(
        directory_path=batch_dir,
        batch_id=batch_id)

# send the ground truths to Arthur
arthur_model.send_bulk_ground_truths(directory_path=ground_truth_dir)

See Model in Dashboard#

To confirm that the inferences have been sent, you can view your model and its inferences in the Arthur dashboard.

Performance Results#

Once you’ve logged your model’s inferences with Arthur you can evaluate your model performance. You can open your Arthur dashboard to view model performance in the UI, or use the code snippets below to fetch the same results right from your Python environment using Arthur’s Query API.

Query Overall Performance#

You can query overall Accuracy Rate with the following snippet, but for non-classifier models you might consider replacing the accuracyRate function with another model evaluation function.

# query model accuracy across the batches
query = {
    "select": [
        {
            "function": "accuracyRate"
        }
    ]
}
query_result = arthur_model.query(query)

Visualize Performance Results#

Visualize performance metrics over time:

# plot model performance metrics over time
arthur_model.viz.metric_series(
    ["auc", "falsePositiveRate"],
    time_resolution="hour")

Visualize data drift over time:

# plot drift over time of attributes
# from their baseline distribution in the model's reference data
arthur_model.viz.drift_series(
    ["X0", "predicted_probability"],
    drift_metric="KLDivergence",
    time_resolution="hour")

API Query Guide #

For more analysis of model performance, the API Query Guide shows how to use the Arthur API to get the model performance results you need, efficiently and at scale. Our backend query engine allows for fine-grained and customizable performance analysis.

Production Integration#

Now that you have registered your model and successfully gotten initial performance metrics on your model’s historical inferences, you are ready to connect your production pipeline to Arthur.

Arthur has several methods of receiving your production model’s inference data. Most involve some process making a call to one of the SDK methods described above, but where that process runs and reads data from depends on your production environment. We explore a few common patterns below, as well as some of Arthur’s direct integrations.

For a quick start, consider the quick integration, which only involves adding a few lines of code to your model prediction code.

If your model inputs and predictions are written out to a data stream such as a Kafka topic, consider adding a stream listener

If you don’t mind a bit of latency between when your predictions are made and logged with Arthur or it’s much easier to read your inference data from rest, consider setting up an inference upload job.

Note that these methods can be combined for prediction and ground truth values: you might use the quick integration or streaming approach for inference data but a batch job to update ground labels.

API Keys#

API Keys authorize your request to send and receive data to and from the Arthur platform. With a valid API key added to your production environment, your model deployment code can be augmented to send your model’s inferences to Arthur. See the Arthur Standard Access Control Overview to obtain an Arthur API key.

Quick Integration#

Quick integration with Arthur means using the send_inferences() method when and where your model object produces inferences. This is the simplest and quickest way to connect a production model to Arthur. However, this option would have you add some latency to the speed with which your model is generating inferences. For more efficient approaches, see options 2 and 3.

For example, suppose your model is hosted in production behind using an API using Flask - the call to arthur_model.send_inferences() just needs to be included wherever your predict function is defined so your updated code might look something like this:

####################################################
# New code to fetch the ArthurModel
# connect to Arthur
import os
from arthurai import ArthurAI
arthur = ArthurAI(
    url="https://app.arthur.ai",
    access_key=os.environ["ARTHUR_API_KEY"])

# retrieve the arthur model
arthur_model = arthur.get_model(os.environ["ARTHUR_PARTNER_MODEL_ID"], id_type='partner_model_id')
####################################################

# your original model prediction function
# which can be on its own as a python script
# or wrapped by an API like a Flask app
def predict():
    # get data to apply model to
    inference_data = ...

    # generate inferences
    # in this example, the predictions are classification probabilities
    predictions = model.predict_proba(...)

    ####################################################
    #### NEW PART OF YOUR MODEL'S PREDICTION SCRIPT

    # SEND NEW INFERENCES TO ARTHUR
    arthur_model.send_inferences(
        inference_data,
        predictions=predictions)
    ####################################################

    return predictions

Alternatively if you have a batch model that runs in jobs, you might add similar code to the very end of your job, rather than inside the predict() function.

Streaming Integrations#

If you write your model’s inputs and outputs to a data stream, you can add a listener to that stream to log those inferences with Arthur. For example, if you have a Kafka topic you might add a new arthur consumer group to listen to new events and pass them to the send_inferences() method. If your inputs and predictions live in different topics or you want to add non-input data from another topic, you might use Kafka Streams to join the various topics before sending to Arthur.

Inference Upload Jobs#

Another approach is to run jobs that read data from rest and send it to the Arthur platform. These jobs might be scheduled or event-driven, depending on your architecture.

For example, you might have regularly scheduled jobs that:

look up the inference or ground truth data since the last run
format the data and write it to a few Parquet files
send the Parquet files to the Arthur platform using send_bulk_inferences() or send_bulk_ground_truths()

Integrations#

Rather than hand-rolling your own inference upload jobs, Arthur also offers more direct integrations.

For example, our SageMaker Data Capture Integration makes integrating with SageMaker models a breeze by utilizing Data Capture to log the inferences into files in S3, and triggering upload jobs in response to those file write events.

Our Batch Ingestion from S3 allows you to just upload your Parquet files to S3, and Arthur will automatically import them into the system.

Model Onboarding#

Overview#

Requirements#

Outline#

Model Registration#

Connect to Arthur#

Register Model Type#

Register Attributes with ArthurModel.build()#

Non Input Attributes#

Register Attributes Manually#

Binary Classifier with Two Ground Truth Classes#

More Than Two Ground Truth Classes#

Regression Attributes#

Set Reference Data#

Review Model#

Save Model#

Activate Enrichments#

Onboarding Existing Inferences#

Collecting Historical Inferences#

Partner Inference IDs#

Sending Inferences#

A Simple Case#

Sending Inferences at Scale with Delayed Ground Truth#

See Model in Dashboard#

Performance Results#

Query Overall Performance#

Visualize Performance Results#

API Query Guide#

Production Integration#

API Keys#

Quick Integration#

Streaming Integrations#

Inference Upload Jobs#

Integrations#

API Query Guide #