Arthur Quickstart#
From a Python environment with the arthurai
package installed, this quickstart code will:
Make binary classification predictions on a small dataset
Onboard the model with reference data to Arthur
Log batches of model inference data with Arthur
Get performance results for our model
Imports#
The arthurai
package can be pip
-installed from the terminal, along with numpy
and pandas
:
pip install arthurai numpy pandas
Then you can import the functionality we’ll use from the arthurai
package like this:
# Arthur imports
from arthurai import ArthurAI
from arthurai.common.constants import InputType, OutputType, Stage
from arthurai.util import generate_timestamps
# Other libraries used in this example
import numpy as np
import pandas as pd
Model Predictions#
We write out samples from a Titanic survival prediction dataset explicitly in Python,
giving the age of each passenger, the cost of their ticket, the passenger class of their ticket,
and the ground-truth label of whether they survived.
Our model’s outputs are given by a predict function using only the age
variable. We split the data into
reference_data
for onboarding the modelinference_data
for in-production inferences the model processes
# Define Titanic sample data
titanic_data = pd.DataFrame({
"age":[19.0,37.0,65.0,30.0,22.0,24.0,16.0,40.0,58.0,32.0],
"fare":[8.05,29.7,7.75,7.8958,7.75,49.5042,86.5,7.8958,153.4625,7.8958],
"passenger_class":[3,1,3,3,3,1,1,3,1,3],
"survived":[1,0,0,0,1,1,1,0,1,0]})
# Split into reference and inference data
reference_data, inference_data = titanic_data[:6].copy(), titanic_data[6:].copy()
# Predict the probability of Titanic survival as inverse percentile of age
def predict(age):
nearest_age_index = np.argmin(np.abs(np.sort(reference_data['age']) - age))
return 1 - (nearest_age_index / (len(reference_data) - 1))
# reference_data and inference_data contain the model's inputs and outputs
reference_data['pred_survived'] = reference_data['age'].apply(predict)
inference_data['pred_survived'] = inference_data['age'].apply(predict)
Onboarding#
This code will only run once you enter a valid username.
First we connect to the Arthur API and create an arthur_model
with some high-level metadata: a classification model
operating on tabular data with the name “TitanicQuickstart”.
# Connect to Arthur
arthur = ArthurAI(url="https://beta.app.arthur.ai",
login="<YOUR_USERNAME_OR_EMAIL>")
# Register the model type with Arthur
arthur_model = arthur.model(display_name="Example: Titanic Quickstart",
input_type=InputType.Tabular,
output_type=OutputType.Multiclass)
Next we infer the model schema from reference_data
, specifying which attributes are in which
stage. Additionally, we configure extra settings for the passenger_class
attribute. Then we save the model to the platform.
# Map PredictedValue attribute to its corresponding GroundTruth attribute value.
# This tells Arthur that the `pred_survived` column represents
# the probability that the ground truth column has the value 1
pred_to_ground_truth_map = {'pred_survived' : 1}
# Build arthur_model schema on the reference dataset,
# specifying which attribute represents ground truth
# and which attributes are NonInputData.
# Arthur will monitor NonInputData attributes even though they are not model inputs.
arthur_model.build(reference_data,
ground_truth_column='survived',
pred_to_ground_truth_map=pred_to_ground_truth_map,
non_input_columns=['fare', 'passenger_class'])
# Configure the `passenger_class` attribute
# 1. Turn on bias monitoring for the attribute.
# 2. Specify that the passenger_class attribute has possible values [1, 2, 3],
# since that information was not present in reference_data (only values 1 and 3 are present).
arthur_model.get_attribute(name='passenger_class').set(monitor_for_bias=True,
categories=[1,2,3])
# onboard the model to Arthur
arthur_model.save()
Once you call arthur_model.save()
, Arthur will handle creating the model and provisioning the necessary infrastructure to enable data ingestion for this model. If model creation fails, you may try re-saving the model or contact support if the problem persists.
Sending Inferences#
Here we send inferences from inference_data
to Arthur. We’ll oversample inference_data
and use Arthur’s utility
function to generate some fake timestamps as though the inferences were made over the last five days.
# Sample the inference dataset with predictions
inferences = inference_data.sample(100, replace=True)
# Generate mock timestamps over the last five days
timestamps = generate_timestamps(len(inferences), duration='5d')
# Send the inferences to Arthur
arthur_model.send_inferences(inferences, inference_timestamps=timestamps)
Inferences usually become available for analysis in seconds, but it can take up to a few minutes. You can wait until they’re ready for your analysis like this:
# Wait until some inferences land in Arthur
arthur_model.await_inferences()
Performance Results#
With our model onboarded and inferences sent, we can get performance results from Arthur. View your model in your Arthur dashboard, or use the code below to fetch the overall accuracy rate:
# Query overall model accuracy
query = {
"select": [
{
"function": "accuracyRate"
}
]
}
query_result = arthur_model.query(query)
print(query_result)
You should see [{'accuracyRate': 0.8}]
or a similar value depending on the random sampling of your inference set.
Next Steps#
Basic Concepts#
Learn more about important terms and ideas to get familiar with model monitoring using the Arthur platform on the Basic Concepts page.
In-Depth Examples#
Try out more thorough examples in our Jupyter Notebooks at the Arthur Sandbox GitHub repository.
Onboard Your Model#
Use the detailed Model Onboarding walkthrough to get your own production model integrated with the Arthur platform.