Success Criteria for Machine Learning Models

Models Integration

Functionality

Description

Framework and Platform Agnostic

Monitoring system can support all of the existing ML platforms and frameworks used by the team.

Integration with existing database

The solution is able to integrate with existing databases and datalakes that store production data (e.g. S3, ADLS, etc..).

Integration with Python-based serving

The solution can integrate with python-based serving infrastructure.

Visibility & Investigation

Functionality

Description

Model Management

There’s a centralized place where users can see all production models with their health status and recent activity.

Compare 2 versions

View the performance over time of 2 different model versions in comparison mode to quickly identify the best performing one.

Performance over time

When ground truth is available, view performance metric (accuracy, f1, etc.) and how they change over time.

Proxy Performance Metrics

In cases where ground truth is not available, view average prediction and other aggregations like mean/sum/std dev on predictions over time to evaluate model performance.

Distribution investigation & comparison

Live distribution analysis of production data & predictions.

For investigation purposes, allow distribution comparison of:

* Different model versions

* Different time frames

* Different data segments

Metrics over time analysis

The platform provides tooling to visualize various metrics and the way the change over time to identify correlations.

Data statistics

View and compare live data & prediction statistics including the following info for each feature: Numeric – Feature name, Mean, Std Dev, Zeros, Min, Median, Max Categorical – Missing, Unique, Top, Freq. Top

Segments analysis

Define segments of interest (e.g. state = “CA” and age >30) and provide tools to analyze prediction distribution and performance across different segments.

Segments group analysis

Slice the data by segment groups i.e. segment by Age groups will result in: age<10, 10<age<20, 20<age<30, etc.. For each, a segment analysis will be available with a view of group behavior for identifying misbehavior in specific segments.

Drift scoring

Get drift score for each feature and prediction for quickly identifying drifting features.

Monitoring & Alerting

Functionality

Description

Data integrity monitoring

The solution supports creating customized monitors to detect Data Integrity issues:

* Missing Values

* Model Activity (inference count)

* New Values

* Out of range

Data drift monitoring

The solution supports creating customized monitors to detect data drift.

Prediction drift monitoring

The solution supports creating customized monitors to detect prediction drift.

Monitors Customization

As different models have different data and performance metrics, the solution will allow an easy way to customize the thresholds and monitoring logic of each monitor.

Standard metrics monitoring

The solution supports creating customized monitors to detect anomalies and sudden changes in metrics such as:

* Avg

* Min

* Max

* Variance

* Standard Deviation

Performance degradation monitoring

The solution supports customized monitors for performance degradation and comes out of the box with the standard performance metrics:

* Accuracy

* Precision

* Recall

* F1 Score

* AUCROC

* MSE

* RMSE

* MAE

* Logloss

* WAPE

* MAPE

Custom metric monitoring

Users are able to define their own custom metrics within the platform and monitor them for anomalies and degradation.

Monitoring with training as baseline

The solution allows setting the training set as a baseline for a monitor (e.g. data drift compared to training).

Monitoring for anomalies over time

The solution allows monitoring data anomalies over time (e.g. unexpected seasonal changes in missing values)

Monitoring protected populations

The solution allows monitoring specific populations (data slice) for anomalies and unexpected behavior.

Explainability

Functionality

Description

Prediction explanation

Users can easily analyze specific prediction and see what was the contribution of each input to the final prediction.

What-if analysis

Users are able to explore what-if scenarios by changing some input features, and watching the effect on model’s prediction.

Human-readable explanation

The system is able to generate a non-technical explanation sentence for each prediction.

Integrations

Functionality

Description

E-mail alerts

Alerts can be received via e-mail.

Slack alerts

Alerts can be received via slack.

Webhook integration

System supports generic integration to 3rd party solution by triggering a webhook.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

ML Observability Project Success Criteria

XLS

CSV

Sheets

Notion

Models Integration

Visibility & Investigation

Monitoring & Alerting

Explainability

Integrations

Control All your GenAI Apps in minutes