Aporia and Snowflake Pave the Way for Data Science Excellence
Today marks an extraordinary milestone in the journey of leveraging data science for transformative insights! We’re beyond thrilled to announce...
Prompt engineering sucks. Break free from the endless tweaking with this revolutionary approach - Learn more
Securing AI systems is tricky, ignoring it is risky. Discover the easiest way to secure your AI end to end - Learn more
We are excited to announce Direct Data Connectors (DDC), a novel way to monitor your Machine Learning models in production by connecting directly to your training and inference datasets. DDC allows you to monitor without duplicating any of your data. You can now monitor billions of predictions without data sampling, production code changes, or hidden cloud costs.
By simply connecting Aporia to a database where you already store your model predictions, you immediately get fully-customizable ML dashboards tailored to your use case, customizable drift detection, live alerting, XAI, and root-cause analysis tools at your fingertips. Getting started with ML Monitoring has never been easier, and we are releasing this new capability with support for BigQuery, Amazon S3, Athena, Glue Data Catalog, Delta Lake, Postgres, Redshift, Snowflake, Azure Data Lake Storage, and Databricks – we are continuously adding more connectors.
When looking across the ML monitoring market, we see a gap between the flexibility, efficiency, and security that organizations prioritize from their monitoring solutions and the fact that other monitoring solutions act solely as inference stores.
This gap and the following challenges are why DDC is essential to getting the most out of your production models:
Outrageous cloud costs — ML Monitoring solutions that are based on databases like Apache Druid, Elasticsearch, or Clickhouse can quickly become extremely expensive, reaching $10,000+ monthly in cloud costs, in addition to the monthly maintenance fees that accompany these databases.
3. Data sampling comes with a distorted view — Many ML use cases require processing billions of predictions – common examples include recommendation systems, search ranking models, large fraud detection models, and some types of demand forecasting models.
As a result, many of the companies we spoke with were forced to monitor only a small random sample of their data in production. Unfortunately, with small samples of data, ML monitoring becomes highly inaccurate – issues go unnoticed, false positive alerts are common, and monitoring drift, bias, or fairness issues becomes ineffective.
4. Production data duplication — When implementing a monitoring solution that uses an SDK / Importer for reporting data, these systems often store a copy of your data in their own proprietary format in their database.
This results in the following:
DDC is a transformative technology that empowers ML teams to effortlessly monitor and track their ML models by seamlessly integrating Aporia with their production database. By directly accessing your existing data lake, you can effortlessly monitor billions of predictions at minimal cloud costs.
The Head of Data Science from a known e-commerce platform in the US, managing billions of dollars in transactions annually, shared their experience with DDC – “Integrating Aporia’sDDC directly to our BigQuery was easier than expected, and we were able to onboard a dozen models in less than a day.”
With Aporia’s DDC, model monitoring is made easy, helping your ML teams shine in production, and check off necessary tasks that benefit the entire organization:
✅ Monitoring models with DDC is easy – ~7 minutes for model integration
✅ Clear and low cloud costs
✅ Monitor ALL your data at once
✅ Your data stays yours
✅ No vendor lock-in
✅ A single source of truth
We see more and more ML teams who create a centralized store for their production inference data. By doing so, they can audit and investigate historical data, have more quality data for training, and monitor their models in a matter of minutes. If you aren’t already storing your predictions, read our quick guide on Storing Your Predictions.
By decoupling the storage of inference data from the monitoring system, your data stays yours, in your own format, in your data store. There is zero risk of losing your precious data with a vendor-proprietary database.
With DDC enabled, integrating Aporia to your data source is accomplished in only three simple steps:
Connect your data source (not limited to the databases displayed):
Link your Dataset:
Define your model schema and start monitoring your production predictions:
That was simple. In just a few short clicks, monitoring is made easy, secure, and cost-saving. Now, just wait for insights to pour in and start showcasing the value of your predictions.
With Aporia’s DDC, model integration is as easy as writing an SQL query, and can be completed in minutes.
If you’d like to learn more about Direct Data Connectors and see how it benefits your organization, please reach out to us.
Today marks an extraordinary milestone in the journey of leveraging data science for transformative insights! We’re beyond thrilled to announce...
We are thrilled to announce the availability of our new EU region. Hosted in Germany, this region complies with the...
The first all-in-one tool for performing effective and swift root cause analysis in a collaborative, notebook-like experience.
We’re super excited to share that Aporia is now the first ML observability offering integration to the Databricks Lakehouse Platform....
We are excited to announce that Aporia is now available on the AWS Marketplace. Through this strategic partnership, it’s now...
Aporia’s Commitment to Data Privacy and Security We are proud to have achieved Health Insurance Portability and Accountability Act (HIPAA)...
We are excited to announce that we have just released a revision of our documentation. It took a quick second...
We are excited to announce that we have achieved SOC 2 Type II compliance. Achieving SOC 2 Type II compliance...