🎉 AI Engineers: Join our free webinar on how to improve your RAG performance. Secure your spot >

Back to Blog
Data Science

Target, Walmart, Macy’s & Kohl’s: Demand Forecasting in a Dynamic World

Demand Forecasting blog by Aporia
Alon Gubkin Alon Gubkin
11 min read Dec 25, 2022

Demand Forecasting ML models present huge potential for retailers to generate a lot of revenue and streamline the business. While almost all big retailers use such models, many of them have been subjected to inventory purgatory since the beginning of the year.

In this post, we’ll discuss how due to a lack of production practices around Machine Learning, Demand Forecasting models won’t be able to live up to their potential. We’ll break down why this happens using real-world examples and guide you toward a solution.

Demand Forecasting in Production

Demand Forecasting models utilize historical data to predict future demand. Retailers using these models have a better understanding and estimation of the number of products that will be sold in a certain cluster – state, age group, etc. This enables them to better manage their financials and resources, calculate profit margins and cash flows, and optimize product storage.

However, while Demand Forecasting makes for a more profitable business, there are some issues with these models that need to be addressed to ensure these models showcase their value:

The Confidence Challenge

Let’s say we are in Q2, and we’ve just predicted the demand for the next quarter Q3. We have just forecasted how many items will be sold per product line in each state in the US, and now we can stock our warehouse and manage our inventory. This means that we are buying items based on the model predictions; and if these are inaccurate – we can lose money, as stores will be over or under-stocked.

Traditionally, ML models are measured by comparing their predictions to the ground truth (otherwise called “labels”). But if you think about it, we will only have the full ground truth of our predictions, at the end of Q3.

This presents a major challenge in production: Since it takes such a long time to get ground truth, we have long periods where we have no way of measuring these models, meaning “money is tied up”. In these time periods, how confident should we be with the model predictions?

Tip: Even though there isn’t a full proof way to measure the model performance in production before getting the ground truth, there are a lot of tests you can perform in order to detect issues far earlier – right after the prediction itself. For example, you could test the integrity of the inputs of the model (e.g missing values, outliers, etc.) to make sure to avoid the “Garbage In-Garbage Out” effect. Additionally, you can measure data drift between production data and the training set to make sure that the training set represents our production data as much as possible.

The Many-Segments Challenge

Now, let’s say we have arrived at the end of Q3 and we finally have the full ground truth of our model. You can now measure the model to understand how accurate it was in predicting demand in the last quarter… or can you?

Even if in general your model accuracy was high, there might be specific slices of your data where the model is underperforming; for example a specific product line, store, brand, U.S state – or any combination of those. Either demand in this specific data segment will be too low or too high, meaning you’re leaving money on the table.

Tip: Look into specific important slices of your data and measure performance in each one of them separately, to make sure you are not underperforming in one of them.

 The Trend Challenge

Models are trained on historical data that represents a specific state of the world. Therefore, if there’s a new trend in the world, model predictions could be inaccurate, potentially leaving you from optimizing the model to account for seasonality changes or new trends. Once again resulting in potential profits unrealized.

Examples of such trends include seasonality trends (such as holidays – Christmas, Black Friday, etc) or real-world changes (hopefully one-time events like Covid-19). Customer behavior might change in these time periods, but the model is not trained to predict them.

Consider the simple and obvious example of the swift change in the US between Thanksgiving and Christmas, turkeys and pumpkins come down and Christmas trees and Santas go up.

Tip: By comparing general statistics of your production data to the model training set, you’d be able to identify trends that the model wasn’t trained on. Make sure to do this not only for the entire production data, but also in sub-segments, as discussed above.

Now we know how Demand Forecasting models are used, and what issues may challenge successful model predictions. Let’s take a deeper look at some of the biggest retailers in the US and how their inventory levels have affected their stock shares on Wall Street.

Retailers Turn to ML-Driven Demand Forecasting Models

The past couple of years has highlighted the need for Next-Gen technology to help manage inventories and save retailers from a major crisis. Demand forecasting is the go-to method for predicting which products and/or services consumers will buy.

Businesses have adopted AI/ML technologies to help improve and perfect their pipelines by backing their decision-making with concrete data. According to a McKinsey report, using machine learning (ML) forecasting models can help reduce supply chain errors by 30% to 50%, while lost sales can be reduced by up to 65%.

However, this rush to ML is not done leisurely. A key aspect of running a reliable forecasting model is the ability to monitor and explain model performance. This helps ensure model health, providing a performance compass for stakeholders to accurately predict future demand.

Many big retailers have been subjected to inventory challenges since the beginning of the year, with either too much product or not enough. While this could be attributed to problematic supply chains or a funky economic landscape – another explanation could be that seasonal changes and new global trends have been too swift for the forecasting models to catch up. In this case, proper model monitoring becomes invaluable to improve Demand Forecasting and ensure its value is showcased. ML monitoring is an essential solution to catch real-world drift in your models before they cost you money. 

How Real-World Events Impact Supply Chains & Inventory

Consumer products have been telling two totally different stories in recent years – at the beginning of the pandemic, we witnessed how the shelves at popular department stores were cleared by panicking consumers. Ya’ll remember the great toilet paper shortage fear of 2020, right? Then came the lockdowns and quarantines, which when combined with stimulus checks, sparked hyper-consumerism for pandemic-catalyzed products.

However, as we shift back to “normal”, these same retailers who enjoyed uber-prosperous times, find themselves in quite the inventory pickle. Product demand has changed, supply chains have been interrupted, now rising inflation, and retailers have a big demand forecasting problem on their plates. Take the ongoing chip shortage as an example – the main reason I have had to wait more than 6 months for my PS5! This ain’t funny, “I needs to game!” ???? ????

Putting my gaming addiction aside, inventory and supply chain issues are heavily impacting the retail industry, forcing retailers across the board to reduce prices, spelling trouble for their bottom line.

Retail’s Inventory Ruckus Spills onto Wall Street

Retailers were riding high on specific products during the pandemic, so they stocked up, predicting that demand wouldn’t change so abruptly. However, a few months later and those same products turn into an inventory crutch, preventing retail businesses from profiting from the previous high demand while limiting new purchase orders.

Other than global supply chain restraints, all of the companies below cited inventory issues as a key reason for their downward outlook.

In Wall Street talk, lowering financial guidance has been a recurring theme this year for many retail powerhouses, share prices have dipped, causing investors to potentially lose trust (at least in the short term). Let’s take a look at some numbers from Q2 2022 and see how inventory woes have been affecting “Big Retail”:


Kohl’s has struggled mightily this year, reporting that it expects 2022 sales to drop 5% to 6% YoY. The Wisconsin chain also noted that inventory was 48% higher than the previous year for Q2.

“We have adjusted our plans, implementing actions to reduce inventory and lower expenses to account for a softer demand outlook,” said Kohl’s CEO Michelle Gass in a CNN article.


Macy’s has slashed their per-share outlook from around $4.53 to $4, dropping 12.5% following a 7% increase in inventory levels from last year. The department store cited that it’s aiming to cut prices on seasonal goods, private brand merchandise, and any of the products that were popular during the pandemic, such as sportswear and pajamas, to name a few.


As one of the more popular retailers in the U.S., Target saw their Q2 net income fall a whopping 90%, YoY, to 39 cents a share from an expected $3.65 per share. The company had already cut its profit outlook twice this year and has seen its stock price plummet by nearly 30% since the beginning of 2022.


Walmart’s 2022 outlook doesn’t scream 2020 at all, although food and beverage sales will help pick up the slack. The Arkansas retail giant reported cutting 200 corporate jobs after slashing its profits outlook. Additionally, Walmart is anticipating a full-year decline of 11% to 13% for its adjusted earnings per share

Once a demand forecasting model is let loose in the real world, continuous monitoring is best practice. Because when a model is producing garbage outputs, it’s definitely not driving its intended value, and can even end up costing you more. Sound familiar? In the cases of Zillow, Unity Software, and more recently, Equifax, reliable ML monitoring could have alerted these companies’ ML stakeholders to bad predictions, guided them toward remediation, and helped avoid revenue loss and bad PR.

How to Improve Demand Forecasting Models in Production with Aporia

Aporia is the only ML Observability platform that enables you to centralize, visualize, and maximize the value of your machine learning models through customizable dashboards, advanced monitoring, explainable AI, and root cause investigation tools.

As we’ve covered so far, models can drift and deviate from their intended business and data science goals. If you’re thinking bad news, then you’re right. Without a proper monitoring platform, like Aporia, in place, issues with demand forecasting models could potentially impact growth, revenues, optics, and lead to share-price depreciation. To keep that from happening, monitoring ML models in production is essential to tackle the challenges mentioned above.

But, what does monitoring mean? Here are the ABCs of model monitoring:

1. Store your model’s inputs and outputs in production. This is a prerequisite for model monitoring. You can use data stores like BigQuery, Databricks Lakehouse, Snowflake, and S3. For better results monitor data sources directly, as sample sizes can easily produce skewed results.

2. Create custom dashboards that visualize statistics like ground truth metrics (e.g MAPE/WAPE), top drifted features, predicted demand, different statistics by segments,  etc… This way you can centralize your production models under one source of truth that can be understood by the entire company, from data science to business stakeholders.

A dashboard displaying a Demand Forecasting model with Aporia

3. Create live alerts to let you know once there is an issue. By customizing alerting you’re able to get deeper insights into the issue faster, respond more efficiently, and accelerate retraining.

Assuming you are storing your predictions on S3, you can start monitoring your model quickly by:

monitored_model = aporia.Model(“”, “My Demandforecasting Model”)


We’ve covered the challenges and pitfalls of demand forecasting, and discussed how to avoid and fix them. The real-world examples of Target, Walmart, Kohl’s, and Macy’s amplify the need for overview and guardrails to be set in place to ensure demand forecasting models can be trusted and leveraged with confidence by businesses. In the meantime, I’ll be waiting patiently for my PS5 to arrive :).

Convinced that it’s time to start monitoring your Demand forecasting models in production? See Aporia in action and learn how to maximize your ML goals.

On this page

Prevent Data
in real time
Book a Demo

Great things to Read

Green Background

Control All your GenAI Apps in minutes