Functions, Users, and Comparative Analysis
We decided that Docs should have prime location.
Build AI products you can trust.
We’re super excited to share that Aporia is now the first ML observability offering integration to the Databricks Lakehouse Platform. This partnership means that you can now effortlessly automate your data pipelines, monitor, visualize, and explain your ML models in production. Aporia and Databricks: A Match Made in Data Heaven One key benefit of this […]
Fundamentals of ML observability
Metrics, feature importance and more
We’re excited 😁 to share that Forbes has named Aporia a Next Billion-Dollar Company. This recognition comes on the heels of our recent $25 million Series A funding and is a huge testament that Aporia’s mission and the need for trust in AI are more relevant than ever. We are very proud to be listed […]
Introduction >
Or is a software engineer at Aporia and an avid gaming enthusiast "All I need is a cold brew and the controller in my hand, and I'm good to go."
Evaluating the performance of our ML models is an integral part of our work. A model’s performance can greatly influence its utility in real-world applications. Today, in our Production ML Academy, we’re going to talk about F1 Score, a metric that combines Precision and Recall, making it particularly useful when we have imbalanced datasets or when both false positives and false negatives are costly.
It’s a measure of a model’s accuracy on a dataset. It is used to evaluate binary classification systems, which classify examples into ‘positive’ or ‘negative’. F1 Score is the combined measure of Precision and Recall, providing a comprehensive view of these two metrics.
The formula for F1 Score is:
Let’s take an example. Consider a scenario where we’re building a machine learning model to detect spam emails. Here, a ‘positive’ example is a spam email, and a ‘negative’ example is a non-spam email. In this scenario:
Let’s say out of 100 emails, 30 are spam. The model correctly identified 25 spam emails, but classified 5 spam emails as non-spam. Additionally, the model incorrectly classified 3 non-spam emails as spam. Here, TP = 25, FP = 3, and FN = 5.
First, we calculate Precision and Recall:
Precision = TP / (TP + FP) = 25 / (25 + 3) = 0.8929 (approx)
Recall = TP / (TP + FN) = 25 / (25 + 5) = 0.8333 (approx)
Now, using the F1 Score formula:
F1 Score = 2 * (Precision * Recall) / (Precision + Recall) = 2 * (0.8929 * 0.8333) / (0.8929 + 0.8333) = 0.8621 (approx)
This tells us that the F1 Score of the model on this dataset is 0.8621, or about 86.21%.
While Precision is the ratio of correctly predicted positive observations to the total predicted positives, and Recall is the ratio of correctly predicted positive observations to the all observations in actual class, F1 Score is the weighted average of Precision and Recall. Therefore, this score takes both false positives and false negatives into account.
It is worth noting that while these metrics provide critical insights into the performance of a classification model, the choice of metric depends largely on the specific application and the business requirements at hand. Some tasks might require high precision, while others necessitate high recall or a balance between both, which is encapsulated by the F1 Score.
In binary classification, F1 Score becomes critical when the data is imbalanced. It gives us a single metric that encapsulates both Precision and Recall, giving us a more comprehensive view of the model’s performance.
In multi-class problems, F1 Score can be calculated for each class separately by considering one class as positive and the rest as negative. We can then calculate a weighted average of these F1 Scores.
When F1 Score is Critical
When F1 Score may not be the Only Priority
Practical tips for improving F1 Score
The F1 Score is not just useful during model development, but also when the model is in production. Continuous monitoring of the F1 Score can help you ensure that your model continues to perform well and catch potential issues before they become problematic.
Limitations and Cautions of F1 Score
Wrapping up, the F1 Score provides a balanced measure of Precision and Recall, and is a crucial tool in every ML engineer’s toolkit. However, as with all tools, it’s important to understand when and how to use it. By considering the unique requirements of your problem and using the F1 Score judiciously, you can develop ML models that perform well in practice.