April 7, 2024 - last updated

Python Machine Learning: Top 5 Libraries and a Quick Tutorial

Reah Miyara

9 min read Oct 13, 2022

What is Python Machine Learning?

Python is a high-level programming language commonly used for machine learning (ML) and artificial intelligence (AI) projects. This open-source programming language is backed by a strong community and abstracts many aspects to increase productivity. Some of the most popular AI/ML libraries use Python, including TensorFlow and PyTorch.

A Python library provides code and functionality that eliminates the need to build from scratch. Many Python libraries are available to support ML projects, including frameworks for data preparation, visualization, modeling, and advanced ML algorithms like neural networks.

This is part of a series of articles about machine learning engineering.

Why is Python Preferred for Machine Learning and AI?

Python has long been a staple of machine learning and artificial intelligence developers. Python gives developers great flexibility and has features that increase developer productivity and improve code quality. It also provides numerous libraries that help ease the workload.

Here are the key features that make Python a preferred programming language for machine learning, deep learning, and artificial intelligence projects:

Free, open source, and backed by a strong community, ensuring long-term improvement.
A comprehensive library of packages to solve common problems.
Seamless implementation and integration for people of all skill levels
Improved productivity by reducing coding and debugging time
Can be used for soft computing and natural language processing (NLP)
Works seamlessly with C and C++ code modules

Top Python Libraries for Machine Learning

A Python library is a collection of modules that contain useful code and functionality without having to build it from scratch. There are tens of thousands of Python libraries to help machine learning developers and professionals working in fields such as data science, data visualization, and more.

PyTorch

PyTorch is an open-source machine learning library developed by Facebook for training and deploying machine learning models. It is based on the Torch library, which was developed at the University of Notre Dame and is primarily written in the programming language Lua.

PyTorch is designed to be flexible and efficient, and it provides support for both eager execution (which allows users to manipulate and execute models directly, without the need for an abstract syntax tree) and graph-based execution (which allows users to build models with complex control flow and to run them efficiently on GPUs).

TensorFlow

TensorFlow is a free and open source library for numerical computation. Developed in 2015 by the Google Brain research team, it provides an exhaustive mathematical library suitable for neural network applications and large-scale systems.

The library supports probabilistic methods such as Bayesian models, with several distribution functions such as Bernoulli, Chi2, and Gamma. In addition, it supports all common neural network architectures, such as CNN and RNN.

TensorFlow processes data quickly and accurately and is suitable for parallel processing applications and distributed computing. TensorFlow provides robust scalability, strong graph visualization, seamless library management, and compatibility with GPUs, ASICs, and more. However, the library can be difficult to learn and use for machine learning beginners.

Scikit-learn

scikit-learn is another popular machine learning library, based on NumPy and SciPy. It supports most existing supervised and unsupervised learning algorithms and can also be used for data mining, modeling, and analysis. The simple design of scikit-learn provides an easy-to-use library for machine learning beginners.

OpenCV

OpenCV (Open Source Computer Vision) is a free and open-source library of computer vision and machine learning algorithms. It was developed by Intel and later maintained by Willow Garage and Itseez. OpenCV was designed for computational efficiency and with a strong focus on real-time applications.

OpenCV provides a wide range of tools and functions for tasks such as image and video processing, feature detection and extraction, object detection and recognition, and more. It is implemented in C++ and has interfaces for several programming languages, including Python, C#, and Java.

SciPy

The SciPy library includes functions for optimization, linear algebra, signal and image processing, statistics, and more. It also includes a number of submodules that provide additional functionality, such as scipy.optimize for optimization, scipy.linalg for linear algebra, and scipy.signal for signal processing.

SciPy is widely used in the scientific and technical computing communities, and it is an essential tool for tasks such as data analysis, scientific modeling, and machine learning. It is often used in conjunction with other scientific computing libraries, such as NumPy and Matplotlib, to perform complex computational tasks.

NumPy

NumPy is a popular Python library for processing multidimensional arrays and matrices. It can be used to perform a variety of mathematical operations, including linear algebra and Fourier transforms. This makes NumPy ideal for machine learning and artificial intelligence (AI) projects, allowing users to easily improve machine learning performance by manipulating matrices. NumPy is faster and easier to use than most other Python libraries.

Pandas

Pandas is another Python library built on top of NumPy, which lets you prepare advanced datasets for machine learning and training. It relies on two types of data structures: one-dimensional (Series) and two-dimensional (DataFrame). This makes Pandas suitable for a variety of industries such as finance, engineering, and statistics. The Pandas library is fast, compatible with all popular platforms, and flexible.

Keras

Keras is an open-source, standalone Python ML library for computational neural networks. It provides extended support for convolutional and recurrent neural networks. This library can be used as a front end for the TensorFlow framework. It is easy to interpret, modular, and expandable for faster experimentation.

Matplotlib

Matplotlib is a Python library primarily focused on data visualization for creating beautiful graphs, plots, histograms, and bar charts. It works with plot data from SciPy, NumPy, and Pandas. Matplotlib is probably the most intuitive choice of currently available Python plotting tools.

Python ML Tutorial: Face Detection in Python

Let’s see an example of machine learning with Python. The following tutorial shows how to perform face detection using the Haar Cascade algorithm with the Python version of the OpenCV library.

Haar Cascade (proposed in Viola and Jones, 2001), is an object detection method that uses a large number of positive and negative images (i.e. images containing the required object and other images not containing it), to train a cascade function. This function is then used to detect objects in other images. This technique is commonly used to detect faces in images.

To read images and detect faces in them using OpenCV:

Install the relevant libraries using this command:

import cv2

Use the following code:

!pip install opencv-python
import cv2
demoImagePath = "sample_cv.png"
demoFaceCascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml")
demoImage = cv2.imread(demoImagePath)
demoImageInGray = cv2.cvtColor(demoImage, cv2.COLOR_BGR2GRAY)
facesInDemoImage = demoFaceCascade.detectMultiScale(
    demoImageInGray,
    scaleFactor=1.2,
    minNeighbors=6,
    minSize=(35, 35),
    flags = cv2.CASCADE_SCALE_IMAGE
)

for (i, j, k, l) in facesInDemoImage:
    frame = cv2.rectangle(demoImage, (i, j), (i+k, k+l), (0, 255, 0), 2)
    # present the image with opencv
cv2.imshow( "Capture - Face detection", frame )

# the following code can be uncommented to present the image with opencv patch for colab notebook:

# from google.colab.patches import cv2_imshow
# cv2_imshow(demoImage)

Here is a quick walkthrough of the code. We change the path to where the image is located:

demoImagePath = "sample_cv.png"

We create a default cascade function and assign it to the demoFaceCascade object for the detection of faces; this will allow the device to load the cascade into the memory:

demoFaceCascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml")

Next, we read the images on the demoImagePath; this converts the image into grayscale:

demoImage = cv2.imread(demoImagePath)
demoImageInGray = cv2.cvtColor(demoImage, cv2.COLOR_BGR2GRAY)

Now we use the following code to detect faces in the image:

facesInDemoImage = demoFaceCascade.detectMultiScale(
    demoImageInGray,
    scaleFactor=1.2,
    minNeighbors=6,
    minSize=(35, 35),
    flags = cv2.cv.CV_HAAR_SCALE_IMAGE
)

The function detectMultiScale detects objects, scaleFactor factors in the possibility of images of faces that were taken closer to the camera, and minSize defines the size of the moving window.

Finally, the rectangle() command draws rectangles on the image where a face was detected:

for (i, j, k, l) in facesInDemoImage:
    		cv2.rectangle(image, (i, j), (i+k, k+l), (0, 255, 0), 2)

Python Machine Learning with Aporia

The use of Python in combination with Aporia can provide a powerful toolset for building and tailoring ML monitoring to your data science team’s needs. Aporia is a software platform that streamlines the production ML workflow, allowing data scientists and ML engineers to quickly and easily get insights to improve model performance. With support for Python, Aporia offers code-based monitoring to effectively create model monitors that align with your goals and ensure your unique model is operating at its best.

Aporia’s ML observability platform is the ideal partner for Data Scientists and ML engineers to visualize, monitor, explain, and improve ML models in production. Our platform fits naturally into your existing ML stack and seamlessly integrates with your existing ML infrastructure in minutes. Aporia offers data science and ML teams key features and tools to ensure production models perform at their best:

Visibility

Single pane of glass visibility into all production models. Custom dashboards that can be understood and accessed by all relevant stakeholders.
Track model performance and health in one place.
A centralized hub for all your models in production.
Custom metrics and widgets to ensure you see everything you need.

Monitoring

Start monitoring in minutes.
Instant alerts and advanced workflows trigger.
Custom monitors to detect data drift, model degradation, performance, etc.
Track relevant custom metrics to ensure your model is drift-free and performance is driving value.
Choose from our automated monitors or get hands-on with our code-based monitor options.

Explainable AI

Get human-readable insight into your model predictions.
Simulate ‘What if?’ situations. Play with different features and find how they impact predictions.
Gain valuable insights to optimize model performance.
Communicate predictions to relevant stakeholders and customers.

Root Cause Investigation

Slice and dice model performance, data segments, data stats, or distribution.
Identify and debug issues.
Explore and understand connections in your data.

Book a demo to see Aporia in action and understand how reliable ML observability can benefit your organization.