The most advanced ML Observability product in the market
Building an ML platform is nothing like putting together Ikea furniture; obviously, Ikea is way more difficult. However, they both, similarly, include many different parts that help create value when put together. As every organization sets out on a unique path to building its own machine learning platform, taking on the project of building a […]
Start integrating our products and tools.
We’re excited 😁 to share that Forbes has named Aporia a Next Billion-Dollar Company. This recognition comes on the heels of our recent $25 million Series A funding and is a huge testament that Aporia’s mission and the need for trust in AI are more relevant than ever. We are very proud to be listed […]
Selecting rows based on a condition is a common operation in data wrangling. In this short how-to article, we will learn how to use a list of values to select rows from Pandas and PySpark DataFrames.
The isin method of Pandas can be used for selecting rows based on a list of conditions. We just need to write the conditions inside a Python list.
df = df[df.group.isin(["A","B","D"])]
If you are interested in selecting rows that are not in this list, you can either add a tilde (~) operator at the beginning or set the condition as False.
# Tilde operator df = df[~df.group.isin(["A","B","D"])] # False condition df = df[df.group.isin(["A","B","D"])==False]
PySpark has an isin method which works similar to that of Pandas.
df = df.filter(df.group.isin(["A","B","D"]))
Letter cases cause strings to be different in PySpark too. We can use the lower or upper function to standardize letter cases before searching for a substring.
# Tilde operator df = df.filter(~df.group.isin(["A","B","D"])) # False condition df = df.filter(df.group.isin(["A","B","D"])==False)
How to Set the Value of a Particular Cell in a DataFrame Using the Index?
How to Sort a DataFrame by Values in a Column?