Python Pandas Compare Two Dataframes [Solved]

Python Pandas Compare Two Dataframes [Solved]- linuxnasa

In this tutorial, we will learn about Python pandas compare two Dataframes. If you ever worked on large data sets for any use case, you must have come across many operations that needs to be performed on the data set like, analyzing, cleaning, modifying the data and much more. One of the important and very common operation  is comparing two Dataframes.  We will learn about various method to compare two Dataframes in this tutorial . So let us begin.

 

What is Dataframe in Pandas ?

In Python, Pandas is a library which is used for working with data sets. It provides a large set of methods that makes the data analysis, filtering, manipulation etc. much easier. Dataframe is a data structure provided by Pandas library. It is two dimensional, size mutable and potentially heterogeneous tabular data structure. It is similar to spreadsheet or a SQL table or a dictionary of Series objects.

Each column in a Dataframe can be of a different data type like Integers, strings, float etc. Dataframe is powerful for data manipulation, data cleaning, data analysis and data visualization tasks.

 

Python Pandas Compare Two Dataframes [Solved]

Also read: How to Find Sum of Elements in List Python [5 Best Examples] 

We will talk about two methods to compare the Dataframes. We will use built-in methods called ‘equals()’ and ‘compare()’ to compare the Dataframes. Let us look at each of these methods one by one using examples.

 

1. Using ‘equals’ Method

In Pandas, ‘equals()‘ method is used to determine if two pandas objects (Dataframe or Series) are equal. It compares the values within the objects and returns a Boolean value indicating whether they are same or not. Below is the syntax for equals() method.

Syntax

df1.equals(df2)

 

Let us now write a code to understand the implementation of this method. We have created two Dataframes, df1 and df2. We use equals() method to compare these two Dataframe. If all the elements in two Dataframe are equal, it will return True else returns False as shown below.

import pandas as pd

df1 = pd.DataFrame(
    {
        'k1': ["Cat", "Lion"],
        'k2': ["Monkey", "Parrot"]
    }
)

df2 = pd.DataFrame(
    {
        'k1': ["Cat", "Lion"],
        'k2': ["Monkey", "Parrot"]
    }
)

#Compare df1 and df2 using equals Method
print(df1.equals(df2))
OUTPUT
True

 

Now, let modify k2 key value in df2 as shown below. It will return False as the value for k2 key in Dataframe df2 is different from k2 key  in Dataframe df1.
df2 = pd.DataFrame(
    {
        'k1': ["Cat", "Lion"],
        'k2': ["Monkey", "Deer"]
    }
)
OUTPUT
False

 

2. Using ‘compare’ Method

In Pandas, ‘compare()’ method is used to compare two Dataframe objects and highlight the differences. It creates a Dataframe with the compared results, indicating where the values are equal or different. Below is the syntax for compare() method.

Syntax

df1.compare(df2, align_axis=0, keep_shape=False)

 

NOTE:

Please note compare() method is available in pandas version 1.1.0 and later.

 

In the below example, we will use same Dataframes as we used in previous example. Next, we will use compare() method to compare the two Dataframes. It will return a new Dataframe with the differences between Dataframe df1 and df2 as shown below.

import pandas as pd

df1 = pd.DataFrame(
    {
        'k1': ["Cat", "Lion"],
        'k2': ["Monkey", "Parrot"]
    }
)

df2 = pd.DataFrame(
    {
        'k1': ["Cat", "Tiger"],
        'k2': ["Monkey", "Peacock"]
    }
)

#Compare df1 and df2 using compare Method
df_compared = df1.compare(df2)
print(df_compared)
OUTPUT
   k1             k2
  self  other    self    other
1 Lion   Tiger   Parrot  Peacock

 

Summary

We have learnt about Dataframes comparison using Pandas built-in methods. There are numerous methods supported by Pandas library for data sets. You can learn more about Pandas from pandas.pydata.org

Leave a Comment