Download Free Install Free

The Quickest Ways to Sort Pandas DataFrame Values

TJ Simmons
June 25, 2019

Table of Contents

Pandas is a popular open source library used for data analysis. It’s written in Python, and it has a most ambitious mission. Per its own description in GitHub, Pandas “aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis/manipulation tool available in any language.”

An important component in Pandas is the DataFrame—the most commonly used Pandas object.

Typically, data science practitioners often need to perform various data engineering operations, such as aggregation, sorting, and filtering data. This article aims to help the typical data science practitioner perform sorting values in the Pandas DataFrame.

I’ll discuss seven common use cases, and I’ll dive deeper into the parameters involved with the two major sorting functions. That way, if you need to perform sorting outside the seven common use cases, you’ll be better prepared to adapt.

The two major sort functions

Pandas has two key sort functions: sort_values and sort_index. You can check the API for sort_values and sort_index at the Pandas documentation for details on the parameters.

  • sort_values(): You use this to sort the Pandas DataFrame by one or more columns.
  • sort_index(): You use this to sort the Pandas DataFrame by the row index.

Sorting by the values of the selected columns

Allow me to explain the differences between the two sorting functions more clearly. sort_values is easier to understand. Imagine you have a DataFrame that looks like this:

Date/Time                | Weather
------------------|------------
2012-01-01 00:00:00 | Fog
2012-01-01 01:00:00 | Fog
2012-01-01 02:00:00 | Freezing Drizzle,Fog

When you want to sort the DataFrame by the column Weather, you’d use sort_values. Similarly, when you want to sort the DataFrame by the values of one or more columns, you’d also use sort_values.

Sorting by the labels of the DataFrame

By contrast, sort_index doesn’t indicate its meaning as obviously from its name alone. The key thing to know is that the Pandas DataFrame lets you indicate which column acts as the row index. And if you didn’t indicate a specific column to be the row index, Pandas will create a zero-based row index by default.

Therefore, when you execute sort_index, you’re sorting the DataFrame by its row index.

7 common use cases for sorting

Now that I’ve introduced the two main sorting functions, I’ll go into the seven common use cases for sorting your Pandas DataFrame. To simplify, all of the use cases given here will be demonstrated with an open dataset. I prepared the dataset using the following code:

import pandas as pd

# repo for data https://github.com/jvns/pandas-cookbook/tree/master/data
data_url = 'https://raw.githubusercontent.com/jvns/pandas-cookbook/master/data/weather_2012.csv'

# read data from url as pandas dataframe
weather = pd.read_csv(data_url)

# print the first three rows
print(weather.head(n=3))
You should expect the following result:
Date/Time ... Weather
0 2012-01-01 00:00:00 ... Fog
1 2012-01-01 01:00:00 ... Fog
2 2012-01-01 02:00:00 ... Freezing Drizzle,Fog

[3 rows x 8 columns]

You should expect the following result:

Notice that the 0, 1, and 2 columns are the row index I covered earlier. If you open the raw data (which is actually a CSV file) directly, you’ll realize that the first column is Date/Time. Recall when I explained how sort_index sorts your DataFrame by the row index? In the result set you see above, the row index is automatically generated and is shown as such.

Now, let’s get started with our first common use case.

Use case #1: Sort by one column’s values

The most common use case is to sort by a single column’s values in ascending order. So, let’s do just that with the prepared dataset. Assume we want to sort the test data by the Weather column in ascending order:

sort_by_weather = weather.sort_values('Weather')
print(sort_by_weather.head(n=3))

We can expect the following result:

     Date/Time           Temp (C)  ... Stn Press (kPa) Weather
7103 2012-10-22 23:00:00 6.4 ... 101.34 Clear
4203 2012-06-24 03:00:00 14.9 ... 101.02 Clear
4204 2012-06-24 04:00:00 14.4 ... 101.04 Clear

[3 rows x 8 columns]

Note that I didn’t explicitly state that the data is sorted in ascending order; the order is set as ascending by default. So, that makes the code easier to write for this use case. If you want to sort by a single column in descending order, all you need is to make the sort order explicit—which brings us to the next use case. 

Use case #2: Sort by one column’s values in descending order

The trick to sorting in descending order is to declare False for the ascending parameter in the sort_values function:

sort_by_weather_desc = weather.sort_values('Weather',ascending=False)
print(sort_by_weather_desc.head(n=3))

You can then expect the following result:

     Date/Time           ... Weather
4757 2012-07-17 05:00:00 ... Thunderstorms,Rain,Fog
4761 2012-07-17 09:00:00 ... Thunderstorms,Rain Showers,Fog
4323 2012-06-29 03:00:00 ... Thunderstorms,Rain Showers,Fog

[3 rows x 8 columns]

Now you’ve learned how to sort a DataFrame by a single column in ascending and descending order. The next thing to learn is how to sort a DataFrame by multiple columns.

Use case #3: Sort by multiple column values

If you recall, in the last two use cases, I simply stated the single column as a single string. If you want to sort by multiple columns, you need to state the columns as a list of strings:

sort_by_weather_temp = weather.sort_values(['Weather','Temp (C)'])
print(sort_by_weather_temp.head(n=3))

And you’ll get this result:

    Date/Time              Temp (C) ... Stn Press (kPa) Weather
344 2012-01-15 08:00:00 -23.3 ... 102.45 Clear
363 2012-01-16 03:00:00 -19.2 ... 103.07 Clear
365 2012-01-16 05:00:00 -19.1 ... 103.02 Clear

[3 rows x 8 columns]

Compare this result against the result from use case #1. The obvious difference is that the lowest temperature within the Clear weather is now at the top of the result set. One thing to appreciate about sorting by multiple columns is that there is precedence when it comes to sorting. In this case, I want to sort the DataFrame by weather first and temperature second. Hence, the list starts with Weather, followed by Temp. This is an intuitive way to write the list of columns you want to sort the DataFrame by.

Now that you know how to sort multiple columns and how to decide the precedence of the columns for sorting, you need to learn how to decide a different sorting order for the different columns.

Use case #4: Sort by multiple column values with a different sort order

Recall that the key point in the last use case was the use of a list to indicate the columns to sort our DataFrame by. Similarly, if we want to pick a different sort order for multiple columns, we would also use a list to indicate the different sort orders.

In this case, I want to sort first by weather in ascending order, and then by temperature in descending order. Here’s the code I wrote for this:

sort_by_weather_asc_temp_desc = weather.sort_values(['Weather','Temp (C)'], ascending=[True, False])
print(sort_by_weather_asc_temp_desc.head(n=3))

Note that the ascending parameter now takes in a list of Boolean values. Since we have a list of two column names in the first parameter, the ascending parameter also takes in a list of two Boolean values.

You can probably guess this, but the Boolean values in the ascending list correspond to the columns in the list for column values. Now, observe the result:

                Date/Time  Temp (C)  ...  Stn Press (kPa)  Weather
5199 2012-08-04 15:00:00 32.8 ... 101.39 Clear
5200 2012-08-04 16:00:00 32.5 ... 101.34 Clear
5201 2012-08-04 17:00:00 32.5 ... 101.32 Clear

[3 rows x 8 columns]

As you’ve noticed, so far, sort_values has an intuitive way of deciding how to sort the columns. With these four use cases, you can now fulfill most of your sorting needs.

Next, we can cover the less common use cases. For example, what happens if you have “Not Applicable” (NA) values in your columns and you want these NA values to be in first place?

Use Case #5: Sort, but put missing values first

It’s useful to compare this use case against the first one. Hence, in this case, I continue to sort in ascending order by the Weather column, with the additional requirement to put NA values at the top:

sort_na_first = weather.sort_values('Weather',na_position='first')
print(sort_na_first.head(n=3))

In the test data, there are no missing values for the Weather column, so there’s no difference in the result set as compared with the first use case. But you can experiment with this by downloading the test data and changing it:

      Date/Time            Temp (C)  ...  Stn Press (kPa)  Weather
7103 2012-10-22 23:00:00 6.4 ... 101.34 Clear
4203 2012-06-24 03:00:00 14.9 ... 101.02 Clear
4204 2012-06-24 04:00:00 14.4 ... 101.04 Clear

[3 rows x 8 columns]

You’ve probably noticed that in all the use cases so far, the code examples take the DataFrame and return a separate copy. What if you want to sort the DataFrame directly? That requirement would be sorting the DataFrame in place.

Use Case #6: Sort, but put in place

Again, I would recommend comparing this with the first use case. Once again, Pandas has this useful parameter to help you with sorting the DataFrame in place. And you guessed it—it’s called inplace:

weather.sort_values('Weather', inplace=True)
print(weather.head(n=3))

Notice how I no longer use another variable to hold the result of running sort_values? That’s because we’re changing the “weather” DataFrame directly.

You should get the same result as use case #1 when you print out the first three rows:

     Date/Time           Temp (C) ... Stn Press (kPa) Weather 
7103 2012-10-22 23:00:00 6.4 ... 101.34 Clear
4203 2012-06-24 03:00:00 14.9 ... 101.02 Clear
4204 2012-06-24 04:00:00 14.4 ... 101.04 Clear

[3 rows x 8 columns]

Finally, I want to show you how to perform sorting in place when you sort via the row index.

Use Case #7: Sort by row index, but put in place

If you remember that I mentioned that sort_index is simply using the row index as the sorting criteria, this use case will be easy to understand:

weather.sort_index(inplace=True)
print(weather.head(n=3))

If you’re doing this right, you should see that the first three rows will be those with row index 0, 1, and 2. Indeed, we get this:

                 Date/Time  ...               Weather
0 2012-01-01 00:00:00 ... Fog
1 2012-01-01 01:00:00 ... Fog
2 2012-01-01 02:00:00 ... Freezing Drizzle,Fog

[3 rows x 8 columns]

Conclusion

In this article, I explained the two main sorting functions and covered the seven quickest ways to run sorting on your Pandas DataFrame. With the concepts introduced in these seven use cases, you’ll have all the atomic understanding of the sorting functions. Now, you can mix and match this knowledge and resolve any sorting needs you may have. Good luck!