3 Quick Tricks To Speed up Pandas Workflows

Improve memory consumption and speed

Ahmed Besbes
5 min readJun 3, 2021
pen lying on a blank page of an open notebook
Photo by Will Porada on Unsplash

If you’re a data scientist, you’ve probably used Pandas to process your tabular data, apply matrix operations on the rows or the columns, perform complex merges between data frames, plot time series, compute aggregates, etc.

Pandas is a great, fast, and reliable tool that should never leave your toolbox.

graphic including the Pandas wordmark and a stopwatch
Image by the author

However, if you’re an intensive Pandas user, you have probably faced some issues related to slow executions or out-of-memory limits.

In this article, I will share with you three tips to optimize your Pandas workflow even further. I have been applying these tips lately during a project that involved very large data set, and it helped me a lot.

Let’s jump right in.

1. Decrease Memory Consumption of Data Frames

Pandas can handle columns of different types:

  • object — strings or mixed types (basically, anything non-standard)
  • stringsince Pandas 1.0.0
  • int — integer number

--

--

Ahmed Besbes

Medium Top Writer (+2M views) | I write about python and productionizing ML code into scalable apps. Exclusive content here: https://thetechbuffet.substack.com/