3 Quick Tricks To Speed up Pandas Workflows
Improve memory consumption and speed
If you’re a data scientist, you’ve probably used Pandas to process your tabular data, apply matrix operations on the rows or the columns, perform complex merges between data frames, plot time series, compute aggregates, etc.
Pandas is a great, fast, and reliable tool that should never leave your toolbox.
However, if you’re an intensive Pandas user, you have probably faced some issues related to slow executions or out-of-memory limits.
In this article, I will share with you three tips to optimize your Pandas workflow even further. I have been applying these tips lately during a project that involved very large data set, and it helped me a lot.
Let’s jump right in.
1. Decrease Memory Consumption of Data Frames
Pandas can handle columns of different types:
object
— strings or mixed types (basically, anything non-standard)string
— since Pandas 1.0.0int
— integer number