Introduction To Python’s Pandas Library

Michael Jester
2 min readNov 10, 2020

--

A rare photo of the Pandas library crunching data

Pandas is one of the best library for python. But don’t take my word for it, here are a lot of links that say the same thing. What exactly is pandas, though? According to their webpage, it is “a fast, powerful, flexible and easy to use open source data analysis and manipulation tool.” The basis behind it all are dataframes.

What is a dataframe?

Dataframes are “a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects.” Below is an example dataframe from GeekForGeeks.

An example dataframe

How to create a dataframe?

It’s pretty simple to create a dataframe. Below is a block of code that will show you how to create a dataframe (usually abbreviated to ‘df’ in code) in three ways. There are many more examples in the documentation.

import pandas as pd# Create a blank dataframe
blank_df = pd.DataFrame()
# Creating a dataframe from a list (this will generate column names)
list = [[1, 2, 3], [2, 4, 3], [9, 6, 1]]
list_df = pd.DataFrame(list)
# Creating a dataframe from an dictionary
dictionary = [
{'name': 'Michael', 'profession': 'software engineer'},
{'name': 'John', 'profession': 'doctor'},
{'name': 'Henry', 'profession': None}
]
dictionary_df = pd.DataFrame(dictionary)

Selecting specific rows / columns

In order to select specific data from a dataframe, you just need to write one line of code. We will be practicing from the dictionary_df I made in the example below.

# Selecting specific rows
# new_variable_name = df[df['column_name'] (operator) (comparison)
michael_df = dictionary_df[dictionary_df['name'] == 'michael']
# Selecting specific columns
# new_variable_name = df[[array of column names]]
name_df = dictionary_df[['name']]

For more indepth searches, check out .loc(), .iloc(), and .filter()

Handling Missing / Incomplete Data

Sometimes you have missing/incomplete/NA data. If you want to remove it, you can use the .dropna() function. If you want to replace it with something, you can use .fillna(value). Each have their own purposes.

Adding additional rows / columns

# add a column to a dataframe
dictionary_df['new_column_name'] = new_column_value
# add a new row
dictionary_df = pd.concat(dictionary_df, {'name': 'Jane', 'profession': 'manager'})

Additional Resources

--

--