6. Pandas

6. Pandas#

6.1 Introduction#

Pandas offers a wide range of functions to read data from various sources, such as CSV files, Excel files, SQL databases, and more. You can perform operations like filtering, sorting, grouping, merging, reshaping, and aggregating data using Pandas. It also integrates well with other popular libraries in the Python ecosystem, such as numpy and matplotlib.

It provides powerful data structures and functions to efficiently work with structured data, such as tabular data, time series, and more. With pandas, you can easily load, manipulate, analyze, and visualize data for tasks like data cleaning, exploration, transformation, and modeling.

To start using pandas, you need to import it into your Python script or Jupyter Notebook. The standard way to import Pandas is as follows:

import pandas as pd

The two primary data structures provided by Pandas are:

Series: A Series in Pandas is a one-dimensional labeled array that can hold any data type. It is similar to a column in a spreadsheet or a single column of a SQL table. Each element in a Series is associated with a unique label called an index.
DataFrame: A two-dimensional labeled data structure with columns of potentially different types. It is similar to a spreadsheet or a SQL table.

6.2 `Series`#

Here is an example where we create a list of data containing some numbers and an index list with corresponding labels. We then use these lists to create a Series using pd.Series() function.

data = [10, 20, 30, 40, 50]
index = ['A', 'B', 'C', 'D', 'E']

s = pd.Series(data, index=index)
print("\nOriginal series:")
print (s)

Original series:
A    10
B    20
C    30
D    40
E    50
dtype: int64

We also change the created labels without having any effect on the data as follows:

s.index = ['X', 'Y', 'Zebra', 'W', 'V']
print("\nUpdated series:")
print(s)

Updated series:
X        10
Y        20
Zebra    30
W        40
V        50
dtype: int64

Two helpful functions when working with pandas are the iloc[ ] and loc[ ] functions. For more information see the table below:

Function	Description	Example
`iloc[ ]`	Integer-based indexing and selection	s.iloc[0] accesses the first row of a DataFrame
		s.iloc[2:4] accesses a slice of rows in a DataFrame
`loc[ ]`	Label-based indexing and selection	s.loc[‘X’] accesses a row labeled ‘A’ in a DataFrame
		s.loc[[‘X’, ‘W’]] accesses multiple rows in a DataFrame

s.iloc[2:4] 

Zebra    30
W        40
dtype: int64

6.3 `DataFrame`#

In the example below, we start by creating a dictionary data that contains information about names, ages, and cities. We then use this dictionary to create a dataframe df using pd.DataFrame( ).

data2 = {'Name': ['John', 'Alice', 'Bob'],
    'Age': [25, 28, 22],
    'City': ['London', 'Paris', 'Berlin']}

df = pd.DataFrame(data2)

print("Original DataFrame:")
df      

Original DataFrame:

	Name	Age	City
0	John	25	London
1	Alice	28	Paris
2	Bob	22	Berlin

You can then use the loc[ ] and loc[ ] functions to locate specific data from your dataframe. For example:

print("\nUsing iloc[]:")
print(df.iloc[0])                
print(df.iloc[1:3])               
print(df.iloc[0, 1])             

print("\nUsing loc[]:")
print(df.loc[0])                 
print(df.loc[1:2])                
print(df.loc[0, 'Age'])    

Using iloc[]:
Name      John
Age         25
City    London
Name: 0, dtype: object
    Name  Age    City
1  Alice   28   Paris
2    Bob   22  Berlin
25

Using loc[]:
Name      John
Age         25
City    London
Name: 0, dtype: object
    Name  Age    City
1  Alice   28   Paris
2    Bob   22  Berlin
25

We then add and remove some data from the created dataframe. You can remove rows and columns from a datafram by using the Yourdataframename.drop() function.

df.loc[0, 'City'] = 'New York'
df['Salary'] = [5000, 6000, 4500]

df = df.drop(1)                   
df = df.drop('Age', axis=1)      

print("\nUpdated DataFrame:")
df

Updated DataFrame:

	Name	City	Salary
0	John	New York	5000
2	Bob	Berlin	4500

6.4 Importing data into `DataFrames` and exploring its attributes#

You can also make a dataframes from any csv file (excel files, delimited text files, etc). The table below summarizes the important functions you need to remember when applying the pd.read_csv() command.

Aspect	Description	Function
File Path	Specify the path or filename of the CSV file	pd.read_csv(‘file_path.csv’)
Delimiter	Define the delimiter used in the CSV file	pd.read_csv(‘file.csv’, delimiter=’,’)
Header	Specify whether the CSV file has a header row	pd.read_csv(‘file.csv’, header=0)
Columns	Select specific columns from the CSV file	pd.read_csv(‘file.csv’, usecols=[‘col1’, ‘col2’])
Index Column	Set a specific column as the index of the DataFrame	pd.read_csv(‘file.csv’, index_col=’column’)
Data Types	Specify data types for columns in the DataFrame	pd.read_csv(‘file.csv’, dtype={‘col1’: int, ‘col2’: float})

Note

The excel file needs to be in the same place in the folder that the python file is at for this to work!

6. Pandas

Contents

6. Pandas#

6.1 Introduction#

6.2 Series#

6.3 DataFrame#

6.4 Importing data into DataFrames and exploring its attributes#

6.2 `Series`#

6.3 `DataFrame`#

6.4 Importing data into `DataFrames` and exploring its attributes#