Cheatsheets
Learn Data Analysis with Pandas

Learn Data Analysis with Pandas

Introduction to Pandas

Applying Functions with Pandas

Pandas provides a function called `apply()` that lets you run a custom function on each item in a column or row. This helps in transforming data quickly.

                                

# Function to double the value

def double(x):
  return 2 * x

# Apply this function to each value in a column

df['column1'] = df['column1'].apply(double)

# You can also use lambda functions for quick operations

df['column2'] = df['column2'].apply(lambda x: 3 * x)

# Apply a function across rows to create a new column

df['newColumn'] = df.apply(lambda row: row['column1'] * 1.5 + row['column2'], axis=1)

Adding Columns to DataFrames

You can add new columns to a DataFrame in Pandas by simply assigning values to a new column name. This can be done for each value or for the entire column at once.

                                

# Add a new column with specific values

df['newColumn'] = [1, 2, 3, 4]

# Set all values in a new column to the same value

df['newColumn'] = 1

# Create a new column by calculating from an existing column

df['newColumn'] = df['oldColumn'] * 5

Creating DataFrames from Scratch

You can create a Pandas DataFrame using various methods like dictionaries, lists, or by reading from files. This is useful for starting with new data.

                                

# Create DataFrame from a dictionary

data = {'name': ['Anthony', 'Maria'], 'age': [30, 28]}
df = pd.DataFrame(data)

# Create DataFrame from a list of lists

data = [['Tom', 20], ['Jack', 30], ['Meera', 25]]
df = pd.DataFrame(data, columns=['Name', 'Age'])

# Create DataFrame by reading a CSV file

df = pd.read_csv('students.csv')

What is a Pandas DataFrame?

A Pandas DataFrame is a table-like structure where data is organized into rows and columns. It's useful for handling and analyzing data efficiently.

                                

import pandas as pd

Using Aggregates in Pandas

Grouping Data with Pandas

Pandas allows you to group data by one or more columns and then apply statistical functions to each group. This helps in summarizing data effectively.

                                

# Create a DataFrame

df = pd.DataFrame([
  ['Amy', 'Assignment 1', 75],
  ['Amy', 'Assignment 2', 35],
  ['Bob', 'Assignment 1', 99],
  ['Bob', 'Assignment 2', 35]
], columns=['Name', 'Assignment', 'Grade'])

# Group by 'Name' and calculate the average grade

df.groupby('Name')['Grade'].mean()

# Output:
# | Name | Grade |
# | ---  | ---   |
# | Amy  | 55    |
# | Bob  | 67    |

Calculating Statistics in Pandas

Pandas provides functions to calculate statistics such as average, standard deviation, and median for each column in a DataFrame. This helps in understanding data trends.

                                

# Calculate different statistics for a column

df['columnName'].mean()    # Average

df['columnName'].std()     # Standard deviation

df['columnName'].median()  # Median

df['columnName'].max()     # Maximum value

df['columnName'].min()     # Minimum value

df['columnName'].count()   # Count of values

df['columnName'].nunique() # Number of unique values

df['columnName'].unique()  # List of unique values

Working with Multiple Tables in Pandas

Storing Data Across Multiple Tables

In practice, related data is often split into multiple tables to organize and manage it more efficiently. This is common in databases.

Merging Tables in Pandas

Pandas allows you to combine data from multiple tables using merges. This is useful when you need to integrate information from different sources.

Programming Cheatsheets: Quick Reference for Productivity

Welcome to our comprehensive collection of programming language cheatsheets! Whether you're a seasoned developer or a beginner, these quick reference guides provide essential tips and key information for all major languages. They focus on core concepts, commands, and functions—designed to enhance your efficiency and productivity.

ManageEngine Site24x7, a leading IT monitoring and observability platform, is committed to equipping developers and IT professionals with the tools and insights needed to excel in their fields.