Pandas provides a function called `apply()` that lets you run a custom function on each item in a column or row. This helps in transforming data quickly.
# Function to double the value def double(x): return 2 * x # Apply this function to each value in a column df['column1'] = df['column1'].apply(double) # You can also use lambda functions for quick operations df['column2'] = df['column2'].apply(lambda x: 3 * x) # Apply a function across rows to create a new column df['newColumn'] = df.apply(lambda row: row['column1'] * 1.5 + row['column2'], axis=1)
You can add new columns to a DataFrame in Pandas by simply assigning values to a new column name. This can be done for each value or for the entire column at once.
# Add a new column with specific values df['newColumn'] = [1, 2, 3, 4] # Set all values in a new column to the same value df['newColumn'] = 1 # Create a new column by calculating from an existing column df['newColumn'] = df['oldColumn'] * 5
You can create a Pandas DataFrame using various methods like dictionaries, lists, or by reading from files. This is useful for starting with new data.
# Create DataFrame from a dictionary data = {'name': ['Anthony', 'Maria'], 'age': [30, 28]} df = pd.DataFrame(data) # Create DataFrame from a list of lists data = [['Tom', 20], ['Jack', 30], ['Meera', 25]] df = pd.DataFrame(data, columns=['Name', 'Age']) # Create DataFrame by reading a CSV file df = pd.read_csv('students.csv')
A Pandas DataFrame is a table-like structure where data is organized into rows and columns. It's useful for handling and analyzing data efficiently.
import pandas as pd
Pandas allows you to group data by one or more columns and then apply statistical functions to each group. This helps in summarizing data effectively.
# Create a DataFrame df = pd.DataFrame([ ['Amy', 'Assignment 1', 75], ['Amy', 'Assignment 2', 35], ['Bob', 'Assignment 1', 99], ['Bob', 'Assignment 2', 35] ], columns=['Name', 'Assignment', 'Grade']) # Group by 'Name' and calculate the average grade df.groupby('Name')['Grade'].mean() # Output: # | Name | Grade | # | --- | --- | # | Amy | 55 | # | Bob | 67 |
Pandas provides functions to calculate statistics such as average, standard deviation, and median for each column in a DataFrame. This helps in understanding data trends.
# Calculate different statistics for a column df['columnName'].mean() # Average df['columnName'].std() # Standard deviation df['columnName'].median() # Median df['columnName'].max() # Maximum value df['columnName'].min() # Minimum value df['columnName'].count() # Count of values df['columnName'].nunique() # Number of unique values df['columnName'].unique() # List of unique values
In practice, related data is often split into multiple tables to organize and manage it more efficiently. This is common in databases.
Pandas allows you to combine data from multiple tables using merges. This is useful when you need to integrate information from different sources.
Welcome to our comprehensive collection of programming language cheatsheets! Whether you're a seasoned developer or a beginner, these quick reference guides provide essential tips and key information for all major languages. They focus on core concepts, commands, and functions—designed to enhance your efficiency and productivity.
ManageEngine Site24x7, a leading IT monitoring and observability platform, is committed to equipping developers and IT professionals with the tools and insights needed to excel in their fields.
Monitor your IT infrastructure effortlessly with Site24x7 and get comprehensive insights and ensure smooth operations with 24/7 monitoring.
Sign up now!