# Pandas for Everyone: Python Data Analysis

(PYTHON-PANDAS.AP1) / ISBN : 978-1-64459-413-1
This course includes
Lessons
TestPrep
Hands-On Labs

Pandas is an open-source Python library for data analysis. The Pandas for Everyone: Python Data Analysis course focuses on loading data into Python with the help of the Pandas library. This course contains interactive lessons with knowledge checks, quizzes, and hands-on labs to get a deeper understanding of the concepts such as Pandas DataFrame and Data Structure Basics, Plotting Basics, Tidy Data, Data Assembly, Data Normalization, linear regression, survival models, and so on.

## Get the support you need. Enroll in our Instructor-Led Course.

### Lessons

47+ Lessons | 100+ Exercises | 90+ Quizzes | 109+ Flashcards | 109+ Glossary of terms

### TestPrep

50+ Pre Assessment Questions | 50+ Post Assessment Questions |

### Hands-On Labs

30+ LiveLab | 20+ Video tutorials | 43+ Minutes

1

### Preface

• Breakdown of the Course
• How to Read This Course
• Setup
2

### Pandas DataFrame Basics

• Introduction
• Look at Columns, Rows, and Cells
• Grouped and Aggregated Calculations
• Basic Plot
• Conclusion
3

### Pandas Data Structures Basics

• Create Your Own Data
• The Series
• The DataFrame
• Making Changes to Series and DataFrames
• Exporting and Importing Data
• Conclusion
4

### Plotting Basics

• Why Visualize Data?
• Matplotlib Basics
• Statistical Graphics Using matplotlib
• Seaborn
• Pandas Plotting Method
• Conclusion
5

### Tidy Data

• Columns Contain Values, Not Variables
• Columns Contain Multiple Variables
• Variables in Both Rows and Columns
• Conclusion
6

### Apply Functions

• Primer on Functions
• Apply (Basics)
• Vectorized Functions
• Lambda Functions (Anonymous Functions)
• Conclusion
7

### Data Assembly

• Combine Data Sets
• Concatenation
• Observational Units Across Multiple Tables
• Merge Multiple Data Sets
• Conclusion
8

### Data Normalization

• Multiple Observational Units in a Table (Normalization)
• Conclusion
9

### Groupby Operations: Split-Apply-Combine

• Aggregate
• Transform
• Filter
• The pandas.core.groupby. DataFrameGroupBy object
• Working With a MultiIndex
• Conclusion
10

### Missing Data

• What Is a NaN Value?
• Where Do Missing Values Come From?
• Working With Missing Data
• Pandas Built-In NA Missing
• Conclusion
11

### Data Types

• Data Types
• Converting Types
• Categorical Data
• Conclusion
12

### Strings and Text Data

• Introduction
• Strings
• String Methods
• More String Methods
• String Formatting (F-Strings)
• Regular Expressions (RegEx)
• The regex Library
• Conclusion
13

### Dates and Times

• Python's datetime Object
• Converting to datetime
• Extracting Date Components
• Date Calculations and Timedeltas
• Datetime Methods
• Getting Stock Data
• Subsetting Data Based on Dates
• Date Ranges
• Shifting Values
• Resampling
• Time Zones
• Arrow for Better Dates and Times
• Conclusion
14

### Linear Regression (Continuous Outcome Variable)

• Simple Linear Regression
• Multiple Regression
• Models with Categorical Variables
• One-Hot Encoding in scikit-learn with Transformer Pipelines
• Conclusion
15

### Generalized Linear Models

• Logistic Regression (Binary Outcome Variable)
• Poisson Regression (Count Outcome Variable)
• More Generalized Linear Models
• Conclusion
16

### Survival Analysis

• Survival Data
• Kaplan Meier Curves
• Cox Proportional Hazard Model
• Conclusion
17

### Model Diagnostics

• Residuals
• Comparing Multiple Models
• k-Fold Cross-Validation
• Conclusion
18

### Regularization

• Why Regularize?
• LASSO Regression
• Ridge Regression
• Elastic Net
• Cross-Validation
• Conclusion
19

### Clustering

• k-Means
• Hierarchical Clustering
• Conclusion
20

### Life Outside of Pandas

• The (Scientific) Computing Stack
• Performance
• Siuba
• Ibis
• Polars
• PyJanitor
• Pandera
• Machine Learning
• Publishing
• Dashboards
• Conclusion
21

### It’s Dangerous To Go Alone!

• Local Meetups
• Conferences
• The Carpentries
• Podcasts
• Other Resources
• Conclusion

### Appendix B: Installation and Setup

• B.1 Install Python
• B.2 Install Python Packages

### Appendix C: Command Line

• C.1 Installation
• C.2 Basics

### Appendix E: Using Python

• E.1 Command Line and Text Editor
• E.2 Python and IPython
• E.3 Jupyter
• E.4 Integrated Development Environments (IDEs)

### Appendix G: Environments

• G.1 Conda Environments
• G.2 Pyenv + Pipenv

### Appendix H: Install Packages

• H.1 Updating Packages

### Appendix J: Code Style

• J.1 Line Breaks in Code

### Appendix K: Containers: Lists, Tuples, and Dictionaries

• K.1 Lists
• K.2 Tuples
• K.3 Dictionaries

### Appendix O: Functions

• O.1 Default Parameters
• O.2 Arbitrary Parameters

### Appendix T: SettingWithCopyWarning

• T.1 Modifying a Subset of Data
• T.2 Replacing a Value
• T.3 More Resources

### Appendix W: String Formatting

• W.1 C-Style
• W.2 String Formatting: .format() Method
• W.3 Formatting Numbers

### Appendix Z: Replicating Results in R

• Z.1 Linear Regression
• Z.2 Logistic Regression
• Z.3 Poisson Regression
1

### Pandas DataFrame Basics

• Performing Grouped and Aggregated Calculations Using the .groupby() Method
2

### Pandas Data Structures Basics

• Creating a DataFrame and Making Changes to it
3

### Plotting Basics

• Creating a Scatter Plot Using Multivariate Data
• Creating a Density Plot Using Bivariate Data
4

### Tidy Data

• Using Functions and Methods to Process and Tidy Data
5

### Apply Functions

• Performing Calculations Across DataFrames
• Vectorizing Functions
6

### Data Assembly

• Performing Concatenation Using the concat() Function
• Merging Multiple Data Sets Using the .merge() Function
7

### Data Normalization

• Understanding Multiple Observational Units in a Data Set
8

### Groupby Operations: Split-Apply-Combine

• Performing Data Summarization Using Group-by Operations
• Performing Boolean Subsetting on the Data
• Performing Operations on Grouped Objects
9

### Missing Data

• Finding and Cleaning Missing Data
10

### Data Types

• Performing Data Type Conversion
11

### Strings and Text Data

• Finding and Substituting a Pattern
12

### Dates and Times

• Converting an Object Type into a datetime Type
• Extracting Date Components from the Data
• Getting Stock Data and Subsetting it Based on Dates
• Resampling Dates Using the .resample() Method
13

### Linear Regression (Continuous Outcome Variable)

• Performing Linear Regression
• Performing Multiple Regression
14

### Generalized Linear Models

• Performing Logistic Regression
• Performing Poisson Regression Using the poisson() Function
15

### Survival Analysis

• Performing Survival Analysis Using the KaplanMeierFitter() Function
16

### Model Diagnostics

• Comparing Models Using Cross-Validation
17

### Regularization

• Performing L1 Regularization Using the Lasso() Function
• Performing L2 Regularization Using the Ridge() Function
18

### Clustering

• Performing k-Means Clustering
• Using Hierarchical Clustering Algorithms