# Pandas for Everyone: Python Data Analysis

(PYTHON-PANDAS.AP1) / ISBN : 978-1-64459-413-1
Pandas is an open-source Python library for data analysis. The Pandas for Everyone: Python Data Analysis course focuses on loading data into Python with the help of the Pandas library. This course contains interactive lessons with knowledge checks, quizzes, and hands-on labs to get a deeper understanding of the concepts such as Pandas DataFrame and Data Structure Basics, Plotting Basics, Tidy Data, Data Assembly, Data Normalization, linear regression, survival models, and so on.

1

### Preface

• Breakdown of the Course
• How to Read This Course
• Setup
2

### Pandas DataFrame Basics

• Introduction
• Look at Columns, Rows, and Cells
• Grouped and Aggregated Calculations
• Basic Plot
• Conclusion
3

### Pandas Data Structures Basics

• Create Your Own Data
• The Series
• The DataFrame
• Making Changes to Series and DataFrames
• Exporting and Importing Data
• Conclusion
4

### Plotting Basics

• Why Visualize Data?
• Matplotlib Basics
• Statistical Graphics Using matplotlib
• Seaborn
• Pandas Plotting Method
• Conclusion
5

### Tidy Data

• Columns Contain Values, Not Variables
• Columns Contain Multiple Variables
• Variables in Both Rows and Columns
• Conclusion
6

### Apply Functions

• Primer on Functions
• Apply (Basics)
• Vectorized Functions
• Lambda Functions (Anonymous Functions)
• Conclusion
7

### Data Assembly

• Combine Data Sets
• Concatenation
• Observational Units Across Multiple Tables
• Merge Multiple Data Sets
• Conclusion
8

### Data Normalization

• Multiple Observational Units in a Table (Normalization)
• Conclusion
9

### Groupby Operations: Split-Apply-Combine

• Aggregate
• Transform
• Filter
• The pandas.core.groupby. DataFrameGroupBy object
• Working With a MultiIndex
• Conclusion
10

### Missing Data

• What Is a NaN Value?
• Where Do Missing Values Come From?
• Working With Missing Data
• Pandas Built-In NA Missing
• Conclusion
11

### Data Types

• Data Types
• Converting Types
• Categorical Data
• Conclusion
12

### Strings and Text Data

• Introduction
• Strings
• String Methods
• More String Methods
• String Formatting (F-Strings)
• Regular Expressions (RegEx)
• The regex Library
• Conclusion
13

### Dates and Times

• Python's datetime Object
• Converting to datetime
• Extracting Date Components
• Date Calculations and Timedeltas
• Datetime Methods
• Getting Stock Data
• Subsetting Data Based on Dates
• Date Ranges
• Shifting Values
• Resampling
• Time Zones
• Arrow for Better Dates and Times
• Conclusion
14

### Linear Regression (Continuous Outcome Variable)

• Simple Linear Regression
• Multiple Regression
• Models with Categorical Variables
• One-Hot Encoding in scikit-learn with Transformer Pipelines
• Conclusion
15

### Generalized Linear Models

• Logistic Regression (Binary Outcome Variable)
• Poisson Regression (Count Outcome Variable)
• More Generalized Linear Models
• Conclusion
16

### Survival Analysis

• Survival Data
• Kaplan Meier Curves
• Cox Proportional Hazard Model
• Conclusion
17

### Model Diagnostics

• Residuals
• Comparing Multiple Models
• k-Fold Cross-Validation
• Conclusion
18

### Regularization

• Why Regularize?
• LASSO Regression
• Ridge Regression
• Elastic Net
• Cross-Validation
• Conclusion
19

### Clustering

• k-Means
• Hierarchical Clustering
• Conclusion
20

### Life Outside of Pandas

• The (Scientific) Computing Stack
• Performance
• Siuba
• Ibis
• Polars
• PyJanitor
• Pandera
• Machine Learning
• Publishing
• Dashboards
• Conclusion
21

### It’s Dangerous To Go Alone!

• Local Meetups
• Conferences
• The Carpentries
• Podcasts
• Other Resources
• Conclusion

### Appendix B: Installation and Setup

• B.1 Install Python
• B.2 Install Python Packages

### Appendix C: Command Line

• C.1 Installation
• C.2 Basics

### Appendix E: Using Python

• E.1 Command Line and Text Editor
• E.2 Python and IPython
• E.3 Jupyter
• E.4 Integrated Development Environments (IDEs)

### Appendix G: Environments

• G.1 Conda Environments
• G.2 Pyenv + Pipenv

### Appendix H: Install Packages

• H.1 Updating Packages

### Appendix J: Code Style

• J.1 Line Breaks in Code

### Appendix K: Containers: Lists, Tuples, and Dictionaries

• K.1 Lists
• K.2 Tuples
• K.3 Dictionaries

### Appendix O: Functions

• O.1 Default Parameters
• O.2 Arbitrary Parameters

### Appendix T: SettingWithCopyWarning

• T.1 Modifying a Subset of Data
• T.2 Replacing a Value
• T.3 More Resources

### Appendix W: String Formatting

• W.1 C-Style
• W.2 String Formatting: .format() Method
• W.3 Formatting Numbers

### Appendix Z: Replicating Results in R

• Z.1 Linear Regression
• Z.2 Logistic Regression
• Z.3 Poisson Regression
