Pandas for Everyone: Python Data Analysis

(PYTHON-PANDAS.AP1)/ISBN:978-1-64459-413-1

This course includes
Lessons
TestPrep
Hands-On Labs
AI Tutor (Add-on)

Pandas is an open-source Python library for data analysis. The Pandas for Everyone: Python Data Analysis course focuses on loading data into Python with the help of the Pandas library. This course contains interactive lessons with knowledge checks, quizzes, and hands-on labs to get a deeper understanding of the concepts such as Pandas DataFrame and Data Structure Basics, Plotting Basics, Tidy Data, Data Assembly, Data Normalization, linear regression, survival models, and so on.

Lessons

47+ Lessons | 100+ Exercises | 90+ Quizzes | 109+ Flashcards | 109+ Glossary of terms

TestPrep

50+ Pre Assessment Questions | 50+ Post Assessment Questions |

Hands-On Labs

30+ LiveLab | 20+ Video tutorials | 43+ Minutes

Here's what you will learn

Download Course Outline

Lessons 1: Preface

  • Breakdown of the Course
  • How to Read This Course
  • Setup

Lessons 2: Pandas DataFrame Basics

  • Introduction
  • Load Your First Data Set
  • Look at Columns, Rows, and Cells
  • Grouped and Aggregated Calculations
  • Basic Plot
  • Conclusion

Lessons 3: Pandas Data Structures Basics

  • Create Your Own Data
  • The Series
  • The DataFrame
  • Making Changes to Series and DataFrames
  • Exporting and Importing Data
  • Conclusion

Lessons 4: Plotting Basics

  • Why Visualize Data?
  • Matplotlib Basics
  • Statistical Graphics Using matplotlib
  • Seaborn
  • Pandas Plotting Method
  • Conclusion

Lessons 5: Tidy Data

  • Columns Contain Values, Not Variables
  • Columns Contain Multiple Variables
  • Variables in Both Rows and Columns
  • Conclusion

Lessons 6: Apply Functions

  • Primer on Functions
  • Apply (Basics)
  • Vectorized Functions
  • Lambda Functions (Anonymous Functions)
  • Conclusion

Lessons 7: Data Assembly

  • Combine Data Sets
  • Concatenation
  • Observational Units Across Multiple Tables
  • Merge Multiple Data Sets
  • Conclusion

Lessons 8: Data Normalization

  • Multiple Observational Units in a Table (Normalization)
  • Conclusion

Lessons 9: Groupby Operations: Split-Apply-Combine

  • Aggregate
  • Transform
  • Filter
  • The pandas.core.groupby. DataFrameGroupBy object
  • Working With a MultiIndex
  • Conclusion

Lessons 10: Missing Data

  • What Is a NaN Value?
  • Where Do Missing Values Come From?
  • Working With Missing Data
  • Pandas Built-In NA Missing
  • Conclusion

Lessons 11: Data Types

  • Data Types
  • Converting Types
  • Categorical Data
  • Conclusion

Lessons 12: Strings and Text Data

  • Introduction
  • Strings
  • String Methods
  • More String Methods
  • String Formatting (F-Strings)
  • Regular Expressions (RegEx)
  • The regex Library
  • Conclusion

Lessons 13: Dates and Times

  • Python's datetime Object
  • Converting to datetime
  • Loading Data That Include Dates
  • Extracting Date Components
  • Date Calculations and Timedeltas
  • Datetime Methods
  • Getting Stock Data
  • Subsetting Data Based on Dates
  • Date Ranges
  • Shifting Values
  • Resampling
  • Time Zones
  • Arrow for Better Dates and Times
  • Conclusion

Lessons 14: Linear Regression (Continuous Outcome Variable)

  • Simple Linear Regression
  • Multiple Regression
  • Models with Categorical Variables
  • One-Hot Encoding in scikit-learn with Transformer Pipelines
  • Conclusion

Lessons 15: Generalized Linear Models

  • About This Lesson
  • Logistic Regression (Binary Outcome Variable)
  • Poisson Regression (Count Outcome Variable)
  • More Generalized Linear Models
  • Conclusion

Lessons 16: Survival Analysis

  • Survival Data
  • Kaplan Meier Curves
  • Cox Proportional Hazard Model
  • Conclusion

Lessons 17: Model Diagnostics

  • Residuals
  • Comparing Multiple Models
  • k-Fold Cross-Validation
  • Conclusion

Lessons 18: Regularization

  • Why Regularize?
  • LASSO Regression
  • Ridge Regression
  • Elastic Net
  • Cross-Validation
  • Conclusion

Lessons 19: Clustering

  • k-Means
  • Hierarchical Clustering
  • Conclusion

Lessons 20: Life Outside of Pandas

  • The (Scientific) Computing Stack
  • Performance
  • Dask
  • Siuba
  • Ibis
  • Polars
  • PyJanitor
  • Pandera
  • Machine Learning
  • Publishing
  • Dashboards
  • Conclusion

Lessons 21: It’s Dangerous To Go Alone!

  • Local Meetups
  • Conferences
  • The Carpentries
  • Podcasts
  • Other Resources
  • Conclusion

Appendix A: Concept Maps

Appendix B: Installation and Setup

  • B.1 Install Python
  • B.2 Install Python Packages
  • B.3 Download Book Data

Appendix C: Command Line

  • C.1 Installation
  • C.2 Basics

Appendix D: Project Templates

Appendix E: Using Python

  • E.1 Command Line and Text Editor
  • E.2 Python and IPython
  • E.3 Jupyter
  • E.4 Integrated Development Environments (IDEs)

Appendix F: Working Directories

Appendix G: Environments

  • G.1 Conda Environments
  • G.2 Pyenv + Pipenv

Appendix H: Install Packages

  • H.1 Updating Packages

Appendix I: Importing Libraries

Appendix J: Code Style

  • J.1 Line Breaks in Code

Appendix K: Containers: Lists, Tuples, and Dictionaries

  • K.1 Lists
  • K.2 Tuples
  • K.3 Dictionaries

Appendix L: Slice Values

Appendix M: Loops

Appendix N: Comprehensions

Appendix O: Functions

  • O.1 Default Parameters
  • O.2 Arbitrary Parameters

Appendix P: Ranges and Generators

Appendix Q: Multiple Assignment

Appendix R: Numpy ndarray

Appendix S: Classes

Appendix T: SettingWithCopyWarning

  • T.1 Modifying a Subset of Data
  • T.2 Replacing a Value
  • T.3 More Resources

Appendix U: Method Chaining

Appendix V: Timing Code

Appendix W: String Formatting

  • W.1 C-Style
  • W.2 String Formatting: .format() Method
  • W.3 Formatting Numbers

Appendix X: Conditionals (if-elif-else)

Appendix Y: New York ACS Logistic Regression Example

Appendix Z: Replicating Results in R

  • Z.1 Linear Regression
  • Z.2 Logistic Regression
  • Z.3 Poisson Regression

Hands-on LAB Activities

Pandas DataFrame Basics

  • Performing Grouped and Aggregated Calculations Using the .groupby() Method

Pandas Data Structures Basics

  • Creating a DataFrame and Making Changes to it

Plotting Basics

  • Creating a Scatter Plot Using Multivariate Data
  • Creating a Density Plot Using Bivariate Data

Tidy Data

  • Using Functions and Methods to Process and Tidy Data

Apply Functions

  • Performing Calculations Across DataFrames
  • Vectorizing Functions

Data Assembly

  • Performing Concatenation Using the concat() Function
  • Merging Multiple Data Sets Using the .merge() Function

Data Normalization

  • Understanding Multiple Observational Units in a Data Set

Groupby Operations: Split-Apply-Combine

  • Performing Data Summarization Using Group-by Operations
  • Performing Boolean Subsetting on the Data
  • Performing Operations on Grouped Objects

Missing Data

  • Finding and Cleaning Missing Data

Data Types

  • Performing Data Type Conversion

Strings and Text Data

  • Finding and Substituting a Pattern

Dates and Times

  • Converting an Object Type into a datetime Type
  • Extracting Date Components from the Data
  • Getting Stock Data and Subsetting it Based on Dates
  • Resampling Dates Using the .resample() Method

Linear Regression (Continuous Outcome Variable)

  • Performing Linear Regression
  • Performing Multiple Regression

Generalized Linear Models

  • Performing Logistic Regression
  • Performing Poisson Regression Using the poisson() Function

Survival Analysis

  • Performing Survival Analysis Using the KaplanMeierFitter() Function

Model Diagnostics

  • Comparing Models Using Cross-Validation

Regularization

  • Performing L1 Regularization Using the Lasso() Function
  • Performing L2 Regularization Using the Ridge() Function

Clustering

  • Performing k-Means Clustering
  • Using Hierarchical Clustering Algorithms