Lecture 9 – Data 100, Summer 2021

by Suraj Rampure, updates by Fernando Pérez.

adapted from Ani Adhikari

Bar Plots

We often use bar plots to display distributions of a categorical variable:

Note: putting a semicolon after a plot call hides all of the unnecessary text that comes after it (the <matplotlib.axes_....>).

But we can also use them to display a numerical variable that has been measured on individuals in different categories.

Rug plots

Used for visualizing a single quantitative variable. Rug plots show us each and every value.

Histograms

Our old friend!

The above plot shows counts, if we want to see a distribution we can use the density keyword:

Increasing bin width loses granularity, but this may be fine for our purposes.

The bin widths don't all need to be the same!

Density Curves

Seaborn has several related functions for plotting distributions: kdeplot, histplot, rugplot and displot. The latter is more generic but uses the others under the hood:

Can even show a rugplot with it!

displot is quite flexible, so instead of a histogram we can ask it, for example, to show the density curve and rugplot only:

Box Plots

Violin Plots

Overlaid Histograms and Density Curves

Side by side box plots and violin plots

A less fancy version of the above two plots:

Scatter plots