Lecture 24 – Clustering

by Josh Hug (Fall 2019)

Important: This lecture is taken from the Fall 2019 semester.

  • The variant of K-Means mentioned throughout this lecture seeks to minimize distortion, but most packages that implement K-Means (including scikit-learn) seek to minimize inertia instead of distortion. You will work with scikit-learn’s implementation of K-Means in Lab 14.
  • In the lecture, there are a couple of plots that you might not be familiar with. The initial clustering example from 24.1 is taken from the first problem of Fall 2019 Midterm 2, and you will see the state plot from the beginning of 24.5 on Homework 8.
Video Quick Check
24.1
Introduction to clustering. Taxonomy of machine learning. Examples of clustering in practice.
24.1
24.2
The K-Means clustering algorithm. Example of K-Means clustering.
24.2
24.3
Loss functions for K-Means. Inertia and distortion. Optimizing distortion.
24.3
24.4
Agglomerative clustering as an alternative to K-Means. Example of agglomerative clustering. Dendrograms and other clustering algorithms.
24.4
24.5
Picking the number of clusters. The elbow method and silhouette scores. Summary of clustering and machine learning.
24.5