Lecture 24 – Big Data
Presented by Anthony D. Joseph
Content by Anthony D. Joseph, Joseph Gonzalez, Josh Hug
The Quick Check for this lecture is due Monday, December 7th at 11:59PM. A random one of the following Google Forms will give you an alphanumeric code once you submit; you should take this code and enter it into the “Lecture 24” question in the “Quick Check Codes” assignment on Gradescope to get credit for submitting this Quick Check.
An overview of big data, with several pertinent examples. Operational data stores and data warehouses. Extract, transform, load (ETL).
The multidimensional data model. Fact tables and dimension tables. Star schemas and snowflake schemas. Online analytics processing (OLAP).
Data warehouses and data lakes.
Distributed file systems and fault tolerance.
Distributed aggregation with MapReduce. The MapReduce abstraction.
Hadoop and Spark. Resilient Distributed Datasets (RDDs). Modin.