This 24-minute conference talk explores the misconception that all data processing requires distributed systems like Spark. Learn why processing gigabytes or even hundreds of megabytes of data with heavyweight distributed frameworks is often unnecessary and wasteful. Discover how advancements in memory density and CPU performance, combined with efficient data engines like DuckDB, distributed storage, and increased bandwidth, enable doing more with less in today's post-ZIRP economy. Explore the benefits of small data approaches, understand why more data doesn't always equal better results, and see how single machines can provide efficient, powerful solutions with the advantage of local development simplicity.
Overview
Syllabus
Data infrastructure to build bigger with less
Taught by
Open Data Science