Explore various data formats for data science beyond CSV and HDFS in this 43-minute EuroPython Conference talk. Delve into the strengths and limitations of different data storage solutions, including plain text, structured formats, and specialized scientific data formats like HDF5, ROOT, and NetCDF. Compare Pythonic implementations such as xarray, pyROOT, rootpy, h5py, PyTables, bcolz, and blaze for handling diverse data structures and sizes. Gain practical guidelines for choosing appropriate formats based on data characteristics and computational requirements. Discover emerging trends in columnar databases like MonetDB for high-speed in-memory analytics, equipping data scientists with a comprehensive understanding of data format options and their applications in scientific computing.
Overview
Syllabus
Intro
Data formats for data science
Textual data format
LowTake
CSV
Python
Textual Data
Binary Data
New HDF5 File
PyTables
Groups
DataChannelKing
Route
Root Files
Root Pi
Root Numpy
NoSQL DB
HDFS
HDFS III
Example
Python Code Example
Tools
Taught by
EuroPython Conference