What you'll learn:
- In depth knowledge on Druid Components and it's Architecture
- Realtime data ingestion from Apache Kafka using Twitter Producer application
- Tuning Apache Druid for better throughput
- Accessing Apache Druid Tables through Avatica JDBC driver
- Learning Schema Evolution
- Complete Druid Hive Integration with hands-on experience
- Presto Druid Connector
What do you learn from this course ?
In this course, we learn end-to-end apache druid salient features and integration with Apache Hive, Prestodb (Trino), Spark and Schema Registry one by one.
We start this course by gaining theoretical knowledge on Druid and its key features. We write our own Twitter Producer app which pulls the tweets from twitter in realtime and push the tweets to apache Kafka. We create a Kafka streaming task on Druid which pull tweets from Kafka and store it into Apache Druid. Also, we learn how to apply transformation, filter, schema configuration, tuning during kafka ingestion.
In the 3rd module, we explore Native and SQL Batch ingestion methods in depth. In an ETL pipeline after extract and transform step, if you want to load the dataset to druid, then you must checkout this section. We will automate the entire loading to druid part.
In the 4th module, we learn how to read druid tables using Spark and create Spark Dataframe from it. We also explore the predicate and aggregate pushdown spark features.
Section 5, talk about the Schema Registry. We learn how druid talk to schema registry and achieve the schema validation. Also how druid parse the Avro records.
Section 6 & 7 exposes the out of box druid capabilities. which are hive and presto integration. If your organisation data resides in hive or presto and you would like to join with druid table, then you should accomplish hive or presto integration.