"In-Memory Data Management" focuses on the management of enterprise data in column-oriented in-memory databases. Latest hardware and software trends led to the development of a new revolutionary database technology that enables flexible and lightning-fast analysis of massive amounts of enterprise data. The basic concepts and design principles of this technology are explained in detail. Beyond that, the implications of the underlying design principles for future enterprise applications and their development are discussed. The course will explain in detail the differences and advantages of an in-memory column-oriented database in contrast to traditional row-oriented disk-based storages. The following topics are covered (among others):
Requirements for Modern Enterprise Computing, Enterprise Application Characteristics
Hardware Trends, Columnar Storage vs. Row Storage
Dictionary Encoding, Compression
Scans, Selects, Deletes, Inserts, Updates
Week 1: The first week will give you an understanding of origins of enterprise computing. It is vital to know the historic development which lead to the emergence of current hardware as we know it now in order to understand the decisions made in the past. Many characteristics of current applications, like materialized aggregates and a reduction of detail in the stored information, have their roots in the past. While these measures were helpful in former systems, they form an obstacle which has to be overcome now in order to allow for new, dynamic applications.
Week 2: Within the second week, the differences between a horizontal, row oriented layout and a columnar layout are discussed. Concepts like compression and partitioning are introduced. Based on that, you will get an explanation of the internal steps performed inside the database to carry out the fundamental relational operations insert, update and delete. The week concludes with a fundamental difference of SanssouciDB to most other databases: the insert only approach. Following this concept, we circumvent several pitfalls concerning referential integrity and additionally gain the foundation for a gap-less time travel feature.
Week 3: The content of week 3 focuses on more advanced structures and operations within the database. The differential buffer, a means to prevent frequent resorting of the dictionaries and rewriting of the attribute vectors, is explained in further detail. Subsequently, also the merge process, which incorporates the changes from the differential buffer into the main store, is illustrated. The retrieval of information via the select statement, as well as related concepts like tuple reconstruction, early and late materialization, or a closer examination of the achieved scan speed, are also part of this week's schedule. The description of the join operation, which is used to connect information from different tables, concludes this week.
Week 4: Week 4 is all about aggregation. Aggregations are the centerpiece of every business analytics application. Given that huge impact of aggregates on all parts of a business, it is of great importance to understand what aggregate functions are, why we remove all materialized aggregates and go for aggregation on the fly. You will further learn how to greatly reduce the costs of this on demand approach by using the aggregate cache and understand its connection to the differential buffer and the merge process. In the units concluding this week, you will see new prototype applications using the aggregate cache to deliver complex simulations in real time.
Week 5: Week 5 sheds light on some more inner mechanisms of the database. What happens in emergency situations, when for example the power is turned off? Logging and recovery are vital parts to know in order to understand why an in-memory database is as secure as a traditional disk based one. Further, the benefits of replicas are explained. We conclude the week with an outlook onto the implications that arise with the tremendously increased speed at hands.
Week 6: Week 6 is centered on applications. The last conceptual unit is about data separation into active and passive. After that, we showcase several prototypes and sketch out potential fields to apply the technology, thereby also leaving the domain of pure enterprise solutions, by using main memory databases in weather simulations and medicine.
Exam: The final exam will cover all content from the previous weeks and test your understanding of the course as a whole.
HChan completed this course, spending 3 hours a week on it and found the course difficulty to be very easy.
This course is basically an extended tech talk about the architecture of SansoucciDB, which is the R&D precursor of the famous SAP HANA database. Prof Plattner introduces several key design choices in the design of SansoucciDB. Primary among these are...
This course is basically an extended tech talk about the architecture of SansoucciDB, which is the R&D precursor of the famous SAP HANA database. Prof Plattner introduces several key design choices in the design of SansoucciDB. Primary among these are the (1) ability to load the entire working store into main memory (2) columnar orientation and the deisgn to support this - dictionaries and attribute vectors (3) additional operationalization architecture such as the differential store and the hot/cold stores. It is important to understand that these are a review of some solid design choices in a single implementation rather than an introduction to a broad range of concepts across the landscape, so the content of the course is a little less general than implied. However, it remains one of the best modern database implementation online courses available at the intermediate level.
The execution of the course is excellent, with very comprehensive reading materials and helpful quizzes at the end of every week. Prof Plattner's lecturing style needs some getting used to since he often does not cover the material in the slides, but engages in a discussion of his opinions and experience that influences these design decisions, often with helpful anecdotes. Don't be put off by this - listening to an industry veteran (with decades of experience as one of the technical founders of one of the largest database systems company in the world) directly discuss their experience and opinions is a rare treat, and is much better than having him just read the basic material that you should have read yourself anyway.
Couple of caveats: (1) I feel this course is biased towards the design of a single system. For example I cannot consider a course to be a good OS course if it only covers the design choices of Windows; in the same way I cannot consider this to be a sufficiently balanced and broad databases course. With that caveat aside, after taking the course you will really understand the design implications of SAP HANA, which is always very helpful for system architects. (2) when the course does digress towards breadth or other technologies, it is at its weakest. For example there's a small section on mapreduce which really does not fit in with the rest of the course. There's a short section explanaining concepts like joins and aggregates but surely anyone taking an intermediate-level systems implementation course would know those very well already, and if not, then the coverage does not seem to be at the level that would be helpful. And so on. Be prepared to skip or skim those sections for maximum use of your study time in this course.
Tijl De Backer
Tijl De Backer completed this course, spending 3 hours a week on it and found the course difficulty to be medium.