Overview
Class Central Tips
Prepare for a career in the high-growth field of data engineering. In this program, you’ll learn in-demand skills like Python, SQL, and Databases to get job-ready in less than 5 months.
Data engineering is building systems to gather data, process and organize raw data into usable information, and manage data. The work data engineers do provides the foundational information that data scientists and business intelligence (BI) analysts use to make recommendations and decisions.
This program will teach you the foundational data engineering skills employers are seeking for entry level data engineering roles, including Python, one of the most widely used programming languages. You’ll also master SQL, RDBMS, ETL, Data Warehousing, NoSQL, Big Data, and Spark with hands-on labs and projects.
You’ll learn to use Python programming language and Linux/UNIX shell scripts to extract, transform and load (ETL) data. You’ll also work with Relational Databases (RDBMS) and query data using SQL statements and use NoSQL databases as well as unstructured data.
When you complete the full program, you’ll have a portfolio of projects and a Professional Certificate from IBM to showcase your expertise. You’ll also earn an IBM Digital badge and will gain access to career resources to help you in your job search, including mock interviews and resume support.
This program is ACE® recommended—when you complete, you can earn up to 12 college credits.
Syllabus
Course 1: Introduction to Data Engineering
- Offered by IBM. Start your journey in one of the fastest growing professions today with this beginner-friendly Data Engineering course! You ... Enroll for free.
Course 2: Python for Data Science, AI & Development
- Offered by IBM. Kickstart your learning of Python with this beginner-friendly self-paced course taught by an expert. Python is one of the ... Enroll for free.
Course 3: Python Project for Data Engineering
- Offered by IBM. Showcase your Python skills in this Data Engineering Project! This short course is designed to apply your basic Python ... Enroll for free.
Course 4: Introduction to Relational Databases (RDBMS)
- Offered by IBM. Are you ready to dive into the world of data engineering? In this beginner level course, you will gain a solid understanding ... Enroll for free.
Course 5: Databases and SQL for Data Science with Python
- Offered by IBM. Working knowledge of SQL (or Structured Query Language) is a must for data professionals like Data Scientists, Data Analysts ... Enroll for free.
Course 6: Hands-on Introduction to Linux Commands and Shell Scripting
- Offered by IBM. This course provides a practical understanding of common Linux / UNIX shell commands. In this beginner friendly course, you ... Enroll for free.
Course 7: Relational Database Administration (DBA)
- Offered by IBM. Get started with Relational Database Administration and Database Management in this self-paced course! This course begins ... Enroll for free.
Course 8: ETL and Data Pipelines with Shell, Airflow and Kafka
- Offered by IBM. Delve into the two different approaches to converting raw data into analytics-ready data. One approach is the Extract, ... Enroll for free.
Course 9: Data Warehouse Fundamentals
- Offered by IBM. Whether you’re an aspiring data engineer, data architect, business analyst, or data scientist, strong data warehousing ... Enroll for free.
Course 10: Introduction to NoSQL Databases
- Offered by IBM. Get started with NoSQL Databases with this beginner-friendly introductory course! This course will provide technical, ... Enroll for free.
Course 11: Introduction to Big Data with Spark and Hadoop
- Offered by IBM. This self-paced IBM course will teach you all about big data! You will become familiar with the characteristics of big data ... Enroll for free.
Course 12: Machine Learning with Apache Spark
- Offered by IBM. Explore the exciting world of machine learning with this IBM course. Start by learning ML fundamentals before unlocking ... Enroll for free.
Course 13: Data Engineering Capstone Project
- Offered by IBM. Showcase your skills in this Data Engineering project! In this course you will apply a variety of data engineering skills ... Enroll for free.
- Offered by IBM. Start your journey in one of the fastest growing professions today with this beginner-friendly Data Engineering course! You ... Enroll for free.
Course 2: Python for Data Science, AI & Development
- Offered by IBM. Kickstart your learning of Python with this beginner-friendly self-paced course taught by an expert. Python is one of the ... Enroll for free.
Course 3: Python Project for Data Engineering
- Offered by IBM. Showcase your Python skills in this Data Engineering Project! This short course is designed to apply your basic Python ... Enroll for free.
Course 4: Introduction to Relational Databases (RDBMS)
- Offered by IBM. Are you ready to dive into the world of data engineering? In this beginner level course, you will gain a solid understanding ... Enroll for free.
Course 5: Databases and SQL for Data Science with Python
- Offered by IBM. Working knowledge of SQL (or Structured Query Language) is a must for data professionals like Data Scientists, Data Analysts ... Enroll for free.
Course 6: Hands-on Introduction to Linux Commands and Shell Scripting
- Offered by IBM. This course provides a practical understanding of common Linux / UNIX shell commands. In this beginner friendly course, you ... Enroll for free.
Course 7: Relational Database Administration (DBA)
- Offered by IBM. Get started with Relational Database Administration and Database Management in this self-paced course! This course begins ... Enroll for free.
Course 8: ETL and Data Pipelines with Shell, Airflow and Kafka
- Offered by IBM. Delve into the two different approaches to converting raw data into analytics-ready data. One approach is the Extract, ... Enroll for free.
Course 9: Data Warehouse Fundamentals
- Offered by IBM. Whether you’re an aspiring data engineer, data architect, business analyst, or data scientist, strong data warehousing ... Enroll for free.
Course 10: Introduction to NoSQL Databases
- Offered by IBM. Get started with NoSQL Databases with this beginner-friendly introductory course! This course will provide technical, ... Enroll for free.
Course 11: Introduction to Big Data with Spark and Hadoop
- Offered by IBM. This self-paced IBM course will teach you all about big data! You will become familiar with the characteristics of big data ... Enroll for free.
Course 12: Machine Learning with Apache Spark
- Offered by IBM. Explore the exciting world of machine learning with this IBM course. Start by learning ML fundamentals before unlocking ... Enroll for free.
Course 13: Data Engineering Capstone Project
- Offered by IBM. Showcase your skills in this Data Engineering project! In this course you will apply a variety of data engineering skills ... Enroll for free.
Courses
-
Working knowledge of SQL (or Structured Query Language) is a must for data professionals like Data Scientists, Data Analysts and Data Engineers. Much of the world's data resides in databases. SQL is a powerful language used for communicating with and extracting data from databases. In this course you will learn SQL inside out- from the very basics of Select statements to advanced concepts like JOINs. You will: -write foundational SQL statements like: SELECT, INSERT, UPDATE, and DELETE -filter result sets, use WHERE, COUNT, DISTINCT, and LIMIT clauses -differentiate between DML & DDL -CREATE, ALTER, DROP and load tables -use string patterns and ranges; ORDER and GROUP result sets, and built-in database functions -build sub-queries and query data from multiple tables -access databases as a data scientist using Jupyter notebooks with SQL and Python -work with advanced concepts like Stored Procedures, Views, ACID Transactions, Inner & Outer JOINs through hands-on labs and projects You will practice building SQL queries, work with real databases on the Cloud, and use real data science tools. In the final project you’ll analyze multiple real-world datasets to demonstrate your skills.
-
Kickstart your learning of Python with this beginner-friendly self-paced course taught by an expert. Python is one of the most popular languages in the programming and data science world and demand for individuals who have the ability to apply Python has never been higher. This introduction to Python course will take you from zero to programming in Python in a matter of hours—no prior programming experience necessary! You will learn about Python basics and the different data types. You will familiarize yourself with Python Data structures like List and Tuples, as well as logic concepts like conditions and branching. You will use Python libraries such as Pandas, Numpy & Beautiful Soup. You’ll also use Python to perform tasks such as data collection and web scraping with APIs. You will practice and apply what you learn through hands-on labs using Jupyter Notebooks. By the end of this course, you’ll feel comfortable creating basic programs, working with data, and automating real-world tasks using Python. This course is suitable for anyone who wants to learn Data Science, Data Analytics, Software Development, Data Engineering, AI, and DevOps as well as a number of other job roles.
-
Start your journey in one of the fastest growing professions today with this beginner-friendly Data Engineering course! You will be introduced to the core concepts, processes, and tools you need to know in order to get a foundational knowledge of data engineering. as well as the roles that Data Engineers, Data Scientists, and Data Analysts play in the ecosystem. You will begin this course by understanding what is data engineering as well as the roles that Data Engineers, Data Scientists, and Data Analysts play in this exciting field. Next you will learn about the data engineering ecosystem, the different types of data structures, file formats, sources of data, and the languages data professionals use in their day-to-day tasks. You will become familiar with the components of a data platform and gain an understanding of several different types of data repositories such as Relational (RDBMS) and NoSQL databases, Data Warehouses, Data Marts, Data Lakes and Data Lakehouses. You’ll then learn about Big Data processing tools like Apache Hadoop and Spark. You will also become familiar with ETL, ELT, Data Pipelines and Data Integration. This course provides you with an understanding of a typical Data Engineering lifecycle which includes architecting data platforms, designing data stores, and gathering, importing, wrangling, querying, and analyzing data. You will also learn about security, governance, and compliance. You will learn about career opportunities in the field of Data Engineering and the different paths that you can take for getting skilled as a Data Engineer. You will hear from several experienced Data Engineers, sharing their insights and advice. By the end of this course, you will also have completed several hands-on labs and worked with a relational database, loaded data into the database, and performed some basic querying operations.
-
Are you ready to dive into the world of data engineering? In this beginner level course, you will gain a solid understanding of how data is stored, processed, and accessed in relational databases (RDBMSes). You will work with different types of databases that are appropriate for various data processing requirements. You will begin this course by being introduced to relational database concepts, as well as several industry standard relational databases, including IBM DB2, MySQL, and PostgreSQL. Next, you’ll utilize RDBMS tools used by professionals such as phpMyAdmin and pgAdmin for creating and maintaining relational databases. You will also use the command line and SQL statements to create and manage tables. This course incorporates hands-on, practical exercises to help you demonstrate your learning. You will work with real databases and explore real-world datasets. You will create database instances and populate them with tables and data. At the end of this course, you will complete a final assignment where you will apply your accumulated knowledge from this course and demonstrate that you have the skills to: design a database for a specific analytics requirement, normalize tables, create tables and views in the database, load and access data. No prior knowledge of databases or programming is required. Anyone can audit this course at no-charge. If you choose to take this course and earn the Coursera course certificate, you can also earn an IBM digital badge upon successful completion of the course.
-
Showcase your Python skills in this Data Engineering Project! This short course is designed to apply your basic Python skills through the implementation of various techniques for gathering and manipulating data. You will take on the role of a Data Engineer by extracting data from multiple sources, and converting the data into specific formats and making it ready for loading into a database for analysis. You will also demonstrate your knowledge of web scraping and utilizing APIs to extract data. By the end of this hands-on project, you will have shown your proficiency with important skills to Extract Transform and Load (ETL) data using an IDE, and of course, Python Programming. Upon completion of this course, you will also have a great new addition to your portfolio! PRE-REQUISITE: **Python for Data Science, AI and Development** course from IBM is a pre-requisite for this project course. Please ensure that before taking this course you have either completed the Python for Data Science, AI and Development course from IBM or have equivalent proficiency in working with Python and data. NOTE: This course is not intended to teach you Python and does not have too much new instructional content. It is intended for you to mostly apply prior Python knowledge.
-
Get started with NoSQL Databases with this beginner-friendly introductory course! This course will provide technical, hands-on knowledge of NoSQL databases and Database-as-a-Service (DaaS) offerings. With the advent of Big Data and agile development methodologies, NoSQL databases have gained a lot of relevance in the database landscape. Their main advantage is the ability to handle scalability and flexibility issues modern applications raise. You will start this course by learning the history and the basics of NoSQL databases (document, key-value, column, and graph) and discover their key characteristics and benefits. You will learn about the four categories of NoSQL databases and how they differ. You’ll also explore the differences between the ACID and BASE consistency models, the pros and cons of distributed systems, and when to use RDBMS and NoSQL. You will also learn about vector databases, an emerging class of databases popular in AI. Next, you will explore the architecture and features of several implementations of NoSQL databases, namely MongoDB, Cassandra, and IBM Cloudant. You will learn about the common tasks that they each perform and their key and defining characteristics. You will then get hands-on experience using those NoSQL databases to perform standard database management tasks, such as creating and replicating databases, loading and querying data, modifying database permissions, indexing and aggregating data, and sharding (or partitioning) data. At the end of this course, you will complete a final project where you will apply all your knowledge of the course content to a specific scenario and work with several NoSQL databases. This course suits anyone wanting to expand their Data Management and Information Technology skill set.
-
Kickstart your Data Warehousing and Business Intelligence (BI) Analytics journey with this self-paced course. You will learn how to design, deploy, load, manage, and query data warehouses and data marts. You will also work with BI tools to analyze data in these repositories. You will begin this course by understanding different kinds of analytics repositories including data marts, data warehouses, data lakes, data lakehouses, and data reservoirs, and their functions and uses. They are designed to enable rapid business decision making through accurate and flexible reporting and data analysis. A data warehouse is one of the most fundamental business intelligence tools in use today, and one that successful Data Engineers must understand. In this course, you will learn to design, model and implement data warehouses and explore data-warehousing architectures such as Star and Snowflake schemas. You will also learn how to populate data warehouses using ETL and ELT processes, verify data, query data and how to use Cubes, Rollups, and materialized views/tables. You will become familiar with different BI tools used by experts in the industry such as IBM Cognos Analytics, Tableau, and Microsoft PowerBI. You will also use a BI tool to create data visualizations and build interactive dashboards to gain insights from data. The hands-on labs in this course will enable you to apply what you learn and gain a practical knowledge of Data Warehousing and BI Analytics. You will work with repositories like MySQL, PostgreSQL, and IBM Db2. You will also use BI tools like Cognos Analytics. At the end of this course, you will complete a project to demonstrate the skills you acquired in each module.
-
Delve into the two different approaches to converting raw data into analytics-ready data. One approach is the Extract, Transform, Load (ETL) process. The other contrasting approach is the Extract, Load, and Transform (ELT) process. ETL processes apply to data warehouses and data marts. ELT processes apply to data lakes, where the data is transformed on demand by the requesting/calling application. In this course, you will learn about the different tools and techniques that are used with ETL and Data pipelines. Both ETL and ELT extract data from source systems, move the data through the data pipeline, and store the data in destination systems. During this course, you will experience how ELT and ETL processing differ and identify use cases for both. You will identify methods and tools used for extracting the data, merging extracted data either logically or physically, and for loading data into data repositories. You will also define transformations to apply to source data to make the data credible, contextual, and accessible to data users. You will be able to outline some of the multiple methods for loading data into the destination system, verifying data quality, monitoring load failures, and the use of recovery mechanisms in case of failure. By the end of this course, you will also know how to use Apache Airflow to build data pipelines as well be knowledgeable about the advantages of using this approach. You will also learn how to use Apache Kafka to build streaming pipelines as well as the core components of Kafka which include: brokers, topics, partitions, replications, producers, and consumers. Finally, you will complete a shareable final project that enables you to demonstrate the skills you acquired in each module.
-
Get started with Relational Database Administration and Database Management in this self-paced course! This course begins with an introduction to database management; you will learn about things like the Database Management Lifecycle, the roles of a Database Administrator (DBA) as well as database storage. You will then discover some of the activities, techniques, and best practices for managing a database. You will also learn about database optimization, including updating statistics, slow queries, types of indexes, and index creation and usage. You will learn about configuring and upgrading database server software and related products. You’ll also learn about database security; how to implement user authentication, assign roles, and assign object-level permissions. And gain an understanding of how to perform backup and restore procedures in case of system failures. You will learn how to optimize databases for performance, monitor databases, collect diagnostic data, and access error information to help you resolve issues that may occur. Many of these tasks are repetitive, so you will learn how to schedule maintenance activities and regular diagnostic tests and send automated messages of the success or failure of a task. The course includes both video-based lectures as well as hands-on labs to practice and apply what you learn. This course ends with a final project where you will assume the role of a database administrator and complete a number of database administration tasks across many different databases.
-
This self-paced IBM course will teach you all about big data! You will become familiar with the characteristics of big data and its application in big data analytics. You will also gain hands-on experience with big data processing tools like Apache Hadoop and Apache Spark. Bernard Marr defines big data as the digital trace that we are generating in this digital era. You will start the course by understanding what big data is and exploring how insights from big data can be harnessed for a variety of use cases. You’ll also explore how big data uses technologies like parallel processing, scaling, and data parallelism. Next, you will learn about Hadoop, an open-source framework that allows for the distributed processing of large data and its ecosystem. You will discover important applications that go hand in hand with Hadoop, like Distributed File System (HDFS), MapReduce, and HBase. You will become familiar with Hive, a data warehouse software that provides an SQL-like interface to efficiently query and manipulate large data sets. You’ll then gain insights into Apache Spark, an open-source processing engine that provides users with new ways to store and use big data. In this course, you will discover how to leverage Spark to deliver reliable insights. The course provides an overview of the platform, going into the components that make up Apache Spark. You’ll learn about DataFrames and perform basic DataFrame operations and work with SparkSQL. Explore how Spark processes and monitors the requests your application submits and how you can track work using the Spark Application UI. This course has several hands-on labs to help you apply and practice the concepts you learn. You will complete Hadoop and Spark labs using various tools and technologies, including Docker, Kubernetes, Python, and Jupyter Notebooks.
-
This course provides a practical understanding of common Linux / UNIX shell commands. In this beginner friendly course, you will learn about the Linux basics, Shell commands, and Bash shell scripting. You will begin this course with an introduction to Linux and explore the Linux architecture. You will interact with the Linux Terminal, execute commands, navigate directories, edit files, as well as install and update software. Next, you’ll become familiar with commonly used Linux commands. You will work with general purpose commands like id, date, uname, ps, top, echo, man; directory management commands such as pwd, cd, mkdir, rmdir, find, df; file management commands like cat, wget, more, head, tail, cp, mv, touch, tar, zip, unzip; access control command chmod; text processing commands - wc, grep, tr; as well as networking commands - hostname, ping, ifconfig and curl. You will then move on to learning the basics of shell scripting to automate a variety of tasks. You’ll create simple to more advanced shell scripts that involve Metacharacters, Quoting, Variables, Command substitution, I/O Redirection, Pipes & Filters, and Command line arguments. You will also schedule cron jobs using crontab. The course includes both video-based lectures as well as hands-on labs to practice and apply what you learn. You will have no-charge access to a virtual Linux server that you can access through your web browser, so you don't need to download and install anything to complete the labs. You’ll end this course with a final project as well as a final exam. In the final project you will demonstrate your knowledge of course concepts by performing your own Extract, Transform, and Load (ETL) process and create a scheduled backup script. This course is ideal for data engineers, data scientists, software developers, and cloud practitioners who want to get familiar with frequently used commands on Linux, MacOS and other Unix-like operating systems as well as get started with creating shell scripts.
-
Showcase your skills in this Data Engineering project! In this course you will apply a variety of data engineering skills and techniques you have learned as part of the previous courses in the IBM Data Engineering Professional Certificate. You will demonstrate your knowledge of Data Engineering by assuming the role of a Junior Data Engineer who has recently joined an organization and be presented with a real-world use case that requires architecting and implementing a data analytics platform. In this Capstone project you will complete numerous hands-on labs. You will create and query data repositories using relational and NoSQL databases such as MySQL and MongoDB. You’ll also design and populate a data warehouse using PostgreSQL and IBM Db2 and write queries to perform Cube and Rollup operations. You will generate reports from the data in the data warehouse and build a dashboard using Cognos Analytics. You will also show your proficiency in Extract, Transform, and Load (ETL) processes by creating data pipelines for moving data from different repositories. You will perform big data analytics using Apache Spark to make predictions with the help of a machine learning model. This course is the final course in the IBM Data Engineering Professional Certificate. It is recommended that you complete all the previous courses in this Professional Certificate before starting this course.
-
Explore the exciting world of machine learning with this IBM course. Start by learning ML fundamentals before unlocking the power of Apache Spark to build and deploy ML models for data engineering applications. Dive into supervised and unsupervised learning techniques and discover the revolutionary possibilities of Generative AI through instructional readings and videos. Gain hands-on experience with Spark structured streaming, develop an understanding of data engineering and ML pipelines, and become proficient in evaluating ML models using SparkML. In practical labs, you'll utilize SparkML for regression, classification, and clustering, enabling you to construct prediction and classification models. Connect to Spark clusters, analyze SparkSQL datasets, perform ETL activities, and create ML models using Spark ML and sci-kit learn. Finally, demonstrate your acquired skills through a final assignment. This intermediate course is suitable for aspiring and experienced data engineers, as well as working professionals in data analysis and machine learning. Prior knowledge in Big Data, Hadoop, Spark, Python, and ETL is highly recommended for this course.
Taught by
Aije Egwaikhide, Hima Vasudevan, Jeff Grossman, Joseph Santarcangelo, Karthik Muthuraman, Lin Joyner, Priya Kapoor, Ramesh Sannareddy, Rav Ahuja, Sabrina Spillner, Sam Prokopchuk, Sandip Saha Joy, Skills Network, Steve Ryan and Yan Luo