Data engineers and Big Data professionals are in overwhelming demand. NoSQL and Big Data technology skills such as Apache Spark are a must-have for modern day data-driven decision-making. This three-course Professional Certificate from IBM opens the door for data engineering and big data careers.
Starting with NoSQL Database Basics, this course introduces you to NoSQL fundamentals, including the four key non-relational database categories. By the end of the course, you will have hands-on skills working with MongoDB, Cassandra, and IBM Cloudant NoSQL databases.
A crucial aspect of data engineering is the acquisition and management of Big Data and Big Data Analytics scalability and performance. When you enroll in Big Data, Hadoop, and Spark Basics, you'll discover the characteristics, features, benefits, limitations, and applications of some of the more popular Big Data processing tools. You explore the open-source ecosystem of Apache tools, including Apache Hadoop, Apache Hive, and Apache Spark, including Spark on Kubernetes. Discover how to leverage Spark to deliver reliable insights. You'll gain hands-on data analysis skills using PySpark and Spark SQL and create a streaming analytics application using Spark Streaming, and more.
Then enroll in Apache Spark for Data Engineering and Machine Learning to discover how data and machine learning engineers use Spark Structured Streaming, GraphFrames, Regression, Classification, and clustering. Learn about clustering and how to apply the k-means clustering algorithm using Spark MLlib. Extraction Transformation and Loading, (ETL) is at the heart of data and machine learning engineering, and you'll gain skills using Spark to perform extract, transform and load (ETL) tasks. This course culminates with a hands-on Spark project.
This Professional Certificate does not require any prior programming or data science skills; however, prior basic data literacy and SQL skills will prove valuable in completing this program.
Courses under this program: Course 1: NoSQL Database Basics
This course introduces you to the fundamentals of NoSQL, including. the four key non-relational database categories. By the end of the course you will have hands-on skills for working with MongoDB Cassandra and IBM Cloudant NoSQL databases.
Course 2: Big Data, Hadoop, and Spark Basics
This course provides foundational big data practitioner knowledge and analytical skills using popular big data tools, including Hadoop and Spark. Learn and practice your big data skills hands-on.
Course 3: Apache Spark for Data Engineering and Machine Learning
This short course introduces you to the fundamentals of Data Engineering and Machine Learning with Apache Spark, including Spark Structured Streaming, ETL for Machine Learning (ML) Pipelines, and Spark ML. By the end of the course, you will have hands-on experience applying Spark skills to ETL and ML workflows.
This course will provide you with technical hands-on knowledge of NoSQL databases and Database-as-a-Service (DaaS) offerings. With the advent of Big Data and agile development methodologies, NoSQL databases have gained a lot of relevance in the database landscape. Their main advantage is the ability to effectively handle scalability and flexibility issues raised by modern applications.
You will start by learning the history and the basics of NoSQL databases and discover their key characteristics and benefits. You will learn about the four categories of NoSQL databases and how they differ from each other.
You will explore the architecture and features of several different implementations of NoSQL databases, namely MongoDB, Cassandra, and IBM Cloudant.
Throughout the course you will get practical experience using these NoSQL databases to perform standard database management tasks, such as creating and replicating databases, loading and querying data, modifying database permissions, indexing and aggregating data, and sharding (or partitioning) data.
The course ends with a hands-on project to test your understanding of some of the basics of working with several NoSQL database offerings.
Organizations need skilled, forward-thinking Big Data practitioners who can apply their business and technical skills to unstructured data such as tweets, posts, pictures, audio files, videos, sensor data, and satellite imagery, and more, to identify behaviors and preferences of prospects, clients, competitors, and others. ****
This course introduces you to Big Data concepts and practices. You will understand the characteristics, features, benefits, limitations of Big Data and explore some of the Big Data processing tools. You'll explore how Hadoop, Hive, and Spark can help organizations overcome Big Data challenges and reap the rewards of its acquisition.
Hadoop, an open-source framework, enables distributed processing of large data sets across clusters of computers using simple programming models. Each computer, or node, offers local computation and storage, allowing datasets to be processed faster and more efficiently. Hive, a data warehouse software, provides an SQL-like interface to efficiently query and manipulate large data sets in various databases and file systems that integrate with Hadoop.
Open-source Apache Spark is a processing engine built around speed, ease of use, and analytics that provides users with newer ways to store and use big data.
You will discover how to leverage Spark to deliver reliable insights. The course provides an overview of the platform, going into the different components that make up Apache Spark. In this course, you will also learn how Resilient Distributed Datasets, known as RDDs, enable parallel processing across the nodes of a Spark cluster.
You'll gain practical skills when you learn how to analyze data in Spark using PySpark and Spark SQL and how to create a streaming analytics application using Spark Streaming, and more.
Apache® Spark™ is a fast, flexible, and developer-friendly open-source platform for large-scale SQL, batch processing, stream processing, and machine learning. Users can take advantage of its open-source ecosystem, speed, ease of use, and analytic capabilities to work with Big Data in new ways.
In this short course, you explore concepts and gain hands-on skills to use Spark for data engineering and machine learning applications. You'll learn about Spark Structured Streaming, including data sources, output modes, operations. Then, explore how Graph theory works and discover how GraphFrames supports Spark DataFrames and popular algorithms.
Organizations can acquire data from structured and unstructured sources and deliver the data to users in formats they can use. Learn how to use Spark for extract, transform and load (ETL) data. Then, you'll hone your newly acquired skills during your "ETL for Machine Learning Pipelines" lab.
Next, discover why machine learning practitioners prefer Spark. You'll learn how to create pipelines and quickly implement features for extraction, selections, and transformations on structured data sets. Discover how to perform classification and regression using Spark. You'll be able to define and identify both supervised and unsupervised learning. Learn about clustering and how to apply the k-mean s clustering algorithm using Spark MLlib. You'll reinforce your knowledge with focused, hands-on labs and a final project where you will apply Spark to a real-world inspired problem.
Prior to taking this course, please ensure you have foundational Spark knowledge and skills, for example, by first completing the IBM course titled "Big Data, Hadoop and Spark Basics."
Aije Egwaikhide, Karthik Muthuraman, Romeo Kienzler, Rav Ahuja, Steve Ryan and Ramesh Sannareddy