Learn SQL Basics for Data Science

University of California, Davis via Coursera Specialization

Go to class Write review

Details

Go to class

Provider

Coursera Specialization
Pricing

Paid Course
Languages

English
Certificate

Certificate Available
Duration & workload

17 weeks, 5 hours a week
Level

Beginner

Found in

Overview

Unlock Unlimited Opportunities: Get 50% Off Your First Month of Coursera Plus

Grab it

This Specialization is intended for a learner with no previous coding experience seeking to develop SQL query fluency. Through four progressively more difficult SQL projects with data science applications, you will cover topics such as SQL basics, data wrangling, SQL analysis, AB testing, distributed computing using Apache Spark, Delta Lake and more. These topics will prepare you to apply SQL creatively to analyze and explore data; demonstrate efficiency in writing queries; create data analysis datasets; conduct feature engineering, use SQL with other data analysis and machine learning toolsets; and use SQL with unstructured data sets.

Syllabus

Course 1: SQL for Data Science
- Offered by University of California, Davis. As data collection has increased exponentially, so has the need for people skilled at using and ... Enroll for free.

Course 2: Data Wrangling, Analysis and AB Testing with SQL
- Offered by University of California, Davis. This course allows you to apply the SQL skills taught in “SQL for Data Science” to four ... Enroll for free.

Course 3: Distributed Computing with Spark SQL
- Offered by University of California, Davis. This course is all about big data. It’s for students with SQL experience that want to take the ... Enroll for free.

Course 4: SQL for Data Science Capstone Project
- Offered by University of California, Davis. Data science is a dynamic and growing career field that demands knowledge and skills-based in ... Enroll for free.

Courses

5 reviews
15 hours 20 minutes
View details

As data collection has increased exponentially, so has the need for people skilled at using and interacting with data; to be able to think critically, and provide insights to make better decisions and optimize their businesses. This is a data scientist, “part mathematician, part computer scientist, and part trend spotter” (SAS Institute, Inc.). According to Glassdoor, being a data scientist is the best job in America; with a median base salary of $110,000 and thousands of job openings at a time. The skills necessary to be a good data scientist include being able to retrieve and work with data, and to do that you need to be well versed in SQL, the standard language for communicating with database systems. This course is designed to give you a primer in the fundamentals of SQL and working with data so that you can begin analyzing it for data science purposes. You will begin to ask the right questions and come up with good answers to deliver valuable insights for your organization. This course starts with the basics and assumes you do not have any knowledge or skills in SQL. It will build on that foundation and gradually have you write both simple and complex queries to help you select data from tables. You'll start to work with different types of data like strings and numbers and discuss methods to filter and pare down your results. You will create new tables and be able to move data into them. You will learn common operators and how to combine the data. You will use case statements and concepts like data governance and profiling. You will discuss topics on data, and practice using real-world programming assignments. You will interpret the structure, meaning, and relationships in source data and use SQL as a professional to shape your data for targeted analysis purposes. Although we do not have any specific prerequisites or software requirements to take this course, a simple text editor is recommended for the final project. So what are you waiting for? This is your first step in landing a job in the best occupation in the US and soon the world!
0 reviews
15 hours 33 minutes
View details

This course allows you to apply the SQL skills taught in “SQL for Data Science” to four increasingly complex and authentic data science inquiry case studies. We'll learn how to convert timestamps of all types to common formats and perform date/time calculations. We'll select and perform the optimal JOIN for a data science inquiry and clean data within an analysis dataset by deduping, running quality checks, backfilling, and handling nulls. We'll learn how to segment and analyze data per segment using windowing functions and use case statements to execute conditional logic to address a data science inquiry. We'll also describe how to convert a query into a scheduled job and how to insert data into a date partition. Finally, given a predictive analysis need, we'll engineer a feature from raw data using the tools and skills we've built over the course. The real-world application of these skills will give you the framework for performing the analysis of an AB test.
0 reviews
13 hours 32 minutes
View details

This course is all about big data. It’s for students with SQL experience that want to take the next step on their data journey by learning distributed computing using Apache Spark. Students will gain a thorough understanding of this open-source standard for working with large datasets. Students will gain an understanding of the fundamentals of data analysis using SQL on Spark, setting the foundation for how to combine data with advanced analytics at scale and in production environments. The four modules build on one another and by the end of the course you will understand: the Spark architecture, queries within Spark, common ways to optimize Spark SQL, and how to build reliable data pipelines. The first module introduces Spark and the Databricks environment including how Spark distributes computation and Spark SQL. Module 2 covers the core concepts of Spark such as storage vs. compute, caching, partitions, and troubleshooting performance issues via the Spark UI. It also covers new features in Apache Spark 3.x such as Adaptive Query Execution. The third module focuses on Engineering Data Pipelines including connecting to databases, schemas and data types, file formats, and writing reliable data. The final module covers data lakes, data warehouses, and lakehouses. Students build production grade data pipelines by combining Spark with the open-source project Delta Lake. By the end of this course, students will hone their SQL and distributed computing skills to become more adept at advanced analysis and to set the stage for transitioning to more advanced analytics as Data Scientists.
0 reviews
1 day 11 hours 11 minutes
View details

Data science is a dynamic and growing career field that demands knowledge and skills-based in SQL to be successful. This course is designed to provide you with a solid foundation in applying SQL skills to analyze data and solve real business problems. Whether you have successfully completed the other courses in the Learn SQL Basics for Data Science Specialization or are taking just this course, this project is your chance to apply the knowledge and skills you have acquired to practice important SQL querying and solve problems with data. You will participate in your own personal or professional journey to create a portfolio-worthy piece from start to finish. You will choose a dataset and develop a project proposal. You will explore your data and perform some initial statistics you have learned through this specialization. You will uncover analytics for qualitative data and consider new metrics that make sense from the patterns that surface in your analysis. You will put all of your work together in the form of a presentation where you will tell the story of your findings. Along the way, you will receive feedback through the peer-review process. This community of fellow learners will provide additional input to help you refine your approach to data analysis with SQL and present your findings to clients and management.