The Introduction to Data Science class will survey the foundational topics in data science, namely:

Data Manipulation

Data Analysis with Statistics and Machine Learning

Data Communication with Information Visualization

Data at Scale -- Working with Big Data

The class will focus on breadth and present the topics briefly instead of focusing on a single topic in depth. This will give you the opportunity to sample and apply the basic techniques of data science.

This course is also a part of our Data Analyst Nanodegree.

Why Take This Course?

You will have an opportunity to work through a data science project end to end, from analyzing a dataset to visualizing and communicating your data analysis.

Through working on the class project, you will be exposed to and understand the skills that are needed to become a data scientist yourself.

Syllabus

Lesson 1: Introduction to Data Science

Introduction to Data Science

What is a Data Scientist

Pi-Chaun (Data Scientist @ Google): What is Data Science?

Gabor (Data Scientist @ Twitter): What is Data Science?

Problems Solved by Data Science

Pandas

Dataframes

Create a New Dataframe

Lesson 2: Data Wrangling

What is Data Wrangling?

Acquiring Data

Common Data Formats

What are Relational Databases?

Aadhaar Data

Aadhaar Data and Relational Databases

Introduction to Databases Schemas

API’s

Data in JSON Format

How to Access an API efficiently

Missing Values

Easy Imputation

Impute using Linear Regression

Tip of the Imputation Iceberg

Lesson 3: Data Analysis

Statistical Rigor

Kurt (Data Scientist @ Twitter) - Why is Stats Useful?

Introduction to Normal Distribution

T Test

Welch T Test

Non-Parametric Tests

Non-Normal Data

Stats vs. Machine Learning

Different Types of Machine Learning

Prediction with Regression

Cost Function

How to Minimize Cost Function

Coefficients of Determination

Lesson 4: Data Visualization

Effective Information Visualization

Napoleon's March on Russia

Don (Principal Data Scientist @ AT&T): Communicating Findings

Rishiraj (Principal Data Scientist @ AT&T): Communicating Findings Well

Intro to data science is an intermediate level course that assumes basic Python programming skills and knowledge of statistics. The course focuses on gathering, manipulating, analyzing and visualizing data using Python and various Python packages such as numpy, scipy and pandas. One of the best parts about this course is getting some exposure to some Python packages in the scipy stack, although I wish more time was devoted to explaining what the various modules in the scipy stack do, how to set them up at home and when to use them.

The first lesson was fairly gentle introduction w…

Intro to data science is an intermediate level course that assumes basic Python programming skills and knowledge of statistics. The course focuses on gathering, manipulating, analyzing and visualizing data using Python and various Python packages such as numpy, scipy and pandas. One of the best parts about this course is getting some exposure to some Python packages in the scipy stack, although I wish more time was devoted to explaining what the various modules in the scipy stack do, how to set them up at home and when to use them.

The first lesson was fairly gentle introduction with an interesting homework project dealing with data from the Titanic disaster. Lesson 2 goes into more detail about gathering and cleaning data using Pandas and an additional module that lets you make SQL queries to extract data from Pandas data frames. Lesson 3 jumps into data analysis with a T test and linear regression using gradient descent. Going from basic data manipulation into these topics was a bit jarring in terms of difficulty and more time could have been spent explaining how the functions worked. I left without a great appreciation of what gradient descent is really doing. Lesson 4 is focused on making visualizations using a module that attempts to port the functionality R language’s ggplot2 plotting package. Finally, lesson 5 introduces the concept of big data and MapReduce as a solution to deal with large data sets. Each homework assignment after the first has students dealing with New York subway turnstile data, which allows students to get some level of familiarity with the data throughout the course. This was a very good decision, since it lets students focus on learning new concepts rather than spending time familiarizing themselves with new data sets over and over again.

by
Lukascompleted this course and found the course difficulty to be medium.

It brings introduction in many areas, but it does not go into depth to any area. For more advanced classes look for other courses on Udacity. Good as introduction.

by
Joe is taking this course right now, spending 8 hours a week on it and found the course difficulty to be medium.

I was skeptical when I enrolled in UDACITY's Data Analysis Nano Degree Program but not only have they provided the experience they said they would they have steadily made improvements since I enrolled. How many times in your life have you had that experience? Here are SOME of the improvements they have made while I have been enrolled. Initially one could get one-on-one help but usually it was 1 to 2 days out and but then it was video chat.

This was great. I had tried a competitor's course and sometime s one just cannot figure out why something is not working. But not wi…

I was skeptical when I enrolled in UDACITY's Data Analysis Nano Degree Program but not only have they provided the experience they said they would they have steadily made improvements since I enrolled. How many times in your life have you had that experience? Here are SOME of the improvements they have made while I have been enrolled. Initially one could get one-on-one help but usually it was 1 to 2 days out and but then it was video chat.

This was great. I had tried a competitor's course and sometime s one just cannot figure out why something is not working. But not with Udacity. Then they scrapped that and instituted a MENTOR program. Here one could instant message someone who would get back to you in a few hours. Then they scrapped that and now offer LIVE HELP. It is a chat box that one types the gist of your question into. In less than 10 min, often in 3 min , someone comes on. Usually they can immediately figure out your mistake ( it seems students make a finite # of errors) but if they cant they ask you to copy and paste your code. And if they still cannot figure it out, i.e., if you have really made a mess of things they do a screen sharing session to get you back on the rails . Don't make a mistake. Just sign up for Udacity.

by
Shahrukhcompleted this course, spending 5 hours a week on it and found the course difficulty to be easy.

Though the course uses interesting examples for teaching concepts in relation to data science, the over reliance of the online grader for practice often makes learning redundant. Big part of learning programming is experimentation which the grader does not allow for.