What you'll learn:
- You will learn both Python and R Programming with Data Science in this course.
- Python: You will first learn how to Install Anaconda and Jupyter on your desktop/laptop
- Python: You will understand and learn the basics of For Loops and Advanced For Loops. You will have clarity on Python generators and will master the flow of your code using "If Else"
- Python: You will understand Why foundations Modify Lists and Dictionaries and Functions. Learn how to analyze, retrieve and clean data with Python
- Python: Learn Concatenation (Combining Tables) with Python and Pandas and Manipulating Time and Date Data with Python Datetime
- Python: You will learn to Use Pandas with Large Data Sets, Time Series Analysis and Effective Data Visualization in Python
- R: You will learn the most important tools in R that will allow you to do data science
- R: You will have the tools to tackle a wide variety of data science challenges, using the best parts of R.
- R: You will learn how to Tidy the data. Tidying your data means storing it in a consistent form that matches the semantics of the dataset with the way it is stored.
- R: You will learn Visualisation, it is a fundamentally human activity. A good visualisation will show you things that you did not expect, or raise new questions about the data
- R: You will learn Models, they are complementary tools to visualisation. Once you have made your questions sufficiently precise, you can use a model to answer them. Models are a fundamentally mathematical or computational tool, so they generally scale well.
Both Python and R are popular programming languages for Data Science. While R’s functionality is developed with statisticians in mind (think of R's strong data visualization capabilities!), Python is often praised for its easy-to-understand syntax.
Ross Ihaka and Robert Gentleman created the open-source language R in 1995 as an implementation of the S programming language. The purpose was to develop a language that focused on delivering a better and more user-friendly way to do data analysis, statistics and graphical models.
Python was created by Guido Van Rossem in 1991 and emphasizes productivity and code readability. Programmers that want to delve into data analysis or apply statistical techniques are some of the main users of Python for statistical purposes.
As a data scientist it’s your job to pick the language that best fits the needs. Some questions that can help you:
What problems do you want to solve?
What are the net costs for learning a language?
What are the commonly used tools in your field?
What are the other available tools and how do these relate to the commonly used tools?
When and how to use R?
R is mainly used when the data analysis task requires standalone computing or analysis on individual servers. It’s great for exploratory work, and it's handy for almost any type of data analysis because of the huge number of packages and readily usable tests that often provide you with the necessary tools to get up and running quickly. R can even be part of a big data solution.
When getting started with R, a good first step is to install the amazing RStudio IDE. Once this is done, we recommend you to have a look at the following popular packages:
dplyr, plyr and data.table to easily manipulate packages,
stringr to manipulate strings,
zoo to work with regular and irregular time series,
ggvis, lattice, and ggplot2 to visualize data, and
caret for machine learning
When and how to use Python?
You can use Python when your data analysis tasks need to be integrated with web apps or if statistics code needs to be incorporated into a production database. Being a fully fledged programming language, it’s a great tool to implement algorithms for production use.
While the infancy of Python packages for data analysis was an issue in the past, this has improved significantly over the years. Make sure to install NumPy /SciPy (scientific computing) and pandas (data manipulation) to make Python usable for data analysis. Also have a look at matplotlib to make graphics, and scikit-learn for machine learning.
Unlike R, Python has no clear “winning” IDE. We recommend you to have a look at Spyder, IPython Notebook and Rodeo to see which one best fits your needs.
* We recommend all our students to learn both the programming languages and use them where appropriate since many Data Science teams today are bilingual, leveraging both R and Python in their work.