Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Duke University

Data Science with NumPy, Sets, and Dictionaries

Duke University via Coursera


Become proficient in NumPy, a fundamental Python package crucial for careers in data science. This comprehensive course is tailored to novice programmers aspiring to become data scientists, software developers, data analysts, machine learning engineers, data engineers, or database administrators. Starting with foundational computer science concepts, such as object-oriented programming and data organization using sets and dictionaries, you'll progress to more intricate data structures like arrays, vectors, and matrices. Hands-on practice with NumPy will equip you with essential skills to tackle big data challenges and solve data problems effectively. You'll write Python programs to manipulate and filter data, as well as create useful insights out of large datasets. By the end of the course, you'll be adept at summarizing datasets, such as calculating averages, minimums, and maximums. Additionally, you'll gain advanced skills in optimizing data analysis with vectorization and randomizing data. Throughout your learning journey, you'll use many kinds of data structures and analytic techniques for a variety of data science challenges , including mathematical operations, text file analysis, and image processing. Stepwise, guided assignments each week will reinforce your skills, enabling you to solve problems and draw data-driven conclusions independently. Prepare yourself for a rewarding career in data science by mastering NumPy and honing your programming prowess. Start this transformative learning experience today!


  • Sets and Dictionaries: Storing and Working with Data
    • This week, you will learn the basics of object oriented programming as well as how to use sets and dictionaries to store and work with data in Python. You will apply these concepts with Python to perform some mathematical operations and analytical tasks, including solving geometric problems with circles and counting words in a document.
  • NumPy and Vectors
    • This week, you will learn how to utilize NumPy--one of the most useful Python packages we use in data science--as well as learn additional data structures, arrays, beginning with the simplest type of an array, a vector. With NumPy and your new understanding of vectors, you will develop histograms as well as analyze household income distribution data in the United States, drawing your own data-driven conclusions.
  • Matrices and Arrays
    • This week, you will first learn how NumPy handles data in your program using views and copies of your data. You will then learn how to work with more complex arrays called matrices, as well as how you can subset, filter, and modify data in matrices. Finally, you will write your own programs to manipulate data matrices and report your results for a given dataset.
  • Summarizing Datasets, Performance Optimization, and Data Randomization
    • You will learn this week how to use NumPy to summarize data from matrices (e.g., calculating averages, minimums, maximums, etc.) as well as how to begin to analyze and manipulate image data. You will also explore two new data science techniques: how to make your analysis of data matrices more computationally efficient (vectorization) and how to randomize data (randomization).

Taught by

Genevieve M. Lipp, Nick Eubank, Kyle Bradbury and Andrew D. Hilton


Start your review of Data Science with NumPy, Sets, and Dictionaries

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.