This course, Machine Learning for Accounting with Python, introduces machine learning algorithms (models) and their applications in accounting problems. It covers classification, regression, clustering, text analysis, time series analysis. It also discusses model evaluation and model optimization. This course provides an entry point for students to be able to apply proper machine learning models on business related datasets with Python to solve various problems.
Accounting Data Analytics with Python is a prerequisite for this course. This course is running on the same platform (Jupyter Notebook) as that of the prerequisite course. While Accounting Data Analytics with Python covers data understanding and data preparation in the data analytics process, this course covers the next two steps in the process, modeling and model evaluation. Upon completion of the two courses, students should be able to complete an entire data analytics process with Python.
INTRODUCTION TO THE COURSE
In this module, you will become familiar with the course, your instructor and your classmates, and our learning environment. This orientation will also help you obtain the technical skills required to navigate and be successful in this course.
MODULE 1: INTRODUCTION TO MACHINE LEARNING
This module provides the basis for the rest of the course by introducing the basic concepts behind machine learning, and, specifically, how to perform machine learning by using Python and the scikit-learn machine learning module. First, you will learn about the basic types of machine learning. Next, you will learn an important step before applying machine learning algorithms, data pre-processing. Finally, you will learn how to leverage different types of machine learning algorithms in a Python script.
MODULE 2: FUNDAMENTAL ALGORITHMS I
This module introduces three machine learning algorithms. First, you will learn how linear regression can be considered a machine learning problem with parameters that must be determined computationally by minimizing a cost function. Next, you will learn Logistic Regression. Despite its name, Logistic Regression is a classification algorithm. Lastly, you will learn Decision Tree, which is a popular machine learning algorithm that can be used for both classification and regression. This module will dive deeper into the concept of machine classification, where algorithms learn from existing, labeled data to classify new, unseen data into specific categories; and, the concept of machine regression, where algorithms learn a model from data to make predictions for new, unseen continuous data. While these algorithms all differ in their mathematical underpinnings, they are often used for classifying numerical, text, and image data or performing regression in a variety of domains.
MODULE 3: Fundamental Algorithms II
This module introduces three more machine learning algorithms, k-nearest neighbors, support vector machine and random forest. All of them can be used for either classification or regression tasks.
MODULE 4: MODEL EVALUATION
Model Evaluation is an integral component of any data analytics project. It helps to find out how well the model will work on predicting future (out-of-sample) data. This module introduces basic model evaluation metrics for machine learning algorithms. First, the evaluation metrics for regression is presented. Next the metrics and technics to evaluate classification are introduced.
MODULE 5: MODEL OPTIMIZATION
This module introduces the techniques of model optimization. First, the basic techniques of feature selection is presented. Next, the technique of cross-validation is introduced, which can provide a more accurate evaluation on models. Finally, model selection, or hyperparameter tunning, which uses cross-validation, is introduced.
MODULE 6: INTRODUCTION TO TEXT ANALYSIS
In this module, you will start applying your new machine learning skills to an exciting data analytic topic: Text Analysis. First, we will review the process by which textual data is converted into numerical data that can be processed by a computer. Along with this are a number of new concepts that focus on manipulating these data to generate improved machine learning predictions. Second, we will apply machine learning algorithms, specifically classification, to text data. Finally, we will explore the more advanced concepts in text analysis and introduce a special kind of text classification: sentiment analysis.
MODULE 7: INTRODUCTOIN TO CLUSTERING
This module introduces clustering, where data points are assigned to sub groups of points based on some specific properties, such as spatial distance or the local density of points. While humans often find clusters visually with ease in a given data sets, computationally the problem is more challenging. This module starts by exploring the basic ideas behind this unsupervised learning technique. One of the most popular clustering techniques, K-means, is introduced. Next, a K-means case study is provided. Finally the density-based DB-SCAN technique is introduced.
MODULE 8: INTRODUCTION TO TIME SERIES DATA
This module introduces time and date data, which provide unique learning opportunities and challenges. First, we will discuss how to properly handle time and date features within a Python program. Next, we will extend this discussion to handle data indexed by time and date information, which is known as time series data.