Objective of this course is to impart knowledge on use of text mining techniques for deriving business intelligence to achieve organizational goals. Use of Python based software platform to build, assess, and compare models based on real datasets and cases with an easy-to-follow learning curve.
INTENDED AUDIENCE: UG & PG engineering students: all branches MBA students Professionals working in or aspiring for Business Analyst, Data Analyst, Data Scientist, and Data Engineer rolesPREREQUISITES: Relevant sessions from the courses Business Analytics & Data Mining Modelling Using R Parts I and IIINDUSTRY SUPPORT: Big Data companies, Analytics & Consultancy companies, Companies with Analytics Division
COURSE LAYOUT Week 1: Introductory overview of Text Mining- Introductory Thoughts- Data Mining vs. Text Mining- Text Mining and Text Characteristics- Predictive Text Analytics- Text Mining Problems- Prediction & Evaluation- Python as a Data Science PlatformPython for Analytics- Introduction to Python Installation- Jupyter Notebook IntroductionWeek 2: Python Basics- Python Programming Features- Commands for common tasks and control- Essential Python programming concepts & language mechanicsBuilt in Capabilities of Python- Data structures: tuples, lists, dicts, and setsWeek 3: Built in Capabilities of Python- Functions, Namespaces, Scope, Local functions, Writing more reusable generic functions Week 4: Built in Capabilities of Python- Generators- Errors & Exception Handling- Working with filesNumerical Python- N-dimensional array objectsWeek 5: Numerical Python- Vectorized array operations- File management using arrays- Linear algebra operations- Pseudo-random number generation- Random walksPython pandas- Data structures: Series and DataFrameWeek 6: Python pandas- Applying functions and methods- Descriptive Statistics- Correlation and CovarianceWorking with Data in Python- Working with CSV, EXCEL files- Working with Web APIsWeek 7: Working with Data in Python- Filtering out missing data, Filling in the missing data, removing duplicates- Perform transformations based on mappings- Binning continuous variables- Random sampling and random reordering of rows- Dummy variables- String and text processing- Regular expressions- Categorical typeData Visualization using Python- Matplotlib Library- Plots & SubplotsWeek 8: Text mining modeling using NLTK- Text Corpus- Sentence Tokenization- Word Tokenization- Removing special Characters- Expanding contractions- Removing Stopwords- Correcting words: repeated characters- Stemming & lemmatization- Part of Speech Tagging- Feature Extraction- Bag of words model- TF-IDF model- Text classification problem- Building a classifier using support vector machine