Learn about string manipulation and become a master at using regular expressions.
As a data scientist, you will encounter many situations where you will need to extract key information from huge corpora of text, clean messy data containing strings, or detect and match patterns to find useful words. All of these situations are part of text mining and are an important step before applying machine learning algorithms. This course will take you through understanding compelling concepts about string manipulation and regular expressions. You will learn how to split strings, join them back together, interpolate them, as well as detect, extract, replace, and match strings using regular expressions. On the journey to master these skills, you will work with datasets containing movie reviews or streamed tweets that can be used to determine opinion, as well as with raw text scraped from the web.
Basic Concepts of String Manipulation
-Start your journey into the regular expression world! From slicing and concatenating, adjusting the case, removing spaces, to finding and replacing strings. You will learn how to master basic operation for string manipulation using a movie review dataset.
-Following your journey, you will learn the main approaches that can be used to format or interpolate strings in python using a dataset containing information scraped from the web. You will explore the advantages and disadvantages of using positional formatting, embedding expressing inside string constants, and using the Template class.
Regular Expressions for Pattern Matching
-Time to discover the fundamental concepts of regular expressions! In this key chapter, you will learn to understand the basic concepts of regular expression syntax. Using a real dataset with tweets meant for sentiment analysis, you will learn how to apply pattern matching using normal and special characters, and greedy and lazy quantifiers.
Advanced Regular Expression Concepts
-In the last step of your journey, you will learn more complex methods of pattern matching using parentheses to group strings together or to match the same text as matched previously. Also, you will get an idea of how you can look around expressions.