2020 Python Regular Expressions (Regex) with Projects
Genre: eLearning | MP4 | Video: h264, 1280x720 | Audio: aac, 44100 Hz
Language: English | VTT | Size: 1.44 GB | Duration: 3.5 hours
- Expressions 1 3 3 – Regular Expression Tool Printable
- Expressions 1 3 3 – Regular Expression Tool Maker
- Expressions 1 3 3 – Regular Expression Tool Download
- Expressions 1 3 3 – Regular Expression Tool Tutorial
This article is part one in the series: 'Regular Expressions.' Read part two for more information on lookaheads, lookbehinds, and configuring the matching engine. To get a more visual look into how regular expressions work, try our visual java regex tester.You can also watch a video to see how the visual regex tester works. In short, to master regular expressions is to master your data. A regular expression is a series of characters that define a pattern. The pattern is then compared to a target string to see whether there are any matches to the pattern in the target string. Regular expressions are almost another language by itself.
What you'll learnRegular expression as a powerful data cleanup tool
Gain practical tips with hands-on projects
Understand potential performance issues and techniques to address them
Complements Machine Learning Skills
Requirements
All material and software instructions are covered in house keeping lecture.
Familiarity with a Programming Language
Description
*** UPDATE: OCT-2020. Five new assignments on lookahead, lookbehind and more ***
*** UPDATE: SEP-2020. New interactive coding videos, updated tools and practical tips on using regex for complex problems ***
*** UPDATE: NOV-2019. Subtitle/Closed Caption is now available for the course! I spent several hours cleaning up the closed caption text to provide you with high quality closed caption.***
Welcome to Python Regular Expressions Course!
In just a couple of hours, you will master regular expression language and learn internals of the regular expression engine
You will apply your new skills with four hands-on real-world projects
You will gain solid understanding on type of performance issues regex can run into, and techniques to address them
As part of resources in this course, you will get a high-quality quick reference guide, an interactive tool, all the source code and downloadable slides
Why Learn Regular Expressions?
Very often, the data that we need is not readily accessible or useful.
Data preparation and clean-up is often one of the most time-consuming activities in a software automation project.
Instead of spending time writing code for all this, you can specify data patterns of interest and let regular expression engine do the work for you
Regular Expression is cross-platform and you can learn the concepts once and use it in multiple programming languages and environment
Looking forward to seeing you in the course!
God Speed!
Who this course is for:
Data Scientists, Software Engineers and Developers
https://www.udemy.com/course/python-regular-expressions/
http://nitroflare.com/view/0027A739B385665/2020_Python_Regular_Expressions_%28Regex%29_with_Projects.part1.rar
http://nitroflare.com/view/458E17CB8A549CC/2020_Python_Regular_Expressions_%28Regex%29_with_Projects.part2.rar
https://rapidgator.net/file/8368d80bfb0c7982203cff04c4e61c6e/2020_Python_Regular_Expressions_(Regex)_with_Projects.part1.rar.html
https://rapidgator.net/file/82a38e4e0a0106c1bf9b272d708b5bb5/2020_Python_Regular_Expressions_(Regex)_with_Projects.part2.rar.html
https://uploadgig.com/file/download/7B659E602a69018e/2020_Python_Regular_Expressions_Regex_with_Projects.part1.rar
https://uploadgig.com/file/download/9773F68D8869bdeB/2020_Python_Regular_Expressions_Regex_with_Projects.part2.rar
If you've programmed in Perl or any other language with built-in regular-expression capabilities, then you probably know how much easier regular expressions make text processing and pattern matching. If you're unfamiliar with the term, a regular expression is simply a string of characters that defines a pattern used to search for a matching string. The AlertSite keyword match facility allows you to use the power of regular expressions to create complex pattern matches to monitor your sites.
Note: The regular expression feature is being offered to customers as a courtesy to provide expanded matching functionality. Please note that we do not offer technical support for the use of regular expressions. A verbose set of help and examples are provided below.
The following is a brief introduction to regular expression syntax to get you started.
- A simple match: Suppose you want to search for a string with the word 'cat' in it; your regular expression would simply be 'cat'. If your search is case-insensitive, the words 'catalog', 'Catherine', or 'sophisticated' would also match:
Regular expression: cat
Matches: cat, catalog, Catherine, sophisticated
- Period notation: To match a three-letter word starting with the letter 't' and ending with the letter 'n' as a regular expression, you can use a wildcard notation -- the period (.) character. The regular expression would then be 't.n' and would match 'tan', 'Ten', 'tin', and 'ton'; it would also match 't#n', 'tpn', and even 't n', as well as many other nonsensical words. This is because the period character matches everything, including the space, the tab character, and even line breaks:
Regular expression: t.n
Matches: tan, Ten, tin, ton, t n, t#n, tpn, etc.
Expressions 1 3 3 – Regular Expression Tool Printable
- The bracket notation: To solve the problem of the period's indiscriminate matches, you can specify characters you consider meaningful with the bracket ('[ ]') expression, so that only those characters would match the regular expression. Thus, 't[aeio]n' would just match 'tan', 'Ten', 'tin', and 'ton'. 'Toon' would not match because you can only match a single character within the bracket notation:
Regular expression: t[aeio]n
Matches: tan, Ten, tin, ton
- The OR operator: If you want to match 'toon' in addition to all the words matched in the previous section, you can use the '|' notation, which is basically an OR operator. To match 'toon', use the regular expression 't(a|e|i|o|oo)n'. You cannot use the bracket notation here because it will only match a single character. Instead, use parentheses '( )':
Regular expression: t(a|e|i|o|oo)n
Matches: tan, Ten, tin, ton, toon
As you can see, parentheses may be used for grouping contiguous sets of character patterns together with an optional '|' operator to provide alternative selections during matching. That is, any of the alternative patterns within the group may produce a match (with left to right precedence):
Regular expression: Good (morning|afternoon|evening)!
Matches: Good morning!, Good afternoon!, Good evening!
- The quantifier notations: You may also want append quantifier notations to specify how often a particular character or group of characters should repeat. For example, you can use the '*' notation to specify that the previous character should match zero or more times:
Regular expression: Surprise!*
Matches: Surprise, Surprise!, Surprise!!, Surprise!!!, and so on.
If the '*' notation is combined with the wildcard (period) character, it will match all (zero or more) characters, including spaces, tabs and line breaks between two separate notations:
Regular expression: Hello.*There!
Matches: HelloThere!, Hello There!, Hello everyone over There!, and so on.
The following quantifier notations may be used to determine how many times a given notation to the immediate left of the quantifier notation should repeat itself: Betterzip 4 0 1 – archiving tool.
Quantifier notations:
* | 0 or more times |
+ | 1 or more times |
? | 0 or 1 time |
{n} | Exactly n number of times |
{n,} | At least n times |
{n,m} | At least n but not more than m times |
- Template matching: You may also want to match a particular format or 'template' of text, rather than a literal pattern of static characters. Let's say you want to match a generic social security number pattern. The format for US social security numbers is 999-99-9999. The regular expression you would use to match this is as follows:
Regular expression: [0-9]{3}-[0-9]{2}-[0-9]{4}
Matches: All social security numbers of the form 123-12-1234
In regular expressions, the hyphen ('-') notation has special meaning; it indicates a (sequential) range of possible characters such as A-Z, a-z, or 0-9. Thus, the notation [0-9]{3} in the first element of the pattern matches any string of exactly 3 digits, each of which may range from 0-9. This is followed by an 'escaped' hyphen character. You must escape the '-' character with a forward slash ('') when matching literal hyphens in a pattern because of its special meaning within a regular expression.
If, in your template pattern, you wish to make the hyphen optional -- if, say, you consider both 999-99-9999 and 999999999 acceptable formats -- you can use the '?' quantifier notation as shown:
Regular expression: [0-9]{3}-?[0-9]{2}-?[0-9]{4}
Matches: All social security numbers of the forms 123-12-1234 and 123121234
Let's take a look at another example. One format for US car plate numbers consists of four numeric characters followed by two letters. Thus, a regular expression might first include a '[0-9]{4}' numeric part, followed by a '[A-Z]{2}' textual part,:
Regular expression: [0-9]{4}[A-Z]{2}
Matches: US car plate numbers of the format: 8836KV
- The NOT notation: The '^' notation is also called the NOT notation. If used in brackets, '^' indicates the character(s) you don't want to match. For example, the expression below matches all words except those starting with the letter x:
Regular expression: b[^xy][a-z]+b
Matches: All (lowercase) words except those that start with the letter x or y.
In the above example, the '+' quantifier is used to specify one or more characters in range of a-z, and the 'b' notation is used to match at word boundaries.
- Other miscellaneous notations: To make life easier, some shorthand notations for commonly used regular expressions also exist, as shown below:
Commonly used notations:
d | [0-9] |
D | [^0-9] |
w | [A-Z0-9] |
W | [^A-Z0-9] |
s | [ tnrf] |
S | [^ tnrf] |
To illustrate, we can use 'd' for all instances of '[0-9]' we used before, as was the case with our social security number expressions. The revised regular expression is:
Regular expression: d{3}-d{2}-d{4}
Matches: All social security numbers of the form 123-12-1234
Or, suppose you want to match an IP address. It consists of four 1-byte segments (octets), each segment has a value between 0 and 255 and is separated from the others by a period. Thus, in each individual segment of the IP address, you have at least one and at most three digits. The following regular expression might be used to match just such a construct:
Regular expression: d{1,3}.d{1,3}.d{1,3}.d{1,3}
Matches: IP addresses that consist of four 3-digit segments, each with values between 0 and 255.
You need to escape the period character because you literally want it to be there; you do not want it read in terms of its special meaning in regular expression syntax, as explained earlier. Other special characters that need to be escaped when used in a literal match are discussed in the 'Additional Considerations' section below.
Perhaps you're trying to match a particular type of date string. A typical date format might be: June 26, 1951. One example of a regular expression to match strings of this type would be:
Regular expression: [A-Za-z]+s+[0-9]{1,2},s*[0-9]{4}
Matches: All dates with the format of Month DD, YYYY
Broken down, the first element of the expression ('[A-Za-z]+') matches the Month (rather, a word consisting of at least 1 alphabetic character), followed by a mandatory space ('s+'), followed by the day of the month up to 2 digits ('[0-9]{1,2}'), followed by a mandatory comma, followed by an optional space ('s*') followed by a four-digit year field ('[0-9]{4}'). This pattern may be adequate, but you might also choose to enclose the full set of month names within a parenthetical grouping, separate by the '|' notation, such as (January|February|March .. ) instead of the weaker '[A-Za-z]+' notation.
Note that the 's' is shorthand notation for whitespace, and matches either a blank space, tab, newline, return, or form-feed character.
- More special character notations: The following table defines additional notations that may be useful in your regular expression pattern matches:
Additional Special Character Definitions:
Quote the next metacharacter | |
^ | Match the beginning of the line |
. | Match any character |
$ | Match the end of the line |
| | Alternation (OR) |
() | Grouping |
[] | Character class |
w | Match a 'word' character (alphanumerics and '_' chars) |
W | Match a non-word character |
s | Match a whitespace character |
S | Match a non-whitespace character |
d | Match a digit character |
D | Match a non-digit character |
b | Match a word boundary |
B | Match a non-(word boundary) |
A | Match only at beginning of string (same as ^) |
Z | Match only at end of string (same as $) |
Site Keyword Matching
The AlertSite keyword matching facility treats an entire web page as one continuous line of text. Therefore, both the Plain Text and Regular Expression keyword match types permit matches across multiple lines of HTML source text. Typical HTML source text usually includes plain text mixed together with HTML tags and attributes, and may optionally include snippets of programmatic scripting code.
It may be possible for your regular expression to satisfy multiple pattern matches on the same web page. Which pattern ultimately gets matched may or may not be what you desire. For example, you may only want to consider a match successful if the keyword or pattern is found at a particular location on the web page, or only if it appears on the page along with another keyword located somewhere else on the same page.
Let's say that the following HTML source code sample was retrieved from viewing the source of a page on your web site:
<strong>CompanyName</strong> : My Company Website .. additional HTML source code .. .. additional HTML source code .. Copyright©1999-2004 CompanyName. .. additional HTML source code .. |
If you wanted to create a regular expression to match 'CompanyName', but only when it appears in the title of your web page, you might use the following regular expression:
Regular expression: .*CompanyName.*
Matches: Any occurrence of CompanyName between the and HTML tags.
Similarly, if you wanted your match to require multiple keywords from different areas of the page, say for example, CompanyName followed somewhere by 'Login Successful', it might look something like this:
Regular expression: CompanyName.*Login Successful
Matches: The string 'CompanyName', followed by any number of characters (all the 'middle stuff'), followed by the string 'Login Sucessful'.
In the above examples, the '.*' quantifier will generally match as much of the source text as possible while still allowing the whole regular expression to match. Quantifiers that grab as much text as possible are called maximal match or greedy quantifiers (see Quantifier notations above).
But there are times when we would like these quantifiers to match a minimal piece of a text, rather than a maximal piece. The minimal match or non-greedy quantifiers are: ??, *?, +?, and {}?. These are the same standard quantifiers but with a ? appended to them. They have the following meanings:
?? | Match 0 or 1 times. Try 0 first, then 1 |
*? | Match 0 or more times, but as few as possible |
+? | Match 1 or more times, but as few as possible |
{n}? | Match exactly n times. Equivalent to {n} |
{n,}? | Match at least n times, but as few as possible |
{n,m}? | Match at least n but not more than m times, as few as possible |
Since a regular expression can match a string in several different ways, we can use some of the following principles to predict which way the regular expression will match:
Expressions 1 3 3 – Regular Expression Tool Maker
Principle 1: Taken as a whole, any regular expression will be matched at the earliest possible position in the string.
Principle 2: In an alternation a|b|c.., the leftmost alternative that allows a match for the whole regular expression will be the one used.
Principle 3: The maximal matching quantifiers ?, *, + and {n,m} will in general match as much of the string as possible while still allowing the whole regular expression to match.
Principle 4: If there are two or more elements in a regular expression, the leftmost greedy quantifier, if any, will match as much of the string as possible while still allowing the whole regular expression to match. The next leftmost greedy quantifier, if any, will try to match as much of the string remaining available to it as possible, while still allowing the whole regular expression to match. And so on, until all the regular expression elements are satisfied.
Express Yourself
Now that you've been introduced to the pattern matching power of regular expressions, it's up to you to decide whether to use either a Plain Text match or the more powerful Regular Expression type. When used appropriately, regular expressions can help a great deal in constructing complex pattern matches for your site monitoring needs. This tutorial touches only briefly on the full capabilities of regular expression pattern matching. For additional information, you may wish to consult one of the many widely available regular expression tutorials on the internet.
Advanced Pattern Matching:
In order to handle more complex pattern matching requirements, you may choose to use some of the more advanced features of regular expression syntax such as subpattern location independence and lookahead assertions. Suggested solutions to some of these situations are presented below. For more detailed information, please consider reviewing an online tutorial on regular expression syntax.
- Location Independence: You might want to construct patterns where multiple search subpatterns may appear anywhere on the page, in any order. Here are some potential solutions (where ALPHA and BETA are your keywords or sub-patterns):
Regular expression: ALPHA|BETA
Matches: Any occurrence of either ALPHAorBETA, anywhere on the page (overlapping permitted).
Expressions 1 3 3 – Regular Expression Tool Download
Regular expression: (?=.*ALPHA).*BETA
Matches: When both ALPHAandBETA occur, anywhere on the page (overlapping permitted).
Regular expression: (?:^.*ALPHA.*BETA)|(?:^.*BETA.*ALPHA)
Matches: When both ALPHAandBETA occur, anywhere on the page (non-overlapping).
- Lookaround Assertions: You might want to construct patterns which make use of 'look-ahead' and 'look-behind' assertions. Here are some potential solutions (where ALPHA and BETA are your keywords or sub-patterns):
Regular expression: ALPHA(?!BETA)
Matches: Any occurrence of ALPHA which is not followed by BETA (negative look-ahead assertion).
Regular expression: (?<=ALPHA)BETA
Matches: Any occurrence of BETA that is preceeded by ALPHA (positive look-behind assertion).
Case Sensitivity: To make your match criteria wholly or partially case insensitive, you may embed the (?i) and(?i:pattern) notations within your regular expressions, respectively. Here are some potential solutions (where ALPHA and BETA are your keywords or sub-patterns):
Regular expression: (?i)alpha-beta
Matches: Any occurrence of ALPHA and BETA, regardless of case, separated by a dash (e.g., alpha-beta, ALPHA-BETA, aLpHa-BetA, etc).
Regular expression: (?i:alpha)-BETA
Matches: Any occurrence of ALPHA regardless of case, followed by a dash and an uppercase BETA (e.g., aLpHa-BETA, Alpha-BETA, alphA-BETA, etc).
Additional Considerations
Some other things you may want to consider when constructing your regular expressions:
Expressions 1 3 3 – Regular Expression Tool Tutorial
You should not enclose your regular expression patterns between forward slashes, as they are already assumed.
The following special characters should be escaped (using a ' backslash) if you are trying to literally match these characters: ^ . $ | ( ) [ ] * + ? { } ,
Regular expression translation and substitution features are not used by the AlertSite keyword matching facility and thus are not supported.