Pop Quiz #16 (RE)

  • Due Apr 13, 2022 at 11:59pm
  • Points 10
  • Questions 6
  • Available after Apr 11, 2022 at 12am
  • Time Limit None
  • Allowed Attempts 2

Instructions

Review lecture notes week.10.ch16.ch22.ipynb

# RegEx Functions (Chapter 16.)

* The re module offers a set of functions that allows us to search a string for a match:
* findall  
    * Returns a list containing all matches
* split    
    * Returns a list where the string has been split at each match
* search    
    * Returns a Match object if there is a match anywhere in the string
    * group(), groups()
* sub    
    * Replaces one or many matches with a string

 

Metacharacters used in the regular expression

[]    A set of characters
    example:  "[a-m]"    a set of characters from "a" to "m"
\    Signals a special sequence (can also be used to escape special characters)    "\d"    
.    Any character (except newline character)    
    example:  "he..o"    
^    Starts with
    example:  "^hello"    
$    Ends with     
    example:  "world$"    
*    Zero or more occurrences    
    example:  "[a-zA-Z]*"    
+    One or more occurrences
    example: "[a-zA-Z0-9]*+"    
{}    Exactly the specified number of occurrences    
    example: "\d{2}"    
|    Either or    
    example: "falls|stays"    
()    Capture and group

Special Sequences used in regular expressions
* A special sequence is a \ followed by one of the characters in the list below, and has a special meaning:

* \A    
    * Returns a match if the specified characters are at the beginning of the string    
        * "\AThe"    
* \b    
    * Returns a match where the specified characters are at the beginning or at the end of a word
        * (the "r" in the beginning is making sure that the string is being treated as a "raw string")    
            * r"\bain"
            * r"ain\b"    
* \B    
    * Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word
    * (the "r" in the beginning is making sure that the string is being treated as a "raw string")    
        * r"\Bain"
        * r"ain\B"    
* \d    
    * Returns a match where the string contains digits (numbers from 0-9)    
        * "\d"    
* \D    
    * Returns a match where the string DOES NOT contain digits    
        * "\D"    
* \s    
    * Returns a match where the string contains a white space character    
    * "\s"    
* \S    
    * Returns a match where the string DOES NOT contain a white space character    
    * "\S"    
* \w    
    * Returns a match where the string contains any word characters 
    * (characters from a to Z, digits from 0-9, and the underscore _ character)    "\w"    
* \W    
    *  \Returns a match where the string DOES NOT contain any word characters    
        * "\W"    
* \Z    
    *  Returns a match if the specified characters are at the end of the string    
        * "Spain\Z"    

 Sets:    A set is a set of characters inside a pair of square brackets [] with a special meaning:

Set        Description    
[a|r|n]   Returns a match where one of the specified characters (a, r, or n) are present  
[a-n]     Returns a match for any lower case character, alphabetically between a and n  
[^arn]    Returns a match for any character EXCEPT a, r, and n    
[0123]    Returns a match where any of the specified digits (0, 1, 2, or 3) are present    
[0-9]     Returns a match for any digit between 0 and 9  
[0-5][0-9]    Returns a match for any two-digit numbers from 00 and 59    
[a-zA-Z]  Returns a match for any character alphabetically between a and z, lower case OR upper case  
[+]     In sets, +, *, ., |, (), $,{} has no special meaning, so [+] means: return a match for any + character in the string
Only registered, enrolled users can take graded quizzes