Practice Exercises
Excerise 1: Basic Pattern Matching
Consider this example string:
The quick brown fox jumps over the lazy dog. This is outside (this is inside)
| Question | Answer |
|---|---|
| Match the string "fox" and provide its range | 16-19 |
| How many times does "is" appear in the string? | 4 |
| Match the pattern "(this is inside)" and provide its range | 61-77 |
Excerise 2: Using OR Operator (pipe)
For the string:
The sun rises in the east and sets in the west. Birds sing in the morning or evening.
| Question | Pattern |
|---|---|
| Match either "sun" or "moon" | `sun |
| Match either "east" or "west" | `east |
| Match either "morning" or "evening" | `morning |
| Match either "rises", "sets", or "sing" | `rises |
| Match either "The" or "Birds" | `The |
Excerise 3: Character Set, dot(.)
For this data:
Contact Information:
John Doe - john.doe@example.com - (555) 123-4567
Mary Smith - mary_smith@email.net - 555.987.6543
Tom Johnson - tom-johnson@company.org - (555)246-8910
Sarah Brown - sarah@brown.co.uk - +1-555-369-7412
Mike Wilson - mike.wilson@subdomain.example.edu - 555 741 0258
| Question | Pattern |
|---|---|
| Match any single vowel | [aeiou] |
| Match either "John" or "Tom" | `John |
| Match any character that is NOT a digit | [^0-9] |
| Match either "com" or "net" in email domains | `com |
| Match any single digit in phone numbers | [0-9] |
| Match any character between 'T' and 'm' in "Tom" | T.m |
| Match any single uppercase letter | [A-Z] |
| Match any character that is not a letter or number | [^0-9a-zA-Z] |
Excerise 4: Quantiers
Consider the below example string and Use the "The Curious Case of the Missing Code" text to answer the following questions.
The Curious Case of the Missing Code
John_Smith123 was panicking. It was 9:30 AM on April 15, 2025, and he had just realized that the crucial code files for Project-X2021 were missing from his laptop. Yesterday at 17:45, everything had been fine when he left the office at 42 Maple Street, Suite #301.
He quickly sent an email to his boss (anna.director@techcorp.com) and his team members (dev.team@techcorp.com):
Subject: URGENT - Missing Project Files
Body: Team, I can't locate the following files:
- main_v3.2.py
- config_prod.json
- api_keys.txt (IP: 192.168.1.100)
I've checked my backups from 2023-12-01 through 2025-03-15 but found nothing. Has anyone committed changes to the repository at http://git.techcorp.com/projects/x2021? My phone number is (555) 123-4567 if you need to reach me urgently. The project deadline is in 72 hours!
Lisa responded first at 9:42 AM: "I saved a copy at C:\Projects\Backup\X2021-backup.zip. The password is XB21-9$f5. You can also check with Mark who was working late yesterday."
John sighed with relief. Crisis averted! Now he needed to update the project documentation with proper file paths like /usr/local/bin/project-x/ for Linux users and C:\Program Files\Project-X\ for Windows users.
He made a note to call Lisa later at +1-555-987-6543 to thank her properly.
Create regular expressions that match exactly what's requested (nothing more, nothing less).
Basic Character Sets
- Create a pattern that matches all instances of dates in the format YYYY-MM-DD.
- Write a regex that finds all alphanumeric identifiers that contain both letters and numbers (like "John_Smith123" or "Project-X2021").
- Match all times in the HH:MM AM/PM format.
Predefined Character Classes
- Create a pattern using \d and \w to extract all phone numbers in the format (555) 123-4567 or +1-555-987-6543.
- Write a regex using \s and \S to find all file paths (both Windows and Linux style).
- Develop a pattern using \w, \d, and \s to match all file names with version numbers (like "main_v3.2.py").
Metacharacters and Alternation
- Use the pipe operator (|) to match either email addresses or web URLs.
- Create a pattern with the dot (.) metacharacter to find all text within parentheses.
- Write a regex that matches IP addresses like 192.168.1.100.
Combined Challenge
- Create a comprehensive pattern that extracts all forms of contact information (emails and phone numbers) from the text.
Solution
-
Pattern to match dates in YYYY-MM-DD format:
\d{4}-\d{2}-\d{2}Matches: "2023-12-01", "2025-03-15"
-
Pattern for alphanumeric identifiers with both letters and numbers:
[A-Za-z][A-Za-z0-9_]*\d+[A-Za-z0-9_]*Matches: "John_Smith123", "Project-X2021", "XB21-9$f5" (part of it)
-
Pattern for times in HH:MM AM/PM format:
\d{1,2}:\d{2}\sAM|\d{1,2}:\d{2}\sPMMatches: "9:30 AM", "9:42 AM"
-
Pattern for phone numbers using \d and \w:
\(\d{3}\)\s\d{3}-\d{4}|\+\d-\d{3}-\d{3}-\d{4}Matches: "(555) 123-4567", "+1-555-987-6543"
-
Pattern for file paths using \s and \S:
[A-Z]:\\[^\s]+|/\S+/Matches: "C:\Projects\Backup\X2021-backup.zip", "C:\Program Files\Project-X", "/usr/local/bin/project-x/"
-
Pattern for filenames with version numbers using \w, \d, and \s:
\w+_v\d+\.\d+\.\w+Matches: "main_v3.2.py"
-
Pattern for email addresses or web URLs using pipe operator:**
[a-zA-Z0-9_.]+@[a-zA-Z0-9_.]+\.[a-z]+|http://[^\s]+Matches: "anna.director@techcorp.com", "dev.team@techcorp.com", "http://git.techcorp.com/projects/x2021"
-
Pattern with dot metacharacter to find text in parentheses:
\(.*?\)Matches: "(555) 123-4567", "(IP: 192.168.1.100)"
-
Pattern for IP addresses:
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}Matches: "192.168.1.100"
-
Comprehensive pattern for contact information:
[a-zA-Z0-9_.]+@[a-zA-Z0-9_.]+\.[a-z]+|\(\d{3}\)\s\d{3}-\d{4}|\+\d-\d{3}-\d{3}-\d{4}Matches all email addresses and phone numbers in the text
Excerise 6 - Boundary Matchers
These questions are based on the following example string:
Hello world! This is line one.
World, hello! This is line two.
HelloWorld is a single word.
The word "hello" appears in quotes.
This line ends with hello
hello starts this line and world ends it with world
com.example.domain is a domain name
user@example.com is an email address.
2023-05-15 is a date format.
The final line ends the entire text.
Questions on ^ (Caret) Boundary
- Write a regex pattern that matches any line beginning with the word "Hello".
- Write a regex pattern that matches any line beginning with either "Hello" or "hello".
- How many lines in the example string start with a capital letter?
Questions on $ (Dollar) Boundary
- Write a regex pattern that matches any line ending with the word "hello".
- How many lines in the example string end with a period (dot)?
- Write a regex pattern that matches any line ending with the exact word "world".
Questions on \b (Word Boundary)
- Write a regex pattern that matches the standalone word "hello" (case-insensitive) in the example text.
- How many times does the standalone word "world" (lowercase only) appear in the example text?
- Write a regex pattern that matches the word "is" only when it appears as a complete word.
Questions on \B (Non-word Boundary)
- Write a regex pattern that matches "World" only when it's part of another word without word boundaries.
- In the example text, what word contains "World" without word boundaries on either side?
- Write a regex that matches "example" when it's part of a larger word or token.
Questions on \A (Start of String)
- What single word would a regex pattern \AHello match in our example text?
- Write a regex that matches the first 5 characters of the entire example text.
- How does the pattern \AThe perform on our example text?
Questions on \Z (End of String)
- Write a regex pattern that matches the last sentence of the entire example text.
- What's the last word in the entire example text that would be matched by \w+.\Z?
- Write a regex that matches the last 10 characters of the entire example text.
Solution
- Write a regex pattern that matches any line beginning with the word "Hello".
^Hello.+
- Write a regex pattern that matches any line beginning with either "Hello" or "hello".
^[Hh]ello.+
- How many lines in the example string start with a capital letter ?
^[A-Z].+
- Write a regex pattern that matches any line ending with the word "hello".
.+hello$
- How many lines in the example string end with a period (dot)?
.+\.$
- Write a regex pattern that matches any line ending with the exact word "world".
.+\bworld\b$
- Write a regex pattern that matches the standalone word "hello" (case-insensitive) in the example text.
\b[Hh]ello\b
- How many times does the standalone word "world" (lowercase only) appear in the example text?
\bworld\b
- Write a regex pattern that matches the word "is" only when it appears as a complete word.
\bis\b
- Write a regex pattern that matches "World" only when it's part of another word without word boundaries.
\BWorld\B\
- In the example text, what word contains "World" without word boundaries on either side?
\w+.+\BWorld\B\w+.+
- Write a regex that matches "example" when it's part of a larger word or token.
\Bexample\B
- What single word would a regex pattern \AHello match in our example text?
It matches word Hello in entire string
- Write a regex that matches the first 5 characters of the entire example text.
\A.{5}
- How does the pattern \AThe perform on our example text?
No pattern is identified
16.Write a regex pattern that matches the last sentence of the entire example text.
.+\Z
17.What's the last word in the entire example text that would be matched by \w+.\Z?
text.
18.Write a regex that matches the last 10 characters of the entire example text.
.{10}\Z
Warning
The dot (.) is a very powerful metacharacter that can create problems if not used properly, as it matches almost any character.
Source: Data Science Anywhere