Skip to main content

Practice Exercises

Excerise 1: Basic Pattern Matching

Consider this example string:

The quick brown fox jumps over the lazy dog. This is outside (this is inside)
QuestionAnswer
Match the string "fox" and provide its range16-19
How many times does "is" appear in the string?4
Match the pattern "(this is inside)" and provide its range61-77

Excerise 2: Using OR Operator (pipe)

For the string:

The sun rises in the east and sets in the west. Birds sing in the morning or evening.
QuestionPattern
Match either "sun" or "moon"`sun
Match either "east" or "west"`east
Match either "morning" or "evening"`morning
Match either "rises", "sets", or "sing"`rises
Match either "The" or "Birds"`The

Excerise 3: Character Set, dot(.)

For this data:

Contact Information:
John Doe - john.doe@example.com - (555) 123-4567
Mary Smith - mary_smith@email.net - 555.987.6543
Tom Johnson - tom-johnson@company.org - (555)246-8910
Sarah Brown - sarah@brown.co.uk - +1-555-369-7412
Mike Wilson - mike.wilson@subdomain.example.edu - 555 741 0258
QuestionPattern
Match any single vowel[aeiou]
Match either "John" or "Tom"`John
Match any character that is NOT a digit[^0-9]
Match either "com" or "net" in email domains`com
Match any single digit in phone numbers[0-9]
Match any character between 'T' and 'm' in "Tom"T.m
Match any single uppercase letter[A-Z]
Match any character that is not a letter or number[^0-9a-zA-Z]

Excerise 4: Quantiers

Consider the below example string and Use the "The Curious Case of the Missing Code" text to answer the following questions.

The Curious Case of the Missing Code

John_Smith123 was panicking. It was 9:30 AM on April 15, 2025, and he had just realized that the crucial code files for Project-X2021 were missing from his laptop. Yesterday at 17:45, everything had been fine when he left the office at 42 Maple Street, Suite #301.

He quickly sent an email to his boss (anna.director@techcorp.com) and his team members (dev.team@techcorp.com):

Subject: URGENT - Missing Project Files
Body: Team, I can't locate the following files:
- main_v3.2.py
- config_prod.json
- api_keys.txt (IP: 192.168.1.100)

I've checked my backups from 2023-12-01 through 2025-03-15 but found nothing. Has anyone committed changes to the repository at http://git.techcorp.com/projects/x2021? My phone number is (555) 123-4567 if you need to reach me urgently. The project deadline is in 72 hours!

Lisa responded first at 9:42 AM: "I saved a copy at C:\Projects\Backup\X2021-backup.zip. The password is XB21-9$f5. You can also check with Mark who was working late yesterday."

John sighed with relief. Crisis averted! Now he needed to update the project documentation with proper file paths like /usr/local/bin/project-x/ for Linux users and C:\Program Files\Project-X\ for Windows users.

He made a note to call Lisa later at +1-555-987-6543 to thank her properly.

Create regular expressions that match exactly what's requested (nothing more, nothing less).

Basic Character Sets

  1. Create a pattern that matches all instances of dates in the format YYYY-MM-DD.
  2. Write a regex that finds all alphanumeric identifiers that contain both letters and numbers (like "John_Smith123" or "Project-X2021").
  3. Match all times in the HH:MM AM/PM format.

Predefined Character Classes

  1. Create a pattern using \d and \w to extract all phone numbers in the format (555) 123-4567 or +1-555-987-6543.
  2. Write a regex using \s and \S to find all file paths (both Windows and Linux style).
  3. Develop a pattern using \w, \d, and \s to match all file names with version numbers (like "main_v3.2.py").

Metacharacters and Alternation

  1. Use the pipe operator (|) to match either email addresses or web URLs.
  2. Create a pattern with the dot (.) metacharacter to find all text within parentheses.
  3. Write a regex that matches IP addresses like 192.168.1.100.

Combined Challenge

  1. Create a comprehensive pattern that extracts all forms of contact information (emails and phone numbers) from the text.

Solution

  1. Pattern to match dates in YYYY-MM-DD format:

    \d{4}-\d{2}-\d{2}

    Matches: "2023-12-01", "2025-03-15"

  2. Pattern for alphanumeric identifiers with both letters and numbers:

    [A-Za-z][A-Za-z0-9_]*\d+[A-Za-z0-9_]*

    Matches: "John_Smith123", "Project-X2021", "XB21-9$f5" (part of it)

  3. Pattern for times in HH:MM AM/PM format:

    \d{1,2}:\d{2}\sAM|\d{1,2}:\d{2}\sPM

    Matches: "9:30 AM", "9:42 AM"

  4. Pattern for phone numbers using \d and \w:

    \(\d{3}\)\s\d{3}-\d{4}|\+\d-\d{3}-\d{3}-\d{4}

    Matches: "(555) 123-4567", "+1-555-987-6543"

  5. Pattern for file paths using \s and \S:

    [A-Z]:\\[^\s]+|/\S+/

    Matches: "C:\Projects\Backup\X2021-backup.zip", "C:\Program Files\Project-X", "/usr/local/bin/project-x/"

  6. Pattern for filenames with version numbers using \w, \d, and \s:

    \w+_v\d+\.\d+\.\w+

    Matches: "main_v3.2.py"

  7. Pattern for email addresses or web URLs using pipe operator:**

    [a-zA-Z0-9_.]+@[a-zA-Z0-9_.]+\.[a-z]+|http://[^\s]+

    Matches: "anna.director@techcorp.com", "dev.team@techcorp.com", "http://git.techcorp.com/projects/x2021"

  8. Pattern with dot metacharacter to find text in parentheses:

    \(.*?\)

    Matches: "(555) 123-4567", "(IP: 192.168.1.100)"

  9. Pattern for IP addresses:

    \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}

    Matches: "192.168.1.100"

  10. Comprehensive pattern for contact information:

    [a-zA-Z0-9_.]+@[a-zA-Z0-9_.]+\.[a-z]+|\(\d{3}\)\s\d{3}-\d{4}|\+\d-\d{3}-\d{3}-\d{4}

    Matches all email addresses and phone numbers in the text

Excerise 6 - Boundary Matchers

These questions are based on the following example string:

Hello world! This is line one.
World, hello! This is line two.
HelloWorld is a single word.
The word "hello" appears in quotes.
This line ends with hello
hello starts this line and world ends it with world
com.example.domain is a domain name
user@example.com is an email address.
2023-05-15 is a date format.
The final line ends the entire text.

Questions on ^ (Caret) Boundary

  1. Write a regex pattern that matches any line beginning with the word "Hello".
  2. Write a regex pattern that matches any line beginning with either "Hello" or "hello".
  3. How many lines in the example string start with a capital letter?

Questions on $ (Dollar) Boundary

  1. Write a regex pattern that matches any line ending with the word "hello".
  2. How many lines in the example string end with a period (dot)?
  3. Write a regex pattern that matches any line ending with the exact word "world".

Questions on \b (Word Boundary)

  1. Write a regex pattern that matches the standalone word "hello" (case-insensitive) in the example text.
  2. How many times does the standalone word "world" (lowercase only) appear in the example text?
  3. Write a regex pattern that matches the word "is" only when it appears as a complete word.

Questions on \B (Non-word Boundary)

  1. Write a regex pattern that matches "World" only when it's part of another word without word boundaries.
  2. In the example text, what word contains "World" without word boundaries on either side?
  3. Write a regex that matches "example" when it's part of a larger word or token.

Questions on \A (Start of String)

  1. What single word would a regex pattern \AHello match in our example text?
  2. Write a regex that matches the first 5 characters of the entire example text.
  3. How does the pattern \AThe perform on our example text?

Questions on  \Z (End of String)

  1. Write a regex pattern that matches the last sentence of the entire example text.
  2. What's the last word in the entire example text that would be matched by \w+.\Z?
  3. Write a regex that matches the last 10 characters of the entire example text.

Solution

  1. Write a regex pattern that matches any line beginning with the word "Hello".
^Hello.+
  1. Write a regex pattern that matches any line beginning with either "Hello" or "hello".
^[Hh]ello.+
  1. How many lines in the example string start with a capital letter ?
^[A-Z].+
  1. Write a regex pattern that matches any line ending with the word "hello".
.+hello$
  1. How many lines in the example string end with a period (dot)?
.+\.$
  1. Write a regex pattern that matches any line ending with the exact word "world".
.+\bworld\b$
  1. Write a regex pattern that matches the standalone word "hello" (case-insensitive) in the example text.
\b[Hh]ello\b
  1. How many times does the standalone word "world" (lowercase only) appear in the example text?
\bworld\b
  1. Write a regex pattern that matches the word "is" only when it appears as a complete word.
\bis\b

  1. Write a regex pattern that matches "World" only when it's part of another word without word boundaries.
\BWorld\B\
  1. In the example text, what word contains "World" without word boundaries on either side?
\w+.+\BWorld\B\w+.+
  1. Write a regex that matches "example" when it's part of a larger word or token.
\Bexample\B
  1. What single word would a regex pattern \AHello match in our example text?

It matches word Hello in entire string

  1. Write a regex that matches the first 5 characters of the entire example text.
\A.{5}
  1. How does the pattern \AThe perform on our example text?

No pattern is identified

16.Write a regex pattern that matches the last sentence of the entire example text.

.+\Z

17.What's the last word in the entire example text that would be matched by \w+.\Z?

text.

18.Write a regex that matches the last 10 characters of the entire example text.

.{10}\Z

Warning

The dot (.) is a very powerful metacharacter that can create problems if not used properly, as it matches almost any character.


Source: Data Science Anywhere