How to work with regular expressions in Python

Feb 26, 2023 | Python How To’s

Regular expressions are a powerful tool in programming where you try to identify patterns in various things programmatically.

In this blog post, we will see what regular expressions are, how to use them in Python, and give examples of how they can be used to solve real-world problems.

How to use Python Regular Expressions

What are regular expressions?

  • A regular expression, also known as a regex, is a sequence of characters that define a search pattern.
  • Regular expressions are used to search, replace, and manipulate text based on specific rules.
  • Regular expressions can be used to match patterns in text, such as email addresses, phone numbers, and URLs.

For example, let’s say we have a list of email addresses and we want to extract only the domain names.

e.g. Joe@company.com then we would only want to get company.com as the domain name out of it.

We can use regular expressions in this case to match the domain names and extract them.

How to use regular expressions in Python

Python has a built-in module called re that provides support for regular expressions.

The re module contains functions for searching, replacing, and manipulating strings based on regular expressions.

To use regular expressions in Python, we first need to import the re module:

import re

Once we have imported the re module, we can use its functions to work with regular expressions.

Basic regular expressions

A regular expression can be a simple string, such as “hello world”, or it can be more complex, such as “^hello\sworld$”.

The “^hello\sworld$” is a regular expression that matches the string “hello world” only if it appears at the beginning of a line (^) and is followed by a space character (\s) and the string “world”, and then appears at the end of a line ($).

Here are some basic regular expressions that can be used in Python:

  • . – Matches any single character except a newline character.
  • ^ – Matches the beginning of a line.
  • $ – Matches the end of a line.
  • * – Matches zero or more occurrences of the preceding character.
  • + – Matches one or more occurrences of the preceding character.
  • ? – Matches zero or one occurrences of the preceding character.
  • [] – Matches any one of the characters enclosed in the square brackets.
  • () – Groups multiple characters together.

Do not worry if you did not understand what exactly these mean yet, we are going to look at examples for each of these and by end of this post you should be able to have a good grasp on the regular expressions and how to apply them!

Let’s look at some examples now.

Searching for a pattern in a string

To search for a pattern in a string using regular expressions, we can use the search() function from the re module.

The search() function *always* returns a match object if the pattern is found, or None if the pattern is not found.

Here’s an example of how to search for the word “Python” in a string:

import re

string = "I love Python"

pattern = "Python"

result = re.search(pattern, string)

if result:

print("Match found!")

else:

print("Match not found.")

The output of this code will be “Match found!”.

Matching a pattern at the beginning of a string with ^

To match a pattern at the beginning of a string, we can use the ^ character.

For example, let’s say we want to check if a string starts with the word “Hello”.

We can use the following regular expression:

import re

string = "Hello, world!"

pattern = "^Hello"

result = re.search(pattern, string)

if result:

print("Match found!")

else:

print("Match not found.")

The output of this code will be “Match found!”.

Matching a pattern at the end of a string with $

To match a pattern at the end of a string, we can use the $ character. For example, let’s say we want to check if a string ends with the word “world”. We can use the following regular expression:

import re

string = "Hello, world!"

pattern = "world$"

result = re.search(pattern, string)

if result:

print("Match found!")

else:

print("Match not found.")

The output of this code will be “Match found!”.

Matching a pattern with wildcards using dot .

To match a pattern with a wildcard character, we can use the dot ( . ) character. The . character matches any single character except a newline character. For example, let’s say we want to check if a string contains the word “cat” followed by any three characters. We can use the following regular expression:

import re

string = "I love cats and dogs."

pattern = "cat..."

result = re.search(pattern, string)

if result:

print("Match found!")

else:

print("Match not found.")

The output of this code will be “Match found!”.

Matching a pattern with character sets using []

To match a pattern with a set of characters, we can use the [] character. For example, let’s say we want to check if a string contains the word “cat” followed by either “s” or “t”. We can use the following regular expression:

import re

string = "I love cats and dogs."

pattern = "cat[s|t]"

result = re.search(pattern, string)

if result:

print("Match found!")

else:

print("Match not found.")

The output of this code will be “Match found!”.

Matching a pattern with repetitions using *, + and ?

To match a pattern with repetitions, we can use the *, +, and ? characters. The * character matches zero or more occurrences of the preceding character. The + character matches one or more occurrences of the preceding character. The ? character matches zero or one occurrences of the preceding character. For example, let’s say we want to check if a string contains any number of “a” characters followed by the letter “b”. We can use the following regular expression:

import re

string = "ab"

pattern = "a*b"

result = re.search(pattern, string)

if result:

print("Match found!")

else:

print("Match not found.")

The output of this code will be “Match found!”.

Grouping regular expressions using ()

To group regular expressions, we can use the () character. For example, let’s say we want to check if a string contains the word “cat” followed by either “s” or “t” and then followed by the word “and” and then followed by either “dog” or “bird”. We can use the following regular expression:

import re

string = "I love cats and dogs."

pattern = "cat[s|t] and (dog|bird)"

result = re.search(pattern, string)

if result:

print("Match found!")

else:

print("Match not found.")

The output of this code will be “Match found!”.

Conclusion

Regular expressions can be a powerful tool for manipulating text in Python.

By using regular expressions, we can easily search, replace, and manipulate strings based on specific patterns.

In this article, we have covered the basics of regular expressions and shown how they can be used to solve real-world problems.

With practice and experimentation, you can become proficient in working with regular expressions and take advantage of their power in your Python programming.

Hope you now have a good grasp on applying regular expressions and that you learned something new today 😊

0