A regular expression (or regex) is a string of characters, (some of which being reserved control characters,) which represent a pattern [1], i.e. a string which is designed to match a particular type of strings. Regular expressions provide the basic tool in searching, and are ubiquitous in the electronic world.

Getting started

There are many editors with regex functionalities. Here are a few examples (Please feel free to add or remove if you find better ones.)

Learning materials

A lightning introduction

There are several "dialects" (e.g. javascript, perl, php, python) of regular exprssions which differ slightly in grammar. Let us focus on python regex for the moment (because I happen to have a reference [2] for it).

Control characters

  • Python regex has the control characters :

\-.*+?$<!=|()[]^:#

First examples

[please verify]

  • Any string (e.g. abcdefg)which does not contain any control characters is trivially a regular expression ("regex") pattern. It matches only itself
  • The pattern [A-Z] matches a character between A and Z (in the ASCII table)
  • A backslash (\) followed by any control character, such as \. or even the backslash itself \\, refer to the control character itself (this pattern is called an "escape"). In our examples, \. matches the single dot . ; \\ matches the backslash
  • Combining the two examples above, the pattern [A-Za-z0-9\-] matches any alphanumeric character or the dash "-".
  • The pattern \n matches a newline
  • The pattern abc.xyz matches a string which starts with abc, ends with xyz, and, in the middle, an any character except the end of a line, for the dot
  • The pattern a* matches a string with as many characters "a" as possible; It also matches the empty string "".
  • Combining the previous two examples, we get a very common pattern: abc.*xyz matches a string which starts and ends with "abc" and "xyz" respectively, and between which is the longest available string (which could be empty) of any character except the newline.

Exercises

  • Question: What is [A-Za-z0-9\-]?
  • Write a regular expression for (a) the url of any wikiversity page; (b) the url for any page on any wikimedia site. Check with a regex editor that your regex actually works. (c) the electronic address of all your friends

Write your proposed solutions below

    Further lessons

    [proposals]

    • /Basics - the bare minimum to get one start working
    • /Groups
    • /How a regex engine works
    • /Lookahead and lookbehind
    • /Regex objects in python
    • /The good and the bad
    • /Cookbook

    Notes

    1. Martelli, Python in a nutshell, p.203
    2. Alex Martelli, Python in a nutshell ISBN 0596100469
    This article is issued from Wikiversity. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.