Regular Expressions: Constructing Regular Expressions - Doc JavaScript | WebReference

Regular Expressions: Constructing Regular Expressions - Doc JavaScript

Unix Regular Expressions

Constructing Regular Expressions

In this section we'll discuss the basics of regular expressions. Before we dive into interpretation rules, let's examine some characteristics of regular expressions.

Most characters in a regular expression simply match themselves. If you string several characters in a row, they must match in order. So, if you write the pattern:


it won't match unless the string contains the substring "Bart" somewhere. The following pattern can be used to determine roughly if a string is a real e-mail address:


As we proceed, we will discuss much more reliable patterns for e-mail verification.

Some characters don't match themselves, but are metacharacters. You can match these characters literally by placing a backslash in front of them. For example, "\\" matches a backslash and "\$" matches a dollar-sign. Here's the list of metacharacters:

\ | () [ { ^ $ * + ? .

A backslash also turns an alphanumeric character into a metacharacter. So whenever you see a backslash followed by an alphanumeric character:

\d \D \w \W \t \s \3

you'll know that the sequence matches something strange. For example, \t matches a tab character, while \d matches any digit. Some sequences are actually zero characters wide. For instance, "\b" matches a word boundary, which is not a real character -- it is zero characters wide.

Regular expression are mostly assertions, i.e. plain characters that simply assert that they match themselves. We'll use the term "assertions" for the zero-width ones. Non-zero-width assertions are called atoms. As there is no standard terminology, we use the one from "Programming Perl." As a matter of fact, most of our explanations are based on this great book.

Regular expressions can include non-assertions, such as the alternation operator, which is indicated with a vertical bar:


Any of those strings can trigger a match. That is, the preceding expression matches all of the following strings:

You can group various sorts with parentheses, as in the following expression:

/(Homer|Marge|Bart|Lisa|Maggie) Simpson/

A common mistake is to forget the parentheses:

/Homer|Marge|Bart|Lisa|Maggie Simpson/

Unlike the previous pattern, this one matches the followings strings, because "Simpson" belongs only to "Maggie":

but it does not match the string "Maggie".

Quantifiers say how many of the previous substring should match in a row. Here are a few quantifiers:

* + ? {4,8} {5,}

Quantifiers can only be put after atoms, assertions with width. They attach only to the previous atom, so if you want a quantifier to apply to multiple characters, you must group them together, like this:


This pattern matches "BartBartBart", whereas the following pattern matches the string "Barttt":


Created: October 23, 1997, 1997
Revised: December 4, 1997