This form of regular expression searching is compliant with extended regular expressions (EREs) defined
in IEEE POSIX 1003.2. Extended regular expressions are similar in syntax to the traditional Unix regular
expressions, with some exceptions. The true power of the ERE's is their ability to be combined together to
form searches. The POSIX ERE's are utilized in three methods: Brackets, Quantifiers and Predefined
Character Ranges.
• Brackets -
Brackets ([ ]) provide the capability to find ranges of characters when searching
through strings. In non-extended regular expressions, when a search is made against a string for
the term "dog", only words that contain the letters d-o-g in sequence are found. With brackets, the
search using the [dog] will find any words that contain the letter "d", or the letter "o" or the letter
"g". The following are a list of the most commonly used character ranges:
[0-9] matches any numeric digit from 0-9
[a-z] matches any lowercase characters from "a" through "z"
[A-Z] matches any uppercase character from "A" through "Z"
[A-Za-z] matches any character from uppercase "A" through lowercase "z"
The ranges can be modified to control the searches desired, such as [4-7] to find all strings that
contain the decimals 4, 5, 6 or 7.
• Quantifiers -
Quantifiers are characters that have special meaning when used in the context of
ERE's. For many of these quantifiers, the actual use of them is only useful in context with another
ERE. The following are list of the most commonly used character quantifiers:
a+ matches any string containing at least one "a"
a* matches any string containing zero or more of the letter "a"
a? matches any string containing zero or one letter "a"
a{3} matches any string containing "aaa"
a{2,3} matches any string containing "aa" or "aaa"
a{3,} matches any string containing at least "aaa", which would also include those string that
included "aaaa" or "aaaaa" and so on
^a matches any string that begins with the letter "a"
a$ matches any string that ends with the letter "a"
a.a matches any string that contains the letter "a", followed by another character, and then
again by the letter "a" such as madam, radar and again
The following list demonstrates how ERE's can be combined together to produce more productive
searches:
[^a-zA-Z] matches any string not containing any of the characters in the range of lowercase
"a" to uppercase "Z"
^.{2}$ matches any string containing exactly two characters
<table>(.*)</table> matches any string enclosed with <table> tags (the characters between
the tags can be any length (or no characters at all)
a(bc)* matches any string containing a letter "a" followed by zero or more of the sequence
"bc"
• Predefined Character Ranges -
PHP allows the searching of character ranges that have been predefined
(also known as character classes). The standard pre-defined classes are:
[:alpha:] matches any alphabetical character, equivalent to [A-Za-z]
[:alnum:] matches any alphabetical character or numeric digit from 0-9, equivalent to
[A-Za-z0-9]
[:cntrl:] matches control characters such as tab, escape and backspace
[:digit:] matches any numeric digit from 0-9, equivalent to [0-9]
[:graph:] matches any of the printable characters in the range of ASCII 33 (!) to ASCII 126
(~)
[:lower:] matches any lowercase character from "a" to "z", equivalent to [a-z]
[:upper:] matches any uppercase character from "A" to "Z", equivalent to [A-Z]
[:punct:] matches any of the punctuation characters to include ~ ` ! @ # $ % ^ & * ( ) - _ = +
{ } [ ] ; : ' < > , . ? and /
[:space:] matches any white space characters including the space, horizontal tab, vertical tab,
new line, form feed or carriage return
[:xdigit:] matches any of the hexadecimal characters, equivalent to [a-fA-F0-9]
The ranges can be modified to control the searches desired, such as [4-7] to find all strings that
contain the decimals 4, 5, 6 or 7.
Comments
Post a Comment