The Perl programming language is exceptional at parsing strings by providing a comprehensive regular
expression language to the programmer. Rather than creating their own regular expression language, the
developers of PHP made the Perl regular expression syntax available to PHP users. The Perl-style regular
expression syntax was built from the POSIX regular expression syntax and thus hold many of the same
features. In fact, PHP programmers can use many of the same POSIX regular expression syntax when
using Perl-style regular expression syntax. The basic Perl-style regular expression syntax involves using
forward slashes (/ /) to identify the pattern that will be searched for: /mysql/ will find any string that
contains the pattern "mysql", /m+/ will find any string that contains the letter "m" followed by one or more
characters (mysql, mom, mudd, my, etc.) and /m{2,4}/ will find any string that contains the letter "m"
followed by 2 or 4 characters (mom, mysql, etc.). The Perl-style regular expressions are utilized in three
methods: Quantifiers, Modifiers and Metacharacters.
• Quantifiers - The quantifiers used in Perl-style regular expressions are identical to those used in
POSIX ERE's.
• Modifiers - These are specific commands that allow for a more defined search based on the
specific needs of that search. These commands work by being added to the standard Perl-style
regular expressions. The following is a list of the most common Perl-style modifiers:
􀂃 i - performs a case insensitive search (ex. /mysql/i will find "mysql" and "MYSQL" or any
other combination of upper and lower case characters).
􀂃 g - finds all occurrences of the search pattern (ex. /iss/g will find 2 occurrences of "iss" in
"Mississippi"). This is useful when performing a global search and replace.
􀂃 m - performs the search based on individual lines in a multiple line string. For example, when
using the characters ^ (which is for matching at the beginning of the string) or $ (which is for
matching at the end of the string), the programmer can search a string with multiple lines as if
they were individual lines. (ex. /^Thomas/ would not be matched against the following
sentence but /^Thomas/m would be: "My mother went to the store.\nThomas went with her.")
􀂃 x - this Perl-style modifier will ignore white space and comments within the regular
expression.
􀂃 U - this Perl-style modifier will stop the searching after the first occurrence of the pattern is
found.
• Metacharacters - These are specific characters that are preceded by a backslash that symbolize a
special meaning for the character following. The following are a few of the metacharacters
associated with Perl-style regular expressions:
􀂃 \A - matches patterns only at the beginning of the string
􀂃 \b - matches a word boundary (ex. /\bis\b/ will find only occurrences of "is" but not words
like "dish" or "mist", /is\b/ would find not only "is" but also "his" because the boundary is
looking for the only "is" at the end of the word, ignoring what would be in front of it). \B
ignores word boundaries.
􀂃 \d - matches any numeric digit character. \D is the opposite matching any non-numeric
character.
􀂃 \s - matches a white space character. \S matches non-white space characters.
Other metacharacters that do not require the backslash include:
􀂃 $ - matches patterns at the end of the line
􀂃 ^ - matches pattern at the beginning of the line
􀂃 . - matches any pattern except for the new line
􀂃 ( ) - encloses a character grouping
Comments
Post a Comment