Tag Archives: regex

regex

Just wanted to share regex reference site. You can check it here http://www.grymoire.com/Unix/Regular.html
for more details.

What is a Regular Expression?





A regular expression is a set of characters that specify a pattern. The term “regular” has nothing to do with a high-fiber diet. It comes from a term used to describe grammars and formal languages.

Regular expressions are used when you want to search for specify lines of text containing a particular pattern. Most of the UNIX utilities operate on ASCII files a line at a time. Regular expressions search for patterns on a single line, and not for patterns that start on one line and end on another.

Pattern Matches
^A “A” at the beginning of a line
A$ “A” at the end of a line
A^ “A^” anywhere on a line
$A “$A” anywhere on a line
^^ “^” at the beginning of a line
$$ “$” at the end of a line

You can easily search for all characters except those in square brackets by putting a “^” as the first character after the “[." To match all characters except vowels use "[^aeiou].” Like the anchors in places that can’t be considered an anchor, the characters “]” and “-” do not have a special meaning if they directly follow “[." Here are some examples:

Regular Expression Matches
[] The characters “[]“
[0] The character “0″
[0-9] Any number
[^0-9] Any character other than a number
[-0-9] Any number or a “-”
[0-9-] Any number or a “-”
[^-0-9] Any character except a number or a “-”
[]0-9] Any number or a “]”
[0-9]] Any number followed by a “]”
[0-9-z] Any number,
or any character between “9″ and “z”.
[0-9\-a\]] Any number, or
a “-”, a “a”, or a “]”

You must remember that modifiers like “*” and “\{1,5\}” only act as modifiers if they follow a character set. If they were at the beginning of a pattern, they would not be a modifier. Here is a list of examples, and the exceptions:

Regular Expression Matches
_
* Any line with an asterisk
\* Any line with an asterisk
\\ Any line with a backslash
^* Any line starting with an asterisk
^A* Any line
^A\* Any line starting with an “A*”
^AA* Any line if it starts with one “A”
^AA*B Any line with one or more “A”‘s followed by a “B”
^A\{4,8\}B Any line starting with 4, 5, 6, 7 or 8 “A”‘s
followed by a “B”
^A\{4,\}B Any line starting with 4 or more “A”‘s
followed by a “B”
^A\{4\}B Any line starting with “AAAAB”
\{4,8\} Any line with “{4,8}”
A{4,8} Any line with “A{4,8}”
Regular Expression Class Type Meaning
_
. all Character Set A single character (except newline)
^ all Anchor Beginning of line
$ all Anchor End of line
[...] all Character Set Range of characters
* all Modifier zero or more duplicates
\< Basic Anchor Beginning of word
\> Basic Anchor End of word
\(..\) Basic Backreference Remembers pattern
\1..\9 Basic Reference Recalls pattern
_+ Extended Modifier One or more duplicates
? Extended Modifier Zero or one duplicate
\{M,N\} Extended Modifier M to N Duplicates
(…|…) Extended Anchor Shows alteration
_
\(…\|…\) EMACS Anchor Shows alteration
\w EMACS Character set Matches a letter in a word
\W EMACS Character set Opposite of \w

Perl Extensions

Regular Expression
Class Type Meaning
\t Character Set tab
\n Character Set newline
\r Character Set return
\f Character Set form
\a Character Set alarm
\e Character Set escape
\033 Character Set octal
\x1B Character Set hex
\c[ Character Set control
\l Character Set lowercase
\u Character Set uppercase
\L Character Set lowercase
\U Character Set uppercase
\E Character Set end
\Q Character Set quote
\w Character Set Match a “word” character
\W Character Set Match a non-word character
\s Character Set Match a whitespace character
\S Character Set Match a non-whitespace character
\d Character Set Match a digit character
\D Character Set Match a non-digit character
\b Anchor Match a word boundary
\B Anchor Match a non-(word boundary)
\A Anchor Match only at beginning of string
\Z Anchor Match only at EOS, or before newline
\z Anchor Match only at end of string
\G Anchor Match only where previous m//g left off