Published: Wednesday, September 01, 1999
An Introduction to Regular Expression with VBScript
By Scott Mitchell
Introduction:
Let me start out by saying that I am no expert when it comes to regular expression! I have used regular
expression on only a few occasions, and that was when writing some small Perl utilities for my Linux
box. I am by no means an expert in the field. However, I've decided that I'd like to improve my regular
expression skills, so I started studying up, and have decided to document my education in the form of
articles to help others who are interested in learning regular expression!
Regular Expression's Roots:
Regular expression use to be a thing that only UNIX users knew about. Text editors like
vi allowed regular expression-formatted searches (which is so cool and powerful... it's
a shame Windows-based editors don't support this...).
Correction! Alert readers Glen C. and Soeren P. pointed out that there are a
number of Windows editors that allow regular expression searches:
Ultra-Edit 32, SlickEdit, the Visual Studio built-in editors, and a slew of others... Thanks for
the correction!
There were also a couple of powerful text
processing utilities that made use of regular expression: sed and awk.
Then along came Perl, which has been described as the next step from sed and awk
since it combines the functionality of these two programs, plus allows the use of C and
Fortran libraries.
Of course these are all UNIX programs (well, Perl is available for Windows, but it is meant for
UNIX, in my opinion)... However, when Microsoft started creating scripting languages
for the Windows platform, only JScript contained regular expression, leaving VBScript alone in the dark.
That has changed, though, with version 5 of the VBScript Engine.
No, I don't want to hear any excuses that sound like, "I don't have the VBScript 5 Engine, so I don't
need to learn regular expression." You do need to learn regular expression, it is a neat and powerful
tool! And you can get the latest version of the VBScripting engine for free! If you don't already
have it installed, go ahead and download it
now! I'll wait!
So What the Heck is Regular Expression?
Regular expression is, technically, a defined grammar for use in complex pattern searching. Last year
I took an interesting computer science class titled Formal Languages & Automata Theory which
discussed the linguistics behind regular expressions, and defined them in an almost mathematical sense
so proofs could be performed upon various expressions. Very, very neat stuff. Anyway, I won't delve
into such technical detail (unless you guys are interested... if so,
let me know).
Regular expression allows you to quickly search (and replace, if you like) for strings within another
string. There are a few basic type of matching that you will need to acquaint yourself with:
character matching, repetition matching, and position matching.
Character Matching:
Character matching is the easiest, so
let's start there. Let us say that you want to search a string for all occurances of the string "4Guys".
Your regular expression would simply be:
Let's look at another example:
Note that the period is a special character with regular expression and represents that any single character
can exist there (except new line characters). So, 4.uys would return strings like:
4Luys, 4suys, 44uys, etc.
You can also search for strings which contain characters which fall in a set of characters. Let's
say that you wanted to find strings like 1Guys, 2Guys, 3Guys, or 4Guys. You could do:
Which says, if you find the character 1, 2, 3 or 4 preceding Guys, return it! You can also use the
dash as a range value. So, if we wanted any number (0 through 9) to precede Guys, we could do:
Pretty neat, eh? You can also do ranges of characters, like [a-m], to represent all
characters between lowercase A and lowercase M.
Repetition Matching:
Regular expression's true power starts to reveal itself once we delve into repetition matching. Let's
say that we want to find all plural or singular cases of 4Guys. For example, we want to return the
string 4Guy and 4Guys. To do this we can use another special character, the question mark. The
question mark means to match zero or one instance of the previous character. So:
Would return all strings like 4Guys and 4Guy. You may be wondering what to do if you want to find the
string 4Guys? or 4Guys.? Well, to literally search for any special character
all you need to do is simply prefix the special character with a backslash (\). So,
if we wanted to find 4Guys? we could do:
That would return only 4Guys?. Another powerful repetition character is the asterisk.
The asterisk corresponds to "match zero of more of the preceding character. So:
would return strings like 4Guy, 4Guys, 4Guyss, and
4Guysssssss. You can use parenthesis to group characters. For example:
would return strings like 4, 4Guys, 4GuysGuys, and 4GuysGuysGuysGuys.
We can also specify an exact number of substrings we want to return using the braces. Let's say
that we want to return strings that look like 444Guys. We can set our regular expression to:
The {3} means that we want exactly three occurances of the previous character.
Position Matching:
The last type of matching is position matching. Let's look at two special characters that can be used
to force the position of a substring within a string. Let's say that you wanted to search a string
for 4Guys, but you wanted to only return the string 4Guys if 4Guys were the first five characters in
the string. You'd use the caret symbol, and your regular expression would look like:
If you wanted to only match 4Guys if 4Guys were the last five characters, you'd simply use the $ symbol
like so:
Pretty neat, eh? Well, that about raps up the lesson in regular expression. We now need to show how to
use regular expressions with VBScript. That lesson is available in Part 2!
Read Part 2