Examples
The example patterns in this section describe some common character classes and shortcuts used for constructing grep patterns, and addresses some common tasks that you might find useful in your work.
Matching Identifiers
One of the most common things you'll use grep patterns for is to find and modify identifiers, such as variables in computer source code or object names in HTML source documents. To match an arbitrary identifier in most programming languages, you might use this search pattern:
[a-z][a-zA-Z0-9]*
This pattern matches any sequence that begins with a lowercase letter and is followed by zero or more alphanumeric characters. If other characters are allowed in the identifier, add them to the pattern. This pattern allows underscores in only the first character of the identifier:
[a-z_][a-zA-Z0-9]*
The following pattern allows underscores anywhere but the first character, but allows identifiers to begin with an uppercase or lowercase letter:
[a-zA-Z][a-zA-Z0-9_]*
Matching White Space
Often you will want to match two sequences of data that are separated by tabs or spaces, whether to simply identify them, or to rearrange them.
For example, suppose you have a list of formatted label-data pairs like this:
User name: Bernard Rubble
Occupation: Actor
Spouse: Betty
You can see that there are tabs or spaces between the labels on the left and the data on the right, but you have no way of knowing how many spaces or tabs there will be on any given line. Here is a character class that means "match one or more white space characters."
[ \t]+
So, if you wanted to transform the list above to look like this:
User name("Bernard Rubble")
Occupation("Actor")
Spouse("Betty")
You would use this search pattern:
([a-z ]+):[ \t]+([a-z ]+)
and this replacement pattern:
\1\("\2"\)
Matching Delimited Strings
In some cases, you may want to match all the text that appears between a pair of delimiters. One way to do this is to bracket the search pattern with the delimiters, like this:
".*"
This works well if you have only one delimited string on the line. But suppose the line looked like this:
"apples", "oranges, kiwis, mangos", "penguins"
The search string above would match the entire line. (This is another instance of the "longest match" behavior of BBEdit's grep engine, which was discussed previously.)
Although you can specify which characters are allowed in delimited strings, it's much easier to allow everything in the string except the delimiter.
The following pattern is much more effective for delimited strings:
"[^"]+"
This pattern allows anything except a delimiter (a double-quote in this case) as a match. (Of course, if the pattern contains an escaped quote (\"), grep will end it there instead.)
Note however that the pattern above works only on strings with single-character delimiters. If your strings use multi-character delimiters, you'll need to use the first method. For example, this pattern matches C comments, in which * is perfectly valid:
/\*.*\*/
Marking Structured Text
Suppose you're reading a long text document that doesn't have a table of contents, but you notice that all the sections are numbered like this:
3.2.7 Prehistoric Cartoon Communities
5.19.001 Restaurants of the Mesozoic
You can use a grep pattern to create marks for these headings, which will appear in the Marks pop-up menu.
First, decide how many levels you want to mark. In this example, the headings always have at least two digits and at most four.
Use this pattern to find the headings:
^(#+\.#+\.?#*\.?#*)[ \t]+([a-z ]+)
and this pattern to make the file marks:
\1 \2
The ^ before the first search group ensures that BBEdit matches the numeric string at the beginning of a line. The pattern
\.?#*
matches a (possible) decimal point and a digit sequence. The other groups use the white space idiom and the identifier idiom.
You can use a similar technique to mark any section that has a section mark that can be described with Grep.
Marking a Mail Digest
You can elaborate the structured text technique to create markers for mail digests. Assume that each digest is separated by the following lines:
From: Sadie Burke <sadie@burke.com>
Date: Sun, 16 Jul 1995 13:17:45 -0700
Subject: Fishing with the judge
Suppose you want the marker text to list the subject and the sender. You would use the following search string:
^From:[ \t]+(.*)\r.*\rSubject:[ \t]+(.*)
And mark the text with this replacement string:
\2 \1
Note that for the sequence \r.*\r in the middle of the search string, the \r before "Subject" is necessary because as previously discussed, the special character . does not match carriage returns.
Rearranging Name Lists
You can use grep patterns to transform a list of names in first name first form to last name first order (for a later sorting, for instance). Assume that the names are in the form:
Junior X. Potter
Jill Safai
Dylan Schuyler Goode
Walter Wang
If you use this search pattern:
^(.*) ([^ ]+)$
And this replacement string:
\2, \1
The transformed list becomes:
Potter, Junior X.
Safai, Jill
Goode, Dylan Schuyler
Wang, Walter
Modifying HTML tags
When updating or editing Web page content, you may often want to replace or modify some of the existing markup, without changing the associated text content.
For instance, say you want to change every instance of the following markup, containing some arbitrary content:
<CENTER><H2><FONT COLOR=#0000FF> text</FONT></H2></CENTER><P>
by changing the heading level to <H3> and modifying the font color to #FF00FF, to produce the following result:
<CENTER><H3><FONT COLOR=#FF00FF> text</FONT></H3></CENTER><P>
Here is one way to approach the task. Start by searching for:
<CENTER><H2><FONT COLOR=#0000FF>([^<]*)</FONT></H2></CENTER><P>
and do a replacement with:
<CENTER><H3><FONT COLOR=#FF00FF>\1</FONT></H3></CENTER><P>
In this case, you first use strings to specify the exact tag format that you want to match, and then use a character range for the variable text. You must also check the tag strings to see if they contain any special grep characters, and if so, you must modify them accordingly.
One of the target tags does contain the color value #0000FF, and as you know from previous sections, the number sign # is a special character in grep. So, you must escape this by placing a backslash ahead of it \# so that it will be interpreted literally. (Note that this is not necessary in the replace string, since that is not performing a match on anything, but it would work properly there as well.)
After the tag string, you then create a character range to hold the variable text content [^<]* by searching for zero or more instances of any character except another left angle bracket. You then enclose this expression in parentheses so you can refer to it again in the replace string.
The replace string consists of the revised tags, and a \1 to insert the variable text content which was remembered by the subpattern that you defined above.
You need to keep in mind that this search pattern will not work if the variable text section itself contains markup, since you have specified that as containing any character except a tag opening character (left angle bracket).
The simpler expression:
<CENTER><H2><FONT COLOR=#0000FF>(.*)</FONT></H2></CENTER><P>
will avoid that issue; however, it has the disadvantage of not suppressing BBEdit's longest match behavior, so any match will extend to the close of the furthest font tag possible (e.g. if there are two font tags per line in your source document).
As a second example, say that you want to locate every web URL in a file, and turn it into a link. (To keep things simple for now, let's assume that all instances of these URLs are just surrounded by whitespace, with no parentheses, quotes, or other markers.) Since such a URL will always start with the specifier "http://", you can use that as the basis for a search pattern:
(http://[\S]+)
Since the main part of a URL can contain most anything, except unencoded whitespace, you then specify a character range of [\S]+, for one or more instances of any character which is not whitespace. You should then place parentheses around the pattern, so that you can use the URL you've found as part of the replace pattern.
To complete the task, you can use a replacement pattern like the following:
<a href="\1">\1</a>
which opens an anchor tag, inserting the URL we just found as its target, and then inserts the same URL again as the visibly linked text between the opening and closing parts of the tag.