Tuesday, April 07, 2009 | 12:56 PM
What are Regular Expressions and Why Use Them?
Regular Expressions (RegEx) are a set of characters you can use match one or more strings of text. The main reason to use Regular Expressions is that they support wildcard matching, letting you capture a lot of variations (in URLs for example) using a single string of characters.
Here are a few examples when Regular Expressions are useful in Google Analytics:
- Matching multiple pages when defining a goal or funnel page
- Exclude a range of IP addresses when defining a filter
- Defining complex advanced segments
- Including and excluding multiple URLs from reports such as the Top Content report
Check out this help center article for some basic definitions of Regular Expressions and how they work.
Tips and Tricks
Here are some tips, tricks and flourishes to make your RegEx sing.
- USE TRIAL AND ERROR: There is only one really, really good way to write Regular Expressions. You can use all the testing tools in the world, but the only good way is to get them wrong, and then rewrite them and rewrite them until you are sure that they are right. So... be sure to have a profile that you can use just for testing.
- KEEP IT SIMPLE: If you need to write an expression to match "new visits", and the only options that you will be matching against are "new visits" and "repeat visits," just the word "new" is good enough.
- REGULAR EXPRESSIONS ARE GREEDY: They will match everything they possibly can, unless you force them not to. If your expression is "visits", it will match "new visits" and "repeat visits." After all, they both included the expression "visits." To make them less greedy, you have to make them more specific
- DON'T OVER DO IT: (See #3 above.) For example, many people use a Regular Expression only when creating an IP address filter. If the IP address is 126.96.36.199, they create an expression like this: 6\.255\.255\.255 -- and forget that that will also match 188.8.131.52, etc. So in a situation like this, you really do need to start with a beginning anchor, ^6\.255\.255\.255 . A beginning anchor (called a carat), says, "To be a match, it has to start here."
- MATCH EVERYTHING WITH .*: Some combinations of Regular Expressions are very special. Perhaps the most useful combination is a dot followed by a star, like this: .* And don't forget about a dot followed by a star, but in parenthesis, like this: (.*) The first one means, get everything. It is your ultimate wildcard. On the other hand, (.*) means, get everything and put it in a variable. You'll find that (.*) is very helpful when you are creating custom advanced filters.
- BACKSLASH TO ESCAPE: Backslash is the most frequent RegEx you will probably use. It means, take this special character and turn it into an everyday character. So if you are trying to match to "www.mysite.com?pid=123," you have a problem -- unless you use your backslash. The question mark is a Regular Expression, and only by using a backslash, like this: "www.mysite.com\?pid=123" can you take away its special powers. If you aren't sure whether something is a Regular Expression or not, go ahead and use that backslash -- it won't do any harm.
- WHITESPACE IS WHITESPACE: The most frequent question you might ask is, "How do I create a white space with Regular Expressions?" The answer is usually, just use white space. So if you need to match to "Google Analytics," you can make your Regular Expression be "Google Analytics."
Most of the basic Regular Expressions (RegEx) needs are covered in the Google Analytics documentation (you should start here if you want to learn those basics). Watch out though, just because something is not in here doesn't mean Google Analytics doesn't support it.
Other good sources are regular-expressions.info and RegEx Coach, an interactive tool for testing Regular Expressions.
How do you use Regular Expressions? Leave a comment and let us know!
Posted by Robbin Steif of LunaMetrics , a Google Analytics Authorized Consultant