- Security & Authentication
- System & Performance
- Profiles - Datasources - Filters - Lists
- Users - Collections - Service Accounts
- Data Management & Processing
- Release Notes
RegEx: Regular Expressions
Regular Expressions (RegEx) are used to identify patterns of text. This article is a basic introduction to RegEx and addresses common RegEx you might use in Angelfish.
RegEx is used throughout the entire Angelfish application to perform tasks like:
- matching a group of log files in a Datasource
- identifying hits to exclude from processing via a Filter
- isolate Pages in a subdirectory
- specify the Hostname for a Profile
RegEx provides a flexible way to describe what the pattern looks like, using a combination of characters.
^ Caret: Match from the beginning of the field$ Dollar: Match to the end of the field. Period: Match any single character| Pipe: OR* Asterisk: Match zero or more of the previous item? Question Mark: Match zero or one of the previous item Brackets: Match one item in this list() Parentheses: Match contents of parenthesis as item+ Plus Sign: Match one or more of the previous item\ Backslash: Escape symbol for any of the above characters.* Wildcard - select all.+ Wildcard - only select non-empty string
ANCHORS: ^ $
Anchors match a specified pattern from the beginning or from the end of a field. The caret and dollar symbols are anchors.
The caret ^ matches a pattern from the beginning of the field. For example:
^/page/ matches the following:
The following patterns are not matched:
- /subsite/page/default.aspx (/page/ not at beginning)
- /pages/contact.html (/page/ not at beginning)
The dollar symbol $ matches a pattern to the end of the field. For example:
internal.corp$ matches the following:
The following patterns will not be matched:
- finance.internal.dev.corp (does not end in internal.corp)
- home.internal.com (does not end in internal.corp)
You can combine anchors in a single pattern - here's an example of a match pattern for a specific Username:
Common Use Cases for Anchors
- Profile Config: Hostname(s), Results Page Stem
- Report Search Field
RANGES:  ()
RegEx is used to match individual characters, combinations of characters, and ranges of characters.
Brackets  allow you to specify individual characters that appear in the string. Brackets look at each individual character, not whole strings.
- [agf] matches a or g or f
-  matches 0 or 1 or 2 or 3
Rather than typing individual characters, you can type a range in a bracket. For example:
- [a-z] matches any lowercase letter
- [A-Z] matches any uppercase letter
- [0-9] matches any single number
- [a-z0-9] matches any lowercase letter or number
- [a-zA-Z0-9] matches any letter or number
- [2-4x-z] matches 2 or 3 or 4 or x or y or z
Parentheses () allow you to match a string of characters in a specific order, like (blue) and (green).
To match multiple strings, enclose them in parentheses and use a pipe | between each string.
Common Use Cases for Ranges
- Report Search Field
QUANTIFIERS: ? + *
With RegEx, you can specify the number of times a pattern should occur.
A question mark ? after a character matches zero or one of the previous item, which makes the item optional.
^crawl? matches the following:
- crawfish (the l is optional, making ^craw the match pattern)
(www\.)?website\.com$ matches the following
A plus sign + matches one or more of the previous item.
/+ matches the following slash patterns:
.+ is a wildcard that only matches a non-empty string
An asterisk * matches zero or more of the previous item.
.* is a wildcard that matches an empty or non-empty field.
Common Use Cases for Quantifiers
- Datasources e.g. /logs/2022/u_ex220[1-6].*
- Report Search Field
ESCAPE SYMBOL: \
Occasionally you'll want to match a character that has a RegEx value. For example:
.com matches the following:
- marcom.net (the r is matched by the .)
The backslash \ allows you to escape the value of a RegEx character.
Using the above example, you can escape the RegEx value of the period by adding a backslash, like this:
This forces a match pattern of ".com" (dot com) instead of "any single character followed by com"
To match a series of special characters in a row, escape each character individually.
- $? is matched by \$\?
To match a single literal backslash, type two backslashes: the first backslash escapes the RegEx value of the second.
If you're unsure a character has a RegEx value or not, you can escape it with impunity.