The search mechanism supports a large variety of patterns, including simple strings, strings with classes of characters, sets of strings, wild cards, and regular expressions.
Summary
Rule | Explanation | To search for... | Enter... |
---|---|---|---|
Boolean AND | To search for multiple terms, separate by semicolons | larry AND moe AND curly | larry;moe;curly |
Boolean OR | To search for any of several terms, separate by commas | larry OR moe OR curly | larry,moe,curly |
Strings
Strings are any sequence of characters, including
the special symbols `^' for beginning of line and `$'
for end of line. The following special characters (
`$', `^', `*', `[',
`^', `|', `(', `)', `!',
and `\' ) as well as the following meta characters
special to the search: `;', `,',
`#', `<', `>', `-',
and `.', should be preceded by `\' if they are
to be matched as regular characters. For example, \^abc\
corresponds to the string ^abc\, whereas ^abc corresponds
to the string abc at the beginning of a line.
Classes of characters
A list of characters inside [] (in order) corresponds
to any character from the list. For example, [a-ho-z]
is any character between a and h or between o and z.
The symbol `^' inside [] complements the list. For
example, [^i-n] denote any character in the character
set except character `i' to `n'. The symbol `^' thus
has two meanings, but this is consistent with egrep.
The symbol `.' stands for any symbol (except
for the newline symbol).
Boolean operations
The search supports an `AND' operation denoted
by the symbol `;' an `OR' operation denoted by the
symbol `,',
or any combination.
For example,
`pizza;cheeseburger' will output all
lines containing both patterns.
Wild cards
The symbol `#' is used to denote a sequence of any
number (including 0) of arbitrary characters .
The symbol # is equivalent to .* in egrep. In fact,
.* will work too, because it is a valid regular expression
(see below), but unless this is part of an actual regular
expression, # will work faster.
Combination of exact and approximate matching Any pattern inside angle brackets <> must match the text exactly even if the match is with errors. For example, <mathemat>ics matches mathematical with one error (replacing the last s with an a), but mathe<matics> does not match mathematical no matter how many errors are allowed.
Regular expressions
Since the index is word based, a regular expression
must match words that appear in the index for the search
to find it. The search first strips the regular expression
from all non-alphabetic characters, and searches the
index for all remaining words. It then applies the
regular expression matching algorithm to the files
found in the index. For example, `abc.*xyz'
will search the index for all files that contain both
`abc' and `xyz', and then search directly for `abc.*xyz'
in those files. The
union operation `|', Kleene closure `*', and parentheses
() are all supported. Currently `+' is not supported.
Regular expressions are currently limited to approximately
30 characters (generally excluding meta characters).
The maximal number of
errors for regular expressions that use `*' or `|'
is 4.