Skip to content

Extractor (Matcher)

The main component of rare is the extractor (or matcher). There are three fundamental concepts around the parser:

  • Each line of an input (separated by \n) is matched to a matcher
  • A matcher is used to parse a line into a match (and optionally, groups)
  • An expression (see: expression) is used to format an output from a matched groups
  • Optionally, one or more ignore expressions can be applied to silent matches that satisfy a truthy-comparison

Matcher Types

If no matcher is specified, by default, the entire line is always matched and passed-through to the expression-stage.

More than one matcher can not be specified at the same time.

Regex

A regex expression is specified with --match or -m, and follows common regex syntax.

When matching a regex, groups and keys are extracted both index and by-name if specified.

Set ignore-case with -I or --ignore-case.

Example:

rare filter -m '"(\w{3,4}) ([A-Za-z0-9/.@_-]+)' access.log

Dissect

A dissect expression is specified with --disect or -d, and follows dissect syntax.

Like regex, groups are extracted by both index and name. Dissect can be significantly faster than regex.

Set ignore-case with -I or --ignore-case.

Example:

rare filter -d 'HTTP/1.1" %{code} %{size} ' -e '{code}' access.log

Ignore

You can provide one or more expressions via --ignore (-i). If the statement evaluates to truthy (non-empty), the matched line will be ignored.

Example:

To ignore all non-200 http codes

rare filter -d 'HTTP/1.1" %{code} %{size} ' -i '{neq {code} 200}' -e '{code} {size}' access.log

Examples

Decomposing a Matcher

The most primitive way use rare is to filter lines in an input. We'll be using an example nginx log for our example.

Nginx log line looks like this:

10.20.30.40 - - [14/Apr/2016:18:13:29 +0200] "GET / HTTP/1.1" 206 101 "-" "curl/7.43.0"

So, to parse this we may want to match the method and path with a regex:

rare filter -m '"(\w{3,4}) ([A-Za-z0-9/.@_-]+)' access.log

This will extract the method and url and output the entire line to the screen (if matched).

If you want it to only output the matched portion, you can add -e "{0}"

Lastly, lets say we want to ignore all paths that equal "/", we could do that by adding an ignore pattern: -i {eq {1} /}

Histograms

Histograms are like filters, but rather than outputting every match, it will create an aggregated count based on the extracted expression.

So, with the same example as above, if we extract the method and url, we will get something that will count based on keys.

rare histogram -m '"(\w{3,4}) ([A-Za-z0-9/.@_-]+)' -e '{1} {2}' -b access.log

See Also