Pipapo

Overview

The RegexPolicyDaemon (rxpd) can be used to efficiently check data against different lists of regular expressions. This can be used to build whitelists/blacklists to protect many kinds of Internet services. It uses a simple textual protocol that is easily implementable in scripting languages. Example usages are access and content control (spam filtering) for CGI scripts, wikis, email, revision control systems, IRC servers and clients, and so on.

Rxpd encourages users to distribute their lists in a friend2friend network. It has features to fetch, updaten, filter and merge lists. The idea is that users/administrators maintain manageable lists which cover single topics and then merge them together.

Concepts

Rxpd targets to be simple and efficently validating data against Regular expressions. It has (yet) no configuration file for the daemon itself and is controlled by commandline options. Most management of regular expression lists can be done remotely over a simple protocol. By itself it has has no authentication but there is a policy check which validates incoming requests against an special regex list which then defines if the client is allowed to do a certain task. Any further management like distributing the lists, authenticate sessions more strongly and so on should be done by other means and are not planned to be included in rxpd.

The goal it to create a common place which applications can use to validate any kind of data. This works efficently because short lived programs like CGI scripts take the advantage of regular expressions which are precompiled in core and generally such lists might be shared between different applications.

Release Tarballs

Release tarballs are attached to the wiki at: http://www.pipapo.org/pipawiki/RegexPolicyDaemon?action=AttachFile

I am using gpg signed tarballs for distribution. As first step one has to check the signature

$ gpg rxpd-X.Y.tar.gz.gpg

This will produce a rxpd-X.Y.tar.gz and report if the signature could be validated.

Since the package is built with gnu autotools, the usual build and install procedure works:

$ tar xzvf rxpd-X.Y.tar.gz
$ cd rxpd-X.Y
$ mkdir build     # using a build directory is optional
& cd build
$ ../configure
$ make
$ make install

Development Version via git

The development version is available via git from git://git.pipapo.org/rxpd or mirrored at repo.or.cz git://repo.or.cz/rxpd.git.

After you cloned the repository you need to bootstrap the autotools first

$ autoreconf -i

Then the usual configure / make will work.

There is a special makefile target make meta to bring several files (README, AUTHORS, NEWS, TODO) in sync with the Rxpd Documentation wiki and update the Change``Log.

Dependencies

Rxpd requires gnu-pth and its development headers.

What gets installed

A single executable called rxpd will be installed in $prefix/bin.

Access Policies

One list of rules can be used to define access policies for the rxpd itself (-p option). Each command will be extended with access protocol (one of tcp4, tcp6 or unix) and the peer address and then checked against this policy list. When this check yields in an ACCEPT:.. rule, the command is allowed, for everything else will result in an error and drop the connection.

For example if -p policy is used:

:ACCEPT:DUMP:policy
:ACCEPT:.*:tcp.:10\..*$
:REJECT:.*:policy
:ACCEPT:.*

Example

We want to protect a wiki or such against vandalism: blacklists are in $blacklists.d/ lets say /etc/blacklists.d/

The wiki engine builds a tuple hostname;ip which is checked against a blacklist which classify the hosts

this is /etc/blacklist.d/hosts

:allow:localhost;127.0.0.1
:allow:mydomain.org;10.10.
:deny:.*aol.com;
:check:

so printf("CHECK:hosts\n%s;%s\n", hostname, ipaddr) send to the blacklist daemon will result in either allow, deny or check send back. The first both (allow/deny) results are handled obliviously. With the check result the edited content will be filtered against another list /etc/blacklists.d/content

:deny:sex.com
:deny:warez

Rxpd acts on following signals:

''SIGHUP''

Reload all files from disk

''SIGTERM''

Save all files which are already exist on disk and exit

''SIGINT''

Exit immediately without saving

''SIGALRM''

Save all files which are already exist on disk while continue running

There are only 2 things allowed in a list file:
Comments

Begining with a # at the first column followed by arbitary text. Comments are preserved and have semantic meaning as they can be used to organize the data. Comments in the form '#UPPERCASE: ' are special/reserved, the engine uses them to disable rules when they expire or flag errorneous rules, '#lowercase: ' can be used for custom enabling/disableing of rules, see the FILTER command.

Rules

Starting with an optional accesstime entry, then a name, followed by a regex. This three parts are delimited by colons.

  • atime will be maintained by the daemon to reflect the last time the rule matched some data. This is time in seconds since epoch in UTC.

  • name is an arbitary string which has not special meaning for the rxpd but will send back to the calling applications and be used there to classify results.

    • the name may start with a > this is used to jump into a sublist which name is defined by the name of the current list appended by what follows the >.

  • the regex is a POSIX extended regular expression, regex are currently 'case-insensitive' this will become configureable later.

Lines in can be at most 4095 bytes long.

Example list file, let’s name this example:

:accept:GNU|Linux
0:accept:FreeBSD
0:reject:M.*soft
:>sub:blah

Matches will later report the line matched, without the atime and first colon part. "Macrosoft" matches "M.*soft" thus "reject:M.*soft" will be returned.

Note that the first accept rule has no atime, to initiate atimes they can be initalized with 0 the daemon will update them on access and rewrite the List files with the SAVE command or when it recieves a SIGTERM.

When there is an error in a regular expression, it will be replaced with #ERROR:, followed by the cause of the error, followed by the rules string in quote.

A client can indicate that it is finished by sending !EXIT on a single line for all commands which take multi line input except CHECK:.

Namespaces

Rxpd is intended to be used as distributed system, this makes clear that data put into it has to follow some rules to avoid clashing with foreign data. While rxpd doesn’t enforce any naming rules I write some suggestions/proposals here.

Rule lists in rxpd are files, with 0.3 they are organized as directory hierachy under the rules-basedir.

Most importantly we need to distinguish the authority for a rules list, this can be either a specifc user or some server. For users we use the users email as unique identifier, for servers their hostname will suffice. Further we need to address the purpose or protocol for which a list is primary used and finally the list shall have a descriptive name.

This triple forms a uniqe identifier for any list.

Protocol

Rxpd uses a simple line based text protocol. The first line is always the command and list which will be used on the following data, it is not possible to change the command throughout a session. Each session will generate at least one line of response. When no other output is available #OK: is send, in case of an error a line starting with #ERROR: is send.

Lines end with any combination of the newline and/or carriage return character.

The protocol is line based where lines which are longer than 4095 characters are broken (may be word-wraped on the last whitespace character in the line in future). Many commands take multiple lines as input, all this commands except CHECK: can be exited by sending a !EXIT statement.

Lists are autoloaded on demand and automatically saved when they are already exist on disk.

Commands:
CHECK:list\n..data..

check all following data against the list. Returns the first matching rule (excluding the atime field), if any. When a empty line is send, the daemon answers with "#OK:". This can be used to syncronize the queries before sending new data.

APPEND:list\n..rules..

append the following lines to list.

PREPEND:list\n..rules..

prepend the following lines to list.

REMOVE:list\n..rules..

remove all matching lines from list.

REPLACE:list\nrule\n..replacements..

find the position matching the first line, which can be a rule or a comment and replaces it with the following rules. Updates are atomic and done when either an empty line is send or when the connection gets closed.

CLEAR:list\n

Removes all rules from a list.

DELETE:list\n

Deletes list completely including removing it from disk.

LOAD:list\n

reload list from disk, this resets the atime to the values stored on disk. Existing lists will be autoloaded when first referenced.

SAVE:list\n

save list to disk, saves new atime records. Lists have to be saved at least once to be subject of automatic saving.

EXPIRE:list\nseconds

marks all rules from list which are subject of atime updates and where not touched for some (much) seconds with a #EXPIRED: comment, effectively disabling them.

FILTER:list\nfilters…

runs filters.. (also rules lists) over list and takes action for matches. There are 2 Actions defined:

  • DELETE removes the matching rule completely from the list

  • ACTIVATE tries to reactivate a previously commented out rule (#something: comment)

  • any other name will just comment out the matching rule using the name itself

FETCH:list\nremote

fetches a list from remote storing it in local list. remote has the form address/listname where address is either ip:port or a path to a unix domain socket.

`Idea: do we want 'FETCH:list\nremote:policylist' which gives a local list filtering remote first?`
UPDATE:list\nsource…

updates atimes in list from sources, uses an efficent forward looking algorithm, rule reordering in sources is not supported (adding/removing rules works). sources have to be other local lists.

MERGE:list\nsources..

Adds new lines from other sources.. to list, order will be preserved. Does 'NOT' delete removed lines.

DUMP:list\n dump the content of list.

LIST:\n

list all loaded lists.

SHUTDOWN:\n

exits the daemon gracefully, pending connections will still be served but no new connections are accepted.

VERSION:\n

prints package and version information.

HELP:\n

gives a short list of available commands.

Things to do

Contributions welcome! If anyone out there needs one of this features, drop me note and implement it when you dont wan’t to wait until I do it.

  • use SO_KEEPALIVE for client connections

  • more robust saving with backup files

  • unix sockets

  • -4 -6 IPv4/6 flags

  • benchmark/profile other regex engines (pcreposix, tre, …) we need some precaution that regex might not be used to DoS the system

  • There are countless possible optimizations which will be implemented by time examples:

  • share rules over lists

  • compile regex on the first use, not at rule construction

  • optimize the fetch protocol only fetching things newer than a certain atime