Note: Our current website is at www.zeek.org. This website is unmaintained and contains outdated information.

The Zeek Deep Cluster

Contents

Overview

A deep cluster provides one administrative interface for several conventional clusters and/or standalone Zeek nodes at once. A deep cluster eases the monitoring of several links at once and can interconnect different Zeeks and different Zeek clusters in different places. Due to its better scalability it can bring monitoring from the edge of the monitored network into its depth (-> deep cluster). Moreover, it enables and facilitates information exchange in between different Zeeks and different Zeek clusters. In essence, a deep cluster can be seen as a P2P overlay network of Zeek nodes, so that all Zeeks can communicate with each other.

Such a deep cluster requires a configuration mechanism that goes beyond what BroControl is currently providing. Hence, the goal is to setup large numbers of Zeek instances that might be deployed in different parts of the network (or in different networks). Afterwards, these instances need to communicate with each other to share data and to provide security operators with a common view on their networks.

An example for this would be that you have a huge network within an US-wide operating company that hosts several production sites at the east and the west coast. Currently, you would monitor each production site individually by a Zeek cluster. With a deep cluster you would be able to monitor and to configure the monitoring for all production sites at once.

After setting up a deep cluster, every Zeek instance is capable to communicate with each other instance in the deep cluster, via the underlying publish-subscribe system that routes messages via the established deep cluster structure.

Changes to Zeek

Broker is now integrated into the cluster- and communication frameworks as well as into all other Zeek frameworks that make use of them, e.g., the logging and sumstats frameworks.

Zeek nodes now have no fixed types anymore (like manager, proxy, worker), but roles instead. Roles can be changed on-the-fly and more than one role can be assigned to a node at the same time. In doing that, I also removed the proxy type. With the new communication framework (broker), proxies are no longer required for maintaining common state. Instead I introduced a new node-role "datanode" that is responsible for maintaining key-value pairs via broker (not fully implemented yet) and a new role "lognode" that specifies the node in a cluster that is responsible for logging.

The currently supported roles are:

manager
worker
datanode: maintaining distributed key-value stores (via broker)
lognode: receiving updates from manager and workers + writing out logs (via broker)
standalone node
peer: a standalone node that is part of a deep cluster

Multi-hop Broker

Broker was extended significantly to support publish-subscribe-based communication between nodes that are not connected directly.

A specific challenge here is the routing of publications to all interested subscribers. For that, routing tables need to be established among all nodes in a deep cluster. These routing tables are established by flooding subscriptions in the deep cluster. Afterwards, publications can be routed on the shortest paths to all interested subcribers In that context, two issues arise, namely loop detection and avoidance as well as limiting the scope of subscriptions for rudimentarey access control. Both issues are described in detail in the following.

Loop detection and avoidance

There is no unique identifier (like an IP address) anymore on which basis you can forward information. There might be only one receipient for a publish operation, but it can be also many of them. This can result in routing loops, so that messages are forwarded endlessly in the broker topology that is a result of the peerings between broker endpoints. Such loops has to be avoided as it would falsify results, e.g., results stored in datastores.

There is basically two options here:

1. Loop avoidance: During the set up phase of the deep cluster it needs to be ensured that the topology does not contain loops.

2. Loop detection: Detect loops and drop duplicate packets. This requires either to store each forwarded message locally to detect duplicates or, more light-weight, to attach a ttl value to every broker message. When the ttl turns 0, the message gets deleted. However, the ttl does not prevent duplicates completely.

For multi-hop broker we chose a hybrid approach between the two options. Loops in the broker topology need to be avoided during the initial configuration of the deep cluster. A ttl that is attached to every broker message will allow to detect routing loops and will result in an error output. The ttl value can be configured, but its default value is 32. However, there are certain configuration, e.g., a conventional Zeek cluster, that require a dense interconnection of nodes (all workers are connected to manager and datanode; manager and datanode are connected as well -> loop!). To avoid routing loops in such settings we introduced an additional endpoint flag AUTO_ROUTING. It indicates if the respective endpoint is allowed to route message topics on behalf of other nodes. Multi-hop topics are only stored locally and propagated if this flag is set. If an auto-routing endpoint is coupled with an ordinary endpoint, only the auto-routing endpoint will forward messages on behalf of the other endpoint. As a result, not every node will forward subscriptions received by others, so that loops can be prevented even though the interconnection of nodes in the deep cluster result in loops.

TODO figure needed

Subscription Access Control

To prevent that subscriptions are disseminated in the whole deep cluster single (=local) and multi-hop (=global) subscriptions are introduced. Single-hop subscriptions are shared among the direct neighbors only and thus make them only available within the one-hop neighborhood. In contrast, multi-hop subscriptions get flooded in the whole deep clusters. The differentiation in subscriptions with local (LOCAL_SCOPE) and global scope (GLOBAL_SCOPE) is intended to provide better efficiency and is configured as additional parameter when creating a broker message_queue. The default setting is always LOCAL_SCOPE.

Changes to Broctl

One of the next Zeek versions will ship with Broctld that is a daemon application running on the host that provides the control interface to a Zeek cluster. The Broctld daemon supervises the Zeek processes in a cluster and provides an API to the outside including an easy to use local web frontent to issue commands.

For the deep cluster, Broctld was significantly enhanced and thus renamed to Distributed Broctl Daemon (DBroctlD). In the deep cluster you can have multiple daemons, namely one daemon per standalone node and conventional cluster in the overall deep cluster. These daemons communicate with each other and are interconnected in a tree hierarchy. Each daemon receives commands from its predecessor in the tree and forwards them to its successors. Moreover, each node receives results from its successors and forwards them to its predecessor. In the long run dbroctld will be merged into broctld.

The Broctl cluster configuration file <prefix>/etc/node.cfg has been replaced with a new file in JSON format and allows to define a deep cluster.

Running a deep cluster

The code

To run a deep cluster you need the following git branches

Modified Broctl: topic/mfischer/broctl-overlay from https://github.com/zeek/broctl
Modified Zeek frameworks: topic/mfischer/deep-cluster from https://github.com/zeek/zeek
Multi-hop-enabled Broker: topic/mfischer/broker-multihop from https://github.com/zeek/broker with python bindings

Compile Zeek as usual.

Cluster configuration

Configuration file

The deep cluster can be set up by editing the JSON configuration file <prefix>/etc/node.json . After that, the main DBroctld daemon needs to be started and from there on the rest of the deep cluster is started recursively, i.e., the main DBroctld starts its successors and copies all configuration and policy files to them and then these successors start their successors, respectively. Each node on each level of the hierarchy obtains an individual node.json (derived from the initial node.json) that contains a description for its predecessor in the hierarchy, information on all subsequent nodes, and a description of the interconnection between all subsequent nodes.

An example deep cluster setup with peers (icsi-1, icsi-3) and one conventional cluster (icsi-2-cluster) is given below:

{
"head": {
      "id": "icsi-1"
    },

"nodes" : [

    {
      "id": "icsi-1",
      "roles": ["peer"],
      "host": "172.17.0.2",
      "port": "9990"
    },

    {
      "id": "icsi-2-cluster",
      "roles": ["cluster"],
      "members":
      [
        {
          "id": "manager-1",
          "roles": ["manager"],
          "host": "172.17.0.3",
          "port": "9990"
        },

        {
          "id": "datanode-1",
          "roles": ["datanode", "lognode"],
          "host": "172.17.0.3"
        },

        {
          "id": "worker-1",
          "roles": ["worker"],
          "host": "172.17.0.3"
        }
      ]
    },

    {
       "id": "icsi-3",
       "roles": ["peer"],
       "host": "172.17.0.4",
       "port": "9990"
    }
],

"connections":  [
           {"from": "icsi-1", "to": "icsi-2-cluster"},
           {"from": "icsi-1", "to": "icsi-3"}
        ]
}

The notation in the configuration file follows basically a graph notation with nodes (nodes) and edges (connections). In addition, the head section specifies the head node (=manager) of the respective hierarchy. The three different sections of the file are described in more detail in the following subsections. More examples for json configuration files can be found in broctl/testing/Cfg/etc/.

Head section

The head entry specifies the responsible node for this subtree in the hierarchy. During startup the initiating node has this role.

Nodes section

This section contains a description of all nodes in the deep cluster, which includes their id and network configuration. Furthermore, here you can specify the normal Zeek clusters (entry with roles: ["cluster"]) and a list of nodes afterwards (compare icsi-cluster-2 in the example).

In the given example, the node datanode-1 has two roles, datanode and lognode.

Connections section

This section specifies the hierarchy connections between the nodes and clusters specified in the nodes section in edge-notation. The result needs to be a tree-like interconnection of all nodes in this section, otherwise Broctl will output an error as that would represent an invalid configuration.

Starting up a deep cluster

As for now it is only possible to start one dbroctld per host. Virtual machines or Docker containers are a good way to setup a small test environment for a deep cluster.

After setting up the node.json you have to start <prefix>/bin/dbroctld. This will start up a local daemon for the deep cluster that will take care of all the remaining configuration:

To interact with a deep cluster, a client application is provided <prefix>/bin/dclient that in the long run will be merged with the current broctl client.
You can start dclient with optional parameters of ip and port of the main node of the deep cluster. Alternatively, you can start it without parameters and connect to a deep cluster via connect <ip> <port>. The command connect_local (or shortcut cl) will connect the client directly to localhost on port 9990.
Afterwards most broctl commands are available. For example, to start all Zeek instances in the deep cluster issue command start. To stop all instances issue stop and to shutdown the cluster, including all dbroctld instances issue shutdown. The command help will list all currently supported commands and help <command> will show usage infos.

Ongoing and future work

Simplified Administration

We will also implement functionality that allows to restart/start/stop selected parts of the deep cluster only. For example, a security operator in a company should need to type "restart east-coast" only and all Zeek instances on the east coast restart. That can be easily implemented on top of the group concept.

New concept of a group

Apart from clusters that are defined to be nodes that collaboratively monitor a single link, we introduce the new concept of a group in BroControl, which requires the afore-mentioned multi-hop routing. A group of Zeek nodes is a subset of Zeek instances across the whole hierarchy that can exchange data with each other. This is useful when you want to interconnect datanodes out of different conventional Zeek clusters to share global state. Another example would be to establish small subgroups of Zeek instances within the deep cluster that experience similar traffic on their links, e.g., all Zeek instances that monitor in front of mail servers. Moreover, the formation of dynamic groups is envisioned in which a Zeek instance will join a port-scan group dynamically after it has experienced a local port-scan attempt. In return, it is now possible to obtain a global picture on attacks that would have remained unnoticed with separate clusters and isolated monitoring of links.

Reversing the startup process of the deep cluster

For the future, it is envisioned to reverse the startup process. Each dbroctld starts on its own, e.g., after the machine booted up, and is contacted then by the dbroctld above it in the hierarchic deep cluster, which equips the node with all necessary configuration and policy files. This will come with a change in the deployment model for Zeek and BroControl, which are pre-installed on each hosts (e.g., via .deb or .rpm packages). Currently, you only install the manager, which then pushes out everything. After the switch to the new model, the manager will only have to push out configs and policy files.

Sumstats adaption to the deep cluster

The intention is to extend sumstats to be used within a deep cluster to aggregate results in large-scale, but also to form sumstats groups on the fly, e.g., as a result of detected events. In the original sumstats only directly connected nodes in a cluster-setup exchanged messages. By using multi-hop broker, we can extend this to the complete deep cluster. We can form small groups of nodes that are not directly connected to each other, but that rather are connected indirectly by their subscriptions to a group id (e.g., "/bro/sumstats/port-scan-detected").

To adapt sumstats to the deep cluster two basic approaches are feasible:

1. Sumstats groups: Instead of a cluster we apply sumstats on a group of nodes in the deep cluster. This means that we keep the basic structure and functioning of the current sumstats. We only replace direct links by multi-hop links via multi-hop broker. However, we need a coordinator per group (in original sumstats the manager took over this task). This manager will initiate queries and will retrieve all results via the routing mechanisms of multi-hop broker. There will be no processing or aggregation of information directly in the deep cluster. Only nodes in the group and foremost the manager will be able to process and aggregate information. The deep cluster will only provide a routing service between all members of the group.

2. Sumstats and deep cluster become one: We integrate the data forwarding and the data storage with each other. The deep cluster is used to aggregate and process results in a completely distributed manner, while forwarding data it to its destination. This means that all members of a sumstats group get interconnected by the deep cluster (and thus multi-hop broker) as in option 1, but now we have additional processing and aggregation of information while it is forwarded towards the manager. That is definitely the most challenging option, but in the long-term probably the most valuable one.