Note: Our current website is at www.zeek.org. This website is unmaintained and contains outdated information.

Replacing Zeek’s Communication Layer With 0MQ

Current State

Zeek’s current communication system is quite brittle under heavy load. In particular in a cluster, large communications levels sometimes cause failures that then, with one node crashing, can spread throughout the setup taking down other nodes done as well.

There are specifically the following problems:

If a communication channel gets overloaded (i.e., for an extended period of time the sender sends more data than the receiver can handle), it’s pretty much undefined what happens. There’s a mechanism in place for one of the two sides to notice and then shutdown the communication between the two systems regularly, however that doesn’t always seem to work as intended.
If a communication channel gets shutdown per above, problems may trickle over to connections with other nodes. I believe the problem here is that all communication is multiplexed over a single pipe between parent and child process (see below), and something can get mixed up there.
It’s also not fully clear why communication channels get overloaded in the first place. Generally that situation will always happen if too much state is shared/updated, and there’s not much we can do about that other than asking the user to configure Zeek differently for sharing less. However, what’s not clear is if the current implementation indeed gets the maximum performance out of the communication. In some past cases it looked like the data was just not transmitted sufficiently fast, and that may then have been the reason for the pipe between sender and receiving getting full (or alternatively, it may have been sent fast enough, but the receiver didn’t process it as quickly as it could have). In other words, it’s not clear that the pipe overflowing is really due to sharing too much state.

Proposed Solution

The proposed way forward is to remove Zeek’s own communication code and replace it with a more robust and efficient library that takes care of all this low-level socket stuff for us. 0MQ seems to be a great candidate for that, although we need to take a closer look to fully understand whether it’s what we need.

The communication code is part of Zeek’s "independent state framework (which is the original name for the state sharing functionality; see the paper for more information on concepts and approach). This framework consistent of two main parts: (1) serialization code which that convert Zeek’s internal state (primarily script-level globals) into chunks of binary data and back; (2) the communication code that sends and receives these chunks over the network. Integrating 0MQ will replace the latter with the use of the library, but leave the former untouched (i.e., we keep all the serialization code and still send binary blobs back and forth).

Replacing Zeek’s communication involves at least dealing with the following (and probably more….):

The main communication class is the RemoteSerializer. It’s derived from Serializer, which is the entry point to serialization/deserialization of state. The RemoteSerializer is responsible for maintaining connections to remote peers and sending/receiving the serialized data. Integrating 0MQ would get rid of most of the code in RemoteSerializer and replace it with usage of the library.
There’s another helper class, ChunkedIO (with derived classes for plain and encrypted communication), that buffers an incrementally built block of raw data in memory and then sends it out once complete (and the other way as well: take a received chunk and provide access to it). We’d get rid of the direct I/O in there, but probably want to keep the class’ interface as that’s what the serialization code is using.
To decouple the I/O from the main Zeek process, Zeek currently spawns a child process that does the actual socket communication. This child process is implemented by the class SocketComm, likewise in RemoteSerializer.cc. Parent and client process communicate via a pipe, and they multiplex all data over this single channel. The child then demultiplexes what it reads from the pipe and sends it out over the right socket. If I understand 0MQ correctly, it already spawns background threads, which means we can get rid of SocketComm altogether. That by itself will greatly simplify things.
The communication between two peers follows a simple message-based protocol. But note that in addition to sending serializations, there are a number of further message types for various other things; see the beginning of RemoteSerializer.cc (and ask about about what’s not clear; there’s a quite bit that’s not that well documented). I can’t immediately say to which degree we would keep this protocol structure with 0MQ. My understanding is that it’s also message-based and thus it might make sense to stick with the basic structure. But not sure, we can do here whatever works best.
Mainly as background: the communication between parent and child process uses the same protocol, but with a slightly different set of message types (i.e., not all message types apply to this pipe; and likewise there are some message types that are used only between parent and child but not between Zeek peers (i.e., not between two SocketComms.))
Note that with any incompatible change to the communication protocol, Broccoli needs to be adapted. Using 0MQ will likely mean that we need to switch Broccoli over to using it too; I’m guessing 0MQ uses its own internal low-level protocol that we’d otherwise need to emulate.

Broccoli currently uses OpenSSL’s BIOs for its communication, even for unencrypted sessions. We’d replace that with 0MQ I presume, which is a Good Thing: OpenSSL’s API is pretty bad and the more we can avoid it the better.
However, for both Zeek and Broccoli we still need to support SSL connections, which likely means we’ll need to keep OpenSSL at least for that part.

Does 0MQ perhaps already support SSL natively? That would be perfect.
Note that for the time being we will not be able to use 0MQ’s multicast support for serializations, not even if we’re sending the same information to a set of peers. That’s a pity but the reason lies in how the serialization works: because it needs to serialize pointers to objects, it keeps a cache of already transmitted objects. Then, when a pointer is serialized, it transmits a corresponding ID associated with a particular cache object. The receiver will locate the ID in its own cache and thereby restore the right pointer on its side. It’s crucial for this scheme that both sides keep their caches synchronized and that currently only works between two peers, not between more than two. Because therefore the IDs exchanged between peers differ even if semantically the same data is sent, the binary data transmitted differs as well. Hence, we can’t multicast the same binary data to multiple recipients. (Hope that’s not too confusing …)

Footnote to the above: eventually, we could revisit the serialization cache approach and see if there is a way to use real multicasts by keeping the cache IDs in sync across all peers. However, I’m not sure that’s really feasible, and even if it were, I don’t really want to tackle it at the same time as redoing the socket code. But let’s keep it in mind in case there’s any design decision to be made that would later help with doing real multicasts here.

2nd footnote: this whole we-can’t-really-do-multicasts is the main reason for why we have proxies in the cluster.

Things to Keep in Mind

When exploring 0MQ, let’s keep an eye on whether we will be able to use it for other inter-thread communication in the future as well. Specifically, I’m hoping that the logging framework will benefit from it. Currently, in the framework’s first iteration, all logging I/O is still done by the main thread. However, I’m planing to move the I/O into separate sub-threads to decouple it from the main process. I’d hope that 0MQ can take over this task for us. (See also the project page on binary logging for more about this.)
Depending on to which degree we change the message-based protocol, let’s make sure the protocol remains extensible so that we can add other types of Zeek-to-Zeek (or Broccoli) communication later. I still have the hope that at some point we’ll add a routing layer to the protocol for exchange of state via multiple hops. But I don’t really know yet how that will look like.
Independent of the question whether we can use 0MQ multicasts for serializations (see footnote above), it would be nice to generally support that multicast functionality for message types that do send the same information to multiple peers. There’s not really much of an application for that right now, but if we can provide a built-in multicast interface, that might be quite helpful later.