From the user’s perspective, broctld is a single, persistent system service. The service gets started/stopped through the OS’ standard mechanism for managing system services (e.g., init.d scripts, systemd, launchctl). When the service is running, it manages all the Zeek processes running on the system; and it offers an API to the broctl client application that allows the user to control the setup. That new client application offers the user functionality very similar to the current monolithic broctl script.
In cluster settings, we run one of these broctld services per system. The broctld on the master node controls all the other nodes through connections to their broctld instances. broctl connects only to the master’s broctld, which passes commands on to the other nodes as necessary. For more distributed settings, we extend this scheme to a tree of broctld services forming a hierarchical set of nodes monitoring different parts of a network. Commands gets pushed down the tree from the master broctld, which’s children pass them on to their successors. Likewise, any feedback from child nodes (e.g., status information) gets propagated back up the tree.
We no longer push any of broctld‘s code from the master node to other nodes. Each node must set up broctld on its own, e.g., through the standard OS packaging system. Likewise, each node must ensure the broctld service is running configured appropriately to find and connect to its upstream broctld.
We divide the broctld service into two parts that run in two separate processes. Most of the logic resides inside the "Main Process"; that one implements configuration, API, logging, inter-node communication., and almost everything else. The one thing that the Main Process doesn’t do is the supervision of the Zeek processes: that’s performed by a separate "Supervisor Process", which the Main Process controls.
When the broctld service starts up, the Main Process receives control first. Initially, no Supervisor Process will be running yet. The Main Process then spawns up a Supervisor Process and sets up communication between the two. A freshly started Supervisor Process will initially not yet be controlling any Zeek processes. To start them, the Main Process instructs the Supervisor Process to spawn them; the supervisor Process then keeps the Zeek processes running until the Main Process explicitly signals to stop some, or all.
The Main Process can shutdown its operation without stopping the Supervisor Process process, allowing for example to upgrade configuration or code without impacting the Zeek processes. Once the Main Process starts up again, it will notice that a Supervisor Process is already running and will connect to it through the communication channel already established by the previous instance.
When the broctld system service gets shuts down through the OS, that means both broctld and the Zeek processes need to terminate. The Main Process first instructs the Supervisor Process to stop the Zeek processes. It then terminate the Supervisor Process, and finally shuts itself down.
The advantage of the separate Supervisor Process is that usually Zeek processes will be able to keep running unaffected even if the Main Process terminates, or otherwise gets into trouble. To fully exploit that potential, we need to keep the Supervisor Process as minimal and robust as possible, and ideally reuse an existing, stable supervisor implementation. A good candidate seems to be leveraging supervisord, as it’s established & robust, and written in Python. While we won’t need a lot of its features, reusing the existing project would avoid the effort of creating a robust supervisor from scratch. We need to investigate if reusing supervisord is a viable route.
We will have two APIs: one between the broctl client and the broctld service; and one between broctld services running on different nodes.
For the client-side API, there are two options:
Use Broker to exchange control events:
- Advantage: As we’ll need Broker anyways, this will give us much of the API for free. It also generally fits well with our overall "Brokerization" strategy.
- Disadvantage: Non-standard, and cannot benefit from existing HTTP-based clients.
Create a REST-style HTTP-based API.
- Advantage: It’s easy to build further clients beyond broctl with standard tools, including just pointing a browser to broctld.
- Disadvantage: Need to build a completely new API from scratch that won’t fit nicely into our Broker-based ecosystem.
For broctld-to-broctld connectivity, in principle we have the same two options as for the client-side API. However, using Broker seems quite natural here, as there’s no concern about other clients.
Getting all the pieces together is a complex task. Here’s a proposed break-down into smaller parts.
© 2014 The Bro Project.