Note: Our current website is at www.zeek.org. This website is unmaintained and contains outdated information.

osquery Integration

Overview

Facebook’s osquery provides access to host-level system information through an SQL-style interface. We want to make that information available to Zeek scripts in real time.

The plan is provide the information as a continous stream of events that reflect host-level changes, resembling conceptually the events that Zeek generates from network traffic. For example, osquery could send events like the following to Zeek:

event new_user(h: Host, name: string, realname: string, uid: count, gid: count, home: string);

event new_process(h: Host, name: string, cmdline: string, pid: count, ppid: count, uid: count);

event process_finished(h: Host, pid: count);

event filesystem_mounted(h: Host, device: string, path: string, uid: count)

event socket_opened(h: Host, p: port, pid: count)

event socket_closed(h: Host, p: port)

The general style of events is update-based: something has just changed. We are not asking: "what are all the the processes currently running on the system".

h: Host represents a common record structure that identifies the particular host that the activity comes from. It’ll contain things like IP address and hostname, and maybe some kind of unique identifier if osquery has that.

Approach

Note

This is still a bit of guess work based on current understanding of osquery. We can adapt as things become more clear.

We extend osquery with a plugin that communicates with Zeek, sending an event for all interesting updates. Zeek registers corresponding queries with the plugin, and the plugin subscribes to the matching activity with osquery and passes that on to Zeek.

This scheme includes the following pieces:

We use Broker for communcation in both directions. Zeek publishes queries as events, to which all osquery instances subscribe. In turn, osquery sends matching events back to Zeek through Broker.

(In practice, we probably want to allow defining separat subgroups of hosts, so that we can query them for different things. Broker’s topics should enable that, but we can postpone that to later.)
On the Zeek side, we provide a user-interface for defining queries. Zeek scripts use that to install individual queries, and they define prototypes for the events that they expect to receive in return. An off-the-cuff example:
# Event prototype.
event new_process(h: osquery::Host, name: string, path: string, pid: count);

event bro_init()
    {
    # Register query.
    osquery::subscribe(new_process, "SELECT name, path, pid FROM processes;")
    }

# Example handler.
event new_process(h: osquery::Host, name: string, path: string, pid: count)
    {
    print fmt("New process %s (pid %u) on host %s", h.ip, name, pid);
    }
Note

Not quite sure if/how queries map into subscriptions in this way.
On the osquery side, we need to assemble the event for sending to Broker. Generally, the columns returned by the SELECT will turn into the event’s arguments. In addition, we add an always-present h: Host argument. The event arguments’ types need to be mapped from what osquery returns to Broker types (which, in turn, correspond to Zeek types); see next bullet.

It seems there are two possible ways of doing the type conversion:

Hardcoding: The osqery plugin retrieves the query response, iterates through its columns and builds up a Broker event to then send out.

Note

I’m not quite sure what interface(s) osquery provides for extracting results. On the web page, I see JSON; not sure if there’s something more direct.

Leveraging JSON: We can also extend Broker with a JSON interface, so that the osqery plugin can forward a JSON response directly. For this, we would:

Extend Broker’s API with a function that builds an event from JSON; with some predefined mapping of how JSON values turn into Broker values.

Then call that function from the osquery plugin.

Option (2) would actually be a nice interface for Broker to have anyways, as it opens it up to ingesting input from a variety of other JSON sources as well (we could write a an ingestion daemon that opens up a socket to which web applications can post JSON; but that’s a different topic :).