Here we summarize some steps to follow when you see Zeek doing something it shouldn’t. To provide help, it is often crucial for us to have a way of reliably reproducing the effect you’re seeing. Unfortunately, reproducing problems can be rather tricky with Zeek because more often than not, they occur only in either very rare situations or only after Zeek has been running for some time. In particular, getting a small trace showing a specific effect can be a real problem. In the following, we’ll summarize some strategies to this end.
Generally, when you encounter a problem with Zeek, the best thing to do is opening a new issue on Zeek’s GitHub issue tracker and include information on how to reproduce the issue. Ideally, your ticket should come with the following:
Note that when you use BroControl, it normally sends you a "crash report" when a node dies. If you want, you can forward that mail to us at reports@zeek.org, it already has a lot of information in there that might help us to track down what happened. Before mailing that out, however, please look over it and make sure there’s nothing in there you’d rather not send offsite.
As Zeek is usually running live, coming up with a small trace file that reproduces a problem can turn out to be quite a challenge. Often it works best to start with a large trace that triggers the problem, and then successively thin it out as much as possible.
To get to the initial large trace, here are a few things you can try:
Once you have a trace that demonstrates the effect, you will often notice that it’s pretty big, in particular if recorded from the link you’re monitoring. Therefore, the next step is to shrink its size as much as possible. Here are a few things you can try to this end:
Very often, a single connection is able to demonstrate the problem. If you can identify which one it is (e.g., from one of Zeek’s *.log files) you can extract the connection’s packets from the trace using tcpdump by filtering for the corresponding 4-tuple of addresses and ports:
> tcpdump -r large.trace -w small.trace host <ip1> and port <port1> and host <ip2> and port <port2>
If you can’t reduce the problem to a connection, try to identify either a host pair or a single host triggering it, and filter down the trace accordingly.
You can try to extract a smaller time slice from the trace using TCPslice. For example, to extract the first 100 seconds from the trace:
> tcpslice +100 <in >out
Alternatively, tcpdump extracts the first n packets with its option -c <n>.
If Zeek crashes, a core dump can be very helpful to nail down the problem. When using BroControl, the crash report it sends out already contains some information from the core, and forwarding that to us may already shed some light on what’s going on. Examining a core in more detail is not for the faint of heart but can reveal extremely useful information on top of that, so we summarize that in the following.
First, you should configure Zeek with the option --enable-debug and recompile; this will disable all compiler optimizations and thus make the core dump more useful (don’t expect great performance with this version though; compiling Zeek without optimization has a noticeable impact on its CPU usage.). Then enable core dumps if you haven’t already (e.g., ulimit -c unlimited if you’re using bash).
Once Zeek has crashed, start gdb with the Zeek binary and the file containing the core dump. (Alternatively, you can also run Zeek directly inside gdb instead of working from a core file.) The first helpful information to include with your tracker ticket is a stack backtrace, which you get with gdb’s bt command:
> gdb bro core [...] > bt
If the crash occurs inside Zeek’s script interpreter, the next thing to do is identifying the line of script code processed just before the abnormal termination. Look for methods in the stack backtrace which belong to any of the script interpreter’s classes. Roughly speaking, these are all classes with names ending in Expr, Stmt, or Val. Then climb up the stack with up until you reach the first of these methods. The object to which this is pointing will have a Location object, which in turn contains the file name and line number of the corresponding piece of script code. Continuing the example from above, here’s how to get that information:
[in gdb] > up > ... > up > print this->location->filename > print this->location->first_line
If the crash occurs while processing input packets but you cannot directly tell which connection is responsible (and thus not extract its packets from the trace as suggested above), try getting the 4-tuple of the connection currently being processed from the core dump by again examining the stack backtrace, this time looking for methods belonging to the Connection class. That class has members orig_addr/resp_addr and orig_port/resp_port storing (pointers to) the IP addresses and ports respectively:
[in gdb] > up > ... > up > printf "%08x:%04x %08x:%04x\n", *this->orig_addr, this->orig_port, *this->resp_addr, this->resp_port
Note that these values are stored in network byte order so you will need to flip the bytes around if you are on a little-endian machine (which is why the above example prints them in hex). For example, if an IP address prints as 0100007f , that’s 127.0.0.1 .
© 2014 The Bro Project.