Note: Our current website is at www.zeek.org. This website is unmaintained and contains outdated information.

Dynamic Protocol Detection

Contents

Concept
Adding Analyzers
- Class Layout
- Relevant Methods
How-Tos

Concept

Rather than selecting which application protocol analyzer to use based on a connection’s server port, Zeek’s dynamic analyzer framework associates an analyzer tree with every connection. This tree can contain an arbitrary number of analyzers in various constellations and can be modified during the whole lifetime of a connection, i.e., we can enable/disable analyzers on the fly. Most importantly, this gives us two key capabilities:

We can perform protocol analysis independently of ports. By using a set of signatures which match typical protocol dialogues, Zeek is able to look at payload to find the correct analyzers. When such a signature matches, it turns on the corresponding analyzer.
We can turn off analyzers when it becomes obvious that they are parsing the wrong protocol. This allows us to use rather loose protocol signatures and, if in doubt, try multiple analyzers in parallel.

Adding Analyzers

Class Layout

All analyzers derive from the class Analyzer. We associate an analyzer tree with each connection, which reflects the data-flow during packet analysis, in terms of which analyzers get to perform their analysis. Each packet is first passed to the root node of the tree which passes its (potentially transformed) input on to all of its children. Each child in turn passes the data on to its successors.

The root node must always be of type TransportLayerAnalyzer. There are such analyzers for TCP, UDP, and ICMP. Application-layer analyzers are either derived from TCP_ApplicationAnalyzer (for TCP protocols) or from the general Analyzer class (for all non-TCP protocols).

When a connection begins, the initial analyzer tree is instantiated by the global analyzer::Manager. The initial tree always contains a corresponding TransportLayerAnalyzer. For TCP and UDP it also contains an instance of class PIA_TCP or PIA_UDP, respectively. The PIAs are responsible for detecting protocols as the connection progresses. Most importantly, they perform the signature matching. Depending on whether any well-known port is in use, the initial tree may or may not contain any application-layer analyzers right away.

Analyzers can support one of two input methods (or both): packet-wise or stream-wise. An analyzer can accept input via one method (e.g., packet-wise) and pass it on to its children via the other (e.g., stream-wise). The TCP_Analyzer for example reassembles packets into a byte-stream and thus all TCP_ApplicationAnalyzers only see stream-wise input.

Relevant Methods

Any Analyzer-derived class can override the following virtual methods:

void Init()

Initialization of the analyzer. Called before any data processing is performed.

void Done()

Clean-up of the analyzer. Called just before the instance is destroyed.

void DeliverPacket(int len, const u_char* data, bool orig, uint64 seq, const IP_Hdr* ip, int caplen)

Interface for packet-wise input (or, more generally, chunk-wise in- order input as the parent analyzer does not necessarily need to pass full packets around).

len

Length of data.

data

Pointer to data.

orig

True if data is from connection originator, false for responder.

seq

>=0 if there’s a sequence number associated with the data. -1 if not.

ip

Pointer to packet header if there’s a packet associated with the data. 0 if not.

caplen

Length of the captured packed if ip is non-zero.

void DeliverStream(int len, const u_char* data, bool orig)

Interface for stream-wise input.

len

Length of data.

data

Pointer to data.

orig

True if data is from connection originator, false for responder.

void Undelivered(uint64 seq, int len, bool orig)

Interface for input which is supposed to be stream-wise but could not be fitted into a continuous stream (e.g., parts of a TCP stream which could not be reassembled).

seq

Sequence number of not-continuous chunk.

len

Length of not-continuous chunk.

orig

True if from connection originator, false for responder.

unsigned int MemoryAllocation() const

Returns number of memory bytes currently used by the analyzer.

In addition, analyzers need to have two static methods:

static Analyzer* InstantiateAnalyzer(Connection* conn): Returns new instance of the analyzer class.
static bool Available(): Returns false if the analyzer is completely disabled and not to be considered for any connections. (Typically, this is the case if there are no event handlers defined for the analyzer.)

Any TCP_ApplicationAnalyzer-derived class can override the following virtual methods in addition to those of Analyzer :

void EndpointEOF(bool is_orig): The given endpoint’s data delivery is complete.
void ConnectionFinished(int half_finished): Called whenever an endpoint enters TCP_CLOSED or TCP_RESET.
void ConnectionReset(): Called when the connection is reset.
void PacketWithRST(): Called whenever a RST packet is seen (sometimes invocation of ConnectionReset is delayed).

Note: Whenever overriding one of these methods, call the parent class’s implementation first before doing anything else.

The classes Analyzer, TCP_ApplicationAnalyzer, and analyzer::Manager provide a couple of methods for passing data on to child analyzers, manipulating the analyzer trees, generating events, etc. See the source. :-)

There is one more thing: SupportAnalyzers, which encapsulate common but protocol-independent tasks (e.g., line-splitting for line-based ASCII protocols). While also derived from Analyzer, support analyzers are conceptually different in the sense that

They are directly associated with a particular parent analyzer. If the parent gets destroyed, all its support analyzers are deleted as well.
They don’t have children.
They handle only one direction of the connection’s data, i.e., either the originator side or the responder side. If a parent analyzer wants to leverage a support analyzer for both directions, it needs to instantiate two of them.

All the support analyzers of a particular parent analyzer form a list (one list per direction). Every packet/stream-chunk which is handed to the parent first passes through this list. The output of the last support analyzer is then delivered via the parent’s Deliver{Packet,Stream}. The most important support analyzer currently is the ContentLine_Analyzer which performs the mentioned line- splitting in ASCII protocols. It ensures to pass only full lines to the parent’s DeliverStream().

How-Tos

Implementing an Application Analyzer

These are the main steps to write an application analyzer for protocol Foo:

Create a directory under src/analyzer/protocol (e.g. named foo in this case).
Add your analyzer’s dir to src/analyzer/protocol/CMakeLists.txt.
Look at other existing protocols as an example for creating src/analyzer/protocol/foo/CMakeLists.txt, src/analyzer/protocol/foo/Plugin.cc, and any other BIF or BinPAC code that are part of the analyzer.
In the constructor, add support analyzers if required, e.g.:

Foo_Analyzer::Foo_Analyzer(Connection* conn)
   : TCP_ApplicationAnalyzer(AnalyzerTag::Foo, conn)
    {
    AddSupportAnalyzer(new ContentLine_Analyzer(conn));
    }

Add calls to Analyzer::ProtocolViolation() at points where the analyzer believes it is parsing the wrong protocol. Don’t be too strict though as it’s going to see a lot of crud even with the right protocol …
Add calls to Analyzer::ProtocolConfirmation() at points where the analyzer can be pretty sure to parse the right protocol.

Determining Analyzer Activation

Analyzers can use one of three ways to be fed new connections:

Use a preconfigured set of ports, thus triggering on all connections using any of the registered ports.
Use content signatures, thus triggering on all connections that match the relevant signatures.
Hard-code to trigger on all connections, when signatures won’t cut it and the protocol uses arbitrary ports. This should be avoided whenever possible obviously.

We now explain how to do each in turn.

If the analyzer is primarily supposed to work on a fixed set of ports, then make a call to Analyzer::register_for_ports:

global foo_ports: set[port] = { 12345/tcp, 54321/tcp } &redef;

event bro_init()
    {
    Analyzer::register_for_ports(Analyzer::ANALYZER_FTP, foo_ports);
    }

If you want to activate the analyzer via signatures (thus making it port-independent), create a signature file and load it in Zeek (e.g. via @load-sigs). Below is the signature pair used for HTTP as an example. It leverages the requires-reverse-signature condition to make the signature more reliable, and triggers the HTTP Analyzer via the enable "http" action. Here, "http" refers to the lower-case name the analyzer is registered under when an analyzer::Component is added via a plugin.

  signature dpd_http_client {
    ip-proto == tcp
    payload /^ *(GET|HEAD|POST) */
    tcp-state originator
  }

signature dpd_http_server {
    ip-proto == tcp
    payload /^HTTP\/[0-9]/
    tcp-state responder
    requires-reverse-signature dpd_http_client
    enable "http"
  }

If you want to activate the analyzer on all connections, you manually need to hook the analyzer into the analyzer tree, in analyzer::Manager::BuildInitialAnalyzerTree. For example, for a stream-based TCP content analyzer, you might use this…

if ( tcp )
        {
        // ...
        if ( Foo_Analyzer::Available() )
                tcp->AddChildAnalyzer(new Foo_Analyzer(conn));

… while for a packet-based one, you could use this:

if ( Foo_Analyzer::Available() )
        tcp->AddChildPacketAnalyzer(new Foo_Analyzer(conn));

What To Do If It Does Not Work

If your analyzer is not activated when you expect it to, try any of the below:

If you are using signatures, make sure the signature actually matches. You can do so by adding an event "<explanation>" statement to the signature, and run Zeek on your traffic with the signatures.bro policy.
Make sure Zeek actually processes your traffic. By adding print-filter at the end of your Zeek invocation, the resulting BPF filter for your configuration will be printed to the console at startup.
Build Zeek with debugging support (./configure --enable-debug), and run it with the DPD debug stream enabled (by passing -B dpd at the command line). After completion, have a look at the resulting debug.log file and see whether it provides any clues.

General Caveats

A TCP_ApplicationAnalyzer can access the state of the parent TCP_Analyzer by calling the method TCP. However, they should be coded in a way that they can also work without having a TCP parent (i.e., TCP() return 0). That will later allow us to use them with decapsulated tunnels. If that is not possible, they should at least do an assert(TCP()) so that one notices if the analyzer is used in the wrong way.