Note: Our current website is at www.zeek.org. This website is unmaintained and contains outdated information.

ProtocolViolation(), ProtocolConfirmation() semantics

This is a re-surrection of a discussion from quite a while back.

Intro

The current protocol confirmation / violation events are not used consistently and their meaning also varies across different analyzers. We (Robin and Gregor after some discussions) propose the following change:

Policy-Layer

get rid of protocol_confirmation, protocol_violation events.
Add new event: protocol_classification(). protocol_classification() has a parameter indicating whether it is a confirmation, a violation or a don’t know. protocol_classification() is generated exactly once per instantiated (application) analyzer. It can either be generated while the analyzer is still active or when Analyzer::Done() is called. See below for what counts as confirmation and violation. The classification from protocol_classification() is final.
add a protocol_parse_error(isFinal: bool) event. This will be called whenever the analyzer encountered a parse error. If isFinal==false, then the analyzer is trying to resync to the protocol. If isFinal==true, the analyzer gave up and the analyzer has removed itself from the tree (this can be configured with a global config variable)

===> no need for dyn-disable.
add a new event signature_analyzer_enabled(isInstantiated). This event is generated whenever a signature executes and enable action. The isInstantiated flag indicates whether an analyzer was actually instantiated.

This event could be used to set the service field for conn log based on signatures (if no analyzer was instantiated) or based on the protocol_classification() event (if we know that an analyzer was instantiated, we know that there will be a protocol_classification() event).

Analyzer-Interface

Add the methods:

Analyzer::parseBeginOk()

Analyzer::parsePDUOk()

Analyzer::parseError(bool: isFinal)

Method semantic:

parseBeginOk(): called early. E.g., when a HTTP request line is seen. Indicates that the analyzer has been able to start parsing the protocol
parsePDUOk(): called when the analyzer has seen and parsed a significant chuck of the connection, i.e., we are very certain that this is indeed the protocol this analyzer handles! E.g, when a HTTP request/reply pair has been completely parsed. Calling parsePDUOk() should trigger a protocol_classification(confirm) event (see below). It’s up to the analyzer and the semantic of the protocol whether parsePDUOk() should consider both sides of a connection or only one side. For the HTTP example is should consider both.

If a parse error occurs after parsePDUOk(), then we probably just encountered a protocol implementation oddity or violation, but we are still sure that we are parsing the right protocol.
parseError(): The parser encountered a parse error. isFinal is set, when the parser gives up. Depeding on policy configuration this should trigger removal of the analyzer. If isFinal is not set, the analyzer will try to resync to the data stream. If the resync is successful, the analyzer should call parseBeginOk() and parsePDUOk() as it proceeds; otherwise it should call parseError(true) to indicate it could not resync. Any call to parseError() always generates the protocol_parse_error event

Event Generation

Event generation is handled by the Analyzer class directly. However, subclassses should be able to change / override this behavior, to that te exact event generation policy is up to the actual analyzer. However, most analyzer should not need to override default behavor. We should make sure that even if an anlyzer overrides event generation, that we can still make use of the code in the base class.

Note, analyzers need to keep state in order to be able to generate events according to the semantics.
Any parseError() raises a protocol_parse_error event.
Signature engine raises signature_analyzer_enabled event, when it executes an enable action.
Per instantiated analyzer generate one and only one protocol_classification. (protocol_classification has a flag specifying whether the event is a confirmation, violation, or dontknow
Default semantics for generating protocol_classification depends on the sequence of parseBeginOk(), parsePDUOk(), and parseError() calls the analyzer has made.
- Any parsePDUOk() call: generate protocol_classification(confirm). I.e., we conifrm that it’s the protocol.
- Any parseError(isFinal=true): violation (but recall: only if we didn’t have a parsePDUOk before).
- <possibly other calls> ParseBeginOk(), Done() (and no prior classification event): confirmation. (or maybe dontknow, maybe configurable)??. Last call before Done() was ParseBeginOk(), so we are in sync with the datastream and can parse the protocol, so we confirm that it is the protocol.
- <possibly other calls> parseError(*), Done() (and no prior classification event): violation.
- <no calls>, Done(): dontknow

TODOs

How to handle / distinguish between full parsing analyzer VS. partial parsing or protocol detectors
How to handle analyzer that can parse several protocols. E.g., there a ton of DNS-based protocols (including DNS, Netbios-NS, Zeroconf (Bonjour), LLMNR, DNS-SD). Ideally we would like to be able to distinguish those in conn.log
Enable / run analyzers based on a per-analyzer global flag even if no event handlers are defined. E.g., if we are not interested in any events the anlyzer generates but we still want to analyzer to do full protocol parsing, so we get accurate protocol_classification() events….

Comments on the mailinglist

Vern

Gregor’s writeup looks good. I’m not quite sure we won’t find the approach needs refinements as we proceed, but it’s definitely a solid starting point. (One thing I wonder about is to what degree we want to have a distinction between full parsing vs. recognizing or partial parsing, which might need more fine-grained notions than these.)

(Note, Vern also mentioned that the event generation description was confusing but I reworked it, so hopefully it is clearer now).