A large fraction of Zeek’s classes are serializable, i.e. instances can be converted into a (machine-independent) binary format and back. This is a short summary of how to make a class support serialization.
Note that it’s quite important to add serialization support to new classes derived from already-serializable ancestors; and also to adapt the serialization methods of existing classes when some of their internal structure changes (e.g., new data members are added). Otherwise, everything involving serialization will break quite horribly, specifically data persistence and remote communication.
The general layout of the serialization framework is as follows. There are a couple of base classes that are serializable (e.g, Val, Expr, Connection, …). They provide Serialize()/Unserialize() methods to turn them into a binary representation. These methods transparently handle all derived classes, i.e. to serialize a StringVal which is derived from base Val, you’d call Val::Serialize(). In other words only the base class’s serialization is directly accessible while serializing derived classes remains internal.
(Note that in this text the term base class does not correspond to a top-most ancestor in the C++ class hierarchy but to a logically separate branch of Zeek’s class hierarchy).
Let’s assume that class Foo is directly derived from class Bar which already includes serialization support (this case should be the most common one: most of Zeek’s classes already implement serialization). Bar itself is a (direct or indirect) descendant of a base class Base.
Follow these steps:
Add a constant SER_FOO to SerialTypes.h which represents Foo‘s type:
const SerialType SER_FOO = n | SER_IS_BASE;
n is a number which is unique among all children of Base, and SER_IS_BASE is a type-mask defining Foo as being derived from Base.
Insert DECLARE_SERIAL(Foo) into the protected section of the class definition.
If it doesn’t already exist, add a (preferably protected) default constructor to Foo (if you are able to make sure that it is not used by any other party than the serialization code, you do not need to initialize data members which the deserialization process sets (see below); but that’s not Good Style(tm) of course…)
If Foo is not abstract, add IMPLEMENT_SERIAL(Foo, SER_FOO) to Foo‘s implementation file.
Add two methods to Foo‘s implementation (they are implicitly declared by DECLARE_SERIAL):
bool Foo:DoSerialize(SerialInfo* info) const { DO_SERIALIZE(SER_FOO, Bar); <...Foo specific serialization code...> return <boolean indicating success or failure>; } bool Foo:DoUnserialize(UnserialInfo* info) const { DO_UNSERIALIZE(Bar); <...Foo specific unserialization code...> return <boolean indicating success or failure>; }
(Don’t rename info!).
The two methods have to implement serialization/unserialization code for all data members of Foo that are to be stored/restored. When DoUnserialize finishes all members must be fully initialized.
Within the Foo specific code, usually all of Foo‘s data members are serialized/unserialized subsequently. There are two types of data members: atomic values and objects which are instances of serializable classes themselves.
For atomic values, a couple of macros are defined to handle their serialization/unserialization:
For objects which are themselves serializable, call their base class’s Serialize() and Unserialize() methods passing the info object as parameter. For object pointers which may be null, use SERIALIZE_OPTIONAL(ptr, Base::Serialize(info)) and UNSERIALIZE_OPTIONAL(ptr, Base::Unserialize(info)).
The macros SERIALIZE, SERIALIZE_STR, SERIALIZE_BIT, UNSERIALIZE, UNSERIALIZE_STR, UNSERIALIZE_BIT return false on failure. Similarly, on failure the ::Serialize() and ::Unserialize() methods return false or null respectively. Don’t forget to check the return codes and, if something goes wrong, pass the error up.
The OPTIONAL macros automatically return to the caller in case of error.
Let’s assume that class Bar is to be a new serialization base class, i.e. it is going to get its own "user-visible" Serialize()/ Unserialize() methods. Follow these steps:
Add a constant SER_IS_BAR to SerialTypes.h which will allow us to use Bar as a base type:
const SerialType SER_IS_BAR = 0xnn00 | type-specs;
nn is a number unique among all base types; type-specs is bitwise-or of additional type properties. Currently there is one such property defined: SER_IS_BRO_OBJ must be set if Bar is derived from BroObj.
Add Serialize()/Unserialize() to Bar:
bool Bar::Serialize(SerialInfo* info) const { return SerialObj::Serialize(info); } static Val* Bar::Unserialize(UnserialInfo* info) { return SerialObj::Unserialize(info); }
These methods may take additional parameters if required (there’s no standard signature for them). Also, they may be extended to perform additional work before/after the call to SerialObj::*. For example, the class BroType, when deserializing a base_type, discards the reconstructed object in Unserialize() and returns a reference to the current process’s corresponding base_type instance.
Now follow the steps described above for non-base classes.
© 2014 The Bro Project.