Right now using Broker data stores is a pretty cumbersome endeavor. There are two main reasons for that:
Here’s a typical example of how things look like using the current API: a producer stores a value into a store for that it is the master, and a consumer then retrieves the value through a clone of that store.
local h = Broker::create_master("mystore"); Broker::insert(h, Broker::data(42, Broker:data("fortytwo"));
That producer code generally is ok, except for having to wrap the key/value pair into Broker::data() calls; that’s quite verbose. Furthermore, it turns out that Broker::data itself isn’t exactly pretty. It is a (wrapped) BIF that returns a value of this type:
type Data: record { d: opaque of Broker::Data &optional; };
The reason for using a record here is that with the &optional attribute, one can express that an Data instance does not have a value at all — something that’s needed for error cases. To check if a Data instance v has a value, one would use v?$d. That’s again rather inelegant.
local s: string; local h = Broker::create_clone("mystore"); when ( local res = Broker::lookup(h, Broker::data(42)) ) { # 'res' is of type Broker::QueryResult, which # has attributes 'status' saying if a value with # that index indeed exists in the store, and 'result' # with the resulting value if it does. if ( res$success == Broker::SUCCESS ) { s = Broker::refine_to_string(res$result); print s; } else print "value not found"; }
The consumer code is actually pretty bad. First, wrapping even such a basic lookup into when makes the whole operation really cumbersome; just imagine needing a few of those in sequence: one ends up nesting the when statements — and the example above doesn’t even have a timeout branch yet. Also keep in mind that using when is relatively expensive, and doing large numbers of lookups this way will likely have a performance impact. Second, the result coming back needs to be checked for errors and then casted. Again, imagine more complex cases here, such as sets of other values, for which the casting needs to happen recursively through a series of refine_* BIFs.
The proposal is to add extensions to the language that make all this more natural, mostly hiding what’s going on internally. The proposal consists of a number of pieces working jointly, and they are discussed individually below. Taking together, this is how the example above could look like with these extensions in place:
local h = Broker::create_master("mystore"); Broker::insert(h, 42, "fortytwo");
local s: string; local h = Broker::create_clone("mystore"); local v = async Broker::lookup(h, 42); # Continues to return 'opaque of Broker::Data' switch ( Broker::status(v) ) { case Broker::Success: s = (v as string); # Type-safe cast, w/ runtime error if v is not a string. print s; break; case Broker::NotFound: print "value not found"; break; case Broker::Timeout: print "not found"; break; }
Here, the lookup() function proceeds asynchronously: it will hold execution of the event handler until the result is available. More on that below.
In the following we discuss more details on the various pieces for making this (and more) happen.
We add support for type-safe casts on values that aren’t statically typed (Broker::Data here, but it can also include any, which should make Seth happy :-). We support direct casts where the user is certain to know the type that a value can be casted into, as well as a type-based dispatcher where it can be one of a number of target types (e.g., one may not know what type a value has in a Broker store).
Ingredients:
We add a new dynamic cast operator v as T that turns an expression v of arbitrary type into a value of the specified target type T, if supported. If not supported, the operator generates a runtime error. T must be statically known at initialization time. All expressions will automatically support casting to their own types (a no-op). Other casts can be supported by implementing them on a per-type basis (e.g., Broker::Data would support casting into whatever the corresponding Zeek type of the Broker value is; and any would support casting to the value’s actual type).
We enable comparison of types through a new is operator: if ( v is string ) ....
We add a new version of the switch statement that branches based on type, using is internally to select the target:
switch ( v ) { case bool to b: # Make the boolean available as "b". print "bool", b; break; case set[int] to s: print "set", s; break; case some_record: print "a record, but I don't really care about the value"; break; default: print "no clue"; }
Note
How about that T to id syntax for implicility declaring an identifier to access the value with the right type? (@Matthias: I don’t believe the parser could handle the syntax well that you proposed).
To make working with Broker::Data values easier, we enable opaque types to support additional operations:
Because Broker’s functions operate asynchronously, one currently needs to wrap them into when statements, which is very cumbersome. The same actually applies to other uses of asynchronous operations as well. For example, when doing DNS lookups through lookup_hostname, the need for when also often breaks the flow unnaturally.
The original reason for introducing when was to have a means for spawning the operation into the background, since script execution cannot just block until a result becomes available. However, there’s a different way to address that: while we cannot hold the script interpreter, what we can hold is the current event handler. It would just suspend processing temporarily and let the interpreter work on other event handlers in the meantime (similar to how co-routines jump back and forth). For example, we would have lookup_hostname hold execution of the current handler until the DNS result is in. Once it is, execution picks up again right after the lookup operation and then continues normally. With that, DNS lookups (and Broker operations) would turn into code like this:
local h = async lookup_hostname("www.icir.org"); print h;
We use the async keyword to explicitly mark the call as asynchronous. async can only be used with functions that support it; and vice versa: functions that execute asynchrously require using async. The explicit keyword makes it clear that we’re changing the semantics of traditional functioon calls here: the event handler may no longer run to completion first before other code gets executed.
On the implementation side, I believe such asynchronous function calls could be added relatively easily by reusing much of the when-machinery. These can be more efficient though: we don’t need to clone the stack here because execution of the event handler will hold anyways.
Note
I’ve removed the discussion of timeouts. I believe now that the functions need to take care of that themselves individually; the runtime won’t clean up. While I was earlier argueing that we need a fallback mechanism, I don’t believe anymore that there’s a good one-fits-all-solution here; it really depends on the function how timeouts should be handled. Internally, there may also be further cleanup / state recovery required in such a case. I also don’t like having to specify &timeout with calls, that’s something that a caller better shouldn’t care about. For Broker, we could instead generally set a timeout on operations when opening a connection.
Initially, we’d probably limit this new functionality to BIFs, which would internally need to do some wrapping of their logic similar to (or even the same as) what’s currently necessary for supporting BIFs inside when statements. Eventually, we could go further and extend this to full co-routine support within the language, so that script functions can jump back and forth, as some other languages offer.
Note
I’ll forgo sketching out complete co-routine support for now. I think language-wise there’s nothing in this proposal that would prevent us from extending all this accordingly; the async keyword would just start accepting more targets. And implementation-side, it would need to be a separate mechanism anyways, compared to what BiFs are doing.
© 2014 The Bro Project.