Files are constantly transmitted over HTTP on regular networks. These files belong to a specific category (e.g., executable, text, image) identified by a Multipurpose Internet Mail Extension (MIME). Although MIME was originally developed to identify the type of non-text attachments on email, it is also used by a web browser to identify the type of files transmitted and present them accordingly.
In this tutorial, we will demonstrate how to use the Sumstats Framework to collect statistical information based on MIME types; specifically, the total number of occurrences, size in bytes, and number of unique hosts transmitting files over HTTP per each type. For instructions on extracting and creating a local copy of these files, visit this tutorial.
When working with the Summary Statistics Framework, you need to define three different pieces: (i) Observations, where the event is observed and fed into the framework. (ii) Reducers, where observations are collected and measured. (iii) Sumstats, where the main functionality is implemented.
We start by defining our observation along with a record to store
all statistical values and an observation interval. We are conducting our
observation on the HTTP::log_http
event and are interested
in the MIME type, size of the file (“response_body_len”), and the
originator host (“orig_h”). We use the MIME type as our key and create
observers for the other two values.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | mimestats.bro
module MimeMetrics;
export {
redef enum Log::ID += { LOG };
type Info: record {
## Timestamp when the log line was finished and written.
ts: time &log;
## Time interval that the log line covers.
ts_delta: interval &log;
## The mime type
mtype: string &log;
## The number of unique local hosts that fetched this mime type
uniq_hosts: count &log;
## The number of hits to the mime type
hits: count &log;
## The total number of bytes received by this mime type
bytes: count &log;
};
## The frequency of logging the stats collected by this script.
const break_interval = 5mins &redef;
}
event HTTP::log_http(rec: HTTP::Info)
{
if ( Site::is_local_addr(rec$id$orig_h) && rec?$resp_mime_types )
{
local mime_type = rec$resp_mime_types[0];
SumStats::observe("mime.bytes", [$str=mime_type],
[$num=rec$response_body_len]);
SumStats::observe("mime.hits", [$str=mime_type],
[$str=cat(rec$id$orig_h)]);
}
}
|
Next, we create the reducers. The first will accumulate file sizes
and the second will make sure we only store a host ID once. Below is
the partial code from a bro_init
handler.
1 2 3 4 5 6 | mimestats.bro
local r1: SumStats::Reducer = [$stream="mime.bytes",
$apply=set(SumStats::SUM)];
local r2: SumStats::Reducer = [$stream="mime.hits",
$apply=set(SumStats::UNIQUE)];
|
In our final step, we create the SumStats where we check for the observation interval. Once it expires, we populate the record (defined above) with all the relevant data and write it to a log.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | mimestats.bro
SumStats::create([$name="mime-metrics",
$epoch=break_interval,
$reducers=set(r1, r2),
$epoch_result(ts: time, key: SumStats::Key, result: SumStats::Result) =
{
local l: Info;
l$ts = network_time();
l$ts_delta = break_interval;
l$mtype = key$str;
l$bytes = double_to_count(floor(result["mime.bytes"]$sum));
l$hits = result["mime.hits"]$num;
l$uniq_hosts = result["mime.hits"]$unique;
Log::write(MimeMetrics::LOG, l);
}]);
|
After putting the three pieces together we end up with the following final code for our script.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | mimestats.bro
@load base/utils/site
@load base/frameworks/sumstats
redef Site::local_nets += { 10.0.0.0/8 };
module MimeMetrics;
export {
redef enum Log::ID += { LOG };
type Info: record {
## Timestamp when the log line was finished and written.
ts: time &log;
## Time interval that the log line covers.
ts_delta: interval &log;
## The mime type
mtype: string &log;
## The number of unique local hosts that fetched this mime type
uniq_hosts: count &log;
## The number of hits to the mime type
hits: count &log;
## The total number of bytes received by this mime type
bytes: count &log;
};
## The frequency of logging the stats collected by this script.
const break_interval = 5mins &redef;
}
event bro_init() &priority=3
{
Log::create_stream(MimeMetrics::LOG, [$columns=Info, $path="mime_metrics"]);
local r1: SumStats::Reducer = [$stream="mime.bytes",
$apply=set(SumStats::SUM)];
local r2: SumStats::Reducer = [$stream="mime.hits",
$apply=set(SumStats::UNIQUE)];
SumStats::create([$name="mime-metrics",
$epoch=break_interval,
$reducers=set(r1, r2),
$epoch_result(ts: time, key: SumStats::Key, result: SumStats::Result) =
{
local l: Info;
l$ts = network_time();
l$ts_delta = break_interval;
l$mtype = key$str;
l$bytes = double_to_count(floor(result["mime.bytes"]$sum));
l$hits = result["mime.hits"]$num;
l$uniq_hosts = result["mime.hits"]$unique;
Log::write(MimeMetrics::LOG, l);
}]);
}
event HTTP::log_http(rec: HTTP::Info)
{
if ( Site::is_local_addr(rec$id$orig_h) && rec?$resp_mime_types )
{
local mime_type = rec$resp_mime_types[0];
SumStats::observe("mime.bytes", [$str=mime_type],
[$num=rec$response_body_len]);
SumStats::observe("mime.hits", [$str=mime_type],
[$str=cat(rec$id$orig_h)]);
}
}
|
1 | # bro -r http/bro.org.pcap mimestats.bro
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | #separator \x09
#set_separator ,
#empty_field (empty)
#unset_field -
#path mime_metrics
#open 2018-12-19-16-55-50
#fields ts ts_delta mtype uniq_hosts hits bytes
#types time interval string count count count
1389719059.311698 300.000000 image/png 1 9 82176
1389719059.311698 300.000000 image/gif 1 1 172
1389719059.311698 300.000000 image/x-icon 1 2 2300
1389719059.311698 300.000000 text/html 1 2 42231
1389719059.311698 300.000000 text/plain 1 15 128001
1389719059.311698 300.000000 image/jpeg 1 1 186859
1389719059.311698 300.000000 application/pgp-signature 1 1 836
#close 2018-12-19-16-55-50
|
Note
The redefinition of Site::local_nets
is only done inside
this script to make it a self-contained example. It’s typically
redefined somewhere else.