MIME Type Statistics

Files are constantly transmitted over HTTP on regular networks. These files belong to a specific category (e.g., executable, text, image) identified by a Multipurpose Internet Mail Extension (MIME). Although MIME was originally developed to identify the type of non-text attachments on email, it is also used by a web browser to identify the type of files transmitted and present them accordingly.

In this tutorial, we will demonstrate how to use the Sumstats Framework to collect statistical information based on MIME types; specifically, the total number of occurrences, size in bytes, and number of unique hosts transmitting files over HTTP per each type. For instructions on extracting and creating a local copy of these files, visit this tutorial.

MIME Statistics with Sumstats

When working with the Summary Statistics Framework, you need to define three different pieces: (i) Observations, where the event is observed and fed into the framework. (ii) Reducers, where observations are collected and measured. (iii) Sumstats, where the main functionality is implemented.

We start by defining our observation along with a record to store all statistical values and an observation interval. We are conducting our observation on the HTTP::log_http event and are interested in the MIME type, size of the file (“response_body_len”), and the originator host (“orig_h”). We use the MIME type as our key and create observers for the other two values.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
mimestats.bro

module MimeMetrics;

export {

	redef enum Log::ID += { LOG };

	type Info: record {
		## Timestamp when the log line was finished and written.
		ts:         time   &log;
		## Time interval that the log line covers.
		ts_delta:   interval &log;
		## The mime type
		mtype:        string &log;
		## The number of unique local hosts that fetched this mime type
		uniq_hosts: count  &log;
		## The number of hits to the mime type
		hits:       count  &log;
		## The total number of bytes received by this mime type
		bytes:      count  &log;
	};

	## The frequency of logging the stats collected by this script.
	const break_interval = 5mins &redef;
}
event HTTP::log_http(rec: HTTP::Info)
	{
	if ( Site::is_local_addr(rec$id$orig_h) && rec?$resp_mime_types )
		{
		local mime_type = rec$resp_mime_types[0];
		SumStats::observe("mime.bytes", [$str=mime_type],
		                  [$num=rec$response_body_len]);
		SumStats::observe("mime.hits",  [$str=mime_type],
		                  [$str=cat(rec$id$orig_h)]);
		}
	}

Next, we create the reducers. The first will accumulate file sizes and the second will make sure we only store a host ID once. Below is the partial code from a bro_init handler.

1
2
3
4
5
6
mimestats.bro

	local r1: SumStats::Reducer = [$stream="mime.bytes",
	                               $apply=set(SumStats::SUM)];
	local r2: SumStats::Reducer = [$stream="mime.hits", 
	                               $apply=set(SumStats::UNIQUE)];

In our final step, we create the SumStats where we check for the observation interval. Once it expires, we populate the record (defined above) with all the relevant data and write it to a log.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
mimestats.bro

	SumStats::create([$name="mime-metrics",
	                  $epoch=break_interval,
	                  $reducers=set(r1, r2),
	                  $epoch_result(ts: time, key: SumStats::Key, result: SumStats::Result) =
	                        {
	                        local l: Info;
	                        l$ts         = network_time();
	                        l$ts_delta   = break_interval;
	                        l$mtype      = key$str;
	                        l$bytes      = double_to_count(floor(result["mime.bytes"]$sum));
	                        l$hits       = result["mime.hits"]$num;
	                        l$uniq_hosts = result["mime.hits"]$unique;
	                        Log::write(MimeMetrics::LOG, l);
	                        }]);

After putting the three pieces together we end up with the following final code for our script.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
mimestats.bro

@load base/utils/site
@load base/frameworks/sumstats

redef Site::local_nets += { 10.0.0.0/8 };

module MimeMetrics;

export {

	redef enum Log::ID += { LOG };

	type Info: record {
		## Timestamp when the log line was finished and written.
		ts:         time   &log;
		## Time interval that the log line covers.
		ts_delta:   interval &log;
		## The mime type
		mtype:        string &log;
		## The number of unique local hosts that fetched this mime type
		uniq_hosts: count  &log;
		## The number of hits to the mime type
		hits:       count  &log;
		## The total number of bytes received by this mime type
		bytes:      count  &log;
	};

	## The frequency of logging the stats collected by this script.
	const break_interval = 5mins &redef;
}

event bro_init() &priority=3
	{
	Log::create_stream(MimeMetrics::LOG, [$columns=Info, $path="mime_metrics"]);
	local r1: SumStats::Reducer = [$stream="mime.bytes",
	                               $apply=set(SumStats::SUM)];
	local r2: SumStats::Reducer = [$stream="mime.hits", 
	                               $apply=set(SumStats::UNIQUE)];
	SumStats::create([$name="mime-metrics",
	                  $epoch=break_interval,
	                  $reducers=set(r1, r2),
	                  $epoch_result(ts: time, key: SumStats::Key, result: SumStats::Result) =
	                        {
	                        local l: Info;
	                        l$ts         = network_time();
	                        l$ts_delta   = break_interval;
	                        l$mtype      = key$str;
	                        l$bytes      = double_to_count(floor(result["mime.bytes"]$sum));
	                        l$hits       = result["mime.hits"]$num;
	                        l$uniq_hosts = result["mime.hits"]$unique;
	                        Log::write(MimeMetrics::LOG, l);
	                        }]);
	}

event HTTP::log_http(rec: HTTP::Info)
	{
	if ( Site::is_local_addr(rec$id$orig_h) && rec?$resp_mime_types )
		{
		local mime_type = rec$resp_mime_types[0];
		SumStats::observe("mime.bytes", [$str=mime_type],
		                  [$num=rec$response_body_len]);
		SumStats::observe("mime.hits",  [$str=mime_type],
		                  [$str=cat(rec$id$orig_h)]);
		}
	}
1
# bro -r http/bro.org.pcap mimestats.bro
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
#separator \x09
#set_separator    ,
#empty_field      (empty)
#unset_field      -
#path     mime_metrics
#open     2018-12-19-16-55-50
#fields   ts      ts_delta        mtype   uniq_hosts      hits    bytes
#types    time    interval        string  count   count   count
1389719059.311698 300.000000      image/png       1       9       82176
1389719059.311698 300.000000      image/gif       1       1       172
1389719059.311698 300.000000      image/x-icon    1       2       2300
1389719059.311698 300.000000      text/html       1       2       42231
1389719059.311698 300.000000      text/plain      1       15      128001
1389719059.311698 300.000000      image/jpeg      1       1       186859
1389719059.311698 300.000000      application/pgp-signature       1       1       836
#close    2018-12-19-16-55-50

Note

The redefinition of Site::local_nets is only done inside this script to make it a self-contained example. It’s typically redefined somewhere else.


Previous Page

Bro IDS

Copyright 2016, The Bro Project. Last updated on December 19, 2018. Created using Sphinx 1.8.2.