During the course of its normal operation, Bro produces a large volume of log files. This series of exercises examines the Bro log output format, and highlights a few extremely useful utilities that can be used to extract data from and/or process this information.
Exercise
Run bro with the -r option, and provide the http.pcap file. For more information on bro options, please run bro with the -h option.
Note
Logs will be generated in the current working directory!
Solution
mkdir /tmp/bro-logs cd /tmp/bro-logs bro -r traces/http.pcap
Exercise
For this, you’ll need misc.pcap.
Run this command:
bro -r misc.pcap
and then interpret the fields in each of the resulting logs. Examine relevant records in the associated script files; be sure to look for the &log directive when examining those files.
Note
Record definitions can normally be found in $PREFIX/share/bro/base/protocols/<PROTO>/main.bro in the in the installation directory (or scripts/base/protocols/... in the Bro source tree).
Bro summarizes each TCP and UDP connection as a single line in the conn.log. Because these connection summaries are quite detailed, you can extract plenty useful statistics from it. For the following two parts, use the log files generated from the trace 2009-M57-day11-18.trace.gz via bro -r 2009-M57-day11-18.trace.
Exercise
List the connections by in increasing order of duration, i.e., the longest connections at the end.
Solution
bro-cut < conn.log | sort -t$'\t' -k 9 -n
The duration field records the number of seconds per connection. Its location is by default field number 9. Because we want the whole line without the comments, we skip the first eight lines. Then we sort by the same field (-k 9) numerically (-n).
1258531939.613071 Cxjrsb42dZ8mEXummh 192.168.1.102 138 192.168.1.255 138 udp - - - - S0 - 0 D 1 229 0 0 (empty) 1258532644.128655 CpHy8E1QnlfZ1hXil7 192.168.1.1 5353 224.0.0.251 5353 udp - - - - S0 - 0 D 1 154 0 0 (empty) 1258532644.128680 CvTBQB2tVNu1LtxXKd fe80::219:e3ff:fee7:5d23 5353 ff02::fb 5353 udp - - - - S0 - 0 D 1 174 0 0 (empty) 1258532657.288677 CmE0Wf2ymuNfbg9Pea 192.168.1.102 138 192.168.1.255 138 udp - - - - S0 - 0 D 1 229 0 0 (empty) 1258532683.876479 CYWCz93W1v6oGfCZea 192.168.1.103 138 192.168.1.255 138 udp - - - - S0 - 0 D 1 240 0 0 (empty) 1258532824.338291 CtnlC93bf4OhDP8LKd 192.168.1.104 138 192.168.1.255 138 udp - - - - S0 - 0 D 1 229 0 0 (empty) 1258533406.310783 CnBV5x3PLJVMbKyLyh 192.168.1.103 138 192.168.1.255 138 udp - - - - S0 - 0 D 1 240 0 0 (empty) ...
Exercise
Find all connections that last longer than one minute.
Solution
bro-cut < conn.log | awk -F$'\t' '$9 > 60'
We look again at field number 9, but this time add another filter to display only those lines whose duration is greater than 60 seconds.
1258535660.158200 CiDM4qbx9IvEiXgh2 192.168.1.104 1196 65.55.184.16 443 tcp ssl 67.887666 57041 8510 RSTR - 0 ShADdar 54 59209 26 9558 (empty) 1258543996.442969 CyCKbv4KeSk9H0MBql 192.168.1.103 138 192.168.1.255 138 udp - 60.629434 560 0 S0 - 0 D 3 644 0 0 (empty) 1258551306.134546 Cs1Lzs1Ys1eqLxYxO5 192.168.1.104 138 192.168.1.255 138 udp - 61.005932 549 0 S0 - 0 D 3 633 0 0 (empty) 1258561690.554162 Ck9ppw2oYvS7Evhqne fe80::2c23:b96c:78d:e116 546 ff02::1:2 547 udp - 63.129634 623 0 S0 - 0 D 7 959 0 0 (empty) 1258561885.476082 Cqy6Yc48njKTWbwRsh 192.168.1.105 49210 65.55.184.155 443 tcp ssl 66.419106 55531 7475 RSTR - 0 ShADdar 52 57623 21 8323 (empty) 1258562117.622113 CO3K1j5x3guiEhrm5 fe80::2c23:b96c:78d:e116 546 ff02::1:2 547 udp - 62.988974 623 0 S0 - 0 D 7 959 0 0 (empty) 1258562522.926514 CH7CgB2odJiRdZeqcb 192.168.1.104 1386 74.125.164.85 80 tcp http 63.735504 683 30772 SF - 0 ShADadfF13 1211 28 31900 (empty) 1258562544.596174 CBJ8AD1MCPMNzM3Acb fe80::2c23:b96c:78d:e116 546 ff02::1:2 547 udp - 62.989215 623 0 S0 - 0 D 7 959 0 0 (empty) 1258562636.223671 CyzTBD1Fqnjuzy8xi7 192.168.1.104 1387 74.125.164.85 80 tcp http 65.450666 694 11708 SF - 0 ShADadfF91062 14 12276 (empty) 1258562701.674828 CpzC0qs7BUkXE23Q4 192.168.1.104 1423 74.125.164.85 80 tcp http 65.169595 3467 60310 SF - 0 ShADadfF21 4315 54 62478 (empty) 1258562522.748378 C0gI9a3e8GigpP7ENd 192.168.1.104 1385 74.125.19.102 80 tcp http 244.158006 950 1800 SF - 0 ShADadfF61198 6 2048 (empty) ...
Exercise
Find all IP addresses of web servers that send more than more than 1 KB back to a client.
Solution
bro-cut service resp_bytes id.resp_h < conn.log \ | awk -F$'\t' '$1 == "http" && $2 > 1024 { print $3 }' \ | sort -u
First, we extract the relevant fields from the conn.log, which are id.resp_h, service, and resp_bytes. The idea is to filter all connections labeled as HTTP where the responder (i.e., the server) sent more than 1,024 bytes.
Recall awk‘s pattern-action statement, wich looks like pattern { action }. The filter conditions appear in the pattern, whereas the print directives in the action. Here, we print only the third field that we extracted with bro-cut, namely id.resp_h. Finally, we weed out duplicates via sort -u.
130.59.10.36 137.226.34.227 151.207.243.129 193.1.193.64 198.189.255.73 198.189.255.74 198.189.255.82 208.111.128.122 208.111.129.48 208.111.129.62 65.54.95.201 65.54.95.209 65.54.95.7 68.142.123.21 68.142.123.31
Exercise
Are there any web servers on non-standard ports (i.e., 80 and 8080)?
Solution
bro-cut service id.resp_p id.resp_h < conn.log \ | awk -F$'\t' '$1 == "http" && ! ($2 == 80 || $2 == 8080) { print $3 }' \ | sort -u
This awk exercise is similar to the above in terms of complexity, with the only difference being a different filter expression. The output is empty, meaning that Bro did not find any web servers on non-standard ports in this trace.
Exercise
Show a breakdown of the number of connections by service.
Solution
bro-cut service < conn.log | sort | uniq -c | sort -n
This is a typical aggregation question. The standard procedure almost always contains a combination of sort and uniq. The main idea is to massage the lines such that sorting and counting them yields a reasonable output. The advantage of this approach is that it does not accumulate any in-memory state and can rely on external sorting, which is imperative for large sets of logs.
One can also think about these aggregation tasks as a MapReduce job, where the first part of the pipeline is the map phase, sort the shuffle phase, and uniq a primitive reducer.
2 ftp 2 ftp-data 21 smtp 121 ssl 224 dhcp 1681 - 2386 http 4067 dns
Exercise
Show the top 10 destination ports in descending order.
Solution
bro-cut id.resp_p < conn.log | sort | uniq -c | sort -rn | head -n 10
In the spirit as above, we aggregate the destination ports and sort the final output again to emit only the top 10 values.
3455 53 2742 80 776 138 553 137 224 67 189 139 165 5353 88 37 76 443 53 995
Exercise
What are the top 10 hosts (originators) that send the most traffic?
Solution
bro-cut id.orig_h orig_bytes < conn.log \ | sort \ | awk -F$'\t' '{ if (host != $1) { \ if (size != 0) \ print host, size; \ host=$1; \ size=0 \ } else \ size += $2 \ } \ END { \ if (size != 0) \ print $1, size \ }' \ | sort -k 2 -rn \ | head -n 10
This is a more involved example with a more complicated "reducer" function. The main idea is to order the output such that the traffic of one host is grouped together. Each group can then be processed with constant space in awk by only maintaining two variables host and size. Finally, once we have the per-host aggregate of the sent volume, we sort the second field (-k 2) and display the top 10 entries.
192.168.1.105 2050085 192.168.1.103 1333177 192.168.1.102 1079762 192.168.1.104 800378 192.168.1.1 196345 fe80::2c23:b96c:78d:e116 41465 fe80::219:e3ff:fee7:5d23 15191 fe80::5074:1b53:7e7:ad4d 9494 0.0.0.0 6116 169.254.225.22 2172
Exercise
What are the distinct browsers in this trace? What are the distinct MIME types of the downloaded URLS?
Solution
bro-cut user_agent < http.log | sort -u bro-cut orig_mime_types < http.log | sort -u
First, we extract the relevant field with bro-cut and then restrict the output to the distinct values. The query is not very complicated, yet can still be quite insightful.
AVGDM- AVGDM-WVSXX86 85 BUILD=39 LOC=1033 BRD=cnet-0-0 AVGDM-WVSXX86 85 BUILD=40 LOC=1033 PRD=US-F-AVF AVGINET9-WVSXX86 90 AVI=270.14.71/2510 BUILD=707 LOC=1033 LIC=9I-ASXNN-X4WGW-M0XFR-T84VX-3VX02 DIAG=51E OPF=0 PCA= AVGINET9-WVSXX86 90 AVI=270.14.72/2511 BUILD=707 LOC=1033 LIC=9I-ASXNN-X4WGW-M0XFR-T84VX-3VX02 DIAG=51E OPF=0 PCA= AVGINET9-WVSXX86 90FREE AVI=270.14.73/2512 BUILD=707 LOC=1033 LIC=9AVFREE-VKPCB-6BWFM-TRLQR-BRUHP-CP86G DIAG=310 OPF=0 PCA= AVGINET9-WXPPX86 90 AVI=270.14.71/2510 BUILD=707 LOC=1033 LIC=9I-ASXNN-X4WGW-M0XFR-T84VX-3VX02 DIAG=51E OPF=0 PCA= AVGINET9-WXPPX86 90 AVI=270.14.72/2511 BUILD=707 LOC=1033 LIC=9I-ASXNN-X4WGW-M0XFR-T84VX-3VX02 DIAG=51E OPF=0 PCA= AVGINET9-WXPPX86 90 AVI=270.14.73/2512 BUILD=707 LOC=1033 LIC=9I-ASXNN-X4WGW-M0XFR-T84VX-3VX02 DIAG=51E OPF=0 PCA= clamav/0.92.1 Google Update/1.2.183.13;winhttp Google Update/1.2.183.13;winhttp;cup JNLP/6.0 javaws/1.6.0_16 (b01) Java/1.6.0_16 jupdate live-client/2.0 Microsoft BITS/7.0 Microsoft-CryptoAPI/5.131.2600.5512 Microsoft-CryptoAPI/6.0 Microsoft NCSI Microsoft-WebDAV-MiniRedir/6.0.6002 Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.0.04506) Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0) Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.23) Gecko/20090812 Lightning/0.9 Thunderbird/2.0.0.23 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.23) Gecko/20090812 Thunderbird/2.0.0.23 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/532.0 (KHTML, like Gecko) Chrome/3.0.195.33 Safari/532.0 Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729) MSDW SCSDK-6.0.0 Shockwave Flash Windows-Update-Agent
- application/octet-stream application/xml text/plain text/plain,text/plain,text/plain,text/plain text/plain,text/plain,text/plain,text/plain,text/plain,text/plain text/plain,text/plain,text/plain,text/plain,text/plain,text/plain,text/plain,text/plain text/plain,text/plain,text/plain,text/plain,text/plain,text/plain,text/plain,text/plain,text/plain,text/plain,text/plain,text/plain,text/plain,text/plain,text/plain,text/plain,text/plain,text/plain,text/plain,text/plain,text/plain
Exercise
What are the three most commonly accessed web sites?
Solution
bro-cut host < http.log | sort | uniq -c | sort -n | tail -n 3
In this case, we are interested in the Host header of the HTTP request, which the http.log provides in the host field. We interpret the "most commonly accessed" phrase as number of requests, i.e., number of lines in the log file. The aggregation is similar to what we have seen in the previous part.
231 safebrowsing-cache.google.com 259 scores.espn.go.com 421 download.windowsupdate.com
Exercise
What are the top 10 referred hosts?
Solution
bro-cut referrer < http.log \ | awk -F$'\t' 'sub(/[[:alpha:]]+:\/\//, "", $1) \ { \ split($1, s, /\//); \ print s[1] \ }' \ | sort \ | uniq -c \ | sort -rn \ | head -n 10
Although the value of the Referer (sic) header is readily available via the referrer field in the http.log, it may not be in the appropriate format. For example, sometimes we observe values containing a full URL path, and sometimes just the host. Therefore, we perform an extra sanitization step that strips an optional protocol part (sub) and then extracts only the value of the host name of the referring URL.
275 adsatt.espn.go.com 234 espn.go.com 230 www.google.com 217 co108w.col108.mail.live.com 165 www.carmax.com 160 www.toysrus.com 139 support.dell.com 122 www.engadget.com 120 sports.espn.go.com 117 www.msn.com
Exercise
Tell Bro to include the new_separator.bro script, and then re-process http.pcap. After verifying that the separator character has, in fact, changed, modify the separator character defined in new_separator.bro to be something slightly more interesting. Next, re-run Bro and verify that the separator character worked as expected and that the #separator field at the top of the file was updated appropriately. Now, add a line to new_separator.bro that will change the comment character used in the log file; consult base/frameworks/logging/writers/ascii.bro to determine the appropriate incantation.
Solution
Your new_separator.bro should look something like:
redef LogAscii::separator = ","; redef LogAscii::header_prefix = "//";
Note
While bro may accept a two-character separator, keep in mind that some parsers may not understand how to correctly parse a CSV file that uses a string of characters to separate individual fields.
© 2014 The Bro Project.