While parsing logfiles on a Linux machine, several commands are useful in order to get the appropriate results, e.g., searching for concrete events in firewall logs.
In this post, I list a few standard parsing commands such as grep, sort, uniq, or wc. Furthermore, I present a few examples of these small tools. However, it’s all about try and error when building large command pipes.
Of course, the two most important functions are
catfor displaying a complete textfile on the screen (stdout), and the pipe
|which is used after every call to forward the output to the next tool. For a live viewing of log files,
tail -fis used. Note that not all of the following tools can be used with such a type of live viewing, e.g., the sort commands. However, at least “grep” and “cut” can be used.
Filter, Replace, Omit, etc.
- grep [-v] <text>: Prints only the lines that contain the specified value. When using it with -v, it prints only the lines that do NOT have the specified value. Example:
cat file | grep 1234
orcat file | grep -v 5678
. - sort [-g] [-k <position>] [-r]: Sorts the input. Using -g sorts numbers to their real numerical value. With -k the start and stop positions can be set precisely. -r reverses the order. Example: Sort only through the 23th field
cat file | sort -k 23,24
. - uniq [-f <position>] [-s <position>] [-w <number>] [-c]: Deletes multiple entries. -f skips fields, -s skips chars, -w only compares n chars. Example: Delete all lines that have the same value in the 5th field while only comparing the first 10 chars:
cat file | uniq -f 5 -w 10
. -c can be used to print the number of occurrences for each line. - wc -l: Simple word count. -l counts the lines (mostly used).
- comm [-1] [-2] [-3]: Compares two files and prints three columns with entries only present in file 1, 2, or both. These columns can be suppressed with the -1, etc. switches. Example: Print the lines that are uniq in file2:
comm -13 file1 file2
. - tr -s ‘ ‘: The tool “translate” can be used for many things. One of my default cases is to omit double spaces in logfiles with
tr -s ' '
. But it can be used for other use cases, such as replacing uppercase letters to lowercase:tr [:upper:] [:lower:]
, e.g., to have IPv6 address look alike. - cut -d ‘ ‘ -f <field>: Prints only the field specified with -f. The field-separator must be set to space. Example: Print the 23th field:
cat file | cut -d ' ' -f 23
. - head -n -<number>: Omits the first n lines. E.g., when each file starts with three comment lines that should be omitted:
cat * | head -n -3
. - sed s/regexp/replacement/: Replaces the part of each line that is specified with the regex. E.g., everything before the keyword “hello” (and the keyword itself) should be removed:
cat file | sed s/.*hello//
.
A few Examples
Here are a few examples out of my daily business. Let’s grep through some firewall logs. The raw log format looks like the following:
Jan 1 23:59:58 172.16.1.1 fd-wv-fw01: NetScreen device_id=fd-wv-fw01 [Root]system-notification-00257(traffic): start_time="2015-01-01 23:59:55" duration=3 policy_id=206 service=dns proto=17 src zone=Trust dst zone=Untrust action=Permit sent=93 rcvd=132 src=2003:51:6012:123:c24a:ff:fe09:5346 dst=2001:500:1::803f:235 src_port=56854 dst_port=53 src-xlated ip=2003:51:6012:123:c24a:ff:fe09:5346 port=56854 dst-xlated ip=2001:500:1::803f:235 port=53 session_id=3883 reason=Close - RESP
Connections from Host X to Whom?
Let’s say I want to know how many destination IPs appeared in a certain policy rule. The relevant policy-id in my log files is “219” (grep id=219). To avoid problems with double spaces, I delete them (tr -s ‘ ‘). The destination IP address field is the 23th field in the log entries – I only want to see them. Since the delimiter in the log file is space, I have to set it to ‘ ‘ (cut -d ‘ ‘ -f 23). Finally, I sort this list (sort) and filter multiple entries (uniq). Here is the result:
weberjoh@jw-nb10:~$ cat 2015-01-01.fd-wv-fw01.log | grep id=219 | tr -s ' ' | cut -d ' ' -f 23 | sort | uniq dst=77.0.74.170 dst=77.0.77.111 dst=79.204.238.115 dst=93.220.253.102
If I want to have the whole log entry lines (and not only the IP addresses), I can use sort for the 23th field (sort -k 23,24) and uniq for the 23th field (= skip the first 22 fields) while only comparing the following 20 chars (uniq -f 22 -w 20). This is the result:
weberjoh@jw-nb10:~$ cat 2015-01-01.fd-wv-fw01.log | grep id=219 | tr -s ' ' | sort -k 23,24 | uniq -f 22 -w 20 Jan 1 02:17:11 172.16.1.1 fd-wv-fw01: NetScreen device_id=fd-wv-fw01 [Root]system-notification-00257(traffic): start_time="2014-12-31 04:56:53" duration=76818 policy_id=219 service=tcp/port:30005 proto=6 src zone=DMZ dst zone=Untrust2 action=Permit sent=15235078 rcvd=130943813 src=192.168.110.12 dst=77.0.74.170 src_port=49913 dst_port=30005 src-xlated ip=10.49.254.5 port=2364 dst-xlated ip=77.0.74.170 port=30005 session_id=4296 reason=Close - TCP RST Jan 1 05:53:02 172.16.1.1 fd-wv-fw01: NetScreen device_id=fd-wv-fw01 [Root]system-notification-00257(traffic): start_time="2015-01-01 04:55:25" duration=3457 policy_id=219 service=tcp/port:30005 proto=6 src zone=DMZ dst zone=Untrust2 action=Permit sent=386518 rcvd=1532970 src=192.168.110.12 dst=77.0.77.111 src_port=50279 dst_port=30005 src-xlated ip=10.49.254.5 port=1701 dst-xlated ip=77.0.77.111 port=30005 session_id=7535 reason=Close - TCP RST Jan 1 04:36:29 172.16.1.1 fd-wv-fw01: NetScreen device_id=fd-wv-fw01 [Root]system-notification-00257(traffic): start_time="2014-12-31 05:54:15" duration=81734 policy_id=219 service=tcp/port:30005 proto=6 src zone=DMZ dst zone=Untrust2 action=Permit sent=18559326 rcvd=63638696 src=192.168.110.12 dst=79.204.238.115 src_port=49925 dst_port=30005 src-xlated ip=10.49.254.5 port=2721 dst-xlated ip=79.204.238.115 port=30005 session_id=4147 reason=Close - TCP RST Jan 1 05:53:04 172.16.1.1 fd-wv-fw01: NetScreen device_id=fd-wv-fw01 [Root]system-notification-00257(traffic): start_time="2014-12-31 05:54:18" duration=86326 policy_id=219 service=tcp/port:30005 proto=6 src zone=DMZ dst zone=Untrust2 action=Permit sent=24870176 rcvd=276776662 src=192.168.110.12 dst=93.220.253.102 src_port=49926 dst_port=30005 src-xlated ip=10.49.254.5 port=1858 dst-xlated ip=93.220.253.102 port=30005 session_id=4483 reason=Close - TCP RST
Count of Connections from Host Y
Another example is the count of connections from host y, sorted by its destinations. The starting point is the source IP address (grep src=192.168.113.11). Double spaces should be removed (tr -s ‘ ‘). Only the destination IP address is relevant, which is the 23th field (cut -d ‘ ‘ -f 23). The output is sorted (sort) and counted per unique entries (uniq -c). To have the counters sorted by its numerical value, another (sort -g -r) is used. This is it:
weberjoh@jw-nb10:~$ cat 2015-01-0* | grep src=192.168.113.11 | tr -s ' ' | cut -d ' ' -f 23 | sort | uniq -c | sort -g -r 209319 dst=8.8.8.8 2851 dst=88.198.52.243 230 dst=198.20.8.241 209 dst=224.0.0.251 159 dst=198.20.8.246 102 dst=192.168.5.1 50 dst=93.184.221.109 11 dst=172.16.1.5 9 dst=91.189.92.152 5 dst=91.189.95.36 4 dst=141.30.13.10 3 dst=192.168.9.6 2 dst=218.2.0.123 2 dst=103.41.124.53 1 dst=78.223.8.102 1 dst=77.0.138.150 1 dst=61.174.50.229
Summary of Session-End Reasons
Grep every log entry that has the keyword “reason” in it (grep reason), followed by a replacement of the whole line until the last field, which is the reason entry. This is done via the regex that is replaced by nothing (sed s/.*reason.//). Finally, similar to the examples above, sorting the output, counting the unique entries and sorting the counts. Here it is:
weberjoh@jw-nb10:~$ cat 2015-01-01.fd-wv-fw01.log | grep reason | sed s/.*reason.// | sort | uniq -c | sort -g -r 311970 Close - RESP 219406 Close - AGE OUT 69236 Traffic Denied 56179 Close - TCP FIN 3621 Close - ICMP Unreach 2968 Close - TCP RST 191 Creation 34 Close - ALG 24 Close - OTHER
Display Filter with Regex
Here is another example on how to “improve” a logfile output with sed in order to have a better view on it. The following output is from tcpdump sniffing on a network for ICMPv6 DAD messages.
16:41:24.392554 90:27:e4:35:38:a8 > 33:33:ff:87:cb:e9, ethertype IPv6 (0x86dd), length 78: :: > ff02::1:ff87:cbe9: ICMP6, neighbor solicitation, who has fe80::1441:9488:9187:cbe9, length 24 16:43:33.904282 00:26:08:b2:ad:78 > 33:33:ff:b2:ad:78, ethertype IPv6 (0x86dd), length 78: :: > ff02::1:ffb2:ad78: ICMP6, neighbor solicitation, who has fe80::226:8ff:feb2:ad78, length 24 16:53:55.789861 90:27:e4:35:38:a8 > 33:33:ff:87:cb:e9, ethertype IPv6 (0x86dd), length 78: :: > ff02::1:ff87:cbe9: ICMP6, neighbor solicitation, who has fe80::1441:9488:9187:cbe9, length 24 16:54:08.964875 a0:0b:ba:b6:d8:2e > 33:33:ff:b6:d8:2e, ethertype IPv6 (0x86dd), length 78: :: > ff02::1:ffb6:d82e: ICMP6, neighbor solicitation, who has fe80::a20b:baff:feb6:d82e, length 24 16:55:01.020645 90:27:e4:35:38:a8 > 33:33:ff:87:cb:e9, ethertype IPv6 (0x86dd), length 78: :: > ff02::1:ff87:cbe9: ICMP6, neighbor solicitation, who has fe80::1441:9488:9187:cbe9, length 24
I only want to see the timestamps along with the MAC & IPv6 address. That is, I want to throw away any words and symbols from this output. This can be done with
sed s/regexp/replacement/which is called with a regex and a replacement of nothing. In my example, I want to replace anything between the > sign and the “has” keyword. The regex for this is
>.*has.which means, beginning with > (which is escaped), followed by anything “.*” until “has”, followed by a single character “.”. And with a second run I want to replace everything to the end starting with the comma:
weberjoh@jw-nb09:~$ cat test | sed s/>.*has.// | sed s/,.*// 16:41:24.392554 90:27:e4:35:38:a8 fe80::1441:9488:9187:cbe9 16:43:33.904282 00:26:08:b2:ad:78 fe80::226:8ff:feb2:ad78 16:53:55.789861 90:27:e4:35:38:a8 fe80::1441:9488:9187:cbe9 16:54:08.964875 a0:0b:ba:b6:d8:2e fe80::a20b:baff:feb6:d82e 16:55:01.020645 90:27:e4:35:38:a8 fe80::1441:9488:9187:cbe9
That’s it.