Logfile Parsing

While parsing logfiles on a Linux machine, several commands are useful in order to get the appropriate results, e.g., searching for concrete events in firewall logs.

In this post, I list a few standard parsing commands such as grep, sort, uniq, or wc. Furthermore, I present a few examples of these small tools. However, it’s all about try and error when building large command pipes.

Of course, the two most important functions are

cat

for displaying a complete textfile on the screen (stdout), and the pipe

which is used after every call to forward the output to the next tool. For a live viewing of log files,

tail -f

is used. Note that not all of the following tools can be used with such a type of live viewing, e.g., the sort commands. However, at least “grep” and “cut” can be used.

Filter, Replace, Omit, etc.

grep [-v] <text>: Prints only the lines that contain the specified value. When using it with -v, it prints only the lines that do NOT have the specified value. Example:
```
cat file | grep 1234
```
or
```
cat file | grep -v 5678
```
.
sort [-g] [-k <position>] [-r]: Sorts the input. Using -g sorts numbers to their real numerical value. With -k the start and stop positions can be set precisely. -r reverses the order. Example: Sort only through the 23th field
```
cat file | sort -k 23,24
```
.
uniq [-f <position>] [-s <position>] [-w <number>] [-c]: Deletes multiple entries. -f skips fields, -s skips chars, -w only compares n chars. Example: Delete all lines that have the same value in the 5th field while only comparing the first 10 chars:
```
cat file | uniq -f 5 -w 10
```
. -c can be used to print the number of occurrences for each line.
wc -l: Simple word count. -l counts the lines (mostly used).
comm [-1] [-2] [-3]: Compares two files and prints three columns with entries only present in file 1, 2, or both. These columns can be suppressed with the -1, etc. switches. Example: Print the lines that are uniq in file2:
```
comm -13 file1 file2
```
.
tr -s ‘ ‘: The tool “translate” can be used for many things. One of my default cases is to omit double spaces in logfiles with
```
tr -s ' '
```
. But it can be used for other use cases, such as replacing uppercase letters to lowercase:
```
tr [:upper:] [:lower:]
```
, e.g., to have IPv6 address look alike.
cut -d ‘ ‘ -f <field>: Prints only the field specified with -f. The field-separator must be set to space. Example: Print the 23th field:
```
cat file | cut -d ' ' -f 23
```
.
head -n -<number>: Omits the first n lines. E.g., when each file starts with three comment lines that should be omitted:
```
cat * | head -n -3
```
.
sed s/regexp/replacement/: Replaces the part of each line that is specified with the regex. E.g., everything before the keyword “hello” (and the keyword itself) should be removed:
```
cat file | sed s/.*hello//
```
.

A few Examples

Here are a few examples out of my daily business. Let’s grep through some firewall logs. The raw log format looks like the following:

Jan  1 23:59:58 172.16.1.1 fd-wv-fw01: NetScreen device_id=fd-wv-fw01  [Root]system-notification-00257(traffic): start_time="2015-01-01 23:59:55" duration=3 policy_id=206 service=dns proto=17 src zone=Trust dst zone=Untrust action=Permit sent=93 rcvd=132 src=2003:51:6012:123:c24a:ff:fe09:5346 dst=2001:500:1::803f:235 src_port=56854 dst_port=53 src-xlated ip=2003:51:6012:123:c24a:ff:fe09:5346 port=56854 dst-xlated ip=2001:500:1::803f:235 port=53 session_id=3883 reason=Close - RESP

Connections from Host X to Whom?

Let’s say I want to know how many destination IPs appeared in a certain policy rule. The relevant policy-id in my log files is “219” (grep id=219). To avoid problems with double spaces, I delete them (tr -s ‘ ‘). The destination IP address field is the 23th field in the log entries – I only want to see them. Since the delimiter in the log file is space, I have to set it to ‘ ‘ (cut -d ‘ ‘ -f 23). Finally, I sort this list (sort) and filter multiple entries (uniq). Here is the result:

weberjoh@jw-nb10:~$ cat 2015-01-01.fd-wv-fw01.log | grep id=219 | tr -s ' ' | cut -d ' ' -f 23 | sort | uniq
dst=77.0.74.170
dst=77.0.77.111
dst=79.204.238.115
dst=93.220.253.102

If I want to have the whole log entry lines (and not only the IP addresses), I can use sort for the 23th field (sort -k 23,24) and uniq for the 23th field (= skip the first 22 fields) while only comparing the following 20 chars (uniq -f 22 -w 20). This is the result:

weberjoh@jw-nb10:~$ cat 2015-01-01.fd-wv-fw01.log | grep id=219 | tr -s ' ' | sort -k 23,24 | uniq -f 22 -w 20
Jan 1 02:17:11 172.16.1.1 fd-wv-fw01: NetScreen device_id=fd-wv-fw01 [Root]system-notification-00257(traffic): start_time="2014-12-31 04:56:53" duration=76818 policy_id=219 service=tcp/port:30005 proto=6 src zone=DMZ dst zone=Untrust2 action=Permit sent=15235078 rcvd=130943813 src=192.168.110.12 dst=77.0.74.170 src_port=49913 dst_port=30005 src-xlated ip=10.49.254.5 port=2364 dst-xlated ip=77.0.74.170 port=30005 session_id=4296 reason=Close - TCP RST
Jan 1 05:53:02 172.16.1.1 fd-wv-fw01: NetScreen device_id=fd-wv-fw01 [Root]system-notification-00257(traffic): start_time="2015-01-01 04:55:25" duration=3457 policy_id=219 service=tcp/port:30005 proto=6 src zone=DMZ dst zone=Untrust2 action=Permit sent=386518 rcvd=1532970 src=192.168.110.12 dst=77.0.77.111 src_port=50279 dst_port=30005 src-xlated ip=10.49.254.5 port=1701 dst-xlated ip=77.0.77.111 port=30005 session_id=7535 reason=Close - TCP RST
Jan 1 04:36:29 172.16.1.1 fd-wv-fw01: NetScreen device_id=fd-wv-fw01 [Root]system-notification-00257(traffic): start_time="2014-12-31 05:54:15" duration=81734 policy_id=219 service=tcp/port:30005 proto=6 src zone=DMZ dst zone=Untrust2 action=Permit sent=18559326 rcvd=63638696 src=192.168.110.12 dst=79.204.238.115 src_port=49925 dst_port=30005 src-xlated ip=10.49.254.5 port=2721 dst-xlated ip=79.204.238.115 port=30005 session_id=4147 reason=Close - TCP RST
Jan 1 05:53:04 172.16.1.1 fd-wv-fw01: NetScreen device_id=fd-wv-fw01 [Root]system-notification-00257(traffic): start_time="2014-12-31 05:54:18" duration=86326 policy_id=219 service=tcp/port:30005 proto=6 src zone=DMZ dst zone=Untrust2 action=Permit sent=24870176 rcvd=276776662 src=192.168.110.12 dst=93.220.253.102 src_port=49926 dst_port=30005 src-xlated ip=10.49.254.5 port=1858 dst-xlated ip=93.220.253.102 port=30005 session_id=4483 reason=Close - TCP RST

Count of Connections from Host Y

Another example is the count of connections from host y, sorted by its destinations. The starting point is the source IP address (grep src=192.168.113.11). Double spaces should be removed (tr -s ‘ ‘). Only the destination IP address is relevant, which is the 23th field (cut -d ‘ ‘ -f 23). The output is sorted (sort) and counted per unique entries (uniq -c). To have the counters sorted by its numerical value, another (sort -g -r) is used. This is it:

weberjoh@jw-nb10:~$ cat 2015-01-0* | grep src=192.168.113.11 | tr -s ' ' | cut -d ' ' -f 23 | sort | uniq -c | sort -g -r
 209319 dst=8.8.8.8
   2851 dst=88.198.52.243
    230 dst=198.20.8.241
    209 dst=224.0.0.251
    159 dst=198.20.8.246
    102 dst=192.168.5.1
     50 dst=93.184.221.109
     11 dst=172.16.1.5
      9 dst=91.189.92.152
      5 dst=91.189.95.36
      4 dst=141.30.13.10
      3 dst=192.168.9.6
      2 dst=218.2.0.123
      2 dst=103.41.124.53
      1 dst=78.223.8.102
      1 dst=77.0.138.150
      1 dst=61.174.50.229

Summary of Session-End Reasons

Grep every log entry that has the keyword “reason” in it (grep reason), followed by a replacement of the whole line until the last field, which is the reason entry. This is done via the regex that is replaced by nothing (sed s/.*reason.//). Finally, similar to the examples above, sorting the output, counting the unique entries and sorting the counts. Here it is:

weberjoh@jw-nb10:~$ cat 2015-01-01.fd-wv-fw01.log | grep reason | sed s/.*reason.// | sort | uniq -c | sort -g -r
 311970 Close - RESP
 219406 Close - AGE OUT
  69236 Traffic Denied
  56179 Close - TCP FIN
   3621 Close - ICMP Unreach
   2968 Close - TCP RST
    191 Creation
     34 Close - ALG
     24 Close - OTHER

Display Filter with Regex

Here is another example on how to “improve” a logfile output with sed in order to have a better view on it. The following output is from tcpdump sniffing on a network for ICMPv6 DAD messages.

16:41:24.392554 90:27:e4:35:38:a8 > 33:33:ff:87:cb:e9, ethertype IPv6 (0x86dd), length 78: :: > ff02::1:ff87:cbe9: ICMP6, neighbor solicitation, who has fe80::1441:9488:9187:cbe9, length 24
16:43:33.904282 00:26:08:b2:ad:78 > 33:33:ff:b2:ad:78, ethertype IPv6 (0x86dd), length 78: :: > ff02::1:ffb2:ad78: ICMP6, neighbor solicitation, who has fe80::226:8ff:feb2:ad78, length 24
16:53:55.789861 90:27:e4:35:38:a8 > 33:33:ff:87:cb:e9, ethertype IPv6 (0x86dd), length 78: :: > ff02::1:ff87:cbe9: ICMP6, neighbor solicitation, who has fe80::1441:9488:9187:cbe9, length 24
16:54:08.964875 a0:0b:ba:b6:d8:2e > 33:33:ff:b6:d8:2e, ethertype IPv6 (0x86dd), length 78: :: > ff02::1:ffb6:d82e: ICMP6, neighbor solicitation, who has fe80::a20b:baff:feb6:d82e, length 24
16:55:01.020645 90:27:e4:35:38:a8 > 33:33:ff:87:cb:e9, ethertype IPv6 (0x86dd), length 78: :: > ff02::1:ff87:cbe9: ICMP6, neighbor solicitation, who has fe80::1441:9488:9187:cbe9, length 24

I only want to see the timestamps along with the MAC & IPv6 address. That is, I want to throw away any words and symbols from this output. This can be done with

sed s/regexp/replacement/

which is called with a regex and a replacement of nothing. In my example, I want to replace anything between the > sign and the “has” keyword. The regex for this is

>.*has.

which means, beginning with > (which is escaped), followed by anything “.*” until “has”, followed by a single character “.”. And with a second run I want to replace everything to the end starting with the comma:

weberjoh@jw-nb09:~$ cat test | sed s/>.*has.// | sed s/,.*//
16:41:24.392554 90:27:e4:35:38:a8 fe80::1441:9488:9187:cbe9
16:43:33.904282 00:26:08:b2:ad:78 fe80::226:8ff:feb2:ad78
16:53:55.789861 90:27:e4:35:38:a8 fe80::1441:9488:9187:cbe9
16:54:08.964875 a0:0b:ba:b6:d8:2e fe80::a20b:baff:feb6:d82e
16:55:01.020645 90:27:e4:35:38:a8 fe80::1441:9488:9187:cbe9

That’s it.

Logfile Parsing

Filter, Replace, Omit, etc.

A few Examples

Connections from Host X to Whom?

Count of Connections from Host Y

Summary of Session-End Reasons

Display Filter with Regex

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112