Quantcast
Channel: Johannes Weber – Weberblog.net
Viewing all articles
Browse latest Browse all 311

Logfile Parsing

$
0
0

While parsing logfiles on a Linux machine, several commands are useful in order to get the appropriate results, e.g., searching for concrete events in firewall logs.

In this post, I list a few standard parsing commands such as grep, sort, uniq, or wc. Furthermore, I present a few examples of these small tools. However, it’s all about try and error when building large command pipes. ;)

Of course, the two most important functions are 

cat
  for displaying a complete textfile on the screen (stdout), and the pipe 
|
  which is used after every call to forward the output to the next tool. For a live viewing of log files
tail -f
  is used. Note that not all of the following tools can be used with such a type of live viewing, e.g., the sort commands. However, at least “grep” and “cut” can be used.

Filter, Replace, Omit, etc.

  • grep [-v] <text>: Prints only the lines that contain the specified value. When using it with -v, it prints only the lines that do NOT have the specified value. Example: 
    cat file | grep 1234
      or
    cat file | grep -v 5678
     .
  • sort [-g] [-k <position>] [-r]: Sorts the input. Using -g sorts numbers to their real numerical value. With -k the start and stop positions can be set precisely. -r reverses the order. Example: Sort only through the 23th field
    cat file | sort -k 23,24
     .
  • uniq [-f <position>] [-s <position>] [-w <number>] [-c]: Deletes multiple entries. -f skips fields, -s skips chars, -w only compares n chars. Example: Delete all lines that have the same value in the 5th field while only comparing the first 10 chars:
    cat file | uniq -f 5 -w 10
     . -c can be used to print the number of occurrences for each line.
  • wc -l: Simple word count. -l counts the lines (mostly used).
  • comm [-1] [-2] [-3]: Compares two files and prints three columns with entries only present in file 1, 2, or both. These columns can be suppressed with the -1, etc. switches. Example: Print the lines that are uniq in file2:
    comm -13 file1 file2
     .
  • tr -s ‘ ‘: The tool “translate” can be used for many things. One of my default cases is to omit double spaces in logfiles with
    tr -s ' '
     . But it can be used for other use cases, such as replacing uppercase letters to lowercase:
    tr [:upper:] [:lower:]
    , e.g., to have IPv6 address look alike.
  • cut -d ‘ ‘ -f <field>: Prints only the field specified with -f. The field-separator must be set to space. Example: Print the 23th field:
    cat file | cut -d ' ' -f 23
     .
  • head -n -<number>: Omits the first n lines. E.g., when each file starts with three comment lines that should be omitted:
    cat * | head -n -3
     .
  • sed s/regexp/replacement/: Replaces the part of each line that is specified with the regex. E.g., everything before the keyword “hello” (and the keyword itself) should be removed:
    cat file | sed s/.*hello//
     .

A few Examples

Here are a few examples out of my daily business. Let’s grep through some firewall logs. The raw log format looks like the following:

Jan  1 23:59:58 172.16.1.1 fd-wv-fw01: NetScreen device_id=fd-wv-fw01  [Root]system-notification-00257(traffic): start_time="2015-01-01 23:59:55" duration=3 policy_id=206 service=dns proto=17 src zone=Trust dst zone=Untrust action=Permit sent=93 rcvd=132 src=2003:51:6012:123:c24a:ff:fe09:5346 dst=2001:500:1::803f:235 src_port=56854 dst_port=53 src-xlated ip=2003:51:6012:123:c24a:ff:fe09:5346 port=56854 dst-xlated ip=2001:500:1::803f:235 port=53 session_id=3883 reason=Close - RESP

 

Connections from Host X to Whom?

Let’s say I want to know how many destination IPs appeared in a certain policy rule. The relevant policy-id in my log files is “219” (grep id=219). To avoid problems with double spaces, I delete them (tr -s ‘ ‘). The destination IP address field is the 23th field in the log entries – I only want to see them. Since the delimiter in the log file is space, I have to set it to ‘ ‘ (cut -d ‘ ‘ -f 23). Finally, I sort this list (sort) and filter multiple entries (uniq). Here is the result:

weberjoh@jw-nb10:~$ cat 2015-01-01.fd-wv-fw01.log | grep id=219 | tr -s ' ' | cut -d ' ' -f 23 | sort | uniq
dst=77.0.74.170
dst=77.0.77.111
dst=79.204.238.115
dst=93.220.253.102

If I want to have the whole log entry lines (and not only the IP addresses), I can use sort for the 23th field (sort -k 23,24) and uniq for the 23th field (= skip the first 22 fields) while only comparing the following 20 chars (uniq -f 22 -w 20). This is the result:

weberjoh@jw-nb10:~$ cat 2015-01-01.fd-wv-fw01.log | grep id=219 | tr -s ' ' | sort -k 23,24 | uniq -f 22 -w 20
Jan 1 02:17:11 172.16.1.1 fd-wv-fw01: NetScreen device_id=fd-wv-fw01 [Root]system-notification-00257(traffic): start_time="2014-12-31 04:56:53" duration=76818 policy_id=219 service=tcp/port:30005 proto=6 src zone=DMZ dst zone=Untrust2 action=Permit sent=15235078 rcvd=130943813 src=192.168.110.12 dst=77.0.74.170 src_port=49913 dst_port=30005 src-xlated ip=10.49.254.5 port=2364 dst-xlated ip=77.0.74.170 port=30005 session_id=4296 reason=Close - TCP RST
Jan 1 05:53:02 172.16.1.1 fd-wv-fw01: NetScreen device_id=fd-wv-fw01 [Root]system-notification-00257(traffic): start_time="2015-01-01 04:55:25" duration=3457 policy_id=219 service=tcp/port:30005 proto=6 src zone=DMZ dst zone=Untrust2 action=Permit sent=386518 rcvd=1532970 src=192.168.110.12 dst=77.0.77.111 src_port=50279 dst_port=30005 src-xlated ip=10.49.254.5 port=1701 dst-xlated ip=77.0.77.111 port=30005 session_id=7535 reason=Close - TCP RST
Jan 1 04:36:29 172.16.1.1 fd-wv-fw01: NetScreen device_id=fd-wv-fw01 [Root]system-notification-00257(traffic): start_time="2014-12-31 05:54:15" duration=81734 policy_id=219 service=tcp/port:30005 proto=6 src zone=DMZ dst zone=Untrust2 action=Permit sent=18559326 rcvd=63638696 src=192.168.110.12 dst=79.204.238.115 src_port=49925 dst_port=30005 src-xlated ip=10.49.254.5 port=2721 dst-xlated ip=79.204.238.115 port=30005 session_id=4147 reason=Close - TCP RST
Jan 1 05:53:04 172.16.1.1 fd-wv-fw01: NetScreen device_id=fd-wv-fw01 [Root]system-notification-00257(traffic): start_time="2014-12-31 05:54:18" duration=86326 policy_id=219 service=tcp/port:30005 proto=6 src zone=DMZ dst zone=Untrust2 action=Permit sent=24870176 rcvd=276776662 src=192.168.110.12 dst=93.220.253.102 src_port=49926 dst_port=30005 src-xlated ip=10.49.254.5 port=1858 dst-xlated ip=93.220.253.102 port=30005 session_id=4483 reason=Close - TCP RST

 

Count of Connections from Host Y

Another example is the count of connections from host y, sorted by its destinations. The starting point is the source IP address (grep src=192.168.113.11). Double spaces should be removed (tr -s ‘ ‘). Only the destination IP address is relevant, which is the 23th field (cut -d ‘ ‘ -f 23). The output is sorted (sort) and counted per unique entries (uniq -c). To have the counters sorted by its numerical value, another (sort -g -r) is used. This is it:

weberjoh@jw-nb10:~$ cat 2015-01-0* | grep src=192.168.113.11 | tr -s ' ' | cut -d ' ' -f 23 | sort | uniq -c | sort -g -r
 209319 dst=8.8.8.8
   2851 dst=88.198.52.243
    230 dst=198.20.8.241
    209 dst=224.0.0.251
    159 dst=198.20.8.246
    102 dst=192.168.5.1
     50 dst=93.184.221.109
     11 dst=172.16.1.5
      9 dst=91.189.92.152
      5 dst=91.189.95.36
      4 dst=141.30.13.10
      3 dst=192.168.9.6
      2 dst=218.2.0.123
      2 dst=103.41.124.53
      1 dst=78.223.8.102
      1 dst=77.0.138.150
      1 dst=61.174.50.229

 

Summary of Session-End Reasons

Grep every log entry that has the keyword “reason” in it (grep reason), followed by a replacement of the whole line until the last field, which is the reason entry. This is done via the regex that is replaced by nothing (sed s/.*reason.//). Finally, similar to the examples above, sorting the output, counting the unique entries and sorting the counts. Here it is:

weberjoh@jw-nb10:~$ cat 2015-01-01.fd-wv-fw01.log | grep reason | sed s/.*reason.// | sort | uniq -c | sort -g -r
 311970 Close - RESP
 219406 Close - AGE OUT
  69236 Traffic Denied
  56179 Close - TCP FIN
   3621 Close - ICMP Unreach
   2968 Close - TCP RST
    191 Creation
     34 Close - ALG
     24 Close - OTHER

 

Display Filter with Regex

Here is another example on how to “improve” a logfile output with sed in order to have a better view on it. The following output is from tcpdump sniffing on a network for ICMPv6 DAD messages.

16:41:24.392554 90:27:e4:35:38:a8 > 33:33:ff:87:cb:e9, ethertype IPv6 (0x86dd), length 78: :: > ff02::1:ff87:cbe9: ICMP6, neighbor solicitation, who has fe80::1441:9488:9187:cbe9, length 24
16:43:33.904282 00:26:08:b2:ad:78 > 33:33:ff:b2:ad:78, ethertype IPv6 (0x86dd), length 78: :: > ff02::1:ffb2:ad78: ICMP6, neighbor solicitation, who has fe80::226:8ff:feb2:ad78, length 24
16:53:55.789861 90:27:e4:35:38:a8 > 33:33:ff:87:cb:e9, ethertype IPv6 (0x86dd), length 78: :: > ff02::1:ff87:cbe9: ICMP6, neighbor solicitation, who has fe80::1441:9488:9187:cbe9, length 24
16:54:08.964875 a0:0b:ba:b6:d8:2e > 33:33:ff:b6:d8:2e, ethertype IPv6 (0x86dd), length 78: :: > ff02::1:ffb6:d82e: ICMP6, neighbor solicitation, who has fe80::a20b:baff:feb6:d82e, length 24
16:55:01.020645 90:27:e4:35:38:a8 > 33:33:ff:87:cb:e9, ethertype IPv6 (0x86dd), length 78: :: > ff02::1:ff87:cbe9: ICMP6, neighbor solicitation, who has fe80::1441:9488:9187:cbe9, length 24

I only want to see the timestamps along with the MAC & IPv6 address. That is, I want to throw away any words and symbols from this output. This can be done with 

sed s/regexp/replacement/
  which is called with a regex and a replacement of nothing. In my example, I want to replace anything between the > sign and the “has” keyword. The regex for this is 
>.*has.
  which means, beginning with > (which is escaped), followed by anything “.*” until “has”, followed by a single character “.”. And with a second run I want to replace everything to the end starting with the comma:
weberjoh@jw-nb09:~$ cat test | sed s/>.*has.// | sed s/,.*//
16:41:24.392554 90:27:e4:35:38:a8 fe80::1441:9488:9187:cbe9
16:43:33.904282 00:26:08:b2:ad:78 fe80::226:8ff:feb2:ad78
16:53:55.789861 90:27:e4:35:38:a8 fe80::1441:9488:9187:cbe9
16:54:08.964875 a0:0b:ba:b6:d8:2e fe80::a20b:baff:feb6:d82e
16:55:01.020645 90:27:e4:35:38:a8 fe80::1441:9488:9187:cbe9

That’s it. ;)


Viewing all articles
Browse latest Browse all 311

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>