MrUll's

Juni 14, 2004

Awk for accesslog analysis e.g. sorting

Here are some awk helpers to ease accesslog analysis. particular in cases where one logfile holds an exception and timestamp and you need to determine which actions lead to this.
Usual tasks are:
  • find out which IPs were active in a specific time-window.
  • which IP Adress caused the exception
  • reordering accesslog lines by IP, Timestamp for having a look on the sequence of requests.

    This first awk skript is to select all requests around a specific point of time:

    cat localhost_access_log.2004-05-27.txt | awk '/\[27\/May\/2004:05:(05|06)/{ print $0 }' > t.log


    the regexp /\[27\/May\/2004:05:(05|06)/ is to select all requests which were requested in minute five and six.

    Note that awk offers associative arrays. Oracle Pl/Sql offers this too with declarations like
    "type t_arr is table of varchar2(20) index by varchar2(54)"
    . With this its possible to use IP Address as index values. See next example to implement "select IP, count(*) from accesslogfile group by IP"

    cat localhost_access_log.2004-05-27.txt | nawk '{ if ($1 in arr) {arr[$1]++} else {arr[$1]=1} } END { for (x in arr) print x " " arr[x] " requests" }'

    For counting in a specific timeframe just use a regexp as in the first script.

    Next script uses a multidimentional array to reorder accresslogentries (lines) by IP and timestamp:

    { if ($1 in arr) {arr[$1]++} else {arr[$1]=1} requests[$1, arr[$1]]=$0 } END { for (ip in arr) { for(x=1;x<=arr[ip];x++) {print requests[ip, x]}}}


    Note that this is getting RAM excessive for large accesslogs.

    if theres a running Oracle instance present, its possible to create a table with external storage (our logfile) to do e.g. ordering.


  •