[Hampshire] Analog web log analysis program

Top Page
Author: Adam Trickett
Date:  
To: Hampshire LUG Discussion List
Subject: [Hampshire] Analog web log analysis program

Reply to this message
gpg: failed to create temporary file '/var/lib/lurker/.#lk0x57905100.hantslug.org.uk.3357': Permission denied
gpg: keyblock resource '/var/lib/lurker/pubring.gpg': Permission denied
gpg: Signature made Sun Jan 6 13:45:57 2008 GMT
gpg: using DSA key 019AD0D8166C4BF0
gpg: Can't check signature: No public key
Hi,

I don't know if anyone still uses Analog[0] to analysis their web logs.
Yesterday I noticed that my dns-cache file was getting very large, 22MiB in
fact, and my logs were taking far to long to process.

A bit of digging and I found out what is going on, and how to make it faster
and stay faster.

1) By default Apache and most sensible web servers only record IP address and
don't do a DNS look-up so the log file, so analog has to do it for you. It's
best to keep a text file as a dns look-up cache.

2) Many IPs can't be resolved so are stored in the cache file as an asterisk.
Failed look-ups are the incredibly slow for analog to do, which is where time
gets waisted.

3) By default analog re-checks failed dns queries every 2 weeks
(DNSBADHOURS=336). However 99% of the time a dead dns query will always be
dead, so analog is needless checking dead ip addresses. Good entries are
checked but much more infrequently.

4) Analog never deletes an entry in it's DNS cache file, so overtime it tends
to grow as it has multiple entries for the same (usually unresolvable) IP
address.

To keep the dns cache file trim I suggest that you periodically clean it.

1) Remove duplicated entries. There are scripts on the analog site to do
this[1], or you can write something quickly in any scripting language you
fancy.

2) Increase the window for checking dead DNS entries, e.g. 8724 is a year.

3) Try setting the DNSTIMEOUT (in seconds) to something small, probably won't
work on a non POSIX like system.

4) Copy the dns file from a disk system to a ram disk (e.g. /dev/shm) before
running analog and then back to a disk once the logs have been processes.

My cache file is now only 2.8MiB, and analog is running in seconds rather than
many many minutes now.

HTH

[0] http://www.analog.cx/
[1] http://www.analog.cx/helpers/

--
Adam Trickett
Overton, HANTS, UK

Good advice is always certain to be ignored,
but that's no reason not to give it
    -- Agatha Christie