[Hampshire] a strange web server anomaly?

Top Page

Reply to this message
Author: Victor Churchill
Date:  
To: Hampshire LUG Discussion List
Subject: [Hampshire] a strange web server anomaly?
This has me really puzzled.

I have an Apache setup to write a custom_log with the %D field to tell
me the server response time for each request. And I am seeing a very
strange pattern.

It seems that for one particular client IP address, when the page they
are being served includes a particular image, the server frequently
times out delivering the image. The image is fine served to other
clients; the client IP gets fine response on (some) other pages. It's
not particularly big.

A small example of the results - format is filename:line-number:IP
Address,http status, size, etc. The final field is the time in
microseconds):

1. grepping the image:

cd /var/log/httpd/
grep -n -e K0002.jpg cus* /dev/null
custom_log:1996:194.73.99.134 200 58124 [10/Mar/2009:12:12:34 +0000]
"GET /images/meters/K0002.jpg HTTP/1.1" 90075828
custom_log.1:1946:91.208.3.130 304 - [02/Mar/2009:15:24:53 +0000] "GET
/images/meters/K0002.jpg HTTP/1.1" 241
custom_log.1:5324:194.176.105.49 200 58124 [05/Mar/2009:15:21:56
+0000] "GET /images/meters/K0002.jpg HTTP/1.1" 47326
custom_log.1:6146:217.33.26.36 200 58124 [06/Mar/2009:15:48:42 +0000]
"GET /images/meters/K0002.jpg HTTP/1.1" 67631
custom_log.10:1372:195.173.32.142 200 58124 [29/Dec/2008:09:56:11
+0000] "GET /images/meters/K0002.jpg HTTP/1.0" 605584
custom_log.10:1441:195.173.32.179 200 58124 [29/Dec/2008:10:40:48
+0000] "GET /images/meters/K0002.jpg HTTP/1.0" 113752
custom_log.10:1718:195.173.32.142 304 - [30/Dec/2008:12:03:41 +0000]
"GET /images/meters/K0002.jpg HTTP/1.0" 113
custom_log.10:1766:194.176.105.49 200 58124 [30/Dec/2008:12:26:06
+0000] "GET /images/meters/K0002.jpg HTTP/1.1" 42906
custom_log.10:1783:195.173.32.179 304 - [30/Dec/2008:12:36:06 +0000]
"GET /images/meters/K0002.jpg HTTP/1.0" 246
custom_log.11:400:195.217.251.188 200 58124 [22/Dec/2008:12:05:45
+0000] "GET /images/meters/K0002.jpg HTTP/1.1" 246301
custom_log.12:5312:195.93.21.73 200 - [20/Dec/2008:12:49:01 +0000]
"HEAD /images/meters/K0002.jpg HTTP/1.1" 153
custom_log.12:5332:195.93.21.73 200 58124 [20/Dec/2008:12:49:01 +0000]
"GET /images/meters/K0002.jpg HTTP/1.1" 234387
custom_log.13:1834:194.73.99.134 200 58124 [10/Dec/2008:09:14:10
+0000] "GET /images/meters/K0002.jpg HTTP/1.1" 91120
custom_log.13:1839:194.73.99.134 200 58124 [10/Dec/2008:09:13:26
+0000] "GET /images/meters/K0002.jpg HTTP/1.1" 90132330
custom_log.13:1842:194.73.99.134 200 58124 [10/Dec/2008:09:13:41
+0000] "GET /images/meters/K0002.jpg HTTP/1.1" 90035283

386 lines for K0002.jpg
155 >= 90 sec
241 from 194.73.99.134

2. grepping the IP address
cd /var/log/httpd/
grep -n -e 194.73.99.134 cus* /dev/null
custom_log:1987:194.73.99.134 302 - [10/Mar/2009:12:12:26 +0000] "POST
/bstream/myMeter/intro_bs.pl HTTP/1.1" 273073
custom_log:1988:194.73.99.134 200 18341 [10/Mar/2009:12:12:26 +0000]
"GET /bstream/myMeter/master.pl?CGISESSID=f38f18b093477c93f6543fa178ac8f5c
HTTP/1.1" 7369256
custom_log:1989:194.73.99.134 200 1596 [10/Mar/2009:12:12:34 +0000]
"GET /local/QGStyleSheet.css HTTP/1.1" 260
custom_log:1990:194.73.99.134 200 3282 [10/Mar/2009:12:12:34 +0000]
"GET /images/presn/mmw4u.gif HTTP/1.1" 180
custom_log:1991:194.73.99.134 200 15663 [10/Mar/2009:12:12:34 +0000]
"GET /images/presn/SW_Welcome_226x59.png HTTP/1.1" 13087
custom_log:1992:194.73.99.134 200 76 [10/Mar/2009:12:12:34 +0000] "GET
/images/presn/SW_grey_720x2.gif HTTP/1.1" 125
custom_log:1993:194.73.99.134 200 580 [10/Mar/2009:12:12:34 +0000]
"GET /images/presn/TopNav_Highlight_150x40.gif HTTP/1.1" 134
custom_log:1996:194.73.99.134 200 58124 [10/Mar/2009:12:12:34 +0000]
"GET /images/meters/K0002.jpg HTTP/1.1" 90075828
custom_log.13:1825:194.73.99.134 302 - [10/Dec/2008:09:13:14 +0000]
"POST /bstream/myMeter/intro_bs.pl HTTP/1.1" 444074
custom_log.13:1826:194.73.99.134 200 18254 [10/Dec/2008:09:13:14
+0000] "GET /bstream/myMeter/master.pl?CGISESSID=12b34f1cf1c7ed53b48ef1d356de7909
HTTP/1.1" 11256875

1605 lines from 194.73.99.134
241 occurrences K0002
184 occurrences >=90 sec

3. grepping for >90s response times

cd /var/log/httpd/
grep -n -e 9[0-9][0-9][0-9][0-9][0-9][0-9][0-9]$ cus* /dev/null
custom_log:1317:195.173.32.179 200 36401 [09/Mar/2009:16:02:17 +0000]
"POST /bstream/myMeter/master.pl?page=Charts&CGISESSID=45373afd8fe16befa1c3deb524ee91b4
HTTP/1.0" 91407681
custom_log:1996:194.73.99.134 200 58124 [10/Mar/2009:12:12:34 +0000]
"GET /images/meters/K0002.jpg HTTP/1.1" 90075828
custom_log.11:1261:195.93.21.73 200 6410706 [23/Dec/2008:14:14:14
+0000] "POST /bstream/myMeter/master.pl?page=Charts&CGISESSID=f6a4ae7f6cc2c43199e06ea96ef7981a
HTTP/1.1" 94657173
custom_log.12:4372:195.10.45.219 200 5322469 [17/Dec/2008:11:15:53
+0000] "POST /2.3.1/myMeter/master.pl?page=Charts&CGISESSID=96abb78b64b6722ff1eb1a106d5cfa29
HTTP/1.1" 91916041
custom_log.13:1929:194.73.99.134 200 58124 [10/Dec/2008:09:29:19
+0000] "GET /images/meters/K0002.jpg HTTP/1.1" 90079633
custom_log.13:1935:194.73.99.134 200 58124 [10/Dec/2008:09:31:36
+0000] "GET /images/meters/K0002.jpg HTTP/1.1" 90044363
custom_log.13:2693:195.173.32.142 200 36148 [11/Dec/2008:11:32:22
+0000] "POST /bstream/myMeter/master.pl?page=Charts&CGISESSID=fa3e7de8d87fda69212a2e30ec66b315
HTTP/1.0" 91650095

287 lines with response >= 90s
139 occurrences of K0002.jpg
151 occurrences for 194.73.99.134

So :
- one image file accounts for almost half the server timeouts;
- more than half the timeouts come serving this one address;
- the image times out 40% of the time it is served;
- 2/3 of the time this image is served it is going to this one user;
- this user experiences timeouts on 10% of hir page hits;
- s/he calls up the meter page with this image 15% of hir page hits.

But correspondingly, the user gets OK responses on other pages, the
image is OK going to other userrs, we get timeouts for other users
doing other stuff.

This has me quite bewildered. This user may be operating behind some
corporate firewall router, I don't know. I set up a trial where I
hammered the server overnight pulling this one image repeatedly from a
couple of addresses, and had responses consistently around 0.1 second.

What on earth would cause this set of results?