You are viewing a plain text version of this content. The canonical link for it is here.
Posted to infrastructure-issues@apache.org by "Sebb (JIRA)" <ji...@apache.org> on 2016/04/12 12:58:25 UTC
[jira] [Created] (INFRA-11661) Spurious changes to mail archive
files
Sebb created INFRA-11661:
----------------------------
Summary: Spurious changes to mail archive files
Key: INFRA-11661
URL: https://issues.apache.org/jira/browse/INFRA-11661
Project: Infrastructure
Issue Type: Bug
Components: Mail Archives
Reporter: Sebb
Further to INFRA-11634, I have been looking at ways to reduce the load on the mail-archive host.
The script already uses a condtional get (last-modified) so it does not need to process mailboxes unless they have changed.
However I noticed that it was still fetching empty mailbox files.
i.e. the last-modified date was changing even though there was no new content. For an empty file, this does not matter, but it is also happening for the other files. I did some checks, and it looks like all the current files are touched once an hour, regardless of whether they have changed or not.
The Etags are also changed. For example:
$ curl -s -I http://mail-archives.apache.org/mod_mbox/jmeter-issues/201604.mbox
HTTP/1.1 200 OK
Date: Tue, 12 Apr 2016 09:29:59 GMT
Server: Apache/2.4.12 (Unix)
Last-Modified: Tue, 12 Apr 2016 08:32:02 GMT
ETag: "3a15c-530457ed2c521"
Accept-Ranges: bytes
Content-Length: 237916
Content-Type: application/mbox
HTTP/1.1 200 OK
Date: Tue, 12 Apr 2016 09:44:44 GMT
Server: Apache/2.4.12 (Unix)
Last-Modified: Tue, 12 Apr 2016 09:32:06 GMT
ETag: "3a15c-5304655a96122"
Accept-Ranges: bytes
Content-Length: 237916
Content-Type: application/mbox
HTTP/1.1 200 OK
Date: Tue, 12 Apr 2016 10:45:00 GMT
Server: Apache/2.4.12 (Unix)
Last-Modified: Tue, 12 Apr 2016 10:33:16 GMT
ETag: "3a15c-530473062e354"
Accept-Ranges: bytes
Content-Length: 237916
Content-Type: application/mbox
It's perhaps not surprising that there is excess traffic to the box if it reports changes when none have happened.
AFAIK it's not possible to base a conditional get on the length.
It should be possible for scripts to check if the length has changed and skip reading the file if not. However such checks will have to be done on an ad-hoc basis for each script.
It would obviously be better if the server did not report changes where none exist.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)