You are viewing a plain text version of this content. The canonical link for it is here.
Posted to infrastructure-issues@apache.org by "Sebb (JIRA)" <ji...@apache.org> on 2016/04/12 12:58:25 UTC

[jira] [Created] (INFRA-11661) Spurious changes to mail archive files

Sebb created INFRA-11661:
----------------------------

             Summary: Spurious changes to mail archive files
                 Key: INFRA-11661
                 URL: https://issues.apache.org/jira/browse/INFRA-11661
             Project: Infrastructure
          Issue Type: Bug
          Components: Mail Archives
            Reporter: Sebb


Further to INFRA-11634, I have been looking at ways to reduce the load on the mail-archive host.

The script already uses a condtional get (last-modified) so it does not need to process mailboxes unless they have changed.

However I noticed that it was still fetching empty mailbox files.
i.e. the last-modified date was changing even though there was no new content. For an empty file, this does not matter, but it is also happening for the other files. I did some checks, and it looks like all the current files are touched once an hour, regardless of whether they have changed or not.

The Etags are also changed. For example:

$ curl -s -I http://mail-archives.apache.org/mod_mbox/jmeter-issues/201604.mbox

HTTP/1.1 200 OK
Date: Tue, 12 Apr 2016 09:29:59 GMT
Server: Apache/2.4.12 (Unix)
Last-Modified: Tue, 12 Apr 2016 08:32:02 GMT
ETag: "3a15c-530457ed2c521"
Accept-Ranges: bytes
Content-Length: 237916
Content-Type: application/mbox

HTTP/1.1 200 OK
Date: Tue, 12 Apr 2016 09:44:44 GMT
Server: Apache/2.4.12 (Unix)
Last-Modified: Tue, 12 Apr 2016 09:32:06 GMT
ETag: "3a15c-5304655a96122"
Accept-Ranges: bytes
Content-Length: 237916
Content-Type: application/mbox

HTTP/1.1 200 OK
Date: Tue, 12 Apr 2016 10:45:00 GMT
Server: Apache/2.4.12 (Unix)
Last-Modified: Tue, 12 Apr 2016 10:33:16 GMT
ETag: "3a15c-530473062e354"
Accept-Ranges: bytes
Content-Length: 237916
Content-Type: application/mbox

It's perhaps not surprising that there is excess traffic to the box if it reports changes when none have happened.

AFAIK it's not possible to base a conditional get on the length.
It should be possible for scripts to check if the length has changed and skip reading the file if not. However such checks will have to be done on an ad-hoc basis for each script. 

It would obviously be better if the server did not report changes where none exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)