You are viewing a plain text version of this content. The canonical link for it is here.
Posted to infrastructure-issues@apache.org by "Chris Lambertus (JIRA)" <ji...@apache.org> on 2016/04/13 05:58:25 UTC

[jira] [Resolved] (INFRA-11661) Spurious changes to mail archive files

     [ https://issues.apache.org/jira/browse/INFRA-11661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Lambertus resolved INFRA-11661.
-------------------------------------
    Resolution: Won't Fix

Not exactly sure what you want to do here.. The files are and have been populated the same way for a long time. We're going to be moving to ponymail at some point relatively soon for archives, so it's probably not worth spending a lot of time on this. We've sorted out the issue with load on eos, so as far as root@ is concerned, we can retain the status quo for this script. I do appreciate you looking into it though. We will need to track this as  a dependency when the mbox archives are no longer readily available, however. That's not in scope for this ticket, so I'm closing it as won't fix. 


> Spurious changes to mail archive files
> --------------------------------------
>
>                 Key: INFRA-11661
>                 URL: https://issues.apache.org/jira/browse/INFRA-11661
>             Project: Infrastructure
>          Issue Type: Bug
>          Components: Mail Archives
>            Reporter: Sebb
>
> Further to INFRA-11634, I have been looking at ways to reduce the load on the mail-archive host.
> The script already uses a condtional get (last-modified) so it does not need to process mailboxes unless they have changed.
> However I noticed that it was still fetching empty mailbox files.
> i.e. the last-modified date was changing even though there was no new content. For an empty file, this does not matter, but it is also happening for the other files. I did some checks, and it looks like all the current files are touched once an hour, regardless of whether they have changed or not.
> The Etags are also changed. For example:
> $ curl -s -I http://mail-archives.apache.org/mod_mbox/jmeter-issues/201604.mbox
> HTTP/1.1 200 OK
> Date: Tue, 12 Apr 2016 09:29:59 GMT
> Server: Apache/2.4.12 (Unix)
> Last-Modified: Tue, 12 Apr 2016 08:32:02 GMT
> ETag: "3a15c-530457ed2c521"
> Accept-Ranges: bytes
> Content-Length: 237916
> Content-Type: application/mbox
> HTTP/1.1 200 OK
> Date: Tue, 12 Apr 2016 09:44:44 GMT
> Server: Apache/2.4.12 (Unix)
> Last-Modified: Tue, 12 Apr 2016 09:32:06 GMT
> ETag: "3a15c-5304655a96122"
> Accept-Ranges: bytes
> Content-Length: 237916
> Content-Type: application/mbox
> HTTP/1.1 200 OK
> Date: Tue, 12 Apr 2016 10:45:00 GMT
> Server: Apache/2.4.12 (Unix)
> Last-Modified: Tue, 12 Apr 2016 10:33:16 GMT
> ETag: "3a15c-530473062e354"
> Accept-Ranges: bytes
> Content-Length: 237916
> Content-Type: application/mbox
> It's perhaps not surprising that there is excess traffic to the box if it reports changes when none have happened.
> AFAIK it's not possible to base a conditional get on the length.
> It should be possible for scripts to check if the length has changed and skip reading the file if not. However such checks will have to be done on an ad-hoc basis for each script. 
> It would obviously be better if the server did not report changes where none exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)