You are viewing a plain text version of this content. The canonical link for it is here.
Posted to sysadmins@spamassassin.apache.org by Jens Schleusener <Je...@t-online.de> on 2018/10/10 14:26:33 UTC

Re: Some interesting (?) observations on a mirror server (sa-update.fossies.org)

On Sat, 22 Sep 2018, Dave Jones wrote:

> On 9/20/18 2:50 PM, Fossies Administrator wrote:
>> Hi,
>> 
>> incidentally I looked some weeks ago on the web server access log file of 
>> the SpamAssassin rules update files mirror sa-update.fossies.org and found 
>> surprisingly that at noon (midday) the log file has a size much more than 
>> the roughly expected half of a complete daily log.
>> 
>> Just for curiosity I plotted the number of the GET requests for update 
>> files (tarballs) per hour and saw an interesting characteristics with a 
>> great peak between 6 and 7 a.m. (GMT+2). Ok, the main reason is probably 
>> the publication time (mostly between 5 and 6 a.m. GMT+2) with a delay til 
>> the user's sa-update scripts are running. But the structure of the curves 
>> with the some curious (?) mimima is a little bit "surprisingly" to me but 
>> it is constant and reproducible.
>> 
>> A simple example text plot for a single day is attached (more accurate 
>> plots are available under the URL given below).
>> 
>> But more interesting and "irritating" was the fact that I found in the main 
>> update time often (at least 100-1000) entries with the HTTP status 404 
>> ("Not Found"). That motivated me to write a primitive script to analyze the 
>> reason by monitoring the update status resp. update times of the new 
>> published rules update files.
>> 
>> First I checked the local web log files assuming that a 404 request to an 
>> update file means that an external client had the information about a new 
>> file that the local mirror sa-update.fossies.org has not yet available 
>> resp. not yet fetched (via rsync).
>> 
>> Additionally I checked the local DNS server (of the server provider) and 
>> the DNS servers I found responsible for the domain spamassassin.org
>>
>>   ns2.pccc.com.
>>   ns2.ena.com.
>>   c.auth-ns.sonic.net.
>>   b.auth-ns.sonic.net.
>>   a.auth-ns.sonic.net.
>> 
>> via the command
>>
>>   dig @<server> 3.3.3.updates.spamassassin.org txt +short
>> 
>> The plots and an extract of the script output you can find under
>>
>>   https://fossies.org/~schleusener/sa-update.mirror_analysis/
>>    User: sa
>>    PW: update
>> 
>> The main reason for the 404 errors seems to be that the mirroring script is 
>> started as cronjob on sa-update.fossies.org only every 10 minutes.
>> 
>> Probably better would be to check the original nameservers (the local 
>> nameserver answers according the TTL only with a freshness delay of max. 
>> one hour) and start only a rsync job if the response shows that a new file 
>> is available.
>> 
>> If all mirror servers would use update frequencies not smaller than 10 
>> minutes an idea may be also to set/change the DNS TXT entry only 10 minutes 
>> after the release (availability) of a new update file.
>> 
>> Additionally I found that the synchronization of the above DNS servers 
>> seems delayed by some minutes. The "best" DNS server seems to be 
>> "ns2.ena.com" since it always as first one provides the new versions.
>> 
>> Maybe this behaviour is a little bit related to the current thread with the 
>> subject "repeated sa-update problems" on the users list.
>> 
>> Regards
>> 
>> Jens
>> 
>
> Very interesting and useful information.  Thank you Jens.
>
> I have put a 20 minute sleep in the script before the DNS updates happen to 
> give the mirrors time to update before sa-update starts looking for the new 
> ruleset.
>
> I run ns2.ena.com and it's updating quickly because it's receiving the DNS 
> NOTIFY from the hidden master and performing a zone transfer immediately. 
> Now this will happen after a 20 minute delay.  All other DNS servers must be 
> ignoring the NOTIFY and updating at the normal REFRESH interval in the SOA 
> record which is 7200 so they will average out to be 1 hour delay behind the 
> hidden master.
>
>
> [djones@djones5 trunk]$ svn diff
> Index: build/mkupdates/mkupdate-with-scores
> ===================================================================
> --- build/mkupdates/mkupdate-with-scores	(revision 1841667)
> +++ build/mkupdates/mkupdate-with-scores	(working copy)
> @@ -282,6 +282,8 @@
>   if [ $AUTOUPDATESDISABLED -eq 1 -a $REVERT_REVISION -eq 0 ]; then
>     echo "DNS updating disabled (auto update publishing disabled), skipping 
> DNS reload"
>   else
> +    # Wait 20 minutes for the mirrors to update via rsync
> +    sleep 1200
>     # Newer versions >= 3.4.1 of SpamAssassin are CNAME'd to 3.3.3
>     /usr/local/bin/updateDNS.sh 3.3.3.updates TXT $REVISION
>     RC=$?
> [djones@djones5 trunk]$ svn commit -m "Added DNS update delay to give time 
> for the mirrors to update via rsync before sa-update will start looking for 
> the new rule sets."
> Sending        build/mkupdates/mkupdate-with-scores
> Transmitting file data .done
> Committing transaction...
> Committed revision 1841668.
>
>
> Dave

After more than two weeks of observation I just want to confirm that your 
measure succeeds: Since September 23, there was not a single 404 error 
for an update file found on the mirror server sa-update.fossies.org.

Jens