You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Karsten Bräckelmann <gu...@rudersport.de> on 2011/01/26 23:39:22 UTC

Update Mirror Issues

Just came up on the users list. Escalating. ;)  The facts:

  1.3.3.updates.spamassassin.org descriptive text "1052462"
  2.3.3.updates.spamassassin.org descriptive text "1052462"

Rule update tarball available on mirrors. 4 weeks old revision from
trunk.

  0.3.3.updates.spamassassin.org descriptive text "1061118"

Tarball NOT available. 6 days old revision from tags, not trunk.


Moreover, the dostech mirror currently is unresponsive, serving neither.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Update Mirror Issues

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 1/31/2011 11:40 PM, Warren Togami Jr. wrote:
> On 01/26/2011 06:48 PM, Daryl C. W. O'Shea wrote:
>> Not quite sure why 3.3.0 would be different from 3.3.1+2 would be
>> different, but the reason we haven't had any stable branch rules
>> published in a while is that we haven't had enough *recent* spam
>> submitted. Last nights cron job says:
>>
>> HAM: 188008 (150000 required)
>> SPAM: 51330 (150000 required)
>> Insufficient spam corpus to generate scores; aborting.
>>
>> Ever since my main corpus server hard drive failed in October my corpus
>> hasn't had spam being added to it so my spam corpus finally aged out a
>> few weeks ago. I guess I need to get this fixed, but we also need more
>> spam from others.
>>
>
> Joao Gouveia will soon be requesting an account to join the nightly 
> masscheck.  He has a significant quantity of spam, and hopefully much 
> of it is European language so it should add to our diversity.
>
> Warren
Joao runs anubis bl as well.  A very welcome addition!

Re: Mass-check Corpora (once was: Re: Update Mirror Issues)

Posted by "Warren Togami Jr." <wt...@gmail.com>.
On 2/2/2011 1:01 AM, Justin Mason wrote:
> 2011/2/2 Warren Togami Jr.<wt...@gmail.com>:
>> On 2/1/2011 1:02 PM, Karsten Bräckelmann wrote:
>>>
>>> Yikes indeed.
>>>
>>> Maybe Joao should answer these himself...
>>>
>>> Given the numbers, is that purely trap driven? Is there a legion human
>>> users manually verifying the spam?
>>>
>>> What exactly does "filter duplicates" mean? If that includes "identical"
>>> payload sent to different users, these dupes should not be eliminated I
>>> believe, since it will bias results. A random sample already will
>>> eliminate most duplicates, while preserving distribution.
>>
>> Good point. +1
>
> +1.
>
> My approach btw when dealing with traps is to (a) upload those using a
> distinct filename if possible (e.g. "ham-jm-traps.log" or similar),
> and (b) sample randomly to get the volume down to something comparable
> to the other corpora.  Trap spam tends to contain  bounce blowback and
> other "noise" that we don't necessarily want in large numbers in our
> corpora.

Good point about bounce blowback (or backscatter as some people call 
it).  I forgot about that because my traps automatically filter that out 
from the corpus.

Warren

Re: Mass-check Corpora (once was: Re: Update Mirror Issues)

Posted by Justin Mason <jm...@jmason.org>.
2011/2/2 Warren Togami Jr. <wt...@gmail.com>:
> On 2/1/2011 1:02 PM, Karsten Bräckelmann wrote:
>>
>> Yikes indeed.
>>
>> Maybe Joao should answer these himself...
>>
>> Given the numbers, is that purely trap driven? Is there a legion human
>> users manually verifying the spam?
>>
>> What exactly does "filter duplicates" mean? If that includes "identical"
>> payload sent to different users, these dupes should not be eliminated I
>> believe, since it will bias results. A random sample already will
>> eliminate most duplicates, while preserving distribution.
>
> Good point. +1

+1.

My approach btw when dealing with traps is to (a) upload those using a
distinct filename if possible (e.g. "ham-jm-traps.log" or similar),
and (b) sample randomly to get the volume down to something comparable
to the other corpora.  Trap spam tends to contain  bounce blowback and
other "noise" that we don't necessarily want in large numbers in our
corpora.

Re: Mass-check Corpora (once was: Re: Update Mirror Issues)

Posted by "Warren Togami Jr." <wt...@gmail.com>.
On 2/1/2011 1:02 PM, Karsten Bräckelmann wrote:
> Yikes indeed.
>
> Maybe Joao should answer these himself...
>
> Given the numbers, is that purely trap driven? Is there a legion human
> users manually verifying the spam?
>
> What exactly does "filter duplicates" mean? If that includes "identical"
> payload sent to different users, these dupes should not be eliminated I
> believe, since it will bias results. A random sample already will
> eliminate most duplicates, while preserving distribution.

Good point. +1

Warren

Mass-check Corpora (once was: Re: Update Mirror Issues)

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
> > > > SPAM: 51330 (150000 required)
> > >
> > > Joao Gouveia will soon be requesting an account to join the nightly
> > > masscheck. He has a significant quantity of spam, and hopefully much
> > > of it is European language so it should add to our diversity.
> >
> > I wonder how scoring will be affected if his corpus is >50k messages?
> > :)
> 
> Yikes.  He has over 1 million per day spam.  He's figuring out a way to 
> filter it to eliminate duplicates and do a random sample of ~20k * 7 
> days.  But still, that's going to skew us too much.

Yikes indeed.

Maybe Joao should answer these himself...

Given the numbers, is that purely trap driven? Is there a legion human
users manually verifying the spam?

What exactly does "filter duplicates" mean? If that includes "identical"
payload sent to different users, these dupes should not be eliminated I
believe, since it will bias results. A random sample already will
eliminate most duplicates, while preserving distribution.

Is there also ham?


Regarding skewing of results due to a single source with overwhelming
numbers: I recall days, where mass-checks (though not for scoring)
basically consisted of one huge corpus, and a bunch of additional,
*much* smaller corpora. It did indeed have an impact on quite a few
rules, hardly matching the dominant corpus at all, though others quite
nicely. :/


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Update Mirror Issues

Posted by "Warren Togami Jr." <wt...@gmail.com>.
On 2/1/2011 5:44 AM, John Hardin wrote:
> On Mon, 31 Jan 2011, Warren Togami Jr. wrote:
>
>> On 01/26/2011 06:48 PM, Daryl C. W. O'Shea wrote:
>>>
>>> SPAM: 51330 (150000 required)
>>
>> Joao Gouveia will soon be requesting an account to join the nightly
>> masscheck. He has a significant quantity of spam, and hopefully much
>> of it is European language so it should add to our diversity.
>
> I wonder how scoring will be affected if his corpus is >50k messages?
>
> :)
>

Yikes.  He has over 1 million per day spam.  He's figuring out a way to 
filter it to eliminate duplicates and do a random sample of ~20k * 7 
days.  But still, that's going to skew us too much.

Warren

Re: Update Mirror Issues

Posted by John Hardin <jh...@impsec.org>.
On Mon, 31 Jan 2011, Warren Togami Jr. wrote:

> On 01/26/2011 06:48 PM, Daryl C. W. O'Shea wrote:
>>
>>  SPAM: 51330 (150000 required)
>
> Joao Gouveia will soon be requesting an account to join the nightly 
> masscheck.  He has a significant quantity of spam, and hopefully much of it 
> is European language so it should add to our diversity.

I wonder how scoring will be affected if his corpus is >50k messages?

:)

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   W-w-w-w-w-where did he learn to n-n-negotiate like that?
-----------------------------------------------------------------------
  Today: the 8th anniversary of the loss of STS-107 Columbia

Re: Update Mirror Issues

Posted by "Warren Togami Jr." <wt...@gmail.com>.
On 01/26/2011 06:48 PM, Daryl C. W. O'Shea wrote:
> Not quite sure why 3.3.0 would be different from 3.3.1+2 would be
> different, but the reason we haven't had any stable branch rules
> published in a while is that we haven't had enough *recent* spam
> submitted. Last nights cron job says:
>
> HAM: 188008 (150000 required)
> SPAM: 51330 (150000 required)
> Insufficient spam corpus to generate scores; aborting.
>
> Ever since my main corpus server hard drive failed in October my corpus
> hasn't had spam being added to it so my spam corpus finally aged out a
> few weeks ago. I guess I need to get this fixed, but we also need more
> spam from others.
>

Joao Gouveia will soon be requesting an account to join the nightly 
masscheck.  He has a significant quantity of spam, and hopefully much of 
it is European language so it should add to our diversity.

Warren

sa-updates not happening (Re: Update Mirror Issues)

Posted by Da...@chaosreigns.com.
On 01/27, Daryl C. W. O'Shea wrote:
> >>Not quite sure why 3.3.0 would be different from 3.3.1+2 would be
> >>different, but the reason we haven't had any stable branch rules
> >>published in a while is that we haven't had enough *recent* spam
> >>submitted. Last nights cron job says:
> >>
> >>HAM: 188008 (150000 required)
> >>SPAM: 51330 (150000 required)
> >>Insufficient spam corpus to generate scores; aborting.
> >
> >How recent is recent?
> 
> Ham: 72 months
> Spam: 2 months

>From yesterday I'm getting:
Ham:   181322
Spam:  165436

Both are over 150000, so an sa-update should happen, right?

What I did:

rsync --exclude '*~' -vaz "darxus@rsync.spamassassin.org::corpus" /home/darxus/sa/corp
./log-grep-recent -m 72 `grep -l '^# SVN revision: 1083147$' ~/sa/corp/ham-net-*.log` > ~/sa/ham-full.log
./log-grep-recent -m 2 `grep -l '^# SVN revision: 1083147$' ~/sa/corp/spam-net-*.log` > ~/sa/spam-full.log
wc -l ~/sa/*.log
   181322 /home/darxus/sa/ham-full.log
   165436 /home/darxus/sa/spam-full.log

-- 
"Every normal man must be tempted at times to spit upon his hands,
hoist the black flag, and begin slitting throats."
 - Henry Louis Mencken (1880-1956)
http://www.ChaosReigns.com

Re: How close to update threshold?

Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.
Sorry for the delay, I've been fighting the flu again.

On 18/02/2011 4:16 AM, Warren Togami Jr. wrote:
> How close are we to the threshold now?

As of today:

HAM: 261321 (150000 required)
SPAM: 145500 (150000 required)
Insufficient spam corpus to generate scores; aborting.

Daryl


How close to update threshold?

Posted by "Warren Togami Jr." <wt...@gmail.com>.
On 1/27/2011 12:40 PM, Daryl C. W. O'Shea wrote:
>>> Not quite sure why 3.3.0 would be different from 3.3.1+2 would be
>>> different, but the reason we haven't had any stable branch rules
>>> published in a while is that we haven't had enough *recent* spam
>>> submitted. Last nights cron job says:
>>>
>>> HAM: 188008 (150000 required)
>>> SPAM: 51330 (150000 required)
>>> Insufficient spam corpus to generate scores; aborting.
>>
>> How recent is recent?
>
> Ham: 72 months
> Spam: 2 months
>
> Daryl
>

How close are we to the threshold now?

Joao, have you had a chance to make a daily random sample for masscheck? 
  Did they create an rsync upload account for you yet?

Warren

Re: Update Mirror Issues

Posted by "Warren Togami Jr." <wt...@gmail.com>.
On 01/27/2011 12:40 PM, Daryl C. W. O'Shea wrote:
>>>
>>> HAM: 188008 (150000 required)
>>> SPAM: 51330 (150000 required)
>>> Insufficient spam corpus to generate scores; aborting.
>>
>> How recent is recent?
>
> Ham: 72 months
> Spam: 2 months
>
> Daryl
>

Would it be too harmful to expand it to 3-4 months temporarily? 
Probably wont bring us above the threshold itself, but perhaps some 
updates (especially that remove rules) are better than no updates?

I had been randomly discarding ~33% of my spam trap due to concern that 
I would overwhelm the statistics with trap.  I guess I'll turn off the 
discard for a while until others bring their recent spam numbers up.

Warren

Re: Update Mirror Issues

Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.
On 27/01/2011 6:14 AM, Warren Togami Jr. wrote:
> On 1/26/2011 6:48 PM, Daryl C. W. O'Shea wrote:
>> On 26/01/2011 10:12 PM, Kevin A. McGrail wrote:
>>> On 1/26/2011 5:39 PM, Karsten Bräckelmann wrote:
>>>> Just came up on the users list. Escalating. ;) The facts:
>>>>
>>>> 1.3.3.updates.spamassassin.org descriptive text "1052462"
>>>> 2.3.3.updates.spamassassin.org descriptive text "1052462"
>>>>
>>>> Rule update tarball available on mirrors. 4 weeks old revision from
>>>> trunk.
>>>>
>>>> 0.3.3.updates.spamassassin.org descriptive text "1061118"
>>>>
>>>> Tarball NOT available. 6 days old revision from tags, not trunk.
>>
>> Not quite sure why 3.3.0 would be different from 3.3.1+2 would be
>> different, but the reason we haven't had any stable branch rules
>> published in a while is that we haven't had enough *recent* spam
>> submitted. Last nights cron job says:
>>
>> HAM: 188008 (150000 required)
>> SPAM: 51330 (150000 required)
>> Insufficient spam corpus to generate scores; aborting.
>
> How recent is recent?

Ham: 72 months
Spam: 2 months

Daryl


Re: Update Mirror Issues

Posted by "Warren Togami Jr." <wt...@gmail.com>.
On 1/26/2011 6:48 PM, Daryl C. W. O'Shea wrote:
> On 26/01/2011 10:12 PM, Kevin A. McGrail wrote:
>> On 1/26/2011 5:39 PM, Karsten Bräckelmann wrote:
>>> Just came up on the users list. Escalating. ;) The facts:
>>>
>>> 1.3.3.updates.spamassassin.org descriptive text "1052462"
>>> 2.3.3.updates.spamassassin.org descriptive text "1052462"
>>>
>>> Rule update tarball available on mirrors. 4 weeks old revision from
>>> trunk.
>>>
>>> 0.3.3.updates.spamassassin.org descriptive text "1061118"
>>>
>>> Tarball NOT available. 6 days old revision from tags, not trunk.
>
> Not quite sure why 3.3.0 would be different from 3.3.1+2 would be
> different, but the reason we haven't had any stable branch rules
> published in a while is that we haven't had enough *recent* spam
> submitted. Last nights cron job says:
>
> HAM: 188008 (150000 required)
> SPAM: 51330 (150000 required)
> Insufficient spam corpus to generate scores; aborting.

How recent is recent?

Warren

Re: Update Mirror Issues

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
> OK, found the cause.  Somebody broke it by manually changing the 3.3.0 
> update version in DNS to point at 1061118 which is a 3.4.0 version 
> (and not necessarily compatible with 3.3.0!).  This is seriously bad.
>
> I won't point fingers, but here's the last log for the zone at that time:
>
>> wtogami   pts/5        cpe-76-93-222-12 Thu Jan 20 04:22 - 04:40  
>> (00:18)
>> wtogami   sshd         cpe-76-93-222-12 Thu Jan 20 04:22 - 04:40  
>> (00:18)
>> wtogami   pts/8        cpe-76-93-222-12 Thu Jan 20 04:08 - 04:40  
>> (00:32)
>> wtogami   sshd         cpe-76-93-222-12 Thu Jan 20 04:08 - 04:11  
>> (00:03)
>
I did give Warren permission to try his best to poke things with sticks 
and try not to break things.  We are working hard to fix things that 
aren't well known/documented.

In short, I consider this an expected ramification as we know rules is 
problematic and emergency rules more so.

Expect wiki updates from me.

Regards,
KAM

Re: Update Mirror Issues

Posted by Justin Mason <jm...@jmason.org>.
ah, my apologies -- I'd forgotten about that.  As the bug notes, we
never completed instructions to reliably push an update, so I'm not
really sure what the correct approach is....

--j.

On Thu, Jan 27, 2011 at 12:02, Warren Togami Jr. <wt...@gmail.com> wrote:
> On 01/19/2011 11:58 AM, Justin Mason wrote:
>>          ssh spamassassin.zones.apache.org
>>          cd /home/updatesd/svn/spamassassin/build/mkupdates
>>         [svn up appropriately]
>>          sudo -u updatesd ./update-rules-3.3 3.3
>>
>>
>> see "build/README" for full details.
>
> These were JM's instructions.
>
> I looked closer at the server and figured out more of what happened.
>
> * This apparently was last used when 3.3 was trunk.
> * /home/updatesd/svn/spamassassin/ continues to follow trunk, and the script
> gets the number from the svn checkout.
>
> Warren
>

Re: Update Mirror Issues

Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.
On 27/01/2011 7:02 AM, Warren Togami Jr. wrote:
> On 01/19/2011 11:58 AM, Justin Mason wrote:
>  > ssh spamassassin.zones.apache.org
>  > cd /home/updatesd/svn/spamassassin/build/mkupdates
>  > [svn up appropriately]
>  > sudo -u updatesd ./update-rules-3.3 3.3
>  >
>  >
>  > see "build/README" for full details.
>
> These were JM's instructions.
>
> I looked closer at the server and figured out more of what happened.
>
> * This apparently was last used when 3.3 was trunk.
> * /home/updatesd/svn/spamassassin/ continues to follow trunk, and the
> script gets the number from the svn checkout.

Yeah, don't use that for the stable branches.

Daryl


Re: Update Mirror Issues

Posted by "Warren Togami Jr." <wt...@gmail.com>.
On 01/19/2011 11:58 AM, Justin Mason wrote:
 >          ssh spamassassin.zones.apache.org
 >          cd /home/updatesd/svn/spamassassin/build/mkupdates
 >         [svn up appropriately]
 >          sudo -u updatesd ./update-rules-3.3 3.3
 >
 >
 > see "build/README" for full details.

These were JM's instructions.

I looked closer at the server and figured out more of what happened.

* This apparently was last used when 3.3 was trunk.
* /home/updatesd/svn/spamassassin/ continues to follow trunk, and the 
script gets the number from the svn checkout.

Warren

Re: Update Mirror Issues

Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.
Corrected manual procedure... the at job needs to be run as updatesd:

On 27/01/2011 12:29 AM, Daryl C. W. O'Shea wrote:
> Going forward... we, probably me, need to get an automated way to push
> some sort of emergency rule update.
>
> The current manually steps would be:
>
> - un-tar an existing STABLE version rule update
> - make the changes (using a patch or manually)
> - test that those rules work with all .x versions that you're going to
> publish the update for (that is 3.3.0, 3.3.1, 3.3.2, etc...)
> - tar up, sign and hash the update
> - copy the three update files to the update tarball directory on the zone
> - make the files all 544 and owned by updatesd:dns
> - update the DNS record for each .x version
> - wait 16 or more minutes (the mirrors rsync every 15) and reload the
> DNS zone
> - alternatively for the last step you could immediately do this:

You'll need to be updatesd, so first

$ sudo su - updatesd

then

$ echo 
/export/home/updatesd/svn/spamassassin/build/mkupdates/tick_zone_serial 
| at -q n now + 16min


Re: Update Mirror Issues

Posted by "Warren Togami Jr." <wt...@gmail.com>.
On 1/26/2011 7:29 PM, Daryl C. W. O'Shea wrote:
>
> OK, found the cause. Somebody broke it by manually changing the 3.3.0
> update version in DNS to point at 1061118 which is a 3.4.0 version (and
> not necessarily compatible with 3.3.0!). This is seriously bad.
>
> I won't point fingers, but here's the last log for the zone at that time:
>
>> wtogami pts/5 cpe-76-93-222-12 Thu Jan 20 04:22 - 04:40 (00:18)
>> wtogami sshd cpe-76-93-222-12 Thu Jan 20 04:22 - 04:40 (00:18)
>> wtogami pts/8 cpe-76-93-222-12 Thu Jan 20 04:08 - 04:40 (00:32)
>> wtogami sshd cpe-76-93-222-12 Thu Jan 20 04:08 - 04:11 (00:03)

Sorry about that, yes that was my mistake.  I followed directions that 
JM gave me.  It appeared to complete without errors, but it failed to 
update the sa-update channel.  I asked JM for guidance but did not hear 
back.

Warren

Re: Update Mirror Issues

Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.
On 26/01/2011 11:48 PM, Daryl C. W. O'Shea wrote:
> On 26/01/2011 10:12 PM, Kevin A. McGrail wrote:
>> On 1/26/2011 5:39 PM, Karsten Bräckelmann wrote:
>>> Just came up on the users list. Escalating. ;) The facts:
>>>
>>> 1.3.3.updates.spamassassin.org descriptive text "1052462"
>>> 2.3.3.updates.spamassassin.org descriptive text "1052462"
>>>
>>> Rule update tarball available on mirrors. 4 weeks old revision from
>>> trunk.
>>>
>>> 0.3.3.updates.spamassassin.org descriptive text "1061118"
>>>
>>> Tarball NOT available. 6 days old revision from tags, not trunk.
>
> Not quite sure why 3.3.0 would be different from 3.3.1+2 would be
> different

OK, found the cause.  Somebody broke it by manually changing the 3.3.0 
update version in DNS to point at 1061118 which is a 3.4.0 version (and 
not necessarily compatible with 3.3.0!).  This is seriously bad.

I won't point fingers, but here's the last log for the zone at that time:

> wtogami   pts/5        cpe-76-93-222-12 Thu Jan 20 04:22 - 04:40  (00:18)
> wtogami   sshd         cpe-76-93-222-12 Thu Jan 20 04:22 - 04:40  (00:18)
> wtogami   pts/8        cpe-76-93-222-12 Thu Jan 20 04:08 - 04:40  (00:32)
> wtogami   sshd         cpe-76-93-222-12 Thu Jan 20 04:08 - 04:11  (00:03)

Anyway... it just became visible as 3.4.0 (trunk) version rules are only 
retained for a week before being automatically deleted.

I imagine this was done to push an update to some broken rule as 
referenced in bug 6533.  Of course this would have only "fixed" 3.3.0, 
if it didn't break it in some other way.

So I've took the following corrective action:

I've copied update 1052462 (which, I checked, passed 3.3.0 validation) 
to a new update 1061119 to supersede the bogus 1061118 update.

I've updated the DNS record for 3.3.0 to the 1061119 update version. 
DNS should reload in about 10 minutes now... once they mirrors have time 
to sync.


Going forward... we, probably me, need to get an automated way to push 
some sort of emergency rule update.

The current manually steps would be:

- un-tar an existing STABLE version rule update
- make the changes (using a patch or manually)
- test that those rules work with all .x versions that you're going to 
publish the update for (that is 3.3.0, 3.3.1, 3.3.2, etc...)
- tar up, sign and hash the update
- copy the three update files to the update tarball directory on the zone
- make the files all 544 and owned by updatesd:dns
- update the DNS record for each .x version
- wait 16 or more minutes (the mirrors rsync every 15) and reload the 
DNS zone

- alternatively for the last step you could immediately do this:

echo 
/export/home/updatesd/svn/spamassassin/build/mkupdates/tick_zone_serial 
| at -q n now + 16min


Daryl






The way you need to do it now is use an existin


Re: Update Mirror Issues

Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.
On 26/01/2011 10:12 PM, Kevin A. McGrail wrote:
> On 1/26/2011 5:39 PM, Karsten Bräckelmann wrote:
>> Just came up on the users list. Escalating. ;) The facts:
>>
>> 1.3.3.updates.spamassassin.org descriptive text "1052462"
>> 2.3.3.updates.spamassassin.org descriptive text "1052462"
>>
>> Rule update tarball available on mirrors. 4 weeks old revision from
>> trunk.
>>
>> 0.3.3.updates.spamassassin.org descriptive text "1061118"
>>
>> Tarball NOT available. 6 days old revision from tags, not trunk.

Not quite sure why 3.3.0 would be different from 3.3.1+2 would be 
different, but the reason we haven't had any stable branch rules 
published in a while is that we haven't had enough *recent* spam 
submitted.  Last nights cron job says:

HAM: 188008 (150000 required)
SPAM: 51330 (150000 required)
Insufficient spam corpus to generate scores; aborting.

Ever since my main corpus server hard drive failed in October my corpus 
hasn't had spam being added to it so my spam corpus finally aged out a 
few weeks ago.  I guess I need to get this fixed, but we also need more 
spam from others.

>> Moreover, the dostech mirror currently is unresponsive, serving neither.

Bah, I'm running into issues Infra had back when they were using VMware 
Server 1.0.x.  The VM pretty much halted around 12:32 PM today.  I'll be 
moving this VM to a new HP GL380 G7 running ESXi 4.1 in the next week or 
so.  That should put an end to issues with this VM.

> I think we knew this. See
> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6533

Interesting.  I seem to recall wanting "tflags allowpublish" at one 
point in the past... it seems no matter what, and I'm at fault too, 
people almost never use "tflags nopublish".

Regarding score-ing, yeah, the score generator should probably use any 
score in the sandbox as a maximum.  It's just a little hard to access 
the score, if I recall.

> Same issue?

Nah, I think the fact that we're not pushing updates due to no spam is 
the root cause.

Daryl

Re: Update Mirror Issues

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 1/26/2011 5:39 PM, Karsten Bräckelmann wrote:
> Just came up on the users list. Escalating. ;)  The facts:
>
>    1.3.3.updates.spamassassin.org descriptive text "1052462"
>    2.3.3.updates.spamassassin.org descriptive text "1052462"
>
> Rule update tarball available on mirrors. 4 weeks old revision from
> trunk.
>
>    0.3.3.updates.spamassassin.org descriptive text "1061118"
>
> Tarball NOT available. 6 days old revision from tags, not trunk.
>
>
> Moreover, the dostech mirror currently is unresponsive, serving neither.
>
>
I think we knew this.  See 
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6533

Same issue?