You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Simon Byrnand <si...@igrin.co.nz> on 2005/05/15 23:36:42 UTC

Bayes problems and German Spam

Hi All,

After going from 2.64 to 3.0.3 I thought Bayes was working much better - 
previously certain classes of spam were being consistently reported as ham, 
scoring BAYES_00 no matter what I did, or how much manual training I did. 
(Autolearning enabled)

After upgrading to 3.0.3 and clearing the Bayes database everything seemed 
fine for a week or so, now it's back to its old habits :(

Particularly frustrating is the complete inability of sa-learn to correct 
the thinking of Bayes - all the recent flood of German spams are scoring 
BAYES_00, and DESPITE the fact that I have manually learnt well over two 
dozen of these as spam (which includes all the variations of them I've seen 
so far) new copies of identical spams STILL score BAYES_00. WHY ?

If the autolearn system can't be overridden with some manual learning, it 
makes it more of less useless :(

A few other spams that were previously getting BAYES_99 are now down to 
BAYES_00 for no apparent reason. It's highly unlikely that they were 
autolearnt as ham, as they hit several other tests too. It seems that Bayes 
is still exploitable... :(

Any suggestions ?

Regards,
Simon


Re: Bayes problems and German Spam

Posted by Duncan Hill <sa...@nacnud.force9.co.uk>.
On Monday 16 May 2005 12:15, Ronan McGlue typed:

> I too have all net tests enabled and have started from a fresh clean new
> database friday, and already Im seeing the german spams hit bayes_00...
> I dont want to switch autolearning off becuase well i find it incredibly
> usefull. i have spam/ham thresholds at 10/0 respectivly and all appears
> well aside from the german bunch of spams...

The prolocation rulesets based on subject seem to be working quite well here.  
They're slowing dragging Bayes up - currently at _44 after ~50 spams that 
scored nice and high.  At least one server had worked out on its own that the 
spams should be _99.

Re: SARE / RDJ failing on both servers?

Posted by Joshua Tinnin <kr...@spymac.com>.
On Mon 16 May 05 05:22, MIKE YRABEDRA <sa...@323inc.com> wrote:
> Anyone else seeing this today?
>
> Connecting to www.rulesemporium.com[67.67.32.207]:80... failed:
> Connection refused.
> Connecting to www.rulesemporium.com[209.218.125.117]:80... failed:
> Connection refused.

Please do not hijack threads. It screws up threading in email clients 
which use referrers: http://home.pacbell.net/jtinnin/misc3/sccap2.jpg

If you want to ask a new question, please open a new email and start a 
new thread. Please do not reply to another post to the list and erase 
the text, as it leaves the reference in the headers. I'm not an admin 
of this list by any means, but it really would help.

- jt

SARE / RDJ failing on both servers?

Posted by MIKE YRABEDRA <sa...@323inc.com>.
Anyone else seeing this today?

Connecting to www.rulesemporium.com[67.67.32.207]:80... failed: Connection
refused.
Connecting to www.rulesemporium.com[209.218.125.117]:80... failed:
Connection refused.




Re: Bayes problems and German Spam

Posted by Ronan McGlue <r....@qub.ac.uk>.
Simon Byrnand wrote:
> At 09:53 16/05/2005, Jo wrote:
> 
>> Simon Byrnand wrote:
>>
>>> Hi All,
>>>
>>> After going from 2.64 to 3.0.3 I thought Bayes was working much 
>>> better - previously certain classes of spam were being consistently 
>>> reported as ham, scoring BAYES_00 no matter what I did, or how much 
>>> manual training I did. (Autolearning enabled)
>>>
>>> After upgrading to 3.0.3 and clearing the Bayes database everything 
>>> seemed fine for a week or so, now it's back to its old habits :(
>>>
>>> Particularly frustrating is the complete inability of sa-learn to 
>>> correct the thinking of Bayes - all the recent flood of German spams 
>>> are scoring BAYES_00, and DESPITE the fact that I have manually 
>>> learnt well over two dozen of these as spam (which includes all the 
>>> variations of them I've seen so far) new copies of identical spams 
>>> STILL score BAYES_00. WHY ?
>>>
>>> If the autolearn system can't be overridden with some manual 
>>> learning, it makes it more of less useless :(
>>>
>>> A few other spams that were previously getting BAYES_99 are now down 
>>> to BAYES_00 for no apparent reason. It's highly unlikely that they 
>>> were autolearnt as ham, as they hit several other tests too. It seems 
>>> that Bayes is still exploitable... :(
>>>
>>> Any suggestions ?
>>>
>>> Regards,
>>> Simon
>>
>>
>> Clear your bayes database and start all over again. Switch off 
>> auto-learning and rely purely on manual learning in a feedback loop. 
>> Grab a mail box of known ham and another folder of known spam. 
>> Preferably use a thousand of each.
> 
> 
> Hmm, not very practical when the system has several thousand 
> users/mailboxes. There is no way I would be able to keep current with 
> manual learning just based on my own personal mailbox...(and I can 
> hardly go poking around in other peoples mailboxes to gather ham/spam to 
> learn)
> 
>>  If you ever switch on autolearning again. Set the treshold at -0.2 
>> for ham and 10 or 15 for spam.
> 
> 
> Are there even any negative scores in 3.0.3 ? I thought negative scores 
> were pretty much eliminated in recent versions, so with -0.2 it would 
> never learn any ham.
> 
>> Enable network tests, razor2, pyzor and dcc work wonders on the site I 
>> administer.
> 
> 
> Already have all network tests enabled, always have done.
> 
> Regards,
> Simon
> 
I too have all net tests enabled and have started from a fresh clean new 
database friday, and already Im seeing the german spams hit bayes_00...
I dont want to switch autolearning off becuase well i find it incredibly 
usefull. i have spam/ham thresholds at 10/0 respectivly and all appears 
well aside from the german bunch of spams...

dont know what else i can do...


*cluches at straws*
Is there a way to tie in a positive net test... say multi.surbl.org  to 
sway the bayes as generally if the SURBL reports spam you can guaratee 
that all the other rules are surplus to requiremtns... IMHO

ronan

-- 
========

Regards

Ronan McGlue
Info. Services
QUB

Re: Bayes problems and German Spam

Posted by Simon Byrnand <si...@igrin.co.nz>.
At 09:53 16/05/2005, Jo wrote:

>Simon Byrnand wrote:
>
>>Hi All,
>>
>>After going from 2.64 to 3.0.3 I thought Bayes was working much better - 
>>previously certain classes of spam were being consistently reported as 
>>ham, scoring BAYES_00 no matter what I did, or how much manual training I 
>>did. (Autolearning enabled)
>>
>>After upgrading to 3.0.3 and clearing the Bayes database everything 
>>seemed fine for a week or so, now it's back to its old habits :(
>>
>>Particularly frustrating is the complete inability of sa-learn to correct 
>>the thinking of Bayes - all the recent flood of German spams are scoring 
>>BAYES_00, and DESPITE the fact that I have manually learnt well over two 
>>dozen of these as spam (which includes all the variations of them I've 
>>seen so far) new copies of identical spams STILL score BAYES_00. WHY ?
>>
>>If the autolearn system can't be overridden with some manual learning, it 
>>makes it more of less useless :(
>>
>>A few other spams that were previously getting BAYES_99 are now down to 
>>BAYES_00 for no apparent reason. It's highly unlikely that they were 
>>autolearnt as ham, as they hit several other tests too. It seems that 
>>Bayes is still exploitable... :(
>>
>>Any suggestions ?
>>
>>Regards,
>>Simon
>
>Clear your bayes database and start all over again. Switch off 
>auto-learning and rely purely on manual learning in a feedback loop. Grab 
>a mail box of known ham and another folder of known spam. Preferably use a 
>thousand of each.

Hmm, not very practical when the system has several thousand 
users/mailboxes. There is no way I would be able to keep current with 
manual learning just based on my own personal mailbox...(and I can hardly 
go poking around in other peoples mailboxes to gather ham/spam to learn)

>  If you ever switch on autolearning again. Set the treshold at -0.2 for 
> ham and 10 or 15 for spam.

Are there even any negative scores in 3.0.3 ? I thought negative scores 
were pretty much eliminated in recent versions, so with -0.2 it would never 
learn any ham.

>Enable network tests, razor2, pyzor and dcc work wonders on the site I 
>administer.

Already have all network tests enabled, always have done.

Regards,
Simon


Re: Bayes problems and German Spam

Posted by Jo <ml...@winfix.IT>.
Simon Byrnand wrote:

> Hi All,
>
> After going from 2.64 to 3.0.3 I thought Bayes was working much better 
> - previously certain classes of spam were being consistently reported 
> as ham, scoring BAYES_00 no matter what I did, or how much manual 
> training I did. (Autolearning enabled)
>
> After upgrading to 3.0.3 and clearing the Bayes database everything 
> seemed fine for a week or so, now it's back to its old habits :(
>
> Particularly frustrating is the complete inability of sa-learn to 
> correct the thinking of Bayes - all the recent flood of German spams 
> are scoring BAYES_00, and DESPITE the fact that I have manually learnt 
> well over two dozen of these as spam (which includes all the 
> variations of them I've seen so far) new copies of identical spams 
> STILL score BAYES_00. WHY ?
>
> If the autolearn system can't be overridden with some manual learning, 
> it makes it more of less useless :(
>
> A few other spams that were previously getting BAYES_99 are now down 
> to BAYES_00 for no apparent reason. It's highly unlikely that they 
> were autolearnt as ham, as they hit several other tests too. It seems 
> that Bayes is still exploitable... :(
>
> Any suggestions ?
>
> Regards,
> Simon

Clear your bayes database and start all over again. Switch off 
auto-learning and rely purely on manual learning in a feedback loop. 
Grab a mail box of known ham and another folder of known spam. 
Preferably use a thousand of each. If you ever switch on autolearning 
again. Set the treshold at -0.2 for ham and 10 or 15 for spam.
Enable network tests, razor2, pyzor and dcc work wonders on the site I 
administer.

Good luck,

Jo