You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Brian Godette <bg...@idcomm.com> on 2006/06/23 20:56:19 UTC

New bayes busting method.

So far this is the first time I've seen this be used.

Spammer is using a ham corpus message and including the entire plain text 
inside an HTML comment (<-- -->).

Re: New bayes busting method.

Posted by Michael Monnerie <mi...@it-management.at>.
On Samstag, 24. Juni 2006 00:06 Brian Godette wrote:
> Which basically means you've never trained or autolearned on airmiles
> rewards ham, which we happen to see a fair number of

That could be, as I sit here in Vienna, Austria, Europe, and my main 
language is german. Lots of things seem to be different here ;-)

mfg zmi
-- 
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0660/4156531                          .network.your.ideas.
// PGP Key:        "curl -s http://zmi.at/zmi3.asc | gpg --import"
// Fingerprint: 44A3 C1EC B71E C71A B4C2  9AA6 C818 847C 55CB A4EE
// Keyserver: www.keyserver.net                 Key-ID: 0x55CBA4EE

Re: New bayes busting method.

Posted by Brian Godette <bg...@idcomm.com>.
On Friday 23 June 2006 15:28, Michael Monnerie wrote:
> Are you sure about that? It would have to be a message that was ham,
> have (nearly) the same content, autolearn must be on and the message
> must have been learned. That's a lot of "if...and.." statements. I use
> sitewide bayes (hand trained), and got BAYES_99. Yes, I have autolearn
> on, but I do a lot of hand crafted training, and modified the default
> values for learn_as_(spam|ham).
>
> mfg zmi

Which basically means you've never trained or autolearned on airmiles rewards 
ham, which we happen to see a fair number of. And again, it still got flagged 
as spam, just didn't get a BAYES_99 like it would have had it not had a *real 
ham* plain text included in an html comment.

Re: New bayes busting method.

Posted by Michael Monnerie <mi...@it-management.at>.
On Freitag, 23. Juni 2006 21:58 Brian Godette wrote:
> Also note that a large amount of your score was from
> DCC, Razor, and URIBLs that didn't hit at the initial receipt of this
> message.

Yes, another reason to use greylisting *g* If I counted correct, it 
should still - but just - have been marked as SPAM. And it gave 
BAYES_99, which you could score 4.9 if you want. I don't need it, as 
there's almost never spam coming thru..

> This is only really an issue for people who use site-wide bayes as
> per-user bayes has a lower chance of having seen true ham similar to
> the encapsulated ham.

Are you sure about that? It would have to be a message that was ham, 
have (nearly) the same content, autolearn must be on and the message 
must have been learned. That's a lot of "if...and.." statements. I use 
sitewide bayes (hand trained), and got BAYES_99. Yes, I have autolearn 
on, but I do a lot of hand crafted training, and modified the default 
values for learn_as_(spam|ham).

mfg zmi
-- 
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0660/4156531                          .network.your.ideas.
// PGP Key:        "curl -s http://zmi.at/zmi3.asc | gpg --import"
// Fingerprint: 44A3 C1EC B71E C71A B4C2  9AA6 C818 847C 55CB A4EE
// Keyserver: www.keyserver.net                 Key-ID: 0x55CBA4EE

Re: New bayes busting method.

Posted by Brian Godette <bg...@idcomm.com>.
On Friday 23 June 2006 13:24, Michael Monnerie wrote:
> On Freitag, 23. Juni 2006 20:56 Brian Godette wrote:
> > Spammer is using a ham corpus message and including the entire plain
> > text inside an HTML comment (<-- -->).
>
> Seems to be "pas problem" for SA:
> X-Spam-Status: Yes, hits=16.9 required=5.0
> tests=BAYES_99=3.5,DCC_CHECK=2.17,
> DIGEST_MULTIPLE=0.765,FORGED_RCVD_HELO=0.135,HTML_90_100=0.113,
> HTML_MESSAGE=0.001,MIME_HTML_ONLY=0.001,RAZOR2_CF_RANGE_51_100=0.5,
> RAZOR2_CF_RANGE_E8_51_100=1.5,RAZOR2_CHECK=0.5,SARE_UNI=0.591,
> SPF_NEUTRAL=1.069,URIBL_BLACK=3,URIBL_OB_SURBL=3.008 autolearn=spam
> bayes=1.0000
>
> mfg zmi

Uh no. It still got marked as spam here, for other reasons. However the 
spammer is trying to lower the bayes score by including a ham corpus message 
inside an HTML comment. Also note that a large amount of your score was from 
DCC, Razor, and URIBLs that didn't hit at the initial receipt of this 
message.

This is only really an issue for people who use site-wide bayes as per-user 
bayes has a lower chance of having seen true ham similar to the encapsulated 
ham.

Re: New bayes busting method.

Posted by Michael Monnerie <mi...@it-management.at>.
On Freitag, 23. Juni 2006 20:56 Brian Godette wrote:
> Spammer is using a ham corpus message and including the entire plain
> text inside an HTML comment (<-- -->).

Seems to be "pas problem" for SA:
X-Spam-Status: Yes, hits=16.9 required=5.0 tests=BAYES_99=3.5,DCC_CHECK=2.17,
        DIGEST_MULTIPLE=0.765,FORGED_RCVD_HELO=0.135,HTML_90_100=0.113,
        HTML_MESSAGE=0.001,MIME_HTML_ONLY=0.001,RAZOR2_CF_RANGE_51_100=0.5,
        RAZOR2_CF_RANGE_E8_51_100=1.5,RAZOR2_CHECK=0.5,SARE_UNI=0.591,
        SPF_NEUTRAL=1.069,URIBL_BLACK=3,URIBL_OB_SURBL=3.008 autolearn=spam
        bayes=1.0000

mfg zmi
-- 
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0660/4156531                          .network.your.ideas.
// PGP Key:        "curl -s http://zmi.at/zmi3.asc | gpg --import"
// Fingerprint: 44A3 C1EC B71E C71A B4C2  9AA6 C818 847C 55CB A4EE
// Keyserver: www.keyserver.net                 Key-ID: 0x55CBA4EE