You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Simon Loewenthal <si...@klunky.co.uk> on 2013/07/24 13:00:59 UTC

Test email hitting BAYES_00

 

Hi, 

 Yesterday, this did not hit BAYES at all, and now this hits BAYES_00,
and I did not use autolearn. I did a sa-learn --forget for good measure
and this changed nothing (*see below). I am a little flummoxed. Do any
of you have any ideas? 

Little email and result of spamc can be found here
http://pastebin.com/5N0xhWms [1] 

Thanks, Simon. 

_*_ 

_# sa-learn --forget --username=spammyd aaa_
_Forgot tokens from 0 message(s) (1 message(s) examined)_

 

Links:
------
[1] http://pastebin.com/5N0xhWms

Re: Test email hitting BAYES_00

Posted by Karsten Bräckelmann <gu...@rudersport.de>.

On Wed, 2013-07-24 at 15:15 +0200, Simon Loewenthal wrote:
> I rewrote this (not GTUBE anymore) and had the same bayes score
> http://pastebin.com/ATqch32Y

Simon, it seems you have a false understanding of Bayes and how it
works. Quoting parts of the mail body from that paste:

> You should send this from outside of your car.

So you "rewrote" the body, or rather modified it slightly, replacing a
few words. Note that it *did* change the result, now hitting BAYES_20
rather than 00 before.

First things first: Your (original) test message is *based* on GTUBE. It
isn't GTUBE at all, though -- which led to quite some confusion in this
thread.

Starting off with GTUBE (the mail), you stripped what GTUBE actually is:
The weird 68 byte string. Then you took that message, which started off
as the test message to verify mail gets passed through SA, and continued
using it as test message -- modifying it to match your own rules.

Nothing wrong about that, though I'd suggest to remove the (descriptive,
textual, and actually meant as instructions) claim from the mail body to
constitute GTUBE.

Send it from outside of your car. Send it from outside of your network.

This summarizes the "rewriting" you just did, in hopes for SA to
magically stop hitting low Bayes rules.

You're barking up the wrong tree -- you should outright ignore the Bayes
score, when you actually are testing your own rules. Granted, short-
circuiting on BAYES_00 got in the way, resulting in your own rules not
being tested at all and thus not matching. [1]

Solution: Disable short-circuiting, or prevent your test mail from
sporting a really low Bayes score. Getting at that next.

Bayes, more precisely the Bayesian probability, is the likeliness of the
mail being solicited, wanted, hammy, you name it. The lower the number
in the BAYES_nn rules, the more hammy. Also, the SA Bayes implementation
considers tokens. Words, to keep it easy. It does not recognize and
consider sentences, nor multi-word tokens.

Example. See that quote above, and how you modified it. Let's assume
that'd be the complete mail text and all Bayes get's to see. Keeping in
mind the BAYES_nn change from 00 (less than 20% chance spam) to 20 (more
than 20%, less than 40% chance spam), we see an impact of that change.

Namely, the word "network" in mail has a much higher probability of the
mail being ham -- compared to the word "car". You're either not into
cars, or just don't like Lisp...

How does that help preventing BAYES_00 hits? Remove hammy tokens from
your test message. Even better, remove all the text. That is, the GTUBE
instructions you kept. You don't want it being evaluated, your focus is
on your rules for testing -- so why keep it around?

In closing, there is absolutely no problem with your test mail hitting
BAYES_00, unless these words and tokens are what all your spam looks
like...

[1] Taking a guess, that's probably why you had the impression Bayes
    might not have been hit at all before. Which is really unlikely.
    There's always a BAYES_nn rule indicating the Bayesian probability
    on a scale ranging from ham to spam.

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Re: Test email hitting BAYES_00

Posted by Simon Loewenthal <si...@klunky.co.uk>.

On 2013-07-24 15:59, RW wrote: 

> On Wed, 24 Jul 2013 15:15:01 +0200
> Simon Loewenthal wrote:
> 
>> I rewrote this (not GTUBE anymore) and had the same bayes score http://pastebin.com/ATqch32Y [1] [3]
> 
> It's not particularly surprising it hits BAYES_00, aside from the
> obfuscated words it's not very spammy. 
> 
> What you originally said was:
> 
>> Yesterday, this did not hit BAYES at all
> 
> Did you mean that literally? Or did you mean that it previously got a
> neutral result (BAYES_50)?

Literally. I don't recall seeing bayes yesterday, but sadly I've closed
the terminal and have no logging since then. Hopefully this is my
imaginative memory at work and this scored neutrally. I;m glad you asked
this question because you made me think a little clearer instead of
guessing. 

-- 

Links:
------
[1] http://pastebin.com/ATqch32Y

Re: Test email hitting BAYES_00

Posted by RW <rw...@googlemail.com>.

On Wed, 24 Jul 2013 15:15:01 +0200
Simon Loewenthal wrote:

> I rewrote this (not GTUBE anymore) and had the same bayes score
> http://pastebin.com/ATqch32Y [3]

It's not particularly surprising it hits BAYES_00, aside from the
obfuscated words it's not very spammy. 

What you originally said was:

> Yesterday, this did not hit BAYES at all

Did you mean that literally? Or did you mean that it previously got a
neutral result (BAYES_50)?

Re: Test email hitting BAYES_00

Posted by Simon Loewenthal <si...@klunky.co.uk>.

 

On 2013-07-24 14:41, RW wrote: 

> On Wed, 24 Jul 2013 14:04:36 +0200
> JK4 wrote:
> 
>> On 2013-07-24 13:31, RW wrote:
> This isn't a GTUBE email, it's an email with lots of innocuous text and the obfuscated name of a drug claiming to be a GTUBE email. http://spamassassin.apache.org/gtube/ [1] [2] If it wasn't previously getting any BAYES result then presumably it was short-circuiting on something. Perhaps the previous mail was a real GTUBE mail short-circuiting on GTUBE - although I'm not sure why anyone would want to do that. This is a GTUBE test email I'm using to test if rules I wrote fired. I just don't know why this started hitting bayes zero all of a sudden.

As I already said, it's *not* a GTUBE test email. Take a look at the
definition in [2] and then take a look at the email you posted in
[1].

Even if it were there's no reason Bayes should recognise a GTUBE mail as
spam unless it's been trained to recognise them. SpamAssassin will
recognise a GTUBE email, but there's no reason why each of its
individual components should be aware of GTUBE.

> Links: ------ [1] http://pastebin.com/5N0xhWms [2] [2] http://spamassassin.apache.org/gtube/ [1]

I rewrote this (not GTUBE anymore) and had the same bayes score
http://pastebin.com/ATqch32Y [3]
 

Links:
------
[1] http://spamassassin.apache.org/gtube/
[2] http://pastebin.com/5N0xhWms
[3] http://pastebin.com/ATqch32Y

Re: Test email hitting BAYES_00

Posted by RW <rw...@googlemail.com>.

On Wed, 24 Jul 2013 14:04:36 +0200
JK4 wrote:

>  
> 
> On 2013-07-24 13:31, RW wrote: 

> > This isn't a GTUBE email, it's an email with lots of innocuous text
> > and the obfuscated name of a drug claiming to be a GTUBE email. 
> > 
> > http://spamassassin.apache.org/gtube/ [2]
> > 
> > If it wasn't previously getting any BAYES result then presumably it
> > was short-circuiting on something. Perhaps the previous mail was a
> > real GTUBE mail short-circuiting on GTUBE - although I'm not sure
> > why anyone would want to do that.
> 
> This is a GTUBE test email I'm using to test if rules I wrote fired. I
> just don't know why this started hitting bayes zero all of a sudden.

As I already said, it's *not* a GTUBE test email. Take a look at the
definition in [2] and then take a look at the email you posted in
[1].

Even if it were there's no reason Bayes should recognise a GTUBE mail as
spam unless it's been trained to recognise them. SpamAssassin will
recognise a GTUBE email, but there's no reason why each of its
individual components should be aware of GTUBE.  

> Links:
> ------
> [1] http://pastebin.com/5N0xhWms
> [2] http://spamassassin.apache.org/gtube/

Re: Test email hitting BAYES_00

Posted by Benny Pedersen <me...@junc.eu>.

JK4 skrev den 2013-07-24 14:04:

> This is a GTUBE test email I'm using to test if rules I wrote fired.
> I just don't know why this started hitting bayes zero all of a 
> sudden.
> This shortcircuits because the server is configured to do so, and I
> could turn this off.

what is learned so ?

Re: Test email hitting BAYES_00

Posted by JK4 <ju...@klunky.co.uk>.

 

On 2013-07-24 13:31, RW wrote: 

> On Wed, 24 Jul 2013 13:00:59 +0200
> Simon Loewenthal wrote:
> 
>> Hi, Yesterday, this did not hit BAYES at all, and now this hits BAYES_00, and I did not use autolearn. I did a sa-learn --forget for good measure and this changed nothing (*see below). I am a little flummoxed. Do any of you have any ideas? Little email and result of spamc can be found here http://pastebin.com/5N0xhWms [1] [1]
> 
> This isn't a GTUBE email, it's an email with lots of innocuous text and
> the obfuscated name of a drug claiming to be a GTUBE email. 
> 
> http://spamassassin.apache.org/gtube/ [2]
> 
> If it wasn't previously getting any BAYES result then presumably it was
> short-circuiting on something. Perhaps the previous mail was a real
> GTUBE mail short-circuiting on GTUBE - although I'm not sure why
> anyone would want to do that.

This is a GTUBE test email I'm using to test if rules I wrote fired. I
just don't know why this started hitting bayes zero all of a sudden.
This shortcircuits because the server is configured to do so, and I
could turn this off. 
 

Links:
------
[1] http://pastebin.com/5N0xhWms
[2] http://spamassassin.apache.org/gtube/

Re: Test email hitting BAYES_00

Posted by RW <rw...@googlemail.com>.

On Wed, 24 Jul 2013 13:00:59 +0200
Simon Loewenthal wrote:

>  
> 
> Hi, 
> 
>  Yesterday, this did not hit BAYES at all, and now this hits BAYES_00,
> and I did not use autolearn. I did a sa-learn --forget for good
> measure and this changed nothing (*see below). I am a little
> flummoxed. Do any of you have any ideas? 
> 
> Little email and result of spamc can be found here
> http://pastebin.com/5N0xhWms [1] 
> 

This isn't a GTUBE email, it's an email with lots of innocuous text and
the obfuscated name of a drug claiming to be a GTUBE email.  

http://spamassassin.apache.org/gtube/

If it wasn't previously getting any BAYES result then presumably it was
short-circuiting on something. Perhaps the previous mail was a real
GTUBE mail short-circuiting on GTUBE - although I'm not sure why
anyone would want to do that.

Re: Test email hitting BAYES_00

Posted by Benny Pedersen <me...@junc.eu>.

JK4 skrev den 2013-07-24 14:40:

> #shortcircuit BAYES_00 ham

or change it to on, not adding ham score here

> I ran my message through spamc []see pastebin below), but this still
> won't explain why this hits bayes 00 :(

the error is to not add -100 on shortcircuit, it is just save circles 
not helping make the problems better or worse on scores

Re: Test email hitting BAYES_00

Posted by JK4 <ju...@klunky.co.uk>.

 

On 2013-07-24 14:19, Benny Pedersen wrote: 

> Simon Loewenthal skrev den 2013-07-24 13:00:
> 
>> Little email and result of spamc can be found here http://pastebin.com/5N0xhWms [1] [1]
> 
> -100 SHORTCIRCUIT Not all rules were run, due to a 
> shortcircuited rule
> [score: 0.0008]
> -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
> 
> do you have shortcircuit on bayes_00 ?
> 
> solution set score on shortcircuit to score -0.001
> 
> playing fair to autolearn

Commented out the shirtcircuit 

#shortcircuit BAYES_00 ham

I ran my message through spamc []see pastebin below), but this still
won't explain why this hits bayes 00 :( 

http://pastebin.com/bNrEsVru [2] 
 

Links:
------
[1] http://pastebin.com/5N0xhWms
[2] http://pastebin.com/bNrEsVru

Re: Test email hitting BAYES_00

Posted by Benny Pedersen <me...@junc.eu>.

Simon Loewenthal skrev den 2013-07-24 13:00:

> Little email and result of spamc can be found here
> http://pastebin.com/5N0xhWms [1]

-100 SHORTCIRCUIT           Not all rules were run, due to a 
shortcircuited rule
                             [score: 0.0008]
-1.9 BAYES_00               BODY: Bayes spam probability is 0 to 1%

do you have shortcircuit on bayes_00 ?

solution set score on shortcircuit to score -0.001

playing fair to autolearn

Re: Test email hitting BAYES_00

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.

On 24.07.13 13:00, Simon Loewenthal wrote:
> Yesterday, this did not hit BAYES at all, and now this hits BAYES_00,
>and I did not use autolearn. I did a sa-learn --forget for good measure
>and this changed nothing (*see below). I am a little flummoxed. Do any
>of you have any ideas?

>_# sa-learn --forget --username=spammyd aaa_
>_Forgot tokens from 0 message(s) (1 message(s) examined)_

you can forget an e-mail only if you trained it (THE one e-mail).
there's usually no reason to forget e-mails, of course unless it's your own
faked e-mail test. Usually it's much better to train each specific mail as
spam or ham.
-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
- Have you got anything without Spam in it?
- Well, there's Spam egg sausage and Spam, that's not got much Spam in it.