You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Robert Fitzpatrick <li...@webtent.net> on 2008/01/30 01:51:03 UTC

Bayes and celebrity spam

I have some users getting slammed with this spam. Before I start trying
to figure out how to intercept, can someone test this message and tell
me if your getting a score above 5.0?

http://esmtp.webtent.net/test.txt

I'm getting 4.4 on this particular one, but others less. My bayes still
insists on knocking it down even after learning 10-20 similar messages.
I believe our bayes is trained well with 94K spam versus 85K ham learned
with auto learning above 35 for spam and -3 for nonspam. All other is
manually trained mostly by me...

mx1# su vscan -c 'spamassassin -t < test.msg'
<snip>
Content analysis details:   (4.4 points, 5.0 required)

 pts rule name              description
---- ---------------------- --------------------------------------------------
 0.0 MISSING_MID            Missing Message-Id: header
 0.0 MISSING_DATE           Missing Date: header
 2.5 MISSING_HB_SEP         Missing blank line between message header and body
 0.0 UNPARSEABLE_RELAY      Informational: message has unparseable relay lines
 1.3 MISSING_HEADERS        Missing To: header
 1.5 SARE_ADULT1            BODY: Contains adult material
-2.6 BAYES_00               BODY: Bayesian spam probability is 0 to 1%
                            [score: 0.0000]
 1.8 MISSING_SUBJECT        Missing Subject: header

I am running SA 3.2.3 via amavisd-maia with most SARE rules, chickenpox
and other miscellaneous rules...

mx1# cat /usr/local/etc/mail/spamassassin/sare-sa-update-channels.txt
70_sare_evilnum0.cf.sare.sa-update.dostech.net
70_sare_adult.cf.sare.sa-update.dostech.net
99_sare_fraud_post25x.cf.sare.sa-update.dostech.net
72_sare_bml_post25x.cf.sare.sa-update.dostech.net
70_sare_spoof.cf.sare.sa-update.dostech.net
70_sare_bayes_poison_nxm.cf.sare.sa-update.dostech.net
70_sare_oem.cf.sare.sa-update.dostech.net
70_sare_random.cf.sare.sa-update.dostech.net
70_sare_header0.cf.sare.sa-update.dostech.net
70_sare_html0.cf.sare.sa-update.dostech.net
70_sare_specific.cf.sare.sa-update.dostech.net
70_sare_obfu0.cf.sare.sa-update.dostech.net
72_sare_redirect_post3.0.0.cf.sare.sa-update.dostech.net
70_sare_genlsubj0.cf.sare.sa-update.dostech.net
70_sare_unsub.cf.sare.sa-update.dostech.net
70_sare_uri0.cf.sare.sa-update.dostech.net
70_sare_whitelist.cf.sare.sa-update.dostech.net
70_sare_whitelist_spf.cf.sare.sa-update.dostech.net
70_sare_stocks.cf.sare.sa-update.dostech.net
updates.spamassassin.org

-- 
Robert


Re: Bayes and celebrity spam

Posted by Loren Wilton <lw...@earthlink.net>.
> He is obviously a target, but some of this is very obvious, no? With
> subject like 'Jennifer Garner showing tits and booty in the shower
> fbeqxunqpwpjauxekoyx' and body containing...
>
> www(dot)prnceleb(dot)com now," Malfoy went on. of metal, and
> tnlffifuubqrnvrrtneekyntauypuqlecgwjaihf
>
> Is this some new variant we're having to deal with?

Well, I think the answer is pretty obvious -- there aren't a lot of rules 
hitting on it, so it isn't something that has been happening a lot before or 
there would be rules for it!

The general format is a fairly common thing that has been around for years, 
but obviously this has been tweaked to miss on badwords rules and the like, 
and that (dot) trick is probably intended to avoid SURBL and the like.

The long string of junk would have hit on rules in the past, but that sort 
of thing has become uncommon, so those kinds of rules have decayed out..

        Loren



Re: Bayes and celebrity spam

Posted by Robert Fitzpatrick <li...@webtent.net>.
On Tue, 2008-01-29 at 22:16 -0500, Mark Johnson wrote:
> I put extreme scores against emails from TW as we don't do business with 
> anyone from there.  If it wasn't for that, this would have made it 
> through my system as well.  I am really surprised bayes scored a 0 as it 
> did for the original poster.  I do serious bayes training on a regular 
> basis.  I see alot of others are getting bayes scores of 80.
> 
> Content analysis details:   (5.6 points, 5.0 required)
> 
>   pts rule name              description
> ---- ---------------------- 
> --------------------------------------------------
>   0.9 SUBJ_HAS_SPACES        Subject contains lots of white space
>   0.2 SUBJECT_NOVOWEL        Subject: has long non-vowel letter sequence
>   7.0 RELAYCOUNTRY_TW        Relayed through TW
>   0.2 SUBJ_HAS_UNIQ_ID       Subject contains a unique ID
> -2.6 BAYES_00               BODY: Bayesian spam probability is 0 to 1%
>                              [score: 0.0000]
>   0.0 HTML_MESSAGE           BODY: HTML included in message
> 

Well, it looks like I'll need to start learning how to write some rules
to kick these. I have one person that is flooded with these kinds of
messages, bunch of Yahoo and celeb porn. He sends them over asking isn't
this spam obvious to block. Well, I've been browsing my caches of user
mail and can't find anyone else getting slammed like this guy with these
messages. Not that there aren't any I'm sure, but even people within his
own domain that receive the same level of mail, can't find one. 

He is obviously a target, but some of this is very obvious, no? With
subject like 'Jennifer Garner showing tits and booty in the shower
fbeqxunqpwpjauxekoyx' and body containing...

www(dot)prnceleb(dot)com now," Malfoy went on. of metal, and
tnlffifuubqrnvrrtneekyntauypuqlecgwjaihf

Is this some new variant we're having to deal with?
-- 
Robert


Re: Bayes and celebrity spam

Posted by Mark Johnson <sa...@astroshapes.com>.
Theo Van Dinter wrote:
> On Tue, Jan 29, 2008 at 07:51:03PM -0500, Robert Fitzpatrick wrote:
>> I have some users getting slammed with this spam. Before I start trying
>> to figure out how to intercept, can someone test this message and tell
>> me if your getting a score above 5.0?
>>
>> http://esmtp.webtent.net/test.txt
>>
>>  2.5 MISSING_HB_SEP         Missing blank line between message header and body
> 
> This appears to be a badly pasted email.  For example, the topmost Received
> header (and then a lot of the rest of the headers) is malformed.
> 
> Hitting MISSING_HB_SEP w/ real mails is possible, but very uncommon.  If you
> see it hitting somewhere, you're more likely to have a misconfiguration in
> your setup than a valid hit.
> 

I put extreme scores against emails from TW as we don't do business with 
anyone from there.  If it wasn't for that, this would have made it 
through my system as well.  I am really surprised bayes scored a 0 as it 
did for the original poster.  I do serious bayes training on a regular 
basis.  I see alot of others are getting bayes scores of 80.

Content analysis details:   (5.6 points, 5.0 required)

  pts rule name              description
---- ---------------------- 
--------------------------------------------------
  0.9 SUBJ_HAS_SPACES        Subject contains lots of white space
  0.2 SUBJECT_NOVOWEL        Subject: has long non-vowel letter sequence
  7.0 RELAYCOUNTRY_TW        Relayed through TW
  0.2 SUBJ_HAS_UNIQ_ID       Subject contains a unique ID
-2.6 BAYES_00               BODY: Bayesian spam probability is 0 to 1%
                             [score: 0.0000]
  0.0 HTML_MESSAGE           BODY: HTML included in message

--
Mark Johnson
http://www.astroshapes.com/information-technology/blog/


Re: Bayes and celebrity spam

Posted by Duane Hill <d....@yournetplus.com>.
On Tue, 29 Jan 2008 20:22:59 -0500
Theo Van Dinter <fe...@apache.org> wrote:

> On Tue, Jan 29, 2008 at 07:51:03PM -0500, Robert Fitzpatrick wrote:
> > I have some users getting slammed with this spam. Before I start
> > trying to figure out how to intercept, can someone test this
> > message and tell me if your getting a score above 5.0?
> > 
> > http://esmtp.webtent.net/test.txt
> > 
> >  2.5 MISSING_HB_SEP         Missing blank line between message
> > header and body
> 
> This appears to be a badly pasted email.  For example, the topmost
> Received header (and then a lot of the rest of the headers) is
> malformed.
> 
> Hitting MISSING_HB_SEP w/ real mails is possible, but very uncommon.
> If you see it hitting somewhere, you're more likely to have a
> misconfiguration in your setup than a valid hit.

That explains why my test results were so high.

------
  _|_
 (_| |

Re: Bayes and celebrity spam

Posted by Loren Wilton <lw...@earthlink.net>.
> Can I get some tests now on my properly formatted file by anyone to see
> if my scoring should be blocking this message? Sorry for the previously
> posted poorly formatted files...and thanks for the help!
>
> http://esmtp.webtent.net/test2.txt

Well, my results probably aren't hugely representative, but:

Content analysis details:   (9.0 points, 5.0 required)

 pts rule name              description
---- ---------------------- --------------------------------------------------
 1.0 RELAY_IS_203           RELAY_IS_203
 0.0 LW_YAHOO_FROM          LW_YAHOO_FROM
 0.0 DK_POLICY_TESTING      Domain Keys: policy says domain is testing DK
 0.0 DK_SIGNED              Domain Keys: message has a signature
-0.0 DK_VERIFIED            Domain Keys: signature passes verification
 2.0 BAYES_80               BODY: Bayesian spam probability is 80 to 95%
                            [score: 0.8980]
 0.0 HTML_MESSAGE           BODY: HTML included in message
 1.5 JD_YAHOO_REDIRECT      FULL: Yahoo Redirect Ads
 0.5 NOT_SENDER_MSGID       Sender host doesn't match message-id host
 2.0 NOT_FROM_SENDER        Not from putative sender
 2.0 NOT_TO_ME              Mail is not addressed to me

Most of those rules you aren't going to have, and most of them are 
triggering because your received headers don't match would I would have seen 
for the same mail.

So other than the bayes and 203 relay rules you probably wouldn't hit much. 
Since we score points for redirections (and in this case, search links) the 
message picked up 1.5 points there, but the standard rules don't have that.

        Loren



Re: Bayes and celebrity spam

Posted by Robert Fitzpatrick <li...@webtent.net>.
On Tue, 2008-01-29 at 18:05 -0800, Loren Wilton wrote:
> There is still something wrong with the message you pasted, and possibly 
> with how you are runing it into SA to test:
> 
> Received: from n6c.bullet.mail.tp2.yahoo.com (n6c.bullet.mail.tp2.yahoo.com 
> [203.188.202.136])
>  \x09by esmtp.ky.webtent.net (WebTent ESMTP Postfix Internet Mail Gateway) 
> with SMTP id 2348137B72A
> 
> Notice that that second line starts with " \x09by".  This is a text string 
> that won't be recognized as a tab followed by "by", which was apparently 
> what was in the original message before something helpfully changed the tab 
> character to a hex representation.
> 
> Pull those \x09's out of the message, replacing them with tabs or spaces, 
> and things should at least recognize the received headers correctly.
> 
> > 0.0 MISSING_MID            Missing Message-Id: header
> > 0.0 MISSING_DATE           Missing Date: header
> > 2.5 MISSING_HB_SEP         Missing blank line between message header and 
> > body
> > 1.3 MISSING_HEADERS        Missing To: header
> > 1.8 MISSING_SUBJECT        Missing Subject: header
> > 1.4 EMPTY_MESSAGE          Message appears to have no textual parts and no
> 
> But it still looks like you ran something close to a blank file through SA.
> Make sure that the first line of the file you send to SA isn't blank, or 
> there is a prepended space on every line or some such.
> 
>         Loren
> 

Yes, I removed what seemed to be one space added to start of each line
after dumping from the db field and translated the \x09 into a single
space and now the score is matching what I have in Maia...

Can I get some tests now on my properly formatted file by anyone to see
if my scoring should be blocking this message? Sorry for the previously
posted poorly formatted files...and thanks for the help!

http://esmtp.webtent.net/test2.txt

-- 
Robert


Re: Bayes and celebrity spam

Posted by Loren Wilton <lw...@earthlink.net>.
There is still something wrong with the message you pasted, and possibly 
with how you are runing it into SA to test:

Received: from n6c.bullet.mail.tp2.yahoo.com (n6c.bullet.mail.tp2.yahoo.com 
[203.188.202.136])
 \x09by esmtp.ky.webtent.net (WebTent ESMTP Postfix Internet Mail Gateway) 
with SMTP id 2348137B72A

Notice that that second line starts with " \x09by".  This is a text string 
that won't be recognized as a tab followed by "by", which was apparently 
what was in the original message before something helpfully changed the tab 
character to a hex representation.

Pull those \x09's out of the message, replacing them with tabs or spaces, 
and things should at least recognize the received headers correctly.

> 0.0 MISSING_MID            Missing Message-Id: header
> 0.0 MISSING_DATE           Missing Date: header
> 2.5 MISSING_HB_SEP         Missing blank line between message header and 
> body
> 1.3 MISSING_HEADERS        Missing To: header
> 1.8 MISSING_SUBJECT        Missing Subject: header
> 1.4 EMPTY_MESSAGE          Message appears to have no textual parts and no

But it still looks like you ran something close to a blank file through SA.
Make sure that the first line of the file you send to SA isn't blank, or 
there is a prepended space on every line or some such.

        Loren



Re: Bayes and celebrity spam

Posted by Chris <cp...@embarqmail.com>.
On Tuesday 29 January 2008 7:51 pm, Robert Fitzpatrick wrote:

> http://esmtp.webtent.net/test2.txt
>
> I have gone through my debug, but can't seem to spot any problems. How
> can one send debug output to file? And what do you think I should be
> looking for given the results of my test?

FWIW, on my home stand alone system your message scored thus:

Content analysis details:   (16.7 points, 5.0 required)

 pts rule name              description
---- ---------------------- --------------------------------------------------
 0.0 RELAY_UK               Relayed through Brittan
 0.0 DK_POLICY_TESTING      Domain Keys: policy says domain is testing DK
 0.0 DK_SIGNED              Domain Keys: message has a signature
-0.0 DK_VERIFIED            Domain Keys: signature passes verification
 0.0 HTML_MESSAGE           BODY: HTML included in message
 1.0 BAYES_50               BODY: Bayesian spam probability is 40 to 60%
                            [score: 0.5101]
 2.2 DCC_CHECK              listed in DCC (http://rhyolite.com/anti-spam/dcc/)
                            [cpollock 104; Body=1 Fuz1=1 Fuz2=many]
  10 CLAMAV                 Clam AntiVirus detected a virus
 2.5 L_UNVERIFIED_YAHOO     L_UNVERIFIED_YAHOO
 1.0 SAGREY                 Adds 1.0 to spam from first-time senders


-- 
Chris
KeyID 0xE372A7DA98E6705C

Re: Bayes and celebrity spam

Posted by Robert Fitzpatrick <li...@webtent.net>.
On Tue, 2008-01-29 at 20:22 -0500, Theo Van Dinter wrote:
> On Tue, Jan 29, 2008 at 07:51:03PM -0500, Robert Fitzpatrick wrote:
> > I have some users getting slammed with this spam. Before I start trying
> > to figure out how to intercept, can someone test this message and tell
> > me if your getting a score above 5.0?
> > 
> > http://esmtp.webtent.net/test.txt
> > 
> >  2.5 MISSING_HB_SEP         Missing blank line between message header and body
> 
> This appears to be a badly pasted email.  For example, the topmost Received
> header (and then a lot of the rest of the headers) is malformed.
> 
> Hitting MISSING_HB_SEP w/ real mails is possible, but very uncommon.  If you
> see it hitting somewhere, you're more likely to have a misconfiguration in
> your setup than a valid hit.
> 

Thanks for the tips, I pasted from Maia Mailguard web GUI by clicking
View Raw. Not sure if you're familiar, Maia is an amavisd-2.2 spin off.
I exported contents from the pgsql db this time with another suspect and
seeing something very wrong. While Maia shows a negative score on this
next test with Bayes factored...

 0.001 HTML_MESSAGE HTML included in message
-2.599 BAYES_00 Bayesian spam probability is 0 to 1%

It scores way over kill with zero Bayes points when running from the
command line...

Content analysis details:   (9.2 points, 5.0 required)

 pts rule name              description
---- ---------------------- --------------------------------------------------
 0.0 MISSING_MID            Missing Message-Id: header
 0.0 MISSING_DATE           Missing Date: header
 2.5 MISSING_HB_SEP         Missing blank line between message header and body
 1.3 MISSING_HEADERS        Missing To: header
 0.0 BAYES_50               BODY: Bayesian spam probability is 40 to 60%
                            [score: 0.4988]
 2.2 TVD_SPACE_RATIO        BODY: TVD_SPACE_RATIO
 1.8 MISSING_SUBJECT        Missing Subject: header
 1.4 EMPTY_MESSAGE          Message appears to have no textual parts and no
                            Subject: text

http://esmtp.webtent.net/test2.txt

I have gone through my debug, but can't seem to spot any problems. How
can one send debug output to file? And what do you think I should be
looking for given the results of my test?

-- 
Robert


Re: Bayes and celebrity spam

Posted by Theo Van Dinter <fe...@apache.org>.
On Tue, Jan 29, 2008 at 07:51:03PM -0500, Robert Fitzpatrick wrote:
> I have some users getting slammed with this spam. Before I start trying
> to figure out how to intercept, can someone test this message and tell
> me if your getting a score above 5.0?
> 
> http://esmtp.webtent.net/test.txt
> 
>  2.5 MISSING_HB_SEP         Missing blank line between message header and body

This appears to be a badly pasted email.  For example, the topmost Received
header (and then a lot of the rest of the headers) is malformed.

Hitting MISSING_HB_SEP w/ real mails is possible, but very uncommon.  If you
see it hitting somewhere, you're more likely to have a misconfiguration in
your setup than a valid hit.

-- 
Randomly Selected Tagline:
"It's not you Bernie.  I guess I'm just not used to being chased around
 a mall at night by killer robots." - Linda from the movie "Chopping Mall"

Re: Bayes and celebrity spam

Posted by Duane Hill <d....@yournetplus.com>.
On Tue, 29 Jan 2008 19:51:03 -0500
Robert Fitzpatrick <li...@webtent.net> wrote:

> I have some users getting slammed with this spam. Before I start
> trying to figure out how to intercept, can someone test this message
> and tell me if your getting a score above 5.0?
> 
> http://esmtp.webtent.net/test.txt
> 
> I'm getting 4.4 on this particular one, but others less. My bayes
> still insists on knocking it down even after learning 10-20 similar
> messages. I believe our bayes is trained well with 94K spam versus
> 85K ham learned with auto learning above 35 for spam and -3 for
> nonspam. All other is manually trained mostly by me...
> 
> mx1# su vscan -c 'spamassassin -t < test.msg'
> <snip>
> Content analysis details:   (4.4 points, 5.0 required)
> 
>  pts rule name              description
> ---- ----------------------
> -------------------------------------------------- 0.0
> MISSING_MID            Missing Message-Id: header 0.0
> MISSING_DATE           Missing Date: header 2.5
> MISSING_HB_SEP         Missing blank line between message header and
> body 0.0 UNPARSEABLE_RELAY      Informational: message has
> unparseable relay lines 1.3 MISSING_HEADERS        Missing To: header
> 1.5 SARE_ADULT1            BODY: Contains adult material -2.6
> BAYES_00               BODY: Bayesian spam probability is 0 to 1%
> [score: 0.0000] 1.8 MISSING_SUBJECT        Missing Subject: header

X-Spam-Level: xxxxxxx
X-Spam-Status: Reqd:5.0 Hits:7.1 Learn:disabled
Tests:MISSING_DATE=0.001,
MISSING_HB_SEP=2.5,MISSING_HEADERS=1.581,MISSING_MID=0.001,
MISSING_SUBJECT=1.285,SARE_ADULT1=1.47,UNPARSEABLE_RELAY=0.25


[snip]

-------
  _|_
 (_| |