You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Robert Fitzpatrick <li...@webtent.net> on 2008/01/30 01:51:03 UTC
Bayes and celebrity spam
I have some users getting slammed with this spam. Before I start trying
to figure out how to intercept, can someone test this message and tell
me if your getting a score above 5.0?
http://esmtp.webtent.net/test.txt
I'm getting 4.4 on this particular one, but others less. My bayes still
insists on knocking it down even after learning 10-20 similar messages.
I believe our bayes is trained well with 94K spam versus 85K ham learned
with auto learning above 35 for spam and -3 for nonspam. All other is
manually trained mostly by me...
mx1# su vscan -c 'spamassassin -t < test.msg'
<snip>
Content analysis details: (4.4 points, 5.0 required)
pts rule name description
---- ---------------------- --------------------------------------------------
0.0 MISSING_MID Missing Message-Id: header
0.0 MISSING_DATE Missing Date: header
2.5 MISSING_HB_SEP Missing blank line between message header and body
0.0 UNPARSEABLE_RELAY Informational: message has unparseable relay lines
1.3 MISSING_HEADERS Missing To: header
1.5 SARE_ADULT1 BODY: Contains adult material
-2.6 BAYES_00 BODY: Bayesian spam probability is 0 to 1%
[score: 0.0000]
1.8 MISSING_SUBJECT Missing Subject: header
I am running SA 3.2.3 via amavisd-maia with most SARE rules, chickenpox
and other miscellaneous rules...
mx1# cat /usr/local/etc/mail/spamassassin/sare-sa-update-channels.txt
70_sare_evilnum0.cf.sare.sa-update.dostech.net
70_sare_adult.cf.sare.sa-update.dostech.net
99_sare_fraud_post25x.cf.sare.sa-update.dostech.net
72_sare_bml_post25x.cf.sare.sa-update.dostech.net
70_sare_spoof.cf.sare.sa-update.dostech.net
70_sare_bayes_poison_nxm.cf.sare.sa-update.dostech.net
70_sare_oem.cf.sare.sa-update.dostech.net
70_sare_random.cf.sare.sa-update.dostech.net
70_sare_header0.cf.sare.sa-update.dostech.net
70_sare_html0.cf.sare.sa-update.dostech.net
70_sare_specific.cf.sare.sa-update.dostech.net
70_sare_obfu0.cf.sare.sa-update.dostech.net
72_sare_redirect_post3.0.0.cf.sare.sa-update.dostech.net
70_sare_genlsubj0.cf.sare.sa-update.dostech.net
70_sare_unsub.cf.sare.sa-update.dostech.net
70_sare_uri0.cf.sare.sa-update.dostech.net
70_sare_whitelist.cf.sare.sa-update.dostech.net
70_sare_whitelist_spf.cf.sare.sa-update.dostech.net
70_sare_stocks.cf.sare.sa-update.dostech.net
updates.spamassassin.org
--
Robert
Re: Bayes and celebrity spam
Posted by Loren Wilton <lw...@earthlink.net>.
> He is obviously a target, but some of this is very obvious, no? With
> subject like 'Jennifer Garner showing tits and booty in the shower
> fbeqxunqpwpjauxekoyx' and body containing...
>
> www(dot)prnceleb(dot)com now," Malfoy went on. of metal, and
> tnlffifuubqrnvrrtneekyntauypuqlecgwjaihf
>
> Is this some new variant we're having to deal with?
Well, I think the answer is pretty obvious -- there aren't a lot of rules
hitting on it, so it isn't something that has been happening a lot before or
there would be rules for it!
The general format is a fairly common thing that has been around for years,
but obviously this has been tweaked to miss on badwords rules and the like,
and that (dot) trick is probably intended to avoid SURBL and the like.
The long string of junk would have hit on rules in the past, but that sort
of thing has become uncommon, so those kinds of rules have decayed out..
Loren
Re: Bayes and celebrity spam
Posted by Robert Fitzpatrick <li...@webtent.net>.
On Tue, 2008-01-29 at 22:16 -0500, Mark Johnson wrote:
> I put extreme scores against emails from TW as we don't do business with
> anyone from there. If it wasn't for that, this would have made it
> through my system as well. I am really surprised bayes scored a 0 as it
> did for the original poster. I do serious bayes training on a regular
> basis. I see alot of others are getting bayes scores of 80.
>
> Content analysis details: (5.6 points, 5.0 required)
>
> pts rule name description
> ---- ----------------------
> --------------------------------------------------
> 0.9 SUBJ_HAS_SPACES Subject contains lots of white space
> 0.2 SUBJECT_NOVOWEL Subject: has long non-vowel letter sequence
> 7.0 RELAYCOUNTRY_TW Relayed through TW
> 0.2 SUBJ_HAS_UNIQ_ID Subject contains a unique ID
> -2.6 BAYES_00 BODY: Bayesian spam probability is 0 to 1%
> [score: 0.0000]
> 0.0 HTML_MESSAGE BODY: HTML included in message
>
Well, it looks like I'll need to start learning how to write some rules
to kick these. I have one person that is flooded with these kinds of
messages, bunch of Yahoo and celeb porn. He sends them over asking isn't
this spam obvious to block. Well, I've been browsing my caches of user
mail and can't find anyone else getting slammed like this guy with these
messages. Not that there aren't any I'm sure, but even people within his
own domain that receive the same level of mail, can't find one.
He is obviously a target, but some of this is very obvious, no? With
subject like 'Jennifer Garner showing tits and booty in the shower
fbeqxunqpwpjauxekoyx' and body containing...
www(dot)prnceleb(dot)com now," Malfoy went on. of metal, and
tnlffifuubqrnvrrtneekyntauypuqlecgwjaihf
Is this some new variant we're having to deal with?
--
Robert
Re: Bayes and celebrity spam
Posted by Mark Johnson <sa...@astroshapes.com>.
Theo Van Dinter wrote:
> On Tue, Jan 29, 2008 at 07:51:03PM -0500, Robert Fitzpatrick wrote:
>> I have some users getting slammed with this spam. Before I start trying
>> to figure out how to intercept, can someone test this message and tell
>> me if your getting a score above 5.0?
>>
>> http://esmtp.webtent.net/test.txt
>>
>> 2.5 MISSING_HB_SEP Missing blank line between message header and body
>
> This appears to be a badly pasted email. For example, the topmost Received
> header (and then a lot of the rest of the headers) is malformed.
>
> Hitting MISSING_HB_SEP w/ real mails is possible, but very uncommon. If you
> see it hitting somewhere, you're more likely to have a misconfiguration in
> your setup than a valid hit.
>
I put extreme scores against emails from TW as we don't do business with
anyone from there. If it wasn't for that, this would have made it
through my system as well. I am really surprised bayes scored a 0 as it
did for the original poster. I do serious bayes training on a regular
basis. I see alot of others are getting bayes scores of 80.
Content analysis details: (5.6 points, 5.0 required)
pts rule name description
---- ----------------------
--------------------------------------------------
0.9 SUBJ_HAS_SPACES Subject contains lots of white space
0.2 SUBJECT_NOVOWEL Subject: has long non-vowel letter sequence
7.0 RELAYCOUNTRY_TW Relayed through TW
0.2 SUBJ_HAS_UNIQ_ID Subject contains a unique ID
-2.6 BAYES_00 BODY: Bayesian spam probability is 0 to 1%
[score: 0.0000]
0.0 HTML_MESSAGE BODY: HTML included in message
--
Mark Johnson
http://www.astroshapes.com/information-technology/blog/
Re: Bayes and celebrity spam
Posted by Duane Hill <d....@yournetplus.com>.
On Tue, 29 Jan 2008 20:22:59 -0500
Theo Van Dinter <fe...@apache.org> wrote:
> On Tue, Jan 29, 2008 at 07:51:03PM -0500, Robert Fitzpatrick wrote:
> > I have some users getting slammed with this spam. Before I start
> > trying to figure out how to intercept, can someone test this
> > message and tell me if your getting a score above 5.0?
> >
> > http://esmtp.webtent.net/test.txt
> >
> > 2.5 MISSING_HB_SEP Missing blank line between message
> > header and body
>
> This appears to be a badly pasted email. For example, the topmost
> Received header (and then a lot of the rest of the headers) is
> malformed.
>
> Hitting MISSING_HB_SEP w/ real mails is possible, but very uncommon.
> If you see it hitting somewhere, you're more likely to have a
> misconfiguration in your setup than a valid hit.
That explains why my test results were so high.
------
_|_
(_| |
Re: Bayes and celebrity spam
Posted by Loren Wilton <lw...@earthlink.net>.
> Can I get some tests now on my properly formatted file by anyone to see
> if my scoring should be blocking this message? Sorry for the previously
> posted poorly formatted files...and thanks for the help!
>
> http://esmtp.webtent.net/test2.txt
Well, my results probably aren't hugely representative, but:
Content analysis details: (9.0 points, 5.0 required)
pts rule name description
---- ---------------------- --------------------------------------------------
1.0 RELAY_IS_203 RELAY_IS_203
0.0 LW_YAHOO_FROM LW_YAHOO_FROM
0.0 DK_POLICY_TESTING Domain Keys: policy says domain is testing DK
0.0 DK_SIGNED Domain Keys: message has a signature
-0.0 DK_VERIFIED Domain Keys: signature passes verification
2.0 BAYES_80 BODY: Bayesian spam probability is 80 to 95%
[score: 0.8980]
0.0 HTML_MESSAGE BODY: HTML included in message
1.5 JD_YAHOO_REDIRECT FULL: Yahoo Redirect Ads
0.5 NOT_SENDER_MSGID Sender host doesn't match message-id host
2.0 NOT_FROM_SENDER Not from putative sender
2.0 NOT_TO_ME Mail is not addressed to me
Most of those rules you aren't going to have, and most of them are
triggering because your received headers don't match would I would have seen
for the same mail.
So other than the bayes and 203 relay rules you probably wouldn't hit much.
Since we score points for redirections (and in this case, search links) the
message picked up 1.5 points there, but the standard rules don't have that.
Loren
Re: Bayes and celebrity spam
Posted by Robert Fitzpatrick <li...@webtent.net>.
On Tue, 2008-01-29 at 18:05 -0800, Loren Wilton wrote:
> There is still something wrong with the message you pasted, and possibly
> with how you are runing it into SA to test:
>
> Received: from n6c.bullet.mail.tp2.yahoo.com (n6c.bullet.mail.tp2.yahoo.com
> [203.188.202.136])
> \x09by esmtp.ky.webtent.net (WebTent ESMTP Postfix Internet Mail Gateway)
> with SMTP id 2348137B72A
>
> Notice that that second line starts with " \x09by". This is a text string
> that won't be recognized as a tab followed by "by", which was apparently
> what was in the original message before something helpfully changed the tab
> character to a hex representation.
>
> Pull those \x09's out of the message, replacing them with tabs or spaces,
> and things should at least recognize the received headers correctly.
>
> > 0.0 MISSING_MID Missing Message-Id: header
> > 0.0 MISSING_DATE Missing Date: header
> > 2.5 MISSING_HB_SEP Missing blank line between message header and
> > body
> > 1.3 MISSING_HEADERS Missing To: header
> > 1.8 MISSING_SUBJECT Missing Subject: header
> > 1.4 EMPTY_MESSAGE Message appears to have no textual parts and no
>
> But it still looks like you ran something close to a blank file through SA.
> Make sure that the first line of the file you send to SA isn't blank, or
> there is a prepended space on every line or some such.
>
> Loren
>
Yes, I removed what seemed to be one space added to start of each line
after dumping from the db field and translated the \x09 into a single
space and now the score is matching what I have in Maia...
Can I get some tests now on my properly formatted file by anyone to see
if my scoring should be blocking this message? Sorry for the previously
posted poorly formatted files...and thanks for the help!
http://esmtp.webtent.net/test2.txt
--
Robert
Re: Bayes and celebrity spam
Posted by Loren Wilton <lw...@earthlink.net>.
There is still something wrong with the message you pasted, and possibly
with how you are runing it into SA to test:
Received: from n6c.bullet.mail.tp2.yahoo.com (n6c.bullet.mail.tp2.yahoo.com
[203.188.202.136])
\x09by esmtp.ky.webtent.net (WebTent ESMTP Postfix Internet Mail Gateway)
with SMTP id 2348137B72A
Notice that that second line starts with " \x09by". This is a text string
that won't be recognized as a tab followed by "by", which was apparently
what was in the original message before something helpfully changed the tab
character to a hex representation.
Pull those \x09's out of the message, replacing them with tabs or spaces,
and things should at least recognize the received headers correctly.
> 0.0 MISSING_MID Missing Message-Id: header
> 0.0 MISSING_DATE Missing Date: header
> 2.5 MISSING_HB_SEP Missing blank line between message header and
> body
> 1.3 MISSING_HEADERS Missing To: header
> 1.8 MISSING_SUBJECT Missing Subject: header
> 1.4 EMPTY_MESSAGE Message appears to have no textual parts and no
But it still looks like you ran something close to a blank file through SA.
Make sure that the first line of the file you send to SA isn't blank, or
there is a prepended space on every line or some such.
Loren
Re: Bayes and celebrity spam
Posted by Chris <cp...@embarqmail.com>.
On Tuesday 29 January 2008 7:51 pm, Robert Fitzpatrick wrote:
> http://esmtp.webtent.net/test2.txt
>
> I have gone through my debug, but can't seem to spot any problems. How
> can one send debug output to file? And what do you think I should be
> looking for given the results of my test?
FWIW, on my home stand alone system your message scored thus:
Content analysis details: (16.7 points, 5.0 required)
pts rule name description
---- ---------------------- --------------------------------------------------
0.0 RELAY_UK Relayed through Brittan
0.0 DK_POLICY_TESTING Domain Keys: policy says domain is testing DK
0.0 DK_SIGNED Domain Keys: message has a signature
-0.0 DK_VERIFIED Domain Keys: signature passes verification
0.0 HTML_MESSAGE BODY: HTML included in message
1.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60%
[score: 0.5101]
2.2 DCC_CHECK listed in DCC (http://rhyolite.com/anti-spam/dcc/)
[cpollock 104; Body=1 Fuz1=1 Fuz2=many]
10 CLAMAV Clam AntiVirus detected a virus
2.5 L_UNVERIFIED_YAHOO L_UNVERIFIED_YAHOO
1.0 SAGREY Adds 1.0 to spam from first-time senders
--
Chris
KeyID 0xE372A7DA98E6705C
Re: Bayes and celebrity spam
Posted by Robert Fitzpatrick <li...@webtent.net>.
On Tue, 2008-01-29 at 20:22 -0500, Theo Van Dinter wrote:
> On Tue, Jan 29, 2008 at 07:51:03PM -0500, Robert Fitzpatrick wrote:
> > I have some users getting slammed with this spam. Before I start trying
> > to figure out how to intercept, can someone test this message and tell
> > me if your getting a score above 5.0?
> >
> > http://esmtp.webtent.net/test.txt
> >
> > 2.5 MISSING_HB_SEP Missing blank line between message header and body
>
> This appears to be a badly pasted email. For example, the topmost Received
> header (and then a lot of the rest of the headers) is malformed.
>
> Hitting MISSING_HB_SEP w/ real mails is possible, but very uncommon. If you
> see it hitting somewhere, you're more likely to have a misconfiguration in
> your setup than a valid hit.
>
Thanks for the tips, I pasted from Maia Mailguard web GUI by clicking
View Raw. Not sure if you're familiar, Maia is an amavisd-2.2 spin off.
I exported contents from the pgsql db this time with another suspect and
seeing something very wrong. While Maia shows a negative score on this
next test with Bayes factored...
0.001 HTML_MESSAGE HTML included in message
-2.599 BAYES_00 Bayesian spam probability is 0 to 1%
It scores way over kill with zero Bayes points when running from the
command line...
Content analysis details: (9.2 points, 5.0 required)
pts rule name description
---- ---------------------- --------------------------------------------------
0.0 MISSING_MID Missing Message-Id: header
0.0 MISSING_DATE Missing Date: header
2.5 MISSING_HB_SEP Missing blank line between message header and body
1.3 MISSING_HEADERS Missing To: header
0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60%
[score: 0.4988]
2.2 TVD_SPACE_RATIO BODY: TVD_SPACE_RATIO
1.8 MISSING_SUBJECT Missing Subject: header
1.4 EMPTY_MESSAGE Message appears to have no textual parts and no
Subject: text
http://esmtp.webtent.net/test2.txt
I have gone through my debug, but can't seem to spot any problems. How
can one send debug output to file? And what do you think I should be
looking for given the results of my test?
--
Robert
Re: Bayes and celebrity spam
Posted by Theo Van Dinter <fe...@apache.org>.
On Tue, Jan 29, 2008 at 07:51:03PM -0500, Robert Fitzpatrick wrote:
> I have some users getting slammed with this spam. Before I start trying
> to figure out how to intercept, can someone test this message and tell
> me if your getting a score above 5.0?
>
> http://esmtp.webtent.net/test.txt
>
> 2.5 MISSING_HB_SEP Missing blank line between message header and body
This appears to be a badly pasted email. For example, the topmost Received
header (and then a lot of the rest of the headers) is malformed.
Hitting MISSING_HB_SEP w/ real mails is possible, but very uncommon. If you
see it hitting somewhere, you're more likely to have a misconfiguration in
your setup than a valid hit.
--
Randomly Selected Tagline:
"It's not you Bernie. I guess I'm just not used to being chased around
a mall at night by killer robots." - Linda from the movie "Chopping Mall"
Re: Bayes and celebrity spam
Posted by Duane Hill <d....@yournetplus.com>.
On Tue, 29 Jan 2008 19:51:03 -0500
Robert Fitzpatrick <li...@webtent.net> wrote:
> I have some users getting slammed with this spam. Before I start
> trying to figure out how to intercept, can someone test this message
> and tell me if your getting a score above 5.0?
>
> http://esmtp.webtent.net/test.txt
>
> I'm getting 4.4 on this particular one, but others less. My bayes
> still insists on knocking it down even after learning 10-20 similar
> messages. I believe our bayes is trained well with 94K spam versus
> 85K ham learned with auto learning above 35 for spam and -3 for
> nonspam. All other is manually trained mostly by me...
>
> mx1# su vscan -c 'spamassassin -t < test.msg'
> <snip>
> Content analysis details: (4.4 points, 5.0 required)
>
> pts rule name description
> ---- ----------------------
> -------------------------------------------------- 0.0
> MISSING_MID Missing Message-Id: header 0.0
> MISSING_DATE Missing Date: header 2.5
> MISSING_HB_SEP Missing blank line between message header and
> body 0.0 UNPARSEABLE_RELAY Informational: message has
> unparseable relay lines 1.3 MISSING_HEADERS Missing To: header
> 1.5 SARE_ADULT1 BODY: Contains adult material -2.6
> BAYES_00 BODY: Bayesian spam probability is 0 to 1%
> [score: 0.0000] 1.8 MISSING_SUBJECT Missing Subject: header
X-Spam-Level: xxxxxxx
X-Spam-Status: Reqd:5.0 Hits:7.1 Learn:disabled
Tests:MISSING_DATE=0.001,
MISSING_HB_SEP=2.5,MISSING_HEADERS=1.581,MISSING_MID=0.001,
MISSING_SUBJECT=1.285,SARE_ADULT1=1.47,UNPARSEABLE_RELAY=0.25
[snip]
-------
_|_
(_| |