You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Chris 'Xenon' Hanson <xe...@alphapixel.com> on 2007/10/16 07:53:09 UTC

SpamAssassin not hitting well on obvious spam

OS is an old Debian 3.1/Sarge
SpamAssassin version 3.2.3
   running on Perl version 5.8.4
MTA is qmail, from qmailrocks
qmail-scanner 2.01 tweaked with q-s-2.01st-20070204.patch
Also using ClamAV

Qmail, qmail-scanner, clamav and Spamassassin are all running, mail is being filtered, all 
is good. SA is running as a daemon and Q-S is calling SA in fast mode. I have SA set to 
flag anything over 4.0 as spam. Q-S is set to silently delete anything 5.5 and above on 
sight.

I'm using sa-update to pull rulesets weekly. My set list is:
updates.spamassassin.org
72_sare_redirect_post3.0.0.cf.sare.sa-update.dostech.net
70_sare_evilnum0.cf.sare.sa-update.dostech.net
70_sare_html0.cf.sare.sa-update.dostech.net
70_sare_header0.cf.sare.sa-update.dostech.net
70_sare_specific.cf.sare.sa-update.dostech.net
70_sare_adult.cf.sare.sa-update.dostech.net
99_sare_fraud_post25x.cf.sare.sa-update.dostech.net
70_sare_spoof.cf.sare.sa-update.dostech.net
70_sare_random.cf.sare.sa-update.dostech.net
70_sare_oem.cf.sare.sa-update.dostech.net
70_sare_genlsubj0.cf.sare.sa-update.dostech.net
70_sare_obfu.cf.sare.sa-update.dostech.net
70_sare_stocks.cf.sare.sa-update.dostech.net

   I only wanted really solid rules that don't misfire. SA-update seems to run fine, and I 
have files in /var/lib/spamassassin that seem to indicate I have all those rules. 
spamassassin --lint reports no problems. I _do_ get a lot less spam than before.

And yet, sometimes the spam that makes it through is startlingly obvious. Lots of 
expletives about male anatomy and the like, in plaintext mails. I turned on the 
X-Spam-Report header to see how things were going. A typical flagged "anatomical 
enlargement" spam might show:

X-Spam-Status: Yes, hits=4.4 required=4.0
X-Spam-Level: ++++
X-Spam-Report: SA TESTS
   0.1 FORGED_RCVD_HELO       Received: contains a forged HELO
   0.1 HTML_40_50             BODY: Message is 40% to 50% HTML
   0.0 HTML_MESSAGE           BODY: HTML included in message
   1.5 RAZOR2_CF_RANGE_51_100 BODY: Razor2 gives confidence level above 50%
                              [cf: 100]
   0.1 RAZOR2_CHECK           Listed in Razor2 (http://razor.sf.net/)
   0.1 RCVD_IN_SORBS_DUL      RBL: SORBS: sent directly from dynamic IP address
                              [201.240.244.254 listed in dnsbl.sorbs.net]
   1.8 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net
               [Blocked - see <http://www.spamcop.net/bl.shtml?201.240.244.254>]
   0.6 URIBL_SBL              Contains an URL listed in the SBL blocklist
                              [URIs: ecamn.com]

   Another spam (variant of the exact same body text) that didn't get flagged shows:

X-Spam-Status: No, hits=2.5 required=4.0
X-Spam-Level: ++
X-Spam-Report: SA TESTS
   0.1 FORGED_RCVD_HELO       Received: contains a forged HELO
   0.1 HTML_40_50             BODY: Message is 40% to 50% HTML
   0.0 HTML_MESSAGE           BODY: HTML included in message
   1.5 RAZOR2_CF_RANGE_51_100 BODY: Razor2 gives confidence level above 50%
                              [cf: 100]
   0.1 RAZOR2_CHECK           Listed in Razor2 (http://razor.sf.net/)
   0.6 URIBL_SBL              Contains an URL listed in the SBL blocklist
                              [URIs: ecamn.com]
   0.1 MIME_BOUND_NEXTPART    Spam tool pattern in MIME boundary

Neither one is picking up on any of the content of the message body, they're just firing 
on the headers and transmission info.

   I could post the body text here, but I don't want THIS message to trip spam filters. In 
any case, I think I have something misconfigured, because it seems like these spams ought 
to be caught. Am I not using the proper rulesets for this sort of thing, or do I have 
something hosed up?

   Are the rulesets here:
http://www.koders.com/noncode/fidBB2367C919EFE21595CF39216741049B8CF03958.aspx
http://www.koders.com/noncode/fid2FDA2298EF0A572237595868731E4FA234A59A55.aspx
   production rulesets? If so, how would one "subscribe" to them. They seemed to have some 
good ideas in them.

   Thanks in advance for any advice.

-- 
Chris 'Xenon' Hanson, omo sanza lettere                  Xenon AlphaPixel.com
PixelSense Landsat processing now available! http://www.alphapixel.com/demos/
"There is no Truth. There is only Perception. To Perceive is to Exist." - Xen

Re: SpamAssassin not hitting well on obvious spam

Posted by Matt Kettler <mk...@verizon.net>.
Chris 'Xenon' Hanson wrote:
>
>   I believe SA uses Bayes out of the box, but what I don't get is how
> will Bayes know it's spam (to train on, versus ham) if there isn't
> already a rule that flags it as spam somehow? I guess the RBL rules
> will help.
sa-learn --spam messagefile.txt




Re: SpamAssassin not hitting well on obvious spam

Posted by Loren Wilton <lw...@earthlink.net>.
>   I believe SA uses Bayes out of the box, but what I don't get is how will 
> Bayes know it's spam (to train on, versus ham)

You tell it.

Bayes won't kick in on a new installation until you have manually fed it AT 
LEAST 200 each hams and spams.  You do this by deciding yourself if a 
message is ham or spam and training appropriately.

Once it has at least the minimal training amount it will start classifying. 
And if you have auto-learning on, it will start learning from what it 
classifies.  Of course, there is no guarantee that it will learn 
*correctly*.  Again, it is up to you to monitor it (at least occasionally) 
and if necessary re-learn a message as the correct type.

If you get messages that are bayes_50 or near that it means Bayes doesn't 
have a clue about the message.  You should give it one, especialy if it is 
spam, by again training it appropriately.

Bayes will work quite well on the type of spam you are getting.  *once you 
train it*.

        Loren



Re: SpamAssassin not hitting well on obvious spam

Posted by Chris 'Xenon' Hanson <xe...@alphapixel.com>.
Henrik Krohns wrote:
> On Tue, Oct 16, 2007 at 12:18:06AM -0600, Chris 'Xenon' Hanson wrote:
>>   That's just a source code search engine. It's showing files it found in 
>> SVN on the SpamAssassin site, here:
>> http://svn.apache.org/viewvc/spamassassin/rules/trunk/sandbox/jm/20_sought.cf?view=log
>>   Unfortunately, I couldn't find a direct URL to view the file other than 
>> koders.com.
> http://taint.org/2007/08/15/004348a.html

   Thank you. That's very valuable. Is anyone else on this list employing these rules?

-- 
Chris 'Xenon' Hanson, omo sanza lettere                  Xenon AlphaPixel.com
PixelSense Landsat processing now available! http://www.alphapixel.com/demos/
"There is no Truth. There is only Perception. To Perceive is to Exist." - Xen

Re: SpamAssassin not hitting well on obvious spam

Posted by Henrik Krohns <he...@hege.li>.
On Tue, Oct 16, 2007 at 12:18:06AM -0600, Chris 'Xenon' Hanson wrote:
>
>   That's just a source code search engine. It's showing files it found in 
> SVN on the SpamAssassin site, here:
> http://svn.apache.org/viewvc/spamassassin/rules/trunk/sandbox/jm/20_sought.cf?view=log
>
>   Unfortunately, I couldn't find a direct URL to view the file other than 
> koders.com.

http://taint.org/2007/08/15/004348a.html


Re: SpamAssassin not hitting well on obvious spam

Posted by Chris 'Xenon' Hanson <xe...@alphapixel.com>.
Theo Van Dinter wrote:
> Having words like "fucking", "viagra", "huge" or "penis" in a mail does
> not necessarily mean that the message is spam.

   Well, no, but together they are a red flag, along with MegaDik.

> Bayes does a great job with this kind of thing though -- if those words mean
> "spam" for you, then Bayes will learn that and act accordingly.
> If you're not using Bayes for some reason, you could write your own
> single-word/phrase rules that simulate the action.

   I believe SA uses Bayes out of the box, but what I don't get is how will Bayes know 
it's spam (to train on, versus ham) if there isn't already a rule that flags it as spam 
somehow? I guess the RBL rules will help.

> Generally speaking, those types of rules either have a low hit-rate or a not
> acceptable high FP rate, which is why they don't normally exist in
> the standard ruleset.

   Ok.

>>   Are the rulesets here:
>> http://www.koders.com/noncode/fidBB2367C919EFE21595CF39216741049B8CF03958.aspx
>> http://www.koders.com/noncode/fid2FDA2298EF0A572237595868731E4FA234A59A55.aspx
>>   production rulesets? If so, how would one "subscribe" to them. They 
>>   seemed to have some good ideas in them.
> You'd really have to ask the people who wrote them.  (I've never heard of that
> site, fwiw.)

   That's just a source code search engine. It's showing files it found in SVN on the 
SpamAssassin site, here:
http://svn.apache.org/viewvc/spamassassin/rules/trunk/sandbox/jm/20_sought.cf?view=log

   Unfortunately, I couldn't find a direct URL to view the file other than koders.com.

-- 
Chris 'Xenon' Hanson, omo sanza lettere                  Xenon AlphaPixel.com
PixelSense Landsat processing now available! http://www.alphapixel.com/demos/
"There is no Truth. There is only Perception. To Perceive is to Exist." - Xen

Re: SpamAssassin not hitting well on obvious spam

Posted by Theo Van Dinter <fe...@apache.org>.
On Mon, Oct 15, 2007 at 11:53:09PM -0600, Chris 'Xenon' Hanson wrote:
> And yet, sometimes the spam that makes it through is startlingly obvious. 
> Lots of expletives about male anatomy and the like, in plaintext mails. I 
> turned on the X-Spam-Report header to see how things were going. A typical 
> flagged "anatomical enlargement" spam might show:

Having words like "fucking", "viagra", "huge" or "penis" in a mail does
not necessarily mean that the message is spam.

Bayes does a great job with this kind of thing though -- if those words mean
"spam" for you, then Bayes will learn that and act accordingly.

If you're not using Bayes for some reason, you could write your own
single-word/phrase rules that simulate the action.

Generally speaking, those types of rules either have a low hit-rate or a not
acceptable high FP rate, which is why they don't normally exist in
the standard ruleset.

>   Are the rulesets here:
> http://www.koders.com/noncode/fidBB2367C919EFE21595CF39216741049B8CF03958.aspx
> http://www.koders.com/noncode/fid2FDA2298EF0A572237595868731E4FA234A59A55.aspx
>   production rulesets? If so, how would one "subscribe" to them. They 
>   seemed to have some good ideas in them.

You'd really have to ask the people who wrote them.  (I've never heard of that
site, fwiw.)

Ideally, people who come up with ideas/rules would submit them to the
SA project for general testing and (possible) inclusion in the standard
ruleset.  But that doesn't usually happen, unfortunately. :(

-- 
Randomly Selected Tagline:
"Cut the [network] line to your bathroom ... life will be good again."
                                                 - Hal Stern

Re: SpamAssassin not hitting well on obvious spam

Posted by Chris 'Xenon' Hanson <xe...@alphapixel.com>.
Jeff Chan wrote:
> Turn on SURBL tests.  ecamn.com is blacklisted on SURBL.

   Ok. According to
http://wiki.apache.org/spamassassin/SURBL
http://www.surbl.org/faq.html#nettest

   SA 3.x have SURBL by default and it should be enabled if I'm not starting spamd with 
the -L/--local option. My /etc/default/spamassassin doens't show the local option, so I 
think I should have SURBL on already. Any suggestions for where to look to determine why 
it might not be firing?

   In a broader sense, are there any available local rulesets that are going to key off of 
the phrasing in the message body for these types of spams?

> Jeff C.

-- 
Chris 'Xenon' Hanson, omo sanza lettere                  Xenon AlphaPixel.com
PixelSense Landsat processing now available! http://www.alphapixel.com/demos/
"There is no Truth. There is only Perception. To Perceive is to Exist." - Xen

Re: SpamAssassin not hitting well on obvious spam

Posted by Jeff Chan <je...@surbl.org>.
Quoting Chris 'Xenon' Hanson <xe...@alphapixel.com>:
[...]
> X-Spam-Status: Yes, hits=4.4 required=4.0
> X-Spam-Level: ++++
> X-Spam-Report: SA TESTS
>    0.1 FORGED_RCVD_HELO       Received: contains a forged HELO
>    0.1 HTML_40_50             BODY: Message is 40% to 50% HTML
>    0.0 HTML_MESSAGE           BODY: HTML included in message
>    1.5 RAZOR2_CF_RANGE_51_100 BODY: Razor2 gives confidence level above 50%
>                               [cf: 100]
>    0.1 RAZOR2_CHECK           Listed in Razor2 (http://razor.sf.net/)
>    0.1 RCVD_IN_SORBS_DUL      RBL: SORBS: sent directly from dynamic IP
> address
>                               [201.240.244.254 listed in dnsbl.sorbs.net]
>    1.8 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net
>                [Blocked - see
> <http://www.spamcop.net/bl.shtml?201.240.244.254>]
>    0.6 URIBL_SBL              Contains an URL listed in the SBL blocklist
>                               [URIs: ecamn.com]


Turn on SURBL tests.  ecamn.com is blacklisted on SURBL.

Jeff C.