You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Chris 'Xenon' Hanson <xe...@alphapixel.com> on 2007/10/16 07:53:09 UTC
SpamAssassin not hitting well on obvious spam
OS is an old Debian 3.1/Sarge
SpamAssassin version 3.2.3
running on Perl version 5.8.4
MTA is qmail, from qmailrocks
qmail-scanner 2.01 tweaked with q-s-2.01st-20070204.patch
Also using ClamAV
Qmail, qmail-scanner, clamav and Spamassassin are all running, mail is being filtered, all
is good. SA is running as a daemon and Q-S is calling SA in fast mode. I have SA set to
flag anything over 4.0 as spam. Q-S is set to silently delete anything 5.5 and above on
sight.
I'm using sa-update to pull rulesets weekly. My set list is:
updates.spamassassin.org
72_sare_redirect_post3.0.0.cf.sare.sa-update.dostech.net
70_sare_evilnum0.cf.sare.sa-update.dostech.net
70_sare_html0.cf.sare.sa-update.dostech.net
70_sare_header0.cf.sare.sa-update.dostech.net
70_sare_specific.cf.sare.sa-update.dostech.net
70_sare_adult.cf.sare.sa-update.dostech.net
99_sare_fraud_post25x.cf.sare.sa-update.dostech.net
70_sare_spoof.cf.sare.sa-update.dostech.net
70_sare_random.cf.sare.sa-update.dostech.net
70_sare_oem.cf.sare.sa-update.dostech.net
70_sare_genlsubj0.cf.sare.sa-update.dostech.net
70_sare_obfu.cf.sare.sa-update.dostech.net
70_sare_stocks.cf.sare.sa-update.dostech.net
I only wanted really solid rules that don't misfire. SA-update seems to run fine, and I
have files in /var/lib/spamassassin that seem to indicate I have all those rules.
spamassassin --lint reports no problems. I _do_ get a lot less spam than before.
And yet, sometimes the spam that makes it through is startlingly obvious. Lots of
expletives about male anatomy and the like, in plaintext mails. I turned on the
X-Spam-Report header to see how things were going. A typical flagged "anatomical
enlargement" spam might show:
X-Spam-Status: Yes, hits=4.4 required=4.0
X-Spam-Level: ++++
X-Spam-Report: SA TESTS
0.1 FORGED_RCVD_HELO Received: contains a forged HELO
0.1 HTML_40_50 BODY: Message is 40% to 50% HTML
0.0 HTML_MESSAGE BODY: HTML included in message
1.5 RAZOR2_CF_RANGE_51_100 BODY: Razor2 gives confidence level above 50%
[cf: 100]
0.1 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/)
0.1 RCVD_IN_SORBS_DUL RBL: SORBS: sent directly from dynamic IP address
[201.240.244.254 listed in dnsbl.sorbs.net]
1.8 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net
[Blocked - see <http://www.spamcop.net/bl.shtml?201.240.244.254>]
0.6 URIBL_SBL Contains an URL listed in the SBL blocklist
[URIs: ecamn.com]
Another spam (variant of the exact same body text) that didn't get flagged shows:
X-Spam-Status: No, hits=2.5 required=4.0
X-Spam-Level: ++
X-Spam-Report: SA TESTS
0.1 FORGED_RCVD_HELO Received: contains a forged HELO
0.1 HTML_40_50 BODY: Message is 40% to 50% HTML
0.0 HTML_MESSAGE BODY: HTML included in message
1.5 RAZOR2_CF_RANGE_51_100 BODY: Razor2 gives confidence level above 50%
[cf: 100]
0.1 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/)
0.6 URIBL_SBL Contains an URL listed in the SBL blocklist
[URIs: ecamn.com]
0.1 MIME_BOUND_NEXTPART Spam tool pattern in MIME boundary
Neither one is picking up on any of the content of the message body, they're just firing
on the headers and transmission info.
I could post the body text here, but I don't want THIS message to trip spam filters. In
any case, I think I have something misconfigured, because it seems like these spams ought
to be caught. Am I not using the proper rulesets for this sort of thing, or do I have
something hosed up?
Are the rulesets here:
http://www.koders.com/noncode/fidBB2367C919EFE21595CF39216741049B8CF03958.aspx
http://www.koders.com/noncode/fid2FDA2298EF0A572237595868731E4FA234A59A55.aspx
production rulesets? If so, how would one "subscribe" to them. They seemed to have some
good ideas in them.
Thanks in advance for any advice.
--
Chris 'Xenon' Hanson, omo sanza lettere Xenon AlphaPixel.com
PixelSense Landsat processing now available! http://www.alphapixel.com/demos/
"There is no Truth. There is only Perception. To Perceive is to Exist." - Xen
Re: SpamAssassin not hitting well on obvious spam
Posted by Matt Kettler <mk...@verizon.net>.
Chris 'Xenon' Hanson wrote:
>
> I believe SA uses Bayes out of the box, but what I don't get is how
> will Bayes know it's spam (to train on, versus ham) if there isn't
> already a rule that flags it as spam somehow? I guess the RBL rules
> will help.
sa-learn --spam messagefile.txt
Re: SpamAssassin not hitting well on obvious spam
Posted by Loren Wilton <lw...@earthlink.net>.
> I believe SA uses Bayes out of the box, but what I don't get is how will
> Bayes know it's spam (to train on, versus ham)
You tell it.
Bayes won't kick in on a new installation until you have manually fed it AT
LEAST 200 each hams and spams. You do this by deciding yourself if a
message is ham or spam and training appropriately.
Once it has at least the minimal training amount it will start classifying.
And if you have auto-learning on, it will start learning from what it
classifies. Of course, there is no guarantee that it will learn
*correctly*. Again, it is up to you to monitor it (at least occasionally)
and if necessary re-learn a message as the correct type.
If you get messages that are bayes_50 or near that it means Bayes doesn't
have a clue about the message. You should give it one, especialy if it is
spam, by again training it appropriately.
Bayes will work quite well on the type of spam you are getting. *once you
train it*.
Loren
Re: SpamAssassin not hitting well on obvious spam
Posted by Chris 'Xenon' Hanson <xe...@alphapixel.com>.
Henrik Krohns wrote:
> On Tue, Oct 16, 2007 at 12:18:06AM -0600, Chris 'Xenon' Hanson wrote:
>> That's just a source code search engine. It's showing files it found in
>> SVN on the SpamAssassin site, here:
>> http://svn.apache.org/viewvc/spamassassin/rules/trunk/sandbox/jm/20_sought.cf?view=log
>> Unfortunately, I couldn't find a direct URL to view the file other than
>> koders.com.
> http://taint.org/2007/08/15/004348a.html
Thank you. That's very valuable. Is anyone else on this list employing these rules?
--
Chris 'Xenon' Hanson, omo sanza lettere Xenon AlphaPixel.com
PixelSense Landsat processing now available! http://www.alphapixel.com/demos/
"There is no Truth. There is only Perception. To Perceive is to Exist." - Xen
Re: SpamAssassin not hitting well on obvious spam
Posted by Henrik Krohns <he...@hege.li>.
On Tue, Oct 16, 2007 at 12:18:06AM -0600, Chris 'Xenon' Hanson wrote:
>
> That's just a source code search engine. It's showing files it found in
> SVN on the SpamAssassin site, here:
> http://svn.apache.org/viewvc/spamassassin/rules/trunk/sandbox/jm/20_sought.cf?view=log
>
> Unfortunately, I couldn't find a direct URL to view the file other than
> koders.com.
http://taint.org/2007/08/15/004348a.html
Re: SpamAssassin not hitting well on obvious spam
Posted by Chris 'Xenon' Hanson <xe...@alphapixel.com>.
Theo Van Dinter wrote:
> Having words like "fucking", "viagra", "huge" or "penis" in a mail does
> not necessarily mean that the message is spam.
Well, no, but together they are a red flag, along with MegaDik.
> Bayes does a great job with this kind of thing though -- if those words mean
> "spam" for you, then Bayes will learn that and act accordingly.
> If you're not using Bayes for some reason, you could write your own
> single-word/phrase rules that simulate the action.
I believe SA uses Bayes out of the box, but what I don't get is how will Bayes know
it's spam (to train on, versus ham) if there isn't already a rule that flags it as spam
somehow? I guess the RBL rules will help.
> Generally speaking, those types of rules either have a low hit-rate or a not
> acceptable high FP rate, which is why they don't normally exist in
> the standard ruleset.
Ok.
>> Are the rulesets here:
>> http://www.koders.com/noncode/fidBB2367C919EFE21595CF39216741049B8CF03958.aspx
>> http://www.koders.com/noncode/fid2FDA2298EF0A572237595868731E4FA234A59A55.aspx
>> production rulesets? If so, how would one "subscribe" to them. They
>> seemed to have some good ideas in them.
> You'd really have to ask the people who wrote them. (I've never heard of that
> site, fwiw.)
That's just a source code search engine. It's showing files it found in SVN on the
SpamAssassin site, here:
http://svn.apache.org/viewvc/spamassassin/rules/trunk/sandbox/jm/20_sought.cf?view=log
Unfortunately, I couldn't find a direct URL to view the file other than koders.com.
--
Chris 'Xenon' Hanson, omo sanza lettere Xenon AlphaPixel.com
PixelSense Landsat processing now available! http://www.alphapixel.com/demos/
"There is no Truth. There is only Perception. To Perceive is to Exist." - Xen
Re: SpamAssassin not hitting well on obvious spam
Posted by Theo Van Dinter <fe...@apache.org>.
On Mon, Oct 15, 2007 at 11:53:09PM -0600, Chris 'Xenon' Hanson wrote:
> And yet, sometimes the spam that makes it through is startlingly obvious.
> Lots of expletives about male anatomy and the like, in plaintext mails. I
> turned on the X-Spam-Report header to see how things were going. A typical
> flagged "anatomical enlargement" spam might show:
Having words like "fucking", "viagra", "huge" or "penis" in a mail does
not necessarily mean that the message is spam.
Bayes does a great job with this kind of thing though -- if those words mean
"spam" for you, then Bayes will learn that and act accordingly.
If you're not using Bayes for some reason, you could write your own
single-word/phrase rules that simulate the action.
Generally speaking, those types of rules either have a low hit-rate or a not
acceptable high FP rate, which is why they don't normally exist in
the standard ruleset.
> Are the rulesets here:
> http://www.koders.com/noncode/fidBB2367C919EFE21595CF39216741049B8CF03958.aspx
> http://www.koders.com/noncode/fid2FDA2298EF0A572237595868731E4FA234A59A55.aspx
> production rulesets? If so, how would one "subscribe" to them. They
> seemed to have some good ideas in them.
You'd really have to ask the people who wrote them. (I've never heard of that
site, fwiw.)
Ideally, people who come up with ideas/rules would submit them to the
SA project for general testing and (possible) inclusion in the standard
ruleset. But that doesn't usually happen, unfortunately. :(
--
Randomly Selected Tagline:
"Cut the [network] line to your bathroom ... life will be good again."
- Hal Stern
Re: SpamAssassin not hitting well on obvious spam
Posted by Chris 'Xenon' Hanson <xe...@alphapixel.com>.
Jeff Chan wrote:
> Turn on SURBL tests. ecamn.com is blacklisted on SURBL.
Ok. According to
http://wiki.apache.org/spamassassin/SURBL
http://www.surbl.org/faq.html#nettest
SA 3.x have SURBL by default and it should be enabled if I'm not starting spamd with
the -L/--local option. My /etc/default/spamassassin doens't show the local option, so I
think I should have SURBL on already. Any suggestions for where to look to determine why
it might not be firing?
In a broader sense, are there any available local rulesets that are going to key off of
the phrasing in the message body for these types of spams?
> Jeff C.
--
Chris 'Xenon' Hanson, omo sanza lettere Xenon AlphaPixel.com
PixelSense Landsat processing now available! http://www.alphapixel.com/demos/
"There is no Truth. There is only Perception. To Perceive is to Exist." - Xen
Re: SpamAssassin not hitting well on obvious spam
Posted by Jeff Chan <je...@surbl.org>.
Quoting Chris 'Xenon' Hanson <xe...@alphapixel.com>:
[...]
> X-Spam-Status: Yes, hits=4.4 required=4.0
> X-Spam-Level: ++++
> X-Spam-Report: SA TESTS
> 0.1 FORGED_RCVD_HELO Received: contains a forged HELO
> 0.1 HTML_40_50 BODY: Message is 40% to 50% HTML
> 0.0 HTML_MESSAGE BODY: HTML included in message
> 1.5 RAZOR2_CF_RANGE_51_100 BODY: Razor2 gives confidence level above 50%
> [cf: 100]
> 0.1 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/)
> 0.1 RCVD_IN_SORBS_DUL RBL: SORBS: sent directly from dynamic IP
> address
> [201.240.244.254 listed in dnsbl.sorbs.net]
> 1.8 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net
> [Blocked - see
> <http://www.spamcop.net/bl.shtml?201.240.244.254>]
> 0.6 URIBL_SBL Contains an URL listed in the SBL blocklist
> [URIs: ecamn.com]
Turn on SURBL tests. ecamn.com is blacklisted on SURBL.
Jeff C.