You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by "Shaun T. Erickson" <st...@gmail.com> on 2006/10/29 07:38:34 UTC

rules_du_jour question

I've just downloaded this and set it up. I see there are MANY rulesets
I can choose from, but I have no idea if they are all 'safe' (not even
sure what I mean by that). Is there a subset of all these rulesets,
that "everybody" uses, or does everyone use all of them? How do you
decide which to use and which not to use?
-- 
        -ste

Re: rules_du_jour question

Posted by Leander Koornneef <l....@ic-s.nl>.
	
On 29-okt-2006, at 17:55, Shaun T. Erickson wrote:

> On 10/29/06, Leander Koornneef <l....@ic-s.nl> wrote:
>>
>> In my experience, using the default sa-update channel, the  
>> openprotect
>> channel, auto-whitelisting, proper bayes training(!), pyzor, razor,
>> dcc, SPF
>> and DNS blacklists wil get you a spam detection rate >99%.
>
> I'm doing all that, now, I think. The auto-whitelisting seems to be
> happening on it's own (it does say 'auto' after all, lol), as I see
> the auto-whitelist file in amavis' .spamassassin directory growing.
> Likewise, I see the bayes_* files growing, as well. At some point,
> when it has seen enough stuff, it will just kick in on it's own, yes?
> I have a feeling that that will not be for quite some time though, as
> virtually all the spam never makes it onto my system, thanks to the
> postfix rules I have in place. Amavis/Clamav/Spamassassin have an easy
> job here. ;) I will have to train it on spam that it misses though. I
> think I saw a way to have the amavis account pull down and train on
> the contents of my 'missed_spam' imap folder, via fetchmail ...

You should not only train SA with false positives and false negatives,
but also with regular streams of ham and spam. The default autolearn
threshold will for instance only train bayes with spam that scores
above 12, so feeding mails to sa-learn as spam with
required_score < score < bayes_auto_learn_threshold_spam
will also increase the overall quality of your bayesian scoring  
(somebody
please correct me if I'm wrong). For me this is easy, as my mbox files
are on the same server as SA, so I can just point sa-lean to my ham
and spam boxes. Otherwise, you may indeed need to use something
like fetchmail to pull the mailboxes from your pop/imap server.

Leander



Re: rules_du_jour question

Posted by "Shaun T. Erickson" <st...@gmail.com>.
On 10/29/06, Leander Koornneef <l....@ic-s.nl> wrote:
>
> In my experience, using the default sa-update channel, the openprotect
> channel, auto-whitelisting, proper bayes training(!), pyzor, razor,
> dcc, SPF
> and DNS blacklists wil get you a spam detection rate >99%.

I'm doing all that, now, I think. The auto-whitelisting seems to be
happening on it's own (it does say 'auto' after all, lol), as I see
the auto-whitelist file in amavis' .spamassassin directory growing.
Likewise, I see the bayes_* files growing, as well. At some point,
when it has seen enough stuff, it will just kick in on it's own, yes?
I have a feeling that that will not be for quite some time though, as
virtually all the spam never makes it onto my system, thanks to the
postfix rules I have in place. Amavis/Clamav/Spamassassin have an easy
job here. ;) I will have to train it on spam that it misses though. I
think I saw a way to have the amavis account pull down and train on
the contents of my 'missed_spam' imap folder, via fetchmail ...

> Also, I generally use X-Spam-Level = 3 as the cutoff value in my
> email client
> to filter spam out of my Inbox. I rarely have any false positives.

I currently have mine set at 5, but I might go lower after I see how
it works for a while.
-- 
        -ste

Re: rules_du_jour question

Posted by Leander Koornneef <l....@ic-s.nl>.
On 29-okt-2006, at 16:33, Shaun T. Erickson wrote:

> On 10/29/06, Leander Koornneef <l....@ic-s.nl> wrote:
>>
>> If you are using spamassassin 3.1, you can use sa-update to get  
>> the SARE
>> rulesets from the channel provided by http:// 
>> saupdates.openprotect.com/.
>> This negates the necessity to run rulesdujour alongside sa-update.  
>> This
>> channel consists only of "safe" rules.
>
> Ok. I've set that up and run it and now I have the standard set or
> rules and the safe sare rules under "/var/lib/spamassassin/3.001007".
>
> Two questions:
>
> Do many people use the non-sare rulesets that I see are available via
> rules_du_jour (i.e., TRIPWIRE ANTIDRUG RANDOMVAL BOGUSVIRUS
> ZMI_GERMAN)? Are those something I'd still likely want to get via
> rules_du_jour?

In my experience, using the default sa-update channel, the openprotect
channel, auto-whitelisting, proper bayes training(!), pyzor, razor,  
dcc, SPF
and DNS blacklists wil get you a spam detection rate >99%.
Also, I generally use X-Spam-Level = 3 as the cutoff value in my  
email client
to filter spam out of my Inbox. I rarely have any false positives.


> rules_du_jour restarts amavisd-new after it runs, but sa-update
> doesn't. Do most people run it out of cron and simply append an
> (without the quotes, of course) " && /etc/init.d/amavis reload" to the
> command line? Or is there another, more preferred method?

sa-update indeed does not reload amavisd, because not everyone using
sa-update also runs amavis,  so you should arrange this yourself. Also,
if you are using amavis and spamassassin < 3.1.5, you should read the
last section on this page: http://wiki.apache.org/spamassassin/ 
RuleUpdates
I use the script from that wiki page to run sa-update and reload amavisd
and it works fine.

Leander


Re: rules_du_jour question

Posted by Bill Randle <bi...@neocat.org>.
On Mon, 2006-10-30 at 01:41 +0100, Benny Pedersen wrote:
> On Sun, October 29, 2006 16:33, Shaun T. Erickson wrote:
> 
> > rules_du_jour restarts amavisd-new after it runs, but sa-update
> > doesn't. Do most people run it out of cron and simply append an
> > (without the quotes, of course) " && /etc/init.d/amavis reload" to the
> > command line? Or is there another, more preferred method?
> 
> i solved this by doing sa-update before running rules_du_jour, and let
> rules_du_jour restart amavisd-new :-)

I used to do the same thing, but now I let sa-update update the SARE
rules, too. It's a bit more initial set work if you use any of the level
1 or greater SARE rules because you have to list a channel for each
ruleset. Still, you can add them to a file and specify that on the
command line.

So, something like this:
  # cat /etc/mail/spamassassin/sare-sa-update-channels.txt
  saupdates.openprotect.com
  70_sare_genlsubj1.cf.sare.sa-update.dostech.net
  70_sare_header1.cf.sare.sa-update.dostech.net
  70_sare_html1.cf.sare.sa-update.dostech.net
  70_sare_uri1.cf.sare.sa-update.dostech.net
  72_sare_bml_post25x.cf.sare.sa-update.dostech.net
  72_sare_redirect_post3.0.0.cf.sare.sa-update.dostech.net

Then call sa-update from cron like so:
  /usr/bin/sa-update --gpgkey <key-for-openprotect> --gpgkey
<key-for-dostech> --channel updates.spamassassin.org
--channelfile /etc/mail/spamassassin/sare-sa-update-channels.txt; /usr/sbin/amavisd reload

	-Bill



Re: rules_du_jour question

Posted by Benny Pedersen <me...@junc.org>.
On Sun, October 29, 2006 16:33, Shaun T. Erickson wrote:

> rules_du_jour restarts amavisd-new after it runs, but sa-update
> doesn't. Do most people run it out of cron and simply append an
> (without the quotes, of course) " && /etc/init.d/amavis reload" to the
> command line? Or is there another, more preferred method?

i solved this by doing sa-update before running rules_du_jour, and let
rules_du_jour restart amavisd-new :-)

-- 
"This message was sent using 100% recycled spam mails."


Re: rules_du_jour question

Posted by Nigel Frankcom <ni...@blue-canoe.net>.
On Sun, 29 Oct 2006 11:33:54 -0400, "Shaun T. Erickson"
<st...@gmail.com> wrote:

>On 10/29/06, Leander Koornneef <l....@ic-s.nl> wrote:
>>
>> If you are using spamassassin 3.1, you can use sa-update to get the SARE
>> rulesets from the channel provided by http://saupdates.openprotect.com/.
>> This negates the necessity to run rulesdujour alongside sa-update. This
>> channel consists only of "safe" rules.
>
>Ok. I've set that up and run it and now I have the standard set or
>rules and the safe sare rules under "/var/lib/spamassassin/3.001007".
>
>Two questions:
>
>Do many people use the non-sare rulesets that I see are available via
>rules_du_jour (i.e., TRIPWIRE ANTIDRUG RANDOMVAL BOGUSVIRUS
>ZMI_GERMAN)? Are those something I'd still likely want to get via
>rules_du_jour?
>
>rules_du_jour restarts amavisd-new after it runs, but sa-update
>doesn't. Do most people run it out of cron and simply append an
>(without the quotes, of course) " && /etc/init.d/amavis reload" to the
>command line? Or is there another, more preferred method?
>
>Sorry for sounding clueless. I've never tried using rulsets other than
>what came with SA before. :)


I use tripwire (iirc ANTIDRUG is a base incorporated rule now (3.1.7))
and bogusvirus; along with a few others.

At last count we were killing 99.98% of spam, 37 FN out of 170k+tagged
iirc. I don't use amavis though.

My current RDJ is...

TRUSTED_RULESETS="TRIPWIRE SARE_ADULT SARE_FRAUD SARE_BML SARE_OBFU
SARE_URI0 SARE_WHITELIST_SPF SARE_WHITELIST_RCVD SARE_STOCKS
SARE_SPAMCOP_TOP200"
SA_DIR="/etc/mail/spamassassin"
MAIL_ADDRESS="me@mydomain"
SA_RESTART="killall -HUP spamassassin"

I'm testing SPAMCOP_TOP200 at the mo so I've not much idea how
effective it is.

HTH

Nigel

Re: rules_du_jour question

Posted by "Shaun T. Erickson" <st...@gmail.com>.
On 10/29/06, Leander Koornneef <l....@ic-s.nl> wrote:
>
> If you are using spamassassin 3.1, you can use sa-update to get the SARE
> rulesets from the channel provided by http://saupdates.openprotect.com/.
> This negates the necessity to run rulesdujour alongside sa-update. This
> channel consists only of "safe" rules.

Ok. I've set that up and run it and now I have the standard set or
rules and the safe sare rules under "/var/lib/spamassassin/3.001007".

Two questions:

Do many people use the non-sare rulesets that I see are available via
rules_du_jour (i.e., TRIPWIRE ANTIDRUG RANDOMVAL BOGUSVIRUS
ZMI_GERMAN)? Are those something I'd still likely want to get via
rules_du_jour?

rules_du_jour restarts amavisd-new after it runs, but sa-update
doesn't. Do most people run it out of cron and simply append an
(without the quotes, of course) " && /etc/init.d/amavis reload" to the
command line? Or is there another, more preferred method?

Sorry for sounding clueless. I've never tried using rulsets other than
what came with SA before. :)
-- 
        -ste

Re: rules_du_jour question

Posted by Leander Koornneef <l....@ic-s.nl>.
On 29-okt-2006, at 7:38, Shaun T. Erickson wrote:

> I've just downloaded this and set it up. I see there are MANY rulesets
> I can choose from, but I have no idea if they are all 'safe' (not even
> sure what I mean by that). Is there a subset of all these rulesets,
> that "everybody" uses, or does everyone use all of them? How do you
> decide which to use and which not to use?

If you are using spamassassin 3.1, you can use sa-update to get the SARE
rulesets from the channel provided by http://saupdates.openprotect.com/.
This negates the necessity to run rulesdujour alongside sa-update. This
channel consists only of "safe" rules.

Leander

Re: rules_du_jour

Posted by Leander Koornneef <l....@ic-s.nl>.
Those kinds op spam are hitting all kinds of rules here, including  
rulesets from SARE:

X-Spam-Status: Yes, hits=14.1 tagged_above=-999.0 required=3.0  
tests=BAYES_99, EXTRA_MPART_TYPE, HTML_10_20, HTML_MESSAGE,  
MY_CID_AND_ARIAL2, MY_CID_AND_CLOSING, MY_CID_AND_STYLE,  
MY_CID_ARIAL2_CLOSING, MY_CID_ARIAL_STYLE, SARE_GIF_ATTACH,  
TVD_FW_GRAPHIC_ID1

I suspect you haven't done much tweaking on your SA setup?

Leander

On 30-okt-2006, at 21:45, User for SpamAssassin Mail List wrote:

>
> Has anyone come up with a rule that will combat the spam that I  
> have been
> seeing lately?
>
> That is a spam that rambles about much of nothing then has an image  
> or a
> link at the bottom.
>
> I see more and more of these and it seems like the spammers have  
> figured
> out a way to get this past SA.
>
> I include one such message at the end of this post.
>
> Thanks,
>
> Ken
>
>
>
> Example of this spam:
>
> [IMAGE]
> Jeg er udvalgt som blogger, dvs. There is little doubt that  
> asynchronous
> solutions require us to think in new ways as we have to deal with
> concurrency, out-of-sequence issues, correlation and other. Ingen
> interesse mere. But it makes me feel better that Ted Neward seems  
> to beat
> me in that category, though. In my eyes this is really the best  
> indicator
> of success for a pattern language. We don't have to go further than  
> the
> local coffee shop. But it makes me feel better that Ted Neward  
> seems to
> beat me in that category, though. While the conference logistics  
> can be
> quirky at times the content is top notch. Even if you choose the  
> "right"
> specification, it still is likely to evolve over time. Jeg er  
> udvalgt som
> blogger, dvs. However, when building distributed applications, that
> asymmetry really has no place. After "loosely coupled", "stateless"  
> must
> be a close runner-up as the ultimate nirvana in buzzword-compliant
> architectures. While Java is not necessarily the greatest language to
> "host" a DSL we can go a lot further than developers generally  
> believe or
> care for. Ideally, the debate would involve alcoholic beverages and  
> the
> other person would pick up the check. This time, though, Ken Arnold  
> stole
> a little bit of my show by publishing an excellent article in ACM  
> Queue
> magazine called "Programmers are People, too". During the proverbial
> hallway discussions we started talking about boxes and lines, but in a
> profound way. Read on to learn more about the implementation and our
> experiences with intra-JVM EDA. Hearing this tag line for the third or
> fourth time got me wondering, "what really is the difference between
> coding and configuring? For one thing, a fair number of my  
> intellectual
> drinking buddies tend to congregate around the large software  
> company in
> the Pacific Northwest. First, because I was going to meet the  
> exalted one
> in person.
>
>
>
>


rules_du_jour

Posted by User for SpamAssassin Mail List <sp...@pcez.com>.
Has anyone come up with a rule that will combat the spam that I have been
seeing lately?

That is a spam that rambles about much of nothing then has an image or a
link at the bottom.

I see more and more of these and it seems like the spammers have figured
out a way to get this past SA.

I include one such message at the end of this post.

Thanks,

Ken



Example of this spam:

[IMAGE]
Jeg er udvalgt som blogger, dvs. There is little doubt that asynchronous
solutions require us to think in new ways as we have to deal with
concurrency, out-of-sequence issues, correlation and other. Ingen
interesse mere. But it makes me feel better that Ted Neward seems to beat
me in that category, though. In my eyes this is really the best indicator
of success for a pattern language. We don't have to go further than the
local coffee shop. But it makes me feel better that Ted Neward seems to
beat me in that category, though. While the conference logistics can be
quirky at times the content is top notch. Even if you choose the "right"
specification, it still is likely to evolve over time. Jeg er udvalgt som
blogger, dvs. However, when building distributed applications, that
asymmetry really has no place. After "loosely coupled", "stateless" must
be a close runner-up as the ultimate nirvana in buzzword-compliant
architectures. While Java is not necessarily the greatest language to
"host" a DSL we can go a lot further than developers generally believe or
care for. Ideally, the debate would involve alcoholic beverages and the
other person would pick up the check. This time, though, Ken Arnold stole
a little bit of my show by publishing an excellent article in ACM Queue
magazine called "Programmers are People, too". During the proverbial
hallway discussions we started talking about boxes and lines, but in a
profound way. Read on to learn more about the implementation and our
experiences with intra-JVM EDA. Hearing this tag line for the third or
fourth time got me wondering, "what really is the difference between
coding and configuring? For one thing, a fair number of my intellectual
drinking buddies tend to congregate around the large software company in
the Pacific Northwest. First, because I was going to meet the exalted one
in person.




Re: rules_du_jour question

Posted by Loren Wilton <lw...@earthlink.net>.
Many of them are SARE rulesets.  Look at www.rulesemporium.com/rules to see 
the descriptions of the various rulesets.  No, not all of them are "safe". 
Many of them are deliberately graded by your willingness to live with 
possible FPs.  In general they range from xxx0.cf for "dead safe" to xxx4 or 
so for "possibly fairly risky".

        Loren