You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Harry Putnam <re...@newsguy.com> on 2011/04/30 05:00:32 UTC

How to get a fresh start in messy old setup

Setup: Single user Linux Desktop, and home Family lan
  Running: Gentoo Linux (kernel-2.6.33)
           sendmail-8.14 
           procmail-3.22           
           spamassassin-3.3.1
           Mail and News reader: emacs/gnus (emacs is version 24) 

-------        ---------       ---=---       ---------      -------

I'm hoping to get some input as to whether the plan I've laid out
below is likely to do what I want it to, and end with a much cleaner
and easily managed mail setup.

Details:
My mail setup has evolved to the messy state it is in over
15 yrs or so..

For several yrs early on, I experimented with procmail in an attempt
to fight spam and organize my mail. I have continued to use procmail
all this time.  Consequently I have dozens of old procmail recipes,
many still lingering in .procmailrc.

I'm currently using a combination of my own procmail filters and
spamassassin run from procmailrc as spamc.  So, I currently have some
of my own filters in play before SA gets called, then more procmail
rules come into use after the SA call, to actually sort to specific
mail folders.

Some minor customizations of spamassassin, by way of local.cf, are
also involved.

Its quite a mess currently and would take a fair bit of effort to get
really cleaned up and working efficiently.

-------        ---------       ---=---       ---------      -------- 

I thought I might try to go at it with the approach outlined below,
and would like to here any comments people might have about it.

I suspect that SA has gotten so good that I could be using SA alone to
filter out spam for my smallish needs.

Here is how I think it might work:

While continuing to use my messy setup as it is for all `real' mail
associated stuff, I thought I might grab a copy of all incoming mail
(with a procmail rule) and run it thru a sort of experimental
(parallel) setup consisting of spamassassin and nothing else.

No homeboy recipes or the like, just straight up spamassassin.  The
only sorting would be to a SPAM or HAM folder.

Turn on bayes learning, in this parallel experimental system, while
keeping that turned off in the other `real' setup.

Aside:
,----
| I'm not sure how to manage having bayes learning off on the `real'
| setup and on in the parallel experimental setup, but I expect there
| will be some way to do it. Maybe it can be done at the call to SA
| instead of in `local.cf' (Any advice on that would be especially
| welcome).
`----

Then keep close track of how well SA filters the spam out. And closely
follow a systematic method of making sure SA is being taught how to
get it right.  (I don't currently really know how that is done, but
will have to get that understood before starting the extra
experimental setup).

If all this sounds confusing, think of it as two mail setups running in
parallel. 1) filtering and sorting for real use
            
          2) Using a copy of all incoming mail; filtering with SA 
             only and hoping to train bayes to the point that SA and
             bayes are all I need to filter out spam.

It would involve passing something like 40% of my mail thru SA twice,
but my influx of mail is not so high that even doubling the work of SA
would  make much difference.  But it won't actually be doubled since in
the `real' system quite a bit of stuff is handled before SA is called.

So, the incoming copy, snagged first thing, would be run thru SA to
check for spam, and then be sorted to either a SPAM or HAM mailbox.

This parallel setup wouldn't risk harming anything since the mail in
those two folders would also be present in the other `real' mail
setup.

After enough training has transpired, I am expecting to find that I can
just switch to using the experimental system, and only add my mail
sorting (not related to spam)  rules after SA has been run, in a new
vastly trimmed down ~/.procmailrc. 

The goal would be to end up with a system that relies on SA/bayes alone for
filtering of spam, and then procmail for sorting to specific folders
after the initial despamming has happened.

At that point the experimental system would become the `real' system,
and I could scrap my old setup.


Re: How to get a fresh start in messy old setup

Posted by ji...@jidanni.org.
All I know is I first run mail through procmail to filter out the big
items before they get to spamassassin.

Never enabled Bayes and don't intend to.

Hmm, I seem to have described it in http://jidanni.org/comp/spam/spamdealer.html