You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by newby 23 <c....@espero.org.uk> on 2007/09/12 22:34:53 UTC

spamassassin management by file deletion

I use a domain managed by HOSTROUTE, which has installed spamassassin as a
mail filter.  My filespace is limited to 10MB, of which some 7.7MB are
currently devoted to spamassassin.  Thus, I need to prune this quickly to
maintain service.

As I do not maintain the system, I cannot manage spamassassin in the usual
ways.  Instead, I think that I am limited to deleting files and altering the
user_prefs file.  

The following files are present in my .spamassassin directory:

auto-whitelist, bayes_journal, bayes_seen, bayes_toks, users_prefs

As I have been unable to find documentation covering a situation like this,
I would very much appreciate any insights that you could offer.

Thank you,

Colin
-- 
View this message in context: http://www.nabble.com/spamassassin-management-by-file-deletion-tf4431882.html#a12643646
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: spamassassin management by file deletion

Posted by newby 23 <c....@espero.org.uk>.

Theo Van Dinter-2 wrote:
> 
> On Thu, Sep 13, 2007 at 10:24:08AM -0400, Kris Deugau wrote:
>> >How do I disable AWL?
>> 
>> Not sure;  check the docs for the version of SA you're using.  It *has* 
>> changed more than once in the last year or so IIRC.
> 
> It's still the same "use_auto_whitelist 0", though it's recommended to
> just
> not load the plugin if possible (which it isn't in this case).
> 

I've set "use_auto_whitelist 0" in user_prefs.  Can I now delete my
auto-whitelist?

I'll hold off on changing the bayes_expiry_max_db_size.  This being the
case, is there anything I can do to prune the bayes_toks file?

Colin
-- 
View this message in context: http://www.nabble.com/spamassassin-management-by-file-deletion-tf4431882.html#a12663007
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: spamassassin management by file deletion

Posted by Theo Van Dinter <fe...@apache.org>.
On Thu, Sep 13, 2007 at 10:24:08AM -0400, Kris Deugau wrote:
> >How do I disable AWL?
> 
> Not sure;  check the docs for the version of SA you're using.  It *has* 
> changed more than once in the last year or so IIRC.

It's still the same "use_auto_whitelist 0", though it's recommended to just
not load the plugin if possible (which it isn't in this case).

> >My users_prefs file does not currently have a bayes_expiry_max_db_size
> >option in it.  Do I simply add one, setting a smaller value?
> 
> Yep.

I would recommended against this.  150k is a good running value, making it
lower will potentially limit the effectiveness of using Bayes.  It's also
worth noting that the expire system has forced minimum value of 100k here.

> >There's nothing that I can do with bayes_toks?
> 
> To actually trim the file, you may have to discard the Bayes database 
> you've got and train a new one with the new bayes_expiry_max_db_size 
> value.  BerkelyDB doesn't shrink the file when a record is deleted;  you 
> usually have to copy the live data to a new file and move it over top of 
> the old one.

FWIW, that's exactly what the bayes expiry system does.

-- 
Randomly Selected Tagline:
"Phenomenal Cosmic Powers, Itty Little Living Space."   - Aladdin

Re: spamassassin management by file deletion

Posted by Kris Deugau <kd...@vianet.ca>.
newby 23 wrote:
> How do I disable AWL?

Not sure;  check the docs for the version of SA you're using.  It *has* 
changed more than once in the last year or so IIRC.

> My users_prefs file does not currently have a bayes_expiry_max_db_size
> option in it.  Do I simply add one, setting a smaller value?

Yep.

>  If so, how do
> I get a sense of what a good value is?

Trial and error.  :/  FWIW, I have a system with a global Bayes DB set 
for 1,500,000 tokens, running ~45M, and my account on my personal system 
runs ~5.5M with the default 150,000 tokens.  It looks like it's pretty 
much hardcoded to keep at least 100,000 tokens, so you might get down as 
far as ~3M with that setting.

> There's nothing that I can do with bayes_toks?

To actually trim the file, you may have to discard the Bayes database 
you've got and train a new one with the new bayes_expiry_max_db_size 
value.  BerkelyDB doesn't shrink the file when a record is deleted;  you 
usually have to copy the live data to a new file and move it over top of 
the old one.

-kgd

Re: spamassassin management by file deletion

Posted by newby 23 <c....@espero.org.uk>.
> Hmm.  Do you have shell access?  It's not necessary, but it'll make 
> things easier if you do.

No, I don't have shell access.  I can access the file space by FTP, though.

> How big are each of those files?

auto-whitelist = 0.7MB
bayes_journal = 70kB
bayes_seen = 0.3MB
bayes_toks = 5.3MB
users_prefs = 1.5kB

> You'll probably want to disable the AWL and delete auto-whitelist;

How do I disable AWL?

> You'll probably also want to fiddle with the Bayes directive that 
> controls how large the Bayes data files get;  while it works on number 
> of tokens rather than disk size it can be give a rough estimate of disk 
> use.  The default bayes_expiry_max_db_size of 150,000 tokens may be too 
> large, but it looks like you can't make it much smaller.

My users_prefs file does not currently have a bayes_expiry_max_db_size
option in it.  Do I simply add one, setting a smaller value?  If so, how do
I get a sense of what a good value is?

> Over the longer term, you can delete bayes_journal and bayes_seen; 
> those are not critical to proper operation of the Bayes subsystem. 
> However, if you remove bayes_seen, you'll end up re-learning messages 
> over and over again if regularly re-learn a folder that you don't empty.

Got it.  They don't seem to be a big problem right now.

There's nothing that I can do with bayes_toks?

Thank you for the above, Kris: it helps a good deal.

Best,

Colin
-- 
View this message in context: http://www.nabble.com/spamassassin-management-by-file-deletion-tf4431882.html#a12652462
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: spamassassin management by file deletion

Posted by Kris Deugau <kd...@vianet.ca>.
newby 23 wrote:
> I use a domain managed by HOSTROUTE, which has installed spamassassin as a
> mail filter.  My filespace is limited to 10MB,

O_o  That sounds awfully low, even for cheap-to-free hosting.

According to http://www.hostroute.co.uk/hostingplans.html, the smallest 
plan is 20M;  you might want to contact them and see why you apparently 
only have 10M.

> of which some 7.7MB are
> currently devoted to spamassassin.  Thus, I need to prune this quickly to
> maintain service.
> 
> As I do not maintain the system, I cannot manage spamassassin in the usual
> ways.  Instead, I think that I am limited to deleting files and altering the
> user_prefs file.  

Hmm.  Do you have shell access?  It's not necessary, but it'll make 
things easier if you do.

> The following files are present in my .spamassassin directory:
> 
> auto-whitelist, bayes_journal, bayes_seen, bayes_toks, users_prefs

How big are each of those files?

You'll probably want to disable the AWL and delete auto-whitelist;  it 
tends to grow without bound and while *I've* never had functional 
trouble from it, quite a few others on this list have reported problems 
of one kind or another aside from the disk usage.  (I wrote a script a 
long time ago to actually clean out old entries, and trim the file size 
- google for trim_whitelist.  Note that you pretty much REQUIRE shell 
access to use this.)

You'll probably also want to fiddle with the Bayes directive that 
controls how large the Bayes data files get;  while it works on number 
of tokens rather than disk size it can be give a rough estimate of disk 
use.  The default bayes_expiry_max_db_size of 150,000 tokens may be too 
large, but it looks like you can't make it much smaller.

Running "man Mail::SpamAssassin::Conf" from a shell on your webhost 
should give you details on configuration directives, but I'm pretty sure 
the same listing is available on the SA site somewhere under the Docs link.

Over the longer term, you can delete bayes_journal and bayes_seen; 
those are not critical to proper operation of the Bayes subsystem. 
However, if you remove bayes_seen, you'll end up re-learning messages 
over and over again if regularly re-learn a folder that you don't empty.

-kgd