You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Linda Walsh <sa...@tlinx.org> on 2009/04/01 21:27:22 UTC

Re: user-db size, excess growth...limits ignored

01234567890123456789012345678901234567890123456789012345678901234567890123456789
Matt Kettler wrote:
> Linda Walsh wrote:
>> Matt Kettler wrote:
>>>> I see 3 DB's in my user directory (.spamassassin).
>>>>    auto-whitelist (~80MB),   bayes_seen (~40MB),   bayes_toks (~20MB)

>>> expiry will only affect bayes_toks. Currently neither auto-whitelist nor
>>> bayes_seen have any expiry mechanism at all.
>> ---
>> So they just grow without limit?
> Yep. Not ideal, and there's bugs open on both.

>>  How often does the whitelist get sync'd to disk?
> In the case of the whitelist, it's per-message.
-----
	*ouch* -- you mean each message writes out an 80MB white-list file?
That's alot of I/O per message, no wonder spamd seems to be slowing down...


>>     Having changed the user_prefs files back to the default
>> setting (i.e. deleted my previous addition) -- 2 days ago, and system was
>> rebooted 1day14hours ago, I'm certain spamd has been restarted.
> Hmm, can you set bayes_expiry_max_db_size in a user_prefs file? That
> seems like an option that might be privileged and only honored at the
> site-wide level. An absurdly large value can bog the whole server down
> when processing mail, so an end user could DoS your machine if allowed
> to set this.
----
	I *thought* I could set it -- certainly, the only place I
*increased* the tokens beyond the *default* was in user-prefs. That
*seems to have worked in bumping up the toks to 500K, but, now,
lowering it, is being ignored.  Perhaps the user-pref option to set
#tokens changed and an old version allowed it and raised it to 500K,
but newer version disallows so I can't 'relower' it (though I'd think
global 150K limit would have been re-applied).



> That said, 3.1.7 is vulnerable to CVE-2007-0451 and CVE-2007-2873.
> 
> You should seriously consider upgrading for the first one.

-----
	While I was supporting multiple local users at one point, I'm only
local user, so local-user escalation to create local service denial isn't
top-most concern.  Doesn't mean shouldn't upgrade for other reasons.


I'm still *Greatly* concerned about an 80MB file being written to disk
potentially on every email message incoming.  That's seems a high
overhead, or are their mitigating factors that decrease that amount
under 99% of the cases?

Tnx,
Linda

Re: user-db size, excess growth...limits ignored

Posted by Matt Kettler <mk...@verizon.net>.
RW wrote:
> On Wed, 01 Apr 2009 12:27:22 -0700
> Linda Walsh <sa...@tlinx.org> wrote:
>
>   
>> 01234567890123456789012345678901234567890123456789012345678901234567890123456789
>> Matt Kettler wrote:
>>     
>
>   
>>>>  How often does the whitelist get sync'd to disk?
>>>>         
>>> In the case of the whitelist, it's per-message.
>>>       
>> -----
>> 	*ouch* -- you mean each message writes out an 80MB white-list
>> file? That's alot of I/O per message, no wonder spamd seems to be
>> slowing down...
>>     
>
> I think it's fairly safe to assume that the Berkeley DB libraries were
> not written by people who dropped-out in the second week of
> C-programming 101, and never learned any more sophisticated way of
> accessing a database file than reading it in and then writing it out. 
>
> http://en.wikipedia.org/wiki/Berkeley_DB
>
> http://en.wikipedia.org/wiki/Mmap
>
>   
True, I did not mean to imply the entire file is written per message. I
meant that *a* write occurs on a per-message basis.





Re: user-db size, excess growth...limits ignored

Posted by RW <rw...@googlemail.com>.
On Wed, 01 Apr 2009 12:27:22 -0700
Linda Walsh <sa...@tlinx.org> wrote:

> 01234567890123456789012345678901234567890123456789012345678901234567890123456789
> Matt Kettler wrote:

> >>  How often does the whitelist get sync'd to disk?
> > In the case of the whitelist, it's per-message.
> -----
> 	*ouch* -- you mean each message writes out an 80MB white-list
> file? That's alot of I/O per message, no wonder spamd seems to be
> slowing down...

I think it's fairly safe to assume that the Berkeley DB libraries were
not written by people who dropped-out in the second week of
C-programming 101, and never learned any more sophisticated way of
accessing a database file than reading it in and then writing it out. 

http://en.wikipedia.org/wiki/Berkeley_DB

http://en.wikipedia.org/wiki/Mmap

Re: user-db size, excess growth...limits ignored

Posted by LuKreme <kr...@kreme.com>.
On 2-Apr-2009, at 14:10, Linda Walsh wrote:
> LuKreme wrote:
>> On 1-Apr-2009, at 13:27, Linda Walsh wrote:
>>> *ouch* -- you mean each message writes out an 80MB white-list  
>>> file? That's alot of I/O per message, no wonder spamd seems to be  
>>> slowing down...
>> Nooooo.... these are DB files.  Data is added to them, this does  
>> not necessitate rewriting the entire file.
> ---
>
> Yeah -- then this refers back to the bug about there being no way  
> to  prune
> that file -- it just slowly grows and needs to be read in when spamd  
> starts(?)

Erm... You are familiar with how DB files work?  Data is looked up in  
them, the entire database is not read into memory.  The entire point  
of a DB file is to have a structure that it is relatively easy to look  
up against.

The size of the database is largely irrelevant and I bet you would be  
hard pressed to see much difference running with an 8MB, 80MB or 800MB  
database file.

-- 
Hudd: 'I've just done this radio show where I never met any of the
	other actors and I didn't understand what any of it was about'
Moore: 'Ah, yes I expect that's the thing I'm in.'


Re: user-db size, excess growth...limits ignored

Posted by Jonas Eckerman <jo...@frukt.org>.
Linda Walsh skrev:

> Yeah -- then this refers back to the bug about there being no way to  prune
> that file -- it just slowly grows and needs to be read in when spamd 
> starts(?)

No.

The AWL is stored in a database, and spamd does not read the whole 
database into memory. It just looks up and updates the address pairs as 
needed.

The same principle is true for the bayes database.

> So the only real harm is the increased read-initialization and the run-time
> AWL length?

I don't know what you mean with "run-time AWL length", but I don't think 
the time to open a Berkley DB grows much because the file grows.

What will become slower as the file grows is the database updates and to 
a lesser degree the lookups.

If the AWL or bayes database grows enough for this to actually do harm, 
I'd suggest moving to a SQL database (where expiration of old address 
pairs is pretty easy to implement).


Regards
/Jonas

Re: user-db size, excess growth...limits ignored

Posted by Linda Walsh <sa...@tlinx.org>.
  LuKreme wrote:
> On 1-Apr-2009, at 13:27, Linda Walsh wrote:
>> *ouch* -- you mean each message writes out an 80MB white-list file? 
>> That's alot of I/O per message, no wonder spamd seems to be slowing 
>> down...
> 
> Nooooo.... these are DB files.  Data is added to them, this does not 
> necessitate rewriting the entire file.
---

Yeah -- then this refers back to the bug about there being no way to  prune
that file -- it just slowly grows and needs to be read in when spamd starts(?)
and spamd needs to keep that info around as the basis for its AWL scoring, no?
So the only real harm is the increased read-initialization and the run-time
AWL length?


Re: user-db size, excess growth...limits ignored

Posted by LuKreme <kr...@kreme.com>.
On 1-Apr-2009, at 13:27, Linda Walsh wrote:
> *ouch* -- you mean each message writes out an 80MB white-list file?  
> That's alot of I/O per message, no wonder spamd seems to be slowing  
> down...

Nooooo.... these are DB files.  Data is added to them, this does not  
necessitate rewriting the entire file.


-- 
I have a love child who sends me hate mail