You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Jason Frisvold <xe...@gmail.com> on 2006/03/21 15:54:19 UTC

SQL Bayes

Greetings,

I'm looking for some fine tuning help.  It seems that we are currently
I/O limited due to the massive load spamassassin puts on the bayes
database.  The database is currently about 3.5 Gig, including indices.

I have a few questions regarding the operation of Bayes.

1) How effective is it really?  Will users likely notice a huge change
if bayes was disabled?

2) Purging.  I know bayes purges itself regularly.  However, if a user
is removed from the mail server, and bayes is never run for that user
again, will those tokens automatically be purged after a time?  Or
will they be stuck there forever?  In the case of the latter, how do I
manually purge those?

3) Any thoughts on speeding up the database calls?  Is faster hard
drives/clustering my only option?

Thanks!

--
Jason 'XenoPhage' Frisvold
XenoPhage0@gmail.com

Re: SQL Bayes

Posted by "Daniel J. Cody" <dc...@uwm.edu>.
Jason Frisvold wrote:
> 1) How effective is it really?  Will users likely notice a huge change
> if bayes was disabled?

We use a site wide bayes DB, and it's very effective for us. As an 
example, last week our bayes DB got corrupted and we lost that aspect of 
scoring for about 3 hours. Our help desk got over 25 calls about higher 
levels of spam during that time, so yes, they notice.

> 3) Any thoughts on speeding up the database calls?  Is faster hard
> drives/clustering my only option?

How many calls/lookups are you seeing now? We run our DB on a moderatly 
fast Dell with fast disk with lots of ram and it handles ~300 
queries/second very well.

Shout if you have any other questions..

Dan

Re: SQL Bayes

Posted by Jason Frisvold <xe...@gmail.com>.
On 3/21/06, Duane Hill <d....@yournetplus.com> wrote:
> Dell  PowerEdge 2550, Dual P4 2.4 ghz, 2 gig ram, hardware raid5, 4x74
> gig  10,000  rpm scsi III drives from Seagate. Hopefully I can get the
> ram upgraded to 4 gig eventually.

I'm not using anything quite as powerful, but innodb has made a
massive difference.. thanks for the tip..  :)

> "This message is made of 100% recycled electrons."

Heh..  :P

--
Jason 'XenoPhage' Frisvold
XenoPhage0@gmail.com

Re: SQL Bayes

Posted by Duane Hill <d....@yournetplus.com>.
On Tuesday, March 21, 2006 at 8:51:09 PM, xenophage0@gmail.com confabulated:

> On 3/21/06, Duane Hill <d....@yournetplus.com> wrote:
>> I  am  using a per-user configuration here and my DB size currently is
>> sitting at just over 10 gig. All of the tables within the MySQL schema
>> are  using  the  InnoDB storage instead of the MyISAM. I am also using
>> the  SA  plugin  for  caching  with  MySQL  that  can  be  found here:
>> http://wiki.apache.org/spamassassin/DBIPlugin?highlight=%28DBI%29
>> Everything  has  been  running  without any issues so far for the past
>> couple  months. My bayes_vars table has 14,102 rows which would be the
>> same equivalent as 14,102 e-mail accounts.

> What kind of hardware are you running MySQL on?  I'll have to give
> innodb a try..  :)  Thanks for the tip...

Dell  PowerEdge 2550, Dual P4 2.4 ghz, 2 gig ram, hardware raid5, 4x74
gig  10,000  rpm scsi III drives from Seagate. Hopefully I can get the
ram upgraded to 4 gig eventually.

-- 
"This message is made of 100% recycled electrons."












Re: SQL Bayes - MyISAM locks a problem?

Posted by Jason Frisvold <xe...@gmail.com>.
On 3/21/06, Andrew Donkin <ar...@waikato.ac.nz> wrote:
> Jason, if you haven't moved to innodb already, try "SHOW PROCESSLIST"
> in mysql.  Do you have many threads locked on "SELECT FROM
> bayes_token" and "INSERT INTO bayes_token"?

Yep, that's it completely.

> I had about 100 threads locked, so I am changing to InnoDB for its
> fine-grained locking.  About three days ago I issued "ALTER TABLE
> bayes_token ENGINE innodb".  I'll let you know when it finishes.

*grin*  I'm debating whether to try that or to wipe bayes compeltely
and start from scratch..  *sigh*  Or, I may dump the data and then
re-import it into inno.. not sure what will be faster..

> Andrew Donkin                  Waikato University, Hamilton,  New Zealand

--
Jason 'XenoPhage' Frisvold
XenoPhage0@gmail.com

Re: SQL Bayes - MyISAM locks a problem?

Posted by Andrew Donkin <ar...@waikato.ac.nz>.
Duane Hill has:

> per-user [...] just over 10 gig [...] InnoDB [...]
> http://wiki.apache.org/spamassassin/DBIPlugin [...] bayes_vars table
> has 14,102 rows

Jason Frisvold:

> I'll have to give innodb a try..  :)  Thanks for the tip...

Jason, if you haven't moved to innodb already, try "SHOW PROCESSLIST"
in mysql.  Do you have many threads locked on "SELECT FROM
bayes_token" and "INSERT INTO bayes_token"?

I had about 100 threads locked, so I am changing to InnoDB for its
fine-grained locking.  About three days ago I issued "ALTER TABLE
bayes_token ENGINE innodb".  I'll let you know when it finishes.

-- 
_________________________________________________________________________
Andrew Donkin                  Waikato University, Hamilton,  New Zealand

Re: SQL Bayes

Posted by Jason Frisvold <xe...@gmail.com>.
On 3/21/06, Duane Hill <d....@yournetplus.com> wrote:
> I  am  using a per-user configuration here and my DB size currently is
> sitting at just over 10 gig. All of the tables within the MySQL schema
> are  using  the  InnoDB storage instead of the MyISAM. I am also using
> the  SA  plugin  for  caching  with  MySQL  that  can  be  found here:
> http://wiki.apache.org/spamassassin/DBIPlugin?highlight=%28DBI%29
> Everything  has  been  running  without any issues so far for the past
> couple  months. My bayes_vars table has 14,102 rows which would be the
> same equivalent as 14,102 e-mail accounts.

What kind of hardware are you running MySQL on?  I'll have to give
innodb a try..  :)  Thanks for the tip...

--
Jason 'XenoPhage' Frisvold
XenoPhage0@gmail.com

Re: SQL Bayes

Posted by Duane Hill <d....@yournetplus.com>.
On Tuesday, March 21, 2006 at 2:54:19 PM, xenophage0@gmail.com confabulated:

> Greetings,

> I'm looking for some fine tuning help.  It seems that we are currently
> I/O limited due to the massive load spamassassin puts on the bayes
> database.  The database is currently about 3.5 Gig, including indices.

I  am  using a per-user configuration here and my DB size currently is
sitting at just over 10 gig. All of the tables within the MySQL schema
are  using  the  InnoDB storage instead of the MyISAM. I am also using
the  SA  plugin  for  caching  with  MySQL  that  can  be  found here:
http://wiki.apache.org/spamassassin/DBIPlugin?highlight=%28DBI%29
Everything  has  been  running  without any issues so far for the past
couple  months. My bayes_vars table has 14,102 rows which would be the
same equivalent as 14,102 e-mail accounts.

> I have a few questions regarding the operation of Bayes.

> 1) How effective is it really?  Will users likely notice a huge change
> if bayes was disabled?

I,  like  another  response,  will start to get customers e-mailing me
asking why they started receiving more Spam than normal.

> 2) Purging.  I know bayes purges itself regularly.  However, if a user
> is removed from the mail server, and bayes is never run for that user
> again, will those tokens automatically be purged after a time?  Or
> will they be stuck there forever?  In the case of the latter, how do I
> manually purge those?

I  usually  have manually removed the table rows myself using a simple
SQL statement. This will get implemented into our system soon for when
an  e-mail  is  removed  from the server, the corresponding table rows
will get removed from the bayes filtering. Here is an example of how I
removed an account from the tables:

This  will remove all tokens from the bayes_token table for a specific
account (username) in the bayes_vars table:

    delete from bayes_token where id = (select id from bayes_vars where
    username = 'account@example.com');

Using the same syntax you can then do the same for the other tables
that need to have items removed:

    delete from bayes_seen where id = (select id from bayes_vars where
    username = 'account@example.com');

    delete from bayes_expire where id = (select id from bayes_vars
    where username = 'account@example.com');

    delete from awl where username = (select username from bayes_vars
    where username = 'account@example.com');

Then, finally removing the account from bayes_vars:

    delete from bayes_vars where username = 'account@example.com';

> 3) Any thoughts on speeding up the database calls?  Is faster hard
> drives/clustering my only option?

Check out http://wiki.apache.org/spamassassin/DBIPlugin?highlight=%28DBI%29
if you haven't as of yet.

> Thanks!

> --
> Jason 'XenoPhage' Frisvold
> XenoPhage0@gmail.com



-- 
"This message is made of 100% recycled electrons."