You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Jason Ede <J....@birchenallhowden.co.uk> on 2011/07/25 16:24:34 UTC

slow bayes queries using innodb

We've 2 reasonably powerful mail servers handling incoming email and sharing the load. We've moved to a single bayes database (to make training easier) and its stored in mariadb and all of the bayes tables are innodb

# mysql -V
mysql  Ver 14.16 Distrib 5.2.7-MariaDB, for unknown-linux-gnu (x86_64) using readline 5.1

The bayes settings are...

bayes_store_module Mail::SpamAssassin::BayesStore::MySQL
bayes_sql_dsn  DBI:mysql:bayes:localhost
bayes_sql_username sa_user
bayes_sql_password ********
bayes_sql_override_username root
bayes_expiry_max_db_size 1000000

If we have just one server (local) using the database then the query times are fine. If the other machine tries to use the same bayes database then we quickly develop a large backlog on the second server. Looking at the slow-queries log on the database server we see lots of messages similar to those shown below...

# User@Host: sa_user[sa_user] @ gateway.ourdomain.com [10.10.1.4]
# Thread_id: 62925  Schema: bayes  QC_hit: No
# Query_time: 14.454067  Lock_time: 0.000139  Rows_sent: 0  Rows_examined: 0
SET timestamp=1311593054;
INSERT INTO bayes_token
               (id, token, spam_count, ham_count, atime)
               VALUES ('1','ÞûÞÛÈ','0','1','1311592636')
               ON DUPLICATE KEY UPDATE spam_count = GREATEST(spam_count + '0', 0),
                                       ham_count = GREATEST(ham_count + '1', 0),
                                       atime = GREATEST(atime, '1311592636');

sa-learn -force-expire is run regularly as is table optimisation.  The innodb settings are below...

innodb_buffer_pool_size = 2G
innodb_additional_mem_pool_size = 20M
innodb_log_file_size = 64M
innodb_log_buffer_size = 8M
innodb_flush_log_at_trx_commit = 2
innodb_flush_method=O_DIRECT
innodb_lock_wait_timeout = 50
innodb_file_per_table

Is there something I've missed with this configuration? Anything else I can do as I'd really like 1 bayes database, but cannot have mail backing up because of it. I've tried changing flush_log_at_trx_commit to 1 or 0, but doesn't seem to make a huge difference.

Jason

Re: slow bayes queries using innodb

Posted by Benny Pedersen <me...@junc.org>.
On Mon, 25 Jul 2011 13:13:36 -0500, Duane Hill wrote:

> I can provide my my.cnf MySQL config. It is based off a server using
> 12 gig of memory.

i like a copy of my.cnf

Re: slow bayes queries using innodb

Posted by Duane Hill <du...@duanemail.org>.
Monday, July 25, 2011, 1:26:22 PM, you wrote:

> On 7/25/11 2:24 PM, Duane Hill wrote:
>> Monday, July 25, 2011, 1:16:54 PM, you wrote:
>>
>>> On 7/25/11 2:13 PM, Duane Hill wrote:
>>>> Should this even be an issue if one is using an SQL backend and has it
>>>> configured to handle the extra processing? The OP is using MySQL as
>>>> the backend. I've had MySQL configured for several years and have
>>>> never had issues with the default 'bayes_auto_expire' left at its
>>>> default being turned on using a per-user configuration.
>>>>
>>> google for
>>> bayes_auto_expire 0
>>> you will see everyone telling you to turn it off.
>> Point me in the direction where it demonstrates a deficiency with
>> storage other than the filesystem itself. As I stated before, never
>> have I seen an issue using MySQL as a backend.
>>
> one clue rule

Ok.


Re: slow bayes queries using innodb

Posted by Michael Scheidell <mi...@secnap.com>.
On 7/25/11 2:24 PM, Duane Hill wrote:
> Monday, July 25, 2011, 1:16:54 PM, you wrote:
>
>> On 7/25/11 2:13 PM, Duane Hill wrote:
>>> Should this even be an issue if one is using an SQL backend and has it
>>> configured to handle the extra processing? The OP is using MySQL as
>>> the backend. I've had MySQL configured for several years and have
>>> never had issues with the default 'bayes_auto_expire' left at its
>>> default being turned on using a per-user configuration.
>>>
>> google for
>> bayes_auto_expire 0
>> you will see everyone telling you to turn it off.
> Point me in the direction where it demonstrates a deficiency with
> storage other than the filesystem itself. As I stated before, never
> have I seen an issue using MySQL as a backend.
>
one clue rule


-- 
Michael Scheidell, CTO
o: 561-999-5000
d: 561-948-2259
 >*| *SECNAP Network Security Corporation

    * Best Mobile Solutions Product of 2011
    * Best Intrusion Prevention Product
    * Hot Company Finalist 2011
    * Best Email Security Product
    * Certified SNORT Integrator

______________________________________________________________________
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.secnap.com/products/spammertrap/
______________________________________________________________________  

Re: slow bayes queries using innodb

Posted by Duane Hill <du...@duanemail.org>.
Monday, July 25, 2011, 1:16:54 PM, you wrote:

> On 7/25/11 2:13 PM, Duane Hill wrote:
>> Should this even be an issue if one is using an SQL backend and has it
>> configured to handle the extra processing? The OP is using MySQL as
>> the backend. I've had MySQL configured for several years and have
>> never had issues with the default 'bayes_auto_expire' left at its
>> default being turned on using a per-user configuration.
>>
> google for

> bayes_auto_expire 0

> you will see everyone telling you to turn it off.

Point me in the direction where it demonstrates a deficiency with
storage other than the filesystem itself. As I stated before, never
have I seen an issue using MySQL as a backend.

-- 
Best regards,
 Duane                            mailto:duane@duanemail.org


Re: slow bayes queries using innodb

Posted by Benny Pedersen <me...@junc.org>.
On Mon, 25 Jul 2011 14:16:54 -0400, Michael Scheidell wrote:

> you will see everyone telling you to turn it off.

that means we have a bug ? to remove that option to turn it on

Re: slow bayes queries using innodb

Posted by Michael Scheidell <mi...@secnap.com>.
On 7/25/11 2:13 PM, Duane Hill wrote:
> Should this even be an issue if one is using an SQL backend and has it
> configured to handle the extra processing? The OP is using MySQL as
> the backend. I've had MySQL configured for several years and have
> never had issues with the default 'bayes_auto_expire' left at its
> default being turned on using a per-user configuration.
>
google for

bayes_auto_expire 0

you will see everyone telling you to turn it off.



-- 
Michael Scheidell, CTO
o: 561-999-5000
d: 561-948-2259
 >*| *SECNAP Network Security Corporation

    * Best Mobile Solutions Product of 2011
    * Best Intrusion Prevention Product
    * Hot Company Finalist 2011
    * Best Email Security Product
    * Certified SNORT Integrator

______________________________________________________________________
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.secnap.com/products/spammertrap/
______________________________________________________________________  

Re: slow bayes queries using innodb

Posted by Duane Hill <du...@duanemail.org>.
Hello Michael,

Monday, July 25, 2011, 9:30:11 AM, you wrote:

> On 7/25/11 10:24 AM, Jason Ede wrote:
>>
>> We’ve 2 reasonably powerful mail servers handling incoming email and 
>> sharing the load. We’ve moved to a single bayes database (to make 
>> training easier) and its stored in mariadb and all of the bayes tables 
>> are innodb
>>
>> # mysql -V
>>
>> mysql Ver 14.16 Distrib 5.2.7-MariaDB, for unknown-linux-gnu (x86_64) 
>> using readline 5.1
>>
>> The bayes settings are…
>>
>> bayes_store_module Mail::SpamAssassin::BayesStore::MySQL
>>
>> bayes_sql_dsn DBI:mysql:bayes:localhost
>>
>> bayes_sql_username sa_user
>>
>> bayes_sql_password ********
>>
>> bayes_sql_override_username root
>>
>> bayes_expiry_max_db_size 1000000
>>
> missing this:

> bayes_auto_expire 0

> and only run the 'sa-learn –force-expire' late at night, when no one is
> doing anything.

Should this even be an issue if one is using an SQL backend and has it
configured to handle the extra processing? The OP is using MySQL as
the backend. I've had MySQL configured for several years and have
never had issues with the default 'bayes_auto_expire' left at its
default being turned on using a per-user configuration.

I can provide my my.cnf MySQL config. It is based off a server using
12 gig of memory.

-- 
Best regards,
 Duane                            mailto:duane@duanemail.org


Re: slow bayes queries using innodb

Posted by Mark Martinec <Ma...@ijs.si>.
Jason,

> Yes I have 3.3.2 so I've made that change as it seems to make a lot of
> sense to me.
> 
> However, I think that will just slow down the rate of learns and it will
> still snarl up at busy times?

True.

I don't have a better answer for your original question on a slowdown.
There may be some db tweaks to do. Shrinking your database would
most likely shrink the update times too.


Btw, automatically rebuilding a bayes db from scratch (once it has
grown way too large) is not too bad, provided the autolearning is
given quality data to work on. Letting SpamAssassin check outgoing
mail is one component of this: gives it an opportunity to see and
learn mostly ham. Letting SpamAssassin see all incoming mail (no
graylisting and limited RBL checks at the MTA level) gives it an
opportunity to get a good overview of typical spam. Coupling such
good sources of ham and spam with a quality set of network tests
and rules usually results in building a good initial Bayes database
quickly with no manual intervention. After the initial autolearning
the MTA-level spam controls may be turned back on again.

  Mark

RE: slow bayes queries using innodb

Posted by Jason Ede <J....@birchenallhowden.co.uk>.
Hi,

Yes I have 3.3.2 so I've made that change as it seems to make a lot of sense to me.

However, I think that will just slow down the rate of learns and it will still snarl up at busy times?

Jason


> -----Original Message-----
> From: Mark Martinec [mailto:Mark.Martinec+sa@ijs.si]
> Sent: 28 July 2011 10:47
> To: users@spamassassin.apache.org
> Subject: Re: slow bayes queries using innodb
> 
> On Thursday July 28 2011 11:30:33 Jason Ede wrote:
> > Even with auto_expiry off the system is still running really slow for
> > a remote server... The local server bayes updates go through really
> > quickly, but the remote servers queries are still backing up...
> >
> > i.e. from slow query log just now...
> >
> > # User@Host: sa_user[sa_user] @ gateway.XXX.XXX [XXX.XXX.XXX.XXX] #
> > Thread_id: 3363  Schema: bayes  QC_hit: No # Query_time: 24.366423
> > Lock_time: 0.000056  Rows_sent: 0  Rows_examined: 0 SET
> > timestamp=1311845047; INSERT INTO bayes_token (id, token,
> > spam_count,ham_count, atime)
> >                VALUES ('1','ú','0','1','1311844754')
> >                ON DUPLICATE KEY UPDATE spam_count =
> > GREATEST(spam_count + '0', 0), ham_count = GREATEST(ham_count + '1',
> > 0), atime = GREATEST(atime, '1311844754');
> 
> > The connection between the 2 servers is not heavily loaded and the
> > bayes traffic seems to be small queries so I don't think that would be
> > the cause of any problems...
> >
> > I'm sure there must be something simple I'm missing with this.
> 
> One thing that can radically reduce the write rate to a Bayes DB and tame
> down its growth (with possibly also beneficial effect to the classifier quality)
> is to set:
> 
>   bayes_auto_learn_on_error 1
> 
> in your local.cf.  The feature is available since SA 3.3.2.
> 
> 
> $ man Mail::SpamAssassin::Plugin::AutoLearnThreshold
>   [...]
> bayes_auto_learn_on_error (0 | 1)        (default: 0)
> 
>   With "bayes_auto_learn_on_error" off, autolearning will be
>   performed even if bayes classifier already agrees with the new
>   classification (i.e.  yielded BAYES_00 for what we are now trying
>   to teach it as ham, or yielded BAYES_99 for spam). This is a
>   traditional setting, the default was chosen to retain backwards
>   compatibility.
> 
>   With "bayes_auto_learn_on_error" turned on, autolearning will be
>   performed only when a bayes classifier had a different opinion from
>   what the autolearner is now trying to teach it (i.e. it made an
>   error in judgement). This strategy may or may not produce better
>   future classifications, but usually works very well, while also
>   preventing unnecessary overlearning and slows down database growth.
> 
> 
> 
> It may even be beneficial to start a new bayes database from scratch with
> this feature turned on.
> 
>   Mark

Re: slow bayes queries using innodb

Posted by Mark Martinec <Ma...@ijs.si>.
On Thursday July 28 2011 11:30:33 Jason Ede wrote:
> Even with auto_expiry off the system is still running really slow for a
> remote server... The local server bayes updates go through really quickly,
> but the remote servers queries are still backing up...
> 
> i.e. from slow query log just now...
> 
> # User@Host: sa_user[sa_user] @ gateway.XXX.XXX [XXX.XXX.XXX.XXX] #
> Thread_id: 3363  Schema: bayes  QC_hit: No # Query_time: 24.366423 
> Lock_time: 0.000056  Rows_sent: 0  Rows_examined: 0 SET
> timestamp=1311845047;
> INSERT INTO bayes_token (id, token, spam_count,ham_count, atime)
>                VALUES ('1','ú','0','1','1311844754')
>                ON DUPLICATE KEY UPDATE spam_count = GREATEST(spam_count +
> '0', 0), ham_count = GREATEST(ham_count + '1', 0), atime = GREATEST(atime,
> '1311844754');

> The connection between the 2 servers is not heavily loaded and the bayes
> traffic seems to be small queries so I don't think that would be the cause
> of any problems...
> 
> I'm sure there must be something simple I'm missing with this.

One thing that can radically reduce the write rate to a Bayes DB
and tame down its growth (with possibly also beneficial
effect to the classifier quality) is to set:

  bayes_auto_learn_on_error 1

in your local.cf.  The feature is available since SA 3.3.2.


$ man Mail::SpamAssassin::Plugin::AutoLearnThreshold
  [...]
bayes_auto_learn_on_error (0 | 1)        (default: 0)

  With "bayes_auto_learn_on_error" off, autolearning will be
  performed even if bayes classifier already agrees with the new
  classification (i.e.  yielded BAYES_00 for what we are now trying
  to teach it as ham, or yielded BAYES_99 for spam). This is a
  traditional setting, the default was chosen to retain backwards
  compatibility.

  With "bayes_auto_learn_on_error" turned on, autolearning will be
  performed only when a bayes classifier had a different opinion from
  what the autolearner is now trying to teach it (i.e. it made an
  error in judgement). This strategy may or may not produce better
  future classifications, but usually works very well, while also
  preventing unnecessary overlearning and slows down database growth.



It may even be beneficial to start a new bayes database from
scratch with this feature turned on.

  Mark

Re: slow bayes queries using innodb

Posted by Henrik K <he...@hege.li>.
On Thu, Jul 28, 2011 at 09:30:33AM +0000, Jason Ede wrote:
> Even with auto_expiry off the system is still running really slow for a remote server... The local server bayes updates go through really quickly, but the remote servers queries are still backing up...
> 
> i.e. from slow query log just now...
> 
> # User@Host: sa_user[sa_user] @ gateway.XXX.XXX [XXX.XXX.XXX.XXX] # Thread_id: 3363  Schema: bayes  QC_hit: No # Query_time: 24.366423  Lock_time: 0.000056  Rows_sent: 0  Rows_examined: 0 SET timestamp=1311845047; INSERT INTO bayes_token
>                (id, token, spam_count, ham_count, atime)
>                VALUES ('1','ú','0','1','1311844754')
>                ON DUPLICATE KEY UPDATE spam_count = GREATEST(spam_count + '0', 0),
>                                        ham_count = GREATEST(ham_count + '1', 0),
>                                        atime = GREATEST(atime, '1311844754');

Probably only marginal improvement, but drop atime index if you have it. 
Takes pointless resources updating it, if you run expiry once a day.


RE: slow bayes queries using innodb

Posted by Jason Ede <J....@birchenallhowden.co.uk>.
Even with auto_expiry off the system is still running really slow for a remote server... The local server bayes updates go through really quickly, but the remote servers queries are still backing up...

i.e. from slow query log just now...

# User@Host: sa_user[sa_user] @ gateway.XXX.XXX [XXX.XXX.XXX.XXX] # Thread_id: 3363  Schema: bayes  QC_hit: No # Query_time: 24.366423  Lock_time: 0.000056  Rows_sent: 0  Rows_examined: 0 SET timestamp=1311845047; INSERT INTO bayes_token
               (id, token, spam_count, ham_count, atime)
               VALUES ('1','ú','0','1','1311844754')
               ON DUPLICATE KEY UPDATE spam_count = GREATEST(spam_count + '0', 0),
                                       ham_count = GREATEST(ham_count + '1', 0),
                                       atime = GREATEST(atime, '1311844754');

mysqltuner.pl reports everything generally ok...

-------- Performance Metrics -------------------------------------------------
[--] Up for: 2h 30m 52s (293K q [32.438 qps], 3K conn, TX: 92M, RX: 130M)
[--] Reads / Writes: 16% / 84% 
[--] Total buffers: 2.6G global + 6.4M per thread (50 max threads) 
[OK] Maximum possible memory usage: 2.9G (50% of installed RAM)
[OK] Slow queries: 0% (295/293K) 
[OK] Highest usage of available connections: 68% (34/50) 
[OK] Key buffer size / total MyISAM indexes: 256.0M/95.0K 
[!!] Key buffer hit rate: 50.0% (6 cached / 3 reads) 
[OK] Query cache efficiency: 65.2% (68K cached / 105K selects) 
[OK] Query cache prunes per day: 0 
[OK] Sorts requiring temporary tables: 7% (12 temp sorts / 156 sorts) 
[OK] Temporary tables created on disk: 0% (0 on disk / 10 total) 
[OK] Thread cache hit rate: 98% (61 created / 3K connections) 
[OK] Table cache hit rate: 75% (65 open / 86 opened) 
[OK] Open file limit used: 2% (28/1K) 
[OK] Table locks acquired immediately: 99% (218K immediate / 218K locks) 
[OK] InnoDB data size / buffer pool: 302.0M/2.0G

The connection between the 2 servers is not heavily loaded and the bayes traffic seems to be small queries so I don't think that would be the cause of any problems...

I'm sure there must be something simple I'm missing with this.

Jason

--
Dr J D Ede
Senior ICT Technician, BirchenallHowden Ltd
_SIGNATURE_


> -----Original Message-----
> From: Jason Ede [mailto:J.Ede@birchenallhowden.co.uk]
> Sent: 25 July 2011 19:51
> To: users@spamassassin.apache.org
> Subject: RE: slow bayes queries using innodb
> 
> 
> > -----Original Message-----
> > From: Michael Scheidell [mailto:michael.scheidell@secnap.com]
> > Sent: 25 July 2011 15:44
> > To: users@spamassassin.apache.org
> > Subject: Re: slow bayes queries using innodb
> >
> > On 7/25/11 10:41 AM, Jason Ede wrote:
> > > The force expire is run in middle of the night, but the bayes_auto_expire
> 0
> > isn't set. How often does bayes try and do this if this is 1?
> > >
> >
> > just in the middle of when you don't want it to. eg: sorta random
> >
> 
> I've auto_expiry off now and still not flowing as freely as I'd like and backs
> up at busy times with slow queries as before

RE: slow bayes queries using innodb

Posted by Jason Ede <J....@birchenallhowden.co.uk>.
> -----Original Message-----
> From: Michael Scheidell [mailto:michael.scheidell@secnap.com]
> Sent: 25 July 2011 15:44
> To: users@spamassassin.apache.org
> Subject: Re: slow bayes queries using innodb
> 
> On 7/25/11 10:41 AM, Jason Ede wrote:
> > The force expire is run in middle of the night, but the bayes_auto_expire 0
> isn't set. How often does bayes try and do this if this is 1?
> >
> 
> just in the middle of when you don't want it to. eg: sorta random
> 

I've auto_expiry off now and still not flowing as freely as I'd like and backs up at busy times with slow queries as before

Re: slow bayes queries using innodb

Posted by Michael Scheidell <mi...@secnap.com>.
On 7/25/11 10:41 AM, Jason Ede wrote:
> The force expire is run in middle of the night, but the bayes_auto_expire 0 isn't set. How often does bayes try and do this if this is 1?
>

just in the middle of when you don't want it to. eg: sorta random


-- 
Michael Scheidell, CTO
o: 561-999-5000
d: 561-948-2259
 >*| *SECNAP Network Security Corporation

    * Best Mobile Solutions Product of 2011
    * Best Intrusion Prevention Product
    * Hot Company Finalist 2011
    * Best Email Security Product
    * Certified SNORT Integrator

______________________________________________________________________
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.secnap.com/products/spammertrap/
______________________________________________________________________  

RE: slow bayes queries using innodb

Posted by Jason Ede <J....@birchenallhowden.co.uk>.

> -----Original Message-----
> From: Michael Scheidell [mailto:michael.scheidell@secnap.com]
> Sent: 25 July 2011 15:30
> To: users@spamassassin.apache.org
> Subject: Re: slow bayes queries using innodb
> 
> On 7/25/11 10:24 AM, Jason Ede wrote:
> >
> > We've 2 reasonably powerful mail servers handling incoming email and
> > sharing the load. We've moved to a single bayes database (to make
> > training easier) and its stored in mariadb and all of the bayes tables
> > are innodb
> >
> > # mysql -V
> >
> > mysql Ver 14.16 Distrib 5.2.7-MariaDB, for unknown-linux-gnu (x86_64)
> > using readline 5.1
> >
> > The bayes settings are...
> >
> > bayes_store_module Mail::SpamAssassin::BayesStore::MySQL
> >
> > bayes_sql_dsn DBI:mysql:bayes:localhost
> >
> > bayes_sql_username sa_user
> >
> > bayes_sql_password ********
> >
> > bayes_sql_override_username root
> >
> > bayes_expiry_max_db_size 1000000
> >
> missing this:
> 
> bayes_auto_expire 0
> 
> and only run the 'sa-learn -force-expire' late at night, when no one is doing
> anything.
> 
>

The force expire is run in middle of the night, but the bayes_auto_expire 0 isn't set. How often does bayes try and do this if this is 1?

Jason


Re: slow bayes queries using innodb

Posted by Michael Scheidell <mi...@secnap.com>.
On 7/25/11 10:24 AM, Jason Ede wrote:
>
> We’ve 2 reasonably powerful mail servers handling incoming email and 
> sharing the load. We’ve moved to a single bayes database (to make 
> training easier) and its stored in mariadb and all of the bayes tables 
> are innodb
>
> # mysql -V
>
> mysql Ver 14.16 Distrib 5.2.7-MariaDB, for unknown-linux-gnu (x86_64) 
> using readline 5.1
>
> The bayes settings are…
>
> bayes_store_module Mail::SpamAssassin::BayesStore::MySQL
>
> bayes_sql_dsn DBI:mysql:bayes:localhost
>
> bayes_sql_username sa_user
>
> bayes_sql_password ********
>
> bayes_sql_override_username root
>
> bayes_expiry_max_db_size 1000000
>
missing this:

bayes_auto_expire 0

and only run the 'sa-learn –force-expire' late at night, when no one is 
doing anything.



-- 
Michael Scheidell, CTO
o: 561-999-5000
d: 561-948-2259
 >*| *SECNAP Network Security Corporation

    * Best Mobile Solutions Product of 2011
    * Best Intrusion Prevention Product
    * Hot Company Finalist 2011
    * Best Email Security Product
    * Certified SNORT Integrator

______________________________________________________________________
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.secnap.com/products/spammertrap/
______________________________________________________________________