You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Micah Anderson <mi...@riseup.net> on 2007/09/27 02:20:00 UTC

Bayes innodb problems

I was having problems with scalability with my bayes DB, so I read up on
the mailing list and found that it was recommended to switch to the
innodb storage engine because of the row-level locking (versus the
table-level locking that comes with MyISAM). Sounds great. So I
switched, and everything was fine for several days.

Then today the load on the DB server shot up to 11-13 and spam
processing has ground down to really slow. I'm seeing some incredibly
long queries now in my slow-query log, such as:

# Time: 070926 17:10:53
# User@Host: spamass[spamass] @  [10.0.2.4]
# Query_time: 758  Lock_time: 0  Rows_sent: 1  Rows_examined: 2205327
SELECT count(*)
               FROM bayes_token
              WHERE id = '4'
                AND ('1190846660' - atime) > '345600';

This seems really wrong....

Then queries such as the following taking at least 30 seconds:

# Time: 070926 17:13:24
# User@Host: spamass[spamass] @  [10.0.2.4]
# Query_time: 30  Lock_time: 0  Rows_sent: 88  Rows_examined: 88
SELECT RPAD(token, 5, ' '), spam_count, ham_count, atime
                     FROM bayes_token
                    WHERE id = '4'
                      AND token IN
(' <binary data removed here> ')

I'm seeing in my spamd logs the following:
Sep 26 17:17:52 spamd2 spamd[5479]: bayes: expire_old_tokens: child processing timeout at /usr/sbin/spamd line 1246. 
Sep 26 17:17:52 spamd2 spamd[1160]: prefork: child states: BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 
Sep 26 17:17:52 spamd2 spamd[1160]: prefork: server reached --max-children setting, consider raising it 

I've got my --max-children set to 50, and I'm hitting this because the
DB is not responding fast enough.

Did I hit some sort of tipping point with the tokens in my database, do
I have too many or ... what is going on here? I have to turn off bayes
because its too slow and this is sad because this adds a lot to the
results. This is what I have configured:

bayes_store_module                 Mail::SpamAssassin::BayesStore::MySQL
bayes_sql_dsn                      DBI:mysql:bayes:dbw-pn
bayes_sql_username                 spamassassin
bayes_sql_password                 notthepasswd
bayes_sql_override_username        @GLOBAL

# keep the database from getting too big:
bayes_expiry_max_db_size           1000000

# no affect
bayes_learn_to_journal             0

mysql settings related to innodb:

# * InnoDB
innodb_data_file_path = ibdata1:10M:autoextend
#
# Set buffer pool size to 50-80% of your computer's memory
set-variable = innodb_buffer_pool_size=1250M
set-variable = innodb_additional_mem_pool_size=20M
#
# Set the log file size to about 25% of the buffer pool size
set-variable = innodb_log_file_size=313M
set-variable = innodb_log_buffer_size=8M
#
innodb_flush_log_at_trx_commit=1

I'm using spamassassin 3.2.3 and mysql 5.0.45. 

Thanks,
Micah

Re: Bayes innodb problems

Posted by Jari Fredriksson <ja...@iki.fi>.
> On Fri, 28 Sep 2007 at 21:57 +0300, jarif@iki.fi
> confabulated: 
> 
>> 
>> Optimize table does not work with InnoDB.
> 
> Are you sure? An excerpt from the MySQL 5.0 documentation
> found here: 
> 

May be old information what I told. I tried and no errors from optimize table with InnoDB table.

Writing optimize scripts here now;)



Re: Bayes innodb problems

Posted by Duane Hill <d....@yournetplus.com>.
On Fri, 28 Sep 2007 at 21:57 +0300, jarif@iki.fi confabulated:

>> Micah Anderson schrieb am 27.09.2007 02:20:
>>
>>> processing has ground down to really slow. I'm seeing
>>> some incredibly long queries now in my slow-query log,
>>> such as:
>>
>> Try an "optimize table <tabname>" for each of the sa
>> tables. You just filled the database from scratch, so
>> perhaps the counters/statistics do not reflect the actual
>> value distribution yet.
>
> Optimize table does not work with InnoDB.

Are you sure? An excerpt from the MySQL 5.0 documentation found here:

   http://dev.mysql.com/doc/refman/5.0/en/optimize-table.html

   "OPTIMIZE TABLE works only for MyISAM, InnoDB, and (as of MySQL 5.0.16)
    ARCHIVE tables. It does not work for tables created using any other
    storage engine."

I've never tried it myself as I've been using MyISAM without issues in a 
per account set up.

------
   _|_
  (_| |

Re: Bayes innodb problems

Posted by Nigel Frankcom <ni...@blue-canoe.com>.
On Sat, 29 Sep 2007 03:24:17 +0200, Alex Woick <al...@wombaz.de> wrote:

>>>> processing has ground down to really slow. I'm seeing
>>>> some incredibly long queries now in my slow-query log,
>>>> such as: 
>>> Try an "optimize table <tabname>" for each of the sa
>>> tables. You just filled the database from scratch, so
>>> perhaps the counters/statistics do not reflect the actual
>>> value distribution yet. 
>> 
>> Optimize table does not work with InnoDB.
>
>Surely it does. According to the Mysql documentation, it defragments the 
>indexes (it probably rebuilds them) and it updates the index statistics.

I use the MySQL Tools to handle compacting and repair etc. For other
jobs I use DBTools. Neither is perfect, but between them they get the
job done.

I do most of my admin from windows  but afaik the MySQL tools are
cross platform.

Despite the above, I was (perhaps mistakenly) under the impression
that the daily admin for bayes handled compaction etc.

Perhaps your problem lies elsewhere in your system? I know form ages
back that lack of a PTR for the SQL server can slow things
significantly if your DB is on a different box to your SA.

HTH

Nigel

Re: Bayes innodb problems

Posted by Alex Woick <al...@wombaz.de>.
>>> processing has ground down to really slow. I'm seeing
>>> some incredibly long queries now in my slow-query log,
>>> such as: 
>> Try an "optimize table <tabname>" for each of the sa
>> tables. You just filled the database from scratch, so
>> perhaps the counters/statistics do not reflect the actual
>> value distribution yet. 
> 
> Optimize table does not work with InnoDB.

Surely it does. According to the Mysql documentation, it defragments the 
indexes (it probably rebuilds them) and it updates the index statistics.

Re: Bayes innodb problems

Posted by Jari Fredriksson <ja...@iki.fi>.
> Micah Anderson schrieb am 27.09.2007 02:20:
> 
>> processing has ground down to really slow. I'm seeing
>> some incredibly long queries now in my slow-query log,
>> such as: 
> 
> Try an "optimize table <tabname>" for each of the sa
> tables. You just filled the database from scratch, so
> perhaps the counters/statistics do not reflect the actual
> value distribution yet. 

Optimize table does not work with InnoDB.



Re: Bayes innodb problems

Posted by Alex Woick <al...@wombaz.de>.
>>> processing has ground down to really slow. I'm seeing some incredibly
>>> long queries now in my slow-query log, such as:

>> Try an "optimize table <tabname>" for each of the sa tables. You just 
>> filled the database from scratch, so perhaps the counters/statistics do not 
>> reflect the actual value distribution yet.

> Actually this bayes DB has been around for a few months, and has been
> built up over time.

I experienced a problem with a custom innodb-stored database, where at 
one point a certain query suddenly took minutes instead of microseconds. 
Using EXPLAIN, I saw that indexes were used not as intended any more. I 
changed the query to include a FORCE INDEX clause, but later detected 
that an OPTIMIZE TABLE remedied the situation also. I saw this on two 
different databases that were built up over time from totally empty to 
many million records.
OPTIMIZE TABLE on innodb tables only defragments the index and updates 
the index statistics.

> This does make me wonder what regular DB maintenance tasks should be
> performed on the bayes DB.  It sounds like some people let the code
> auto-expire, while some run cron jobs to expire data?

I added a TIMESTAMP column to the bayes_seen table, so I can expire this 
table by date. The other tables are maintained internally by SA, so 
don't do anything with them.

> Should I be running an optimize table every so often?

No, I don't think so. In my above mentioned databases, the problem never 
came back, and I didn't run optimize table since then (more than 1 year 
ago).

> mysql> explain SELECT count(*) FROM bayes_token WHERE id = '4' AND
> ('1190846660' - atime) > '345600';
> +----+-------------+-------------+------+--------------------------+------------------+---------+-------+--------+--------------------------+
> | id | select_type | table       | type | possible_keys            | key
> | key_len | ref   | rows   | Extra                    |
> +----+-------------+-------------+------+--------------------------+------------------+---------+-------+--------+--------------------------+
> |  1 | SIMPLE      | bayes_token | ref  | PRIMARY,bayes_token_idx2 |
> bayes_token_idx2 | 2       | const | 229946 | Using where; Using index | 
> +----+-------------+-------------+------+--------------------------+------------------+---------+-------+--------+--------------------------+

The "2" in the key_len column makes me wonder. That means only 2 bytes 
of the index are used. The index at my system includes id (INTEGER=4 
bytes) and atime (INTEGER=4 bytes). 2 Bytes would be half of the id 
field, which is impossible. The key_len field always shows 4 at my 
system, so it uses the id part (the first half) of the index for lookup. 
Have you changed id to smallint? Then you have altered the table which 
has the same (side-)effect as optimize table. I don't think truncating 
the field to 2 bytes did the speedup, I think the side-effect of 
recreating the index did it.

Re: Bayes innodb problems

Posted by Micah Anderson <mi...@riseup.net>.
* Alex Woick <al...@wombaz.de> [070927 02:14]:
> Micah Anderson schrieb am 27.09.2007 02:20:
>
>> processing has ground down to really slow. I'm seeing some incredibly
>> long queries now in my slow-query log, such as:
>
> Try an "optimize table <tabname>" for each of the sa tables. You just 
> filled the database from scratch, so perhaps the counters/statistics do not 
> reflect the actual value distribution yet.

Actually this bayes DB has been around for a few months, and has been
built up over time. 

This does make me wonder what regular DB maintenance tasks should be
performed on the bayes DB. It sounds like some people let the code
auto-expire, while some run cron jobs to expire data? What are the
benefits of each? Should I be running an optimize table every so often?

>> # Time: 070926 17:10:53
>> # User@Host: spamass[spamass] @  [10.0.2.4]
>> # Query_time: 758  Lock_time: 0  Rows_sent: 1  Rows_examined: 2205327
>> SELECT count(*)
>>                FROM bayes_token
>>               WHERE id = '4'
>>                 AND ('1190846660' - atime) > '345600';
>
> More than 10 minutes for counting 2 mio rows is a bit long. You can try to 
> look what Mysql is doing all the time. Execute a "show full processlist" 
> from a mysql command line while the above query is running and look at the 
> "State" column. If a SA-initiated query is waiting for a lock and actually 
> doing nothing, you should see it there. You also see all the other queries 
> that are currently running at this point and may be hogging the database 
> server.

Since I've adjusted the SQL query to use the index, I haven't seen this
problem, so I can't look at the State column to see what is going on.
This DB server isn't doing anything else, for any other database, so
there was no possibility of other things hogging the resources on the
server. 

> The database design and query design of Spamassassin is ok, even the 
> appearently non-indexable term "('1190846660' - atime) > '345600'", since 
> Mysql would not use the index on an optimized term anyway. Try an EXPLAIN 
> of this statement - Mysql will always use only the first half for lookup (4 
> bytes) of the index, which covers only the id part.

That is if I am optimizing...

mysql> explain SELECT count(*) FROM bayes_token WHERE id = '4' AND
('1190846660' - atime) > '345600';
+----+-------------+-------------+------+--------------------------+------------------+---------+-------+--------+--------------------------+
| id | select_type | table       | type | possible_keys            | key
| key_len | ref   | rows   | Extra                    |
+----+-------------+-------------+------+--------------------------+------------------+---------+-------+--------+--------------------------+
|  1 | SIMPLE      | bayes_token | ref  | PRIMARY,bayes_token_idx2 |
bayes_token_idx2 | 2       | const | 229946 | Using where; Using index | 
+----+-------------+-------------+------+--------------------------+------------------+---------+-------+--------+--------------------------+

>> innodb_flush_log_at_trx_commit=1
>
> Use value 0 for more performance and a small sacrifice of safety. See the 
> comment in the default *.ini file:

Mine doesn't have a comment... but looking at
http://dev.mysql.com/doc/refman/5.0/en/innodb-parameters.html does lead me
to want to change this since I dont care about transaction-level ACID 
compliance with the bayes database, if I have issues with that DB, I
can always restore the backup from the day before.

Micah

Re: Bayes innodb problems

Posted by Alex Woick <al...@wombaz.de>.
Micah Anderson schrieb am 27.09.2007 02:20:

> processing has ground down to really slow. I'm seeing some incredibly
> long queries now in my slow-query log, such as:

Try an "optimize table <tabname>" for each of the sa tables. You just 
filled the database from scratch, so perhaps the counters/statistics do 
not reflect the actual value distribution yet.


> # Time: 070926 17:10:53
> # User@Host: spamass[spamass] @  [10.0.2.4]
> # Query_time: 758  Lock_time: 0  Rows_sent: 1  Rows_examined: 2205327
> SELECT count(*)
>                FROM bayes_token
>               WHERE id = '4'
>                 AND ('1190846660' - atime) > '345600';

More than 10 minutes for counting 2 mio rows is a bit long. You can try 
to look what Mysql is doing all the time. Execute a "show full 
processlist" from a mysql command line while the above query is running 
and look at the "State" column. If a SA-initiated query is waiting for a 
lock and actually doing nothing, you should see it there. You also see 
all the other queries that are currently running at this point and may 
be hogging the database server.

The database design and query design of Spamassassin is ok, even the 
appearently non-indexable term "('1190846660' - atime) > '345600'", 
since Mysql would not use the index on an optimized term anyway. Try an 
EXPLAIN of this statement - Mysql will always use only the first half 
for lookup (4 bytes) of the index, which covers only the id part.

> innodb_flush_log_at_trx_commit=1

Use value 0 for more performance and a small sacrifice of safety. See 
the comment in the default *.ini file:

# If set to 1, InnoDB will flush (fsync) the transaction logs to the
# disk at each commit, which offers full ACID behavior. If you are
# willing to compromise this safety, and you are running small
# transactions, you may set this to 0 or 2 to reduce disk I/O to the
# logs. Value 0 means that the log is only written to the log file and
# the log file flushed to disk approximately once per second. Value 2
# means the log is written to the log file at each commit, but the log
# file is only flushed to disk approximately once per second.
innodb_flush_log_at_trx_commit=0

Re: Bayes innodb problems

Posted by micah <mi...@riseup.net>.
* Michael Parker <pa...@pobox.com> [070926 21:14]:
> micah wrote:
> > On Wed, 26 Sep 2007 17:54:05 -0700, John D. Hardin wrote:
> > 
> >> On Wed, 26 Sep 2007, Micah Anderson wrote:
> >>
> >>> SELECT count(*)
> >>>                FROM bayes_token
> >>>               WHERE id = '4'
> >>>                 AND ('1190846660' - atime) > '345600';
> >> Who the hell wrote *that* query? Is MySQL smart enough to rearrange that
> >> equation to give an indexable comparison?
> > 
> > That comes from /usr/share/perl5/Mail/SpamAssassin/BayesStore/SQL.pl line 
> > 243. It seems to calculate the expire delta, but in a way that can't use 
> > an index.
> > 
> > Maybe that query should be changed from:
> > 
> > AND (? - atime) > ?" 
> > 
> > to:
> > 
> > AND atime < ? + ?"
> > 
> 
> Can someone please open up a Bugzilla bug for this so it can be tracked?

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5661

Micah

Re: Bayes innodb problems

Posted by Michael Parker <pa...@pobox.com>.
micah wrote:
> On Wed, 26 Sep 2007 17:54:05 -0700, John D. Hardin wrote:
> 
>> On Wed, 26 Sep 2007, Micah Anderson wrote:
>>
>>> SELECT count(*)
>>>                FROM bayes_token
>>>               WHERE id = '4'
>>>                 AND ('1190846660' - atime) > '345600';
>> Who the hell wrote *that* query? Is MySQL smart enough to rearrange that
>> equation to give an indexable comparison?
> 
> That comes from /usr/share/perl5/Mail/SpamAssassin/BayesStore/SQL.pl line 
> 243. It seems to calculate the expire delta, but in a way that can't use 
> an index.
> 
> Maybe that query should be changed from:
> 
> AND (? - atime) > ?" 
> 
> to:
> 
> AND atime < ? + ?"
> 

Can someone please open up a Bugzilla bug for this so it can be tracked?

Thanks
Michael

Re: Bayes innodb problems

Posted by Henrik Krohns <he...@hege.li>.
On Thu, Sep 27, 2007 at 11:01:11AM +0200, Mark Martinec wrote:
> > > >> >> SELECT count(*)
> > > >> >>                FROM bayes_token
> > > >> >>               WHERE id = '4'
> > > >> >>                 AND ('1190846660' - atime) > '345600';
> > > >> > Who the hell wrote *that* query? Is MySQL smart enough to rearrange
> > > >> > that equation to give an indexable comparison?
> 
> It doesn't seem to make any difference in timing, seems the
> expression optimizer is smart enough. But it is ugly indeed.

MySQL EXPLAIN does show a lot less rows processed. So it probably does make
a difference, but only if you don't have everything in cache already..


Re: Bayes innodb problems

Posted by Mark Martinec <Ma...@ijs.si>.
> > >> >> SELECT count(*)
> > >> >>                FROM bayes_token
> > >> >>               WHERE id = '4'
> > >> >>                 AND ('1190846660' - atime) > '345600';
> > >> > Who the hell wrote *that* query? Is MySQL smart enough to rearrange
> > >> > that equation to give an indexable comparison?

It doesn't seem to make any difference in timing, seems the
expression optimizer is smart enough. But it is ugly indeed.

On Thursday September 27 2007 07:22:31 Henrik Krohns wrote:
> > > I think you mean:
> > > AND atime < ? - ?

Actually both parameters are constants during execution of a select,
so the following would do just fine:

Index: lib/Mail/SpamAssassin/BayesStore/SQL.pm
===================================================================
--- lib/Mail/SpamAssassin/BayesStore/SQL.pm	(revision 579950)
+++ lib/Mail/SpamAssassin/BayesStore/SQL.pm	(working copy)
@@ -241,7 +241,7 @@
   my $sql = "SELECT count(*)
                FROM bayes_token
               WHERE id = ?
-                AND (? - atime) > ?";
+                AND atime < ?";
 
   my $sth = $self->{_dbh}->prepare_cached($sql);
     
@@ -251,7 +251,7 @@
   }
 
   for (my $i = 1; $i <= $max_expire_mult; $i<<=1) {
-    my $rc = $sth->execute($self->{_userid}, $newest_atime, $start * $i);
+    my $rc = $sth->execute($self->{_userid}, $newest_atime - $start * $i);
 
     unless ($rc) {
       dbg("bayes: calculate_expire_delta: SQL error: ".$self->{_dbh}->errstr());


Mark

Re: Bayes innodb problems

Posted by micah <mi...@riseup.net>.
On Thu, 27 Sep 2007 05:11:09 +0000, micah wrote:

> On Thu, 27 Sep 2007 07:48:02 +0300, Henrik Krohns wrote:
> 
>> On Thu, Sep 27, 2007 at 03:27:51AM +0000, micah wrote:
>>> On Wed, 26 Sep 2007 17:54:05 -0700, John D. Hardin wrote:
>>> 
>>> > On Wed, 26 Sep 2007, Micah Anderson wrote:
>>> > 
>>> >> SELECT count(*)
>>> >>                FROM bayes_token
>>> >>               WHERE id = '4'
>>> >>                 AND ('1190846660' - atime) > '345600';
>>> > 
>>> > Who the hell wrote *that* query? Is MySQL smart enough to rearrange
>>> > that equation to give an indexable comparison?
>>> 
>>> That comes from /usr/share/perl5/Mail/SpamAssassin/BayesStore/SQL.pl
>>> line 243. It seems to calculate the expire delta, but in a way that
>>> can't use an index.
>>> 
>>> Maybe that query should be changed from:
>>> 
>>> AND (? - atime) > ?"
> 
> returns: 1039123

Sorry, I am wrong... I changed the > in the original query to < when I 
did this, I am wrong.

>> AND atime < ? - ?

This is right, apologies for the confusion.

Micah


Re: Bayes innodb problems

Posted by Henrik Krohns <he...@hege.li>.
On Thu, Sep 27, 2007 at 08:46:36AM -0700, John D. Hardin wrote:
> On Thu, 27 Sep 2007, Henrik Krohns wrote:
> 
> > mysql> SELECT count(*) FROM bayes_token WHERE id = '1' AND (1190870335 - atime) > 345600;
> > +----------+
> > | count(*) |
> > +----------+
> > |  1710591 |
> > +----------+
> > 1 row in set (5.69 sec)
> > 
> > mysql> SELECT count(*) FROM bayes_token WHERE id = '1' AND atime < 1190870335 - 345600;
> > +----------+
> > | count(*) |
> > +----------+
> > |  1710582 |
> > +----------+
> > 1 row in set (5.72 sec)
> 
> >From those performance numbers it looks like there is not an index on 
> the atime column...

There is. But now that I tested, it's useless here. All the data is cached,
so the time is pretty much identical. Expiry run for 2.5M tokens takes a
minute here, so I will drop the index to get rid of the maintenance
overhead.


Re: Bayes innodb problems

Posted by Henrik Krohns <he...@hege.li>.
On Thu, Sep 27, 2007 at 05:28:11AM +0000, micah wrote:
> 
> No, what I did was screw up the operator... 
> 
> what I meant to originally say was it should be changed to this:
> 
> AND atime > ? + ? 

But this could not work either.. ;)


Re: Bayes innodb problems

Posted by micah <mi...@riseup.net>.
On Thu, 27 Sep 2007 08:22:31 +0300, Henrik Krohns wrote:

> On Thu, Sep 27, 2007 at 05:11:09AM +0000, micah wrote:
>> On Thu, 27 Sep 2007 07:48:02 +0300, Henrik Krohns wrote:
>> 
>> > On Thu, Sep 27, 2007 at 03:27:51AM +0000, micah wrote:
>> >> On Wed, 26 Sep 2007 17:54:05 -0700, John D. Hardin wrote:
>> >> 
>> >> > On Wed, 26 Sep 2007, Micah Anderson wrote:
>> >> > 
>> >> >> SELECT count(*)
>> >> >>                FROM bayes_token
>> >> >>               WHERE id = '4'
>> >> >>                 AND ('1190846660' - atime) > '345600';
>> >> > 
>> >> > Who the hell wrote *that* query? Is MySQL smart enough to
>> >> > rearrange that equation to give an indexable comparison?
>> >> 
>> >> That comes from /usr/share/perl5/Mail/SpamAssassin/BayesStore/SQL.pl
>> >> line 243. It seems to calculate the expire delta, but in a way that
>> >> can't use an index.
>> >> 
>> >> Maybe that query should be changed from:
>> >> 
>> >> AND (? - atime) > ?"
>> 
>> returns: 1039123
>> 
>> 
>> >> to:
>> >> 
>> >> AND atime < ? + ?"
>> 
>> returns: 1039231
>>  
>> > I think you mean:
>> > 
>> > AND atime < ? - ?
>> 
>> returns: 0
> 
> If you think about it, minus is the logical way. You must have some
> strange atime values.

No, what I did was screw up the operator... 

what I meant to originally say was it should be changed to this:

AND atime > ? + ? 

but I switched the operator around, and you corrected my math, by keeping 
the operator:

AND atime < ? - ?

which is functionally the same... although I think the first one is 
better because it keeps the operator the same as the original query.

micah


Re: Bayes innodb problems

Posted by "John D. Hardin" <jh...@impsec.org>.
On Thu, 27 Sep 2007, Henrik Krohns wrote:

> mysql> SELECT count(*) FROM bayes_token WHERE id = '1' AND (1190870335 - atime) > 345600;
> +----------+
> | count(*) |
> +----------+
> |  1710591 |
> +----------+
> 1 row in set (5.69 sec)
> 
> mysql> SELECT count(*) FROM bayes_token WHERE id = '1' AND atime < 1190870335 - 345600;
> +----------+
> | count(*) |
> +----------+
> |  1710582 |
> +----------+
> 1 row in set (5.72 sec)

>From those performance numbers it looks like there is not an index on 
the atime column...

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  Pelley: Will you pledge not to test a nuclear weapon?
  Ahmadeinejad: CIA! Secret prison in Europe! Abu Ghraib!
                   -- Mahmoud Ahmadeinejad clumsily dodges a question
                                    (60 minutes interview, 9/20/2007)
-----------------------------------------------------------------------
 241 days until the Mars Phoenix lander arrives at Mars


Re: Bayes innodb problems

Posted by Henrik Krohns <he...@hege.li>.
On Thu, Sep 27, 2007 at 05:11:09AM +0000, micah wrote:
> On Thu, 27 Sep 2007 07:48:02 +0300, Henrik Krohns wrote:
> 
> > On Thu, Sep 27, 2007 at 03:27:51AM +0000, micah wrote:
> >> On Wed, 26 Sep 2007 17:54:05 -0700, John D. Hardin wrote:
> >> 
> >> > On Wed, 26 Sep 2007, Micah Anderson wrote:
> >> > 
> >> >> SELECT count(*)
> >> >>                FROM bayes_token
> >> >>               WHERE id = '4'
> >> >>                 AND ('1190846660' - atime) > '345600';
> >> > 
> >> > Who the hell wrote *that* query? Is MySQL smart enough to rearrange
> >> > that equation to give an indexable comparison?
> >> 
> >> That comes from /usr/share/perl5/Mail/SpamAssassin/BayesStore/SQL.pl
> >> line 243. It seems to calculate the expire delta, but in a way that
> >> can't use an index.
> >> 
> >> Maybe that query should be changed from:
> >> 
> >> AND (? - atime) > ?"
> 
> returns: 1039123
> 
> >> 
> >> to:
> >> 
> >> AND atime < ? + ?"
> 
> returns: 1039231
>  
> > I think you mean:
> > 
> > AND atime < ? - ?
> 
> returns: 0

If you think about it, minus is the logical way. You must have some strange atime values.

mysql> SELECT count(*) FROM bayes_token WHERE id = '1' AND (1190870335 - atime) > 345600;
+----------+
| count(*) |
+----------+
|  1710591 |
+----------+
1 row in set (5.69 sec)

mysql> SELECT count(*) FROM bayes_token WHERE id = '1' AND atime < 1190870335 - 345600;
+----------+
| count(*) |
+----------+
|  1710582 |
+----------+
1 row in set (5.72 sec)


Re: Bayes innodb problems

Posted by micah <mi...@riseup.net>.
On Thu, 27 Sep 2007 07:48:02 +0300, Henrik Krohns wrote:

> On Thu, Sep 27, 2007 at 03:27:51AM +0000, micah wrote:
>> On Wed, 26 Sep 2007 17:54:05 -0700, John D. Hardin wrote:
>> 
>> > On Wed, 26 Sep 2007, Micah Anderson wrote:
>> > 
>> >> SELECT count(*)
>> >>                FROM bayes_token
>> >>               WHERE id = '4'
>> >>                 AND ('1190846660' - atime) > '345600';
>> > 
>> > Who the hell wrote *that* query? Is MySQL smart enough to rearrange
>> > that equation to give an indexable comparison?
>> 
>> That comes from /usr/share/perl5/Mail/SpamAssassin/BayesStore/SQL.pl
>> line 243. It seems to calculate the expire delta, but in a way that
>> can't use an index.
>> 
>> Maybe that query should be changed from:
>> 
>> AND (? - atime) > ?"

returns: 1039123

>> 
>> to:
>> 
>> AND atime < ? + ?"

returns: 1039231
 
> I think you mean:
> 
> AND atime < ? - ?

returns: 0
 
> It does make a difference in rows checked, 129266 vs 1280748.

It seems like the one that returns the same result as the original is the 
right one here, not the one that returns zero results.

micah

ps - yes the difference between the first and second result are not the 
same, but thats due to the atime changing, they are much more similar 
than 1039123 is to 0.


Re: Bayes innodb problems

Posted by Henrik Krohns <he...@hege.li>.
On Thu, Sep 27, 2007 at 03:27:51AM +0000, micah wrote:
> On Wed, 26 Sep 2007 17:54:05 -0700, John D. Hardin wrote:
> 
> > On Wed, 26 Sep 2007, Micah Anderson wrote:
> > 
> >> SELECT count(*)
> >>                FROM bayes_token
> >>               WHERE id = '4'
> >>                 AND ('1190846660' - atime) > '345600';
> > 
> > Who the hell wrote *that* query? Is MySQL smart enough to rearrange that
> > equation to give an indexable comparison?
> 
> That comes from /usr/share/perl5/Mail/SpamAssassin/BayesStore/SQL.pl line 
> 243. It seems to calculate the expire delta, but in a way that can't use 
> an index.
> 
> Maybe that query should be changed from:
> 
> AND (? - atime) > ?" 
> 
> to:
> 
> AND atime < ? + ?"

I think you mean:

AND atime < ? - ?

It does make a difference in rows checked, 129266 vs 1280748.


Re: Bayes innodb problems

Posted by micah <mi...@riseup.net>.
On Wed, 26 Sep 2007 17:54:05 -0700, John D. Hardin wrote:

> On Wed, 26 Sep 2007, Micah Anderson wrote:
> 
>> SELECT count(*)
>>                FROM bayes_token
>>               WHERE id = '4'
>>                 AND ('1190846660' - atime) > '345600';
> 
> Who the hell wrote *that* query? Is MySQL smart enough to rearrange that
> equation to give an indexable comparison?

That comes from /usr/share/perl5/Mail/SpamAssassin/BayesStore/SQL.pl line 
243. It seems to calculate the expire delta, but in a way that can't use 
an index.

Maybe that query should be changed from:

AND (? - atime) > ?" 

to:

AND atime < ? + ?"

does that make sense? The ? gets replaced by variables later (its a 
prepared statement), namely $newest_atime and $start * $i, so this change 
would make it so it can utilize the index, thus examining less rows 
(which could be locked) because mysql can't look up $newest_atime - atime 
from the index because that value is made up, so it uses the "id" portion 
of the index on (id, atime)... so if it doesn't have to calculate the 
value for $newest_atime + $something for every row and only needs to 
check the index then it would be much faster.

Micah

ps - nils from #mysql is really the genius behind this information, not me


Re: Bayes innodb problems

Posted by "John D. Hardin" <jh...@impsec.org>.
On Wed, 26 Sep 2007, Micah Anderson wrote:

> SELECT count(*)
>                FROM bayes_token
>               WHERE id = '4'
>                 AND ('1190846660' - atime) > '345600';

Who the hell wrote *that* query? Is MySQL smart enough to rearrange 
that equation to give an indexable comparison?

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  Pelley: Will you pledge not to test a nuclear weapon?
  Ahmadeinejad: CIA! Secret prison in Europe! Abu Ghraib!
                   -- Mahmoud Ahmadeinejad clumsily dodges a question
                                    (60 minutes interview, 9/20/2007)
-----------------------------------------------------------------------
 242 days until the Mars Phoenix lander arrives at Mars