You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Paweł Tęcza <pt...@uw.edu.pl> on 2007/08/28 12:16:22 UTC

InnoDB as storage engine for sa_bayes

Hello Spamassassins! ;)

A few weeks ago I had problems with a capacity of my MySQL 5.0.38
server with sa_bayes database stored in MyISAM when it was handling
a lot of SQL queries from my Spamassassin cluster. The only one
solution was to disable using Bayes.

I wrote about my problems here and I heard many useful advices.
One of them was to convert my sa_bayes database from MyISAM to
InnoDB storage engine.

I didn't have any experiences with InnoDB, so I had to learn it.
Now I know more about it, but I still have a few doubts...

Below you can see details about a copy of my old sa_bayes
database with MyISAM:

# ls -lh sa_bayes/
total 809M
-rw-r----- 1 ptecza ptecza 8,5K 2007-08-16 15:29 awl.frm
-rw-r----- 1 ptecza ptecza 139M 2007-08-16 15:29 awl.MYD
-rw-r----- 1 ptecza ptecza 112M 2007-08-16 15:29 awl.MYI
-rw-r----- 1 ptecza ptecza 8,4K 2007-08-16 15:29 bayes_expire.frm
-rw-r----- 1 ptecza ptecza  207 2007-08-16 15:29 bayes_expire.MYD
-rw-r----- 1 ptecza ptecza 2,0K 2007-08-16 15:29 bayes_expire.MYI
-rw-r----- 1 ptecza ptecza 8,4K 2007-08-16 15:29 bayes_global_vars.frm
-rw-r----- 1 ptecza ptecza   20 2007-08-16 15:29 bayes_global_vars.MYD
-rw-r----- 1 ptecza ptecza 2,0K 2007-08-16 15:29 bayes_global_vars.MYI
-rw-r----- 1 ptecza ptecza 8,5K 2007-08-16 15:29 bayes_seen.frm
-rw-r----- 1 ptecza ptecza 213M 2007-08-16 15:29 bayes_seen.MYD
-rw-r----- 1 ptecza ptecza 278M 2007-08-16 15:29 bayes_seen.MYI
-rw-r----- 1 ptecza ptecza 8,5K 2007-08-16 15:29 bayes_token.frm
-rw-r----- 1 ptecza ptecza  24M 2007-08-16 15:29 bayes_token.MYD
-rw-r----- 1 ptecza ptecza  44M 2007-08-16 15:29 bayes_token.MYI
-rw-r----- 1 ptecza ptecza 8,8K 2007-08-16 15:29 bayes_vars.frm
-rw-r----- 1 ptecza ptecza   52 2007-08-16 15:29 bayes_vars.MYD
-rw-r----- 1 ptecza ptecza 3,0K 2007-08-16 15:29 bayes_vars.MYI
-rw-r----- 1 ptecza ptecza   65 2007-08-16 15:29 db.opt

Here are details about a new sa_bayes database with InnoDB:

ls -lh ib*
-rw-rw---- 1 mysql mysql   10M Aug 28 08:25 ib_logfile0
-rw-rw---- 1 mysql mysql   10M Aug 27 21:42 ib_logfile1
-rw-rw---- 1 mysql mysql 1010M Aug 28 08:25 ibdata1

# ls -lh sa_bayes/
total 882M
-rw-rw---- 1 mysql mysql 8.5K Aug 27 15:28 awl.frm
-rw-rw---- 1 mysql mysql 264M Aug 27 17:09 awl.ibd
-rw-rw---- 1 mysql mysql 8.4K Aug 27 17:08 bayes_expire.frm
-rw-rw---- 1 mysql mysql 112K Aug 28 08:25 bayes_expire.ibd
-rw-rw---- 1 mysql mysql 8.4K Aug 27 17:08 bayes_global_vars.frm
-rw-rw---- 1 mysql mysql  96K Aug 27 17:09 bayes_global_vars.ibd
-rw-rw---- 1 mysql mysql 8.5K Aug 27 17:08 bayes_seen.frm
-rw-rw---- 1 mysql mysql 468M Aug 27 21:11 bayes_seen.ibd
-rw-rw---- 1 mysql mysql 8.5K Aug 27 21:09 bayes_token.frm
-rw-rw---- 1 mysql mysql 148M Aug 27 21:43 bayes_token.ibd
-rw-rw---- 1 mysql mysql 8.8K Aug 27 21:42 bayes_vars.frm
-rw-rw---- 1 mysql mysql 112K Aug 28 08:25 bayes_vars.ibd
-rw-rw---- 1 mysql mysql   65 Aug 27 15:23 db.opt

It has exactly the same content like old database and it was
simply injected from MySQL dump.

As you can see above, the new storage engine consumed 2 times
bigger diskspace then the old. Is it a good behave or I should
feel worried?

Could you please tell me what the size of your sa_bayes
database with InnoDB is? What diskspace should I reserve?

Probably you would like to know my InnoDB settings too:

# grep ^innodb /etc/mysql/my.cnf
innodb_data_file_path=ibdata1:10M:autoextend
innodb_autoextend_increment=10M
innodb_file_per_table
innodb_buffer_pool_size=60M
innodb_additional_mem_pool_size=5M
innodb_log_files_in_group=2
innodb_fast_shutdown=1
innodb_log_file_size=10M
innodb_log_buffer_size=5M
innodb_flush_log_at_trx_commit=1
innodb_lock_wait_timeout=25

I agree that a size of buffers is not too big, but it's only
my testing box, not a production machine.

My best regards,

Pawel

Re: InnoDB as storage engine for sa_bayes

Posted by Paweł Tęcza <pt...@uw.edu.pl>.
"Paweł Tęcza" <pt...@uw.edu.pl> writes:

> Alex Woick <al...@wombaz.de> writes:

>> Perhaps you played around and first imported the data without
>> innodb_file_per_table, which imported into ibdata1. Then you perhaps
>> dropped the tables and defined innodb_file_per table and imported
>> again, so the *.ibd files were created and filled. The ibdata1 may now
>> be empty, but it will never shrink.
>
> I remember that before injecting the MySQL dump I removed all
> ib* files and created initial InnoDB tablespace and logs running
> mysqld from command line:

I've forgotten to add that my sa_bayes database was empty
before injecting of MySQL dump, because firstly I dropped it
and next created.

Kind regards,

Pawel

Re: InnoDB as storage engine for sa_bayes

Posted by Paweł Tęcza <pt...@uw.edu.pl>.
Alex Woick <al...@wombaz.de> writes:

>> -rw-rw---- 1 mysql mysql 1010M Aug 28 08:25 ibdata1
>
>> -rw-rw---- 1 mysql mysql 264M Aug 27 17:09 awl.ibd
>> -rw-rw---- 1 mysql mysql 112K Aug 28 08:25 bayes_expire.ibd
>> -rw-rw---- 1 mysql mysql  96K Aug 27 17:09 bayes_global_vars.ibd
>> -rw-rw---- 1 mysql mysql 468M Aug 27 21:11 bayes_seen.ibd
>> -rw-rw---- 1 mysql mysql 148M Aug 27 21:43 bayes_token.ibd
>> -rw-rw---- 1 mysql mysql 112K Aug 28 08:25 bayes_vars.ibd
>
>> As you can see above, the new storage engine consumed 2 times
>> bigger diskspace then the old. Is it a good behave or I should
>> feel worried?
>
> Nothing to worry. But you have perhaps imported your data twice and have an
> empty ibdata1 file which only occupies space.

Hello Alex,

At first, thanks a lot for your reply and interesting comments! :)

> I quoted the innodb data files. Since you have defined
> innodb_file_per_table, the table data is saved into the *.ibd files in
> the database directory. Without that option all table data would go to
> the ibdata* file(s) in the base data directory. As far as I know, data
> for one table is saved either in ibdata* or in the *.ibd file, but not
> both.

I can quote innodb_data_file_path option to test it, of course,
but I affraid that it's necessary. The MySQL doc [1] says:

"Note: InnoDB always needs the shared tablespace because it puts its
internal data dictionary and undo logs there. The .ibd files are not
sufficient for InnoDB to operate."

> Perhaps you played around and first imported the data without
> innodb_file_per_table, which imported into ibdata1. Then you perhaps
> dropped the tables and defined innodb_file_per table and imported
> again, so the *.ibd files were created and filled. The ibdata1 may now
> be empty, but it will never shrink.

I remember that before injecting the MySQL dump I removed all
ib* files and created initial InnoDB tablespace and logs running
mysqld from command line:

root@mysql:/var/lib/mysql# /usr/sbin/mysqld
InnoDB: The first specified data file ./ibdata1 did not exist:
InnoDB: a new database to be created!
070827 15:25:01  InnoDB: Setting file ./ibdata1 size to 10 MB
InnoDB: Database physically writes the file full: wait...
070827 15:25:01  InnoDB: Log file ./ib_logfile0 did not exist: new to be created
InnoDB: Setting log file ./ib_logfile0 size to 10 MB
InnoDB: Database physically writes the file full: wait...
070827 15:25:01  InnoDB: Log file ./ib_logfile1 did not exist: new to be created
InnoDB: Setting log file ./ib_logfile1 size to 10 MB
InnoDB: Database physically writes the file full: wait...
InnoDB: Doublewrite buffer not found: creating new
InnoDB: Doublewrite buffer created
InnoDB: Creating foreign key constraint system tables
InnoDB: Foreign key constraint system tables created
070827 15:25:02  InnoDB: Started; log sequence number 0 0
070827 15:25:02 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.0.38-Ubuntu_0ubuntu1-log'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  Ubuntu 7.04 distribution

While injecting I could see how the ibdata1 file was growing
from 10MB to 1010MB.

> Try the following: Dump all databases which have innodb tables and
> drop all innodb tables. Stop the server, remove the ibdata1 and *.ibd
> files and restart the server. An empty and small ibdata1 file will be
> recreated. Now import your databases. I bet the ibdata1 file will not
> grow and all data will be imported into *.ibd.

Please look at below. What have I won in the bet? ;)

> It is not neccessary to dump/reload the data for changing the database
> engine of a table. Simply edit the tables with Mysql Query Browser or
> the Mysql Administrator and change the table engine from myisam to
> innodb. Or execute an SQL statement: "ALTER TABLE mytable
> ENGINE=innodb".

Yes, I know it and even I was trying to convert the MyISAM tables
in that way, but it was terrible slowly, so I chose a method
with injecting MySQL dump. Unfortunately it wasn't faster ;)

root@mysql:/var/lib/mysql# date && mysql sa_bayes -u root -p < ~/sa_bayes_innodb.sql && date
Mon Aug 27 15:28:40 CEST 2007
Enter password:
Mon Aug 27 21:42:45 CEST 2007
root@mysql:/var/lib/mysql#

You can see above that it took more then 6 hours for ~560MB
dump file!

My best regards,

Pawel


[1] http://dev.mysql.com/doc/refman/5.0/en/multiple-tablespaces.html

Re: InnoDB as storage engine for sa_bayes

Posted by Alex Woick <al...@wombaz.de>.
 > -rw-rw---- 1 mysql mysql 1010M Aug 28 08:25 ibdata1

 > -rw-rw---- 1 mysql mysql 264M Aug 27 17:09 awl.ibd
 > -rw-rw---- 1 mysql mysql 112K Aug 28 08:25 bayes_expire.ibd
 > -rw-rw---- 1 mysql mysql  96K Aug 27 17:09 bayes_global_vars.ibd
 > -rw-rw---- 1 mysql mysql 468M Aug 27 21:11 bayes_seen.ibd
 > -rw-rw---- 1 mysql mysql 148M Aug 27 21:43 bayes_token.ibd
 > -rw-rw---- 1 mysql mysql 112K Aug 28 08:25 bayes_vars.ibd

 > As you can see above, the new storage engine consumed 2 times
 > bigger diskspace then the old. Is it a good behave or I should
 > feel worried?

Nothing to worry. But you have perhaps imported your data twice and have 
an empty ibdata1 file which only occupies space.

I quoted the innodb data files. Since you have defined 
innodb_file_per_table, the table data is saved into the *.ibd files in 
the database directory. Without that option all table data would go to 
the ibdata* file(s) in the base data directory. As far as I know, data 
for one table is saved either in ibdata* or in the *.ibd file, but not 
both. Perhaps you played around and first imported the data without 
innodb_file_per_table, which imported into ibdata1. Then you perhaps 
dropped the tables and defined innodb_file_per table and imported again, 
so the *.ibd files were created and filled. The ibdata1 may now be 
empty, but it will never shrink.

Try the following: Dump all databases which have innodb tables and drop 
all innodb tables. Stop the server, remove the ibdata1 and *.ibd files 
and restart the server. An empty and small ibdata1 file will be 
recreated. Now import your databases. I bet the ibdata1 file will not 
grow and all data will be imported into *.ibd.

 > It has exactly the same content like old database and it was
 > simply injected from MySQL dump.

It is not neccessary to dump/reload the data for changing the database 
engine of a table. Simply edit the tables with Mysql Query Browser or 
the Mysql Administrator and change the table engine from myisam to 
innodb. Or execute an SQL statement: "ALTER TABLE mytable ENGINE=innodb".