You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Big Wave Dave <bi...@gmail.com> on 2007/01/03 18:11:47 UTC

SA-Learn Recover to SQL is slow.

As mentioned in a previous thread, I'm migrating to SQL based bayes.

I performed the backup from the original bayes DB, which took about a
minute to export.
I then setup the SQL, and now am trying to import into it... but it
has been running for m more than 6 hours!?

When I check MySQL I see:
mysql> SELECT COUNT(*) spam_count FROM bayes_token;
+------------+
| spam_count |
+------------+
|    3944353 |
+------------+
1 row in set (5.36 sec)

It is incrementing, telling me it is doing something.

I am running the sa-learn --restore in debug mode and the last section shows:
[29103] dbg: replacetags: replacing tags
[29103] dbg: replacetags: done replacing tags
[29103] dbg: bayes: using username: root
[29103] dbg: bayes: database connection established
[29103] dbg: bayes: found bayes db version 3
[29103] dbg: bayes: Using userid: 3
[29103] dbg: bayes: not available for scanning, only 1 spam(s) in bayes DB < 200
[29103] dbg: config: score set 1 chosen.
[29103] dbg: bayes: database connection established
[29103] dbg: bayes: found bayes db version 3
[29103] dbg: bayes: Using userid: 3
[29103] dbg: bayes: database connection established
[29103] dbg: bayes: found bayes db version 3
[29103] dbg: bayes: using userid: 4

I don't feel as if it is a MySQL problem, as I'm using the same
instance for backends to other applications that do not have
performance problems.

For reference this is using SpamAssassin 3.1.7, on a 1.2Ghz w/ 1GB of RAM.

my.cnf:
[mysqld]
port            = 3306
socket          = /var/lib/mysql/mysql.sock
skip-locking
key_buffer = 16M
max_allowed_packet = 1M
table_cache = 64
sort_buffer_size = 512K
net_buffer_length = 8K
read_buffer_size = 256K
read_rnd_buffer_size = 512K
myisam_sort_buffer_size = 8M

sql.cf (in /etc/mail/spamassassin):
bayes_store_module              Mail::SpamAssassin::BayesStore::MySQL
bayes_sql_dsn                   DBI:mysql:spam_bayes:localhost
bayes_sql_username              spambayes
bayes_sql_password              ********

auto_whitelist_factory          Mail::SpamAssassin::SQLBasedAddrList
user_awl_dsn                    DBI:mysql:spam_bayes:localhost
bayes_sql_username              spambayes
bayes_sql_password              ********

[root@host spamassassin]# sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0          0          0  non-token data: nspam
0.000          0          0          0  non-token data: nham
0.000          0    2936345          0  non-token data: ntokens
0.000          0 1167206400          0  non-token data: oldest atime
0.000          0 1167764660          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal sync atime
0.000          0          0          0  non-token data: last expiry atime
0.000          0          0          0  non-token data: last expire atime delta
0.000          0          0          0  non-token data: last expire
reduction count


What am I missing?

I'd be thankful for any input.
Thanks,
Dave

Re: SA-Learn Recover to SQL is slow.

Posted by Big Wave Dave <bi...@gmail.com>.
On 1/3/07, Michael Parker <pa...@pobox.com> wrote:
> Big Wave Dave wrote:
> >
> >
> > What am I missing?
> >
> > I'd be thankful for any input.
>
> You're not missing anything.  The import takes a long time to run.  Its
> doing a lot of updates which are expensive in SQL.  The good news is
> that you can pretty much use the system while its doing the import
> because everything is atomic.
>
> There might be some tuning you could do on your database side that would
> speed things up, but that is a much larger discussion.
>
> Michael
>

I just noticed that the sa-learn --dump magic showed:
0.000          0    2936345          0  non-token data: ntokens

Does this mean I have nearly 3 million tokens?  I thought the default
max allowed before expiry was 150,000?  Could this be my problem?...
or am I not reading this information correctly?

Thanks,
Dave

Re: SA-Learn Recover to SQL is slow.

Posted by Big Wave Dave <bi...@gmail.com>.
On 1/4/07, Michael Parker <pa...@pobox.com> wrote:
> Big Wave Dave wrote:
> > On 1/3/07, Gary V <mr...@hotmail.com> wrote:
> >> >It finally finished the restore.
> >> >
> >> >For the sake of information to help future users....
> >> >
> >> >The "backup" file being used to restore into the new SQL database was
> >> >99MB and took 17hrs to import on my AMD 1.2Ghz machine with 1GB of
> >> >RAM.
> >> >
> >> >Dave
> >>
> >> Could be your database was not expiring. Probably a good idea to do a
> >> --force-expire prior to a backup. Just curious, If you run --force-expire
> >> now, what does --dump magic look like?
> >>
> >> Gary V
> >>
> >
> > Here are the numbers...
> > [root@host ~]# sa-learn --dump magic
> > 0.000          0          3          0  non-token data: bayes db version
> > 0.000          0        253          0  non-token data: nspam
> > 0.000          0        580          0  non-token data: nham
> > 0.000          0    3637103          0  non-token data: ntokens
> > 0.000          0 1167206400          0  non-token data: oldest atime
> > 0.000          0 1167890964          0  non-token data: newest atime
> > 0.000          0          0          0  non-token data: last journal
> > sync atime
> > 0.000          0 1167891012          0  non-token data: last expiry atime
> > 0.000          0          0          0  non-token data: last expire
> > atime delta
> > 0.000          0          0          0  non-token data: last expire
> > reduction count
> > [root@host ~]# sa-learn --force-expire
> > [root@host ~]# sa-learn --dump magic
> > 0.000          0          3          0  non-token data: bayes db version
> > 0.000          0        253          0  non-token data: nspam
> > 0.000          0        580          0  non-token data: nham
> > 0.000          0    3637103          0  non-token data: ntokens
> > 0.000          0 1167206400          0  non-token data: oldest atime
> > 0.000          0 1167890964          0  non-token data: newest atime
> > 0.000          0          0          0  non-token data: last journal
> > sync atime
> > 0.000          0 1167891646          0  non-token data: last expiry atime
> > 0.000          0          0          0  non-token data: last expire
> > atime delta
> > 0.000          0          0          0  non-token data: last expire
> > reduction count
> > [root@host ~]#
> >
> > It would appear to me as if it hasn't changed the number of tokens at all.
> >
>
> Run with -D, it will probably tell you there wasn't enough difference to
> run the expire.  The SQL import works just like you were learning the
> tokens, so the atimes are updated accordingly.  Over time the atime
> differences will be enough that you are able to expire.
>
> Someone else mentioned it but I'll followup, probably your auto-expire
> has either been broken (do you use MailScanner of Amavis or something
> like that?) for some time.  Before you backup, you should run sa-learn
> --force-expire to clear things out.  Its obviously too late for that now.
>
> Give it a few days to update the database and you'll be able to start
> expiring out data.  It may take a few weeks for you database to get
> enough diversity in the atimes to get down to the configured 150k token
> level.
>
> Michael
>
I ran with -D and got the following:
[26682] dbg: bayes: couldn't find a good delta atime, need more token
difference, skipping expire
[26682] dbg: bayes: expiry completed

I'm guessing it has to do with the recent learning of "ham" that I
did.  I'll keep an eye on it and try to report back in a couple weeks.

Thanks for the help.
Dave

Re: SA-Learn Recover to SQL is slow.

Posted by Michael Parker <pa...@pobox.com>.
Big Wave Dave wrote:
> On 1/3/07, Gary V <mr...@hotmail.com> wrote:
>> >It finally finished the restore.
>> >
>> >For the sake of information to help future users....
>> >
>> >The "backup" file being used to restore into the new SQL database was
>> >99MB and took 17hrs to import on my AMD 1.2Ghz machine with 1GB of
>> >RAM.
>> >
>> >Dave
>>
>> Could be your database was not expiring. Probably a good idea to do a
>> --force-expire prior to a backup. Just curious, If you run --force-expire
>> now, what does --dump magic look like?
>>
>> Gary V
>>
> 
> Here are the numbers...
> [root@host ~]# sa-learn --dump magic
> 0.000          0          3          0  non-token data: bayes db version
> 0.000          0        253          0  non-token data: nspam
> 0.000          0        580          0  non-token data: nham
> 0.000          0    3637103          0  non-token data: ntokens
> 0.000          0 1167206400          0  non-token data: oldest atime
> 0.000          0 1167890964          0  non-token data: newest atime
> 0.000          0          0          0  non-token data: last journal
> sync atime
> 0.000          0 1167891012          0  non-token data: last expiry atime
> 0.000          0          0          0  non-token data: last expire
> atime delta
> 0.000          0          0          0  non-token data: last expire
> reduction count
> [root@host ~]# sa-learn --force-expire
> [root@host ~]# sa-learn --dump magic
> 0.000          0          3          0  non-token data: bayes db version
> 0.000          0        253          0  non-token data: nspam
> 0.000          0        580          0  non-token data: nham
> 0.000          0    3637103          0  non-token data: ntokens
> 0.000          0 1167206400          0  non-token data: oldest atime
> 0.000          0 1167890964          0  non-token data: newest atime
> 0.000          0          0          0  non-token data: last journal
> sync atime
> 0.000          0 1167891646          0  non-token data: last expiry atime
> 0.000          0          0          0  non-token data: last expire
> atime delta
> 0.000          0          0          0  non-token data: last expire
> reduction count
> [root@host ~]#
> 
> It would appear to me as if it hasn't changed the number of tokens at all.
> 

Run with -D, it will probably tell you there wasn't enough difference to
run the expire.  The SQL import works just like you were learning the
tokens, so the atimes are updated accordingly.  Over time the atime
differences will be enough that you are able to expire.

Someone else mentioned it but I'll followup, probably your auto-expire
has either been broken (do you use MailScanner of Amavis or something
like that?) for some time.  Before you backup, you should run sa-learn
--force-expire to clear things out.  Its obviously too late for that now.

Give it a few days to update the database and you'll be able to start
expiring out data.  It may take a few weeks for you database to get
enough diversity in the atimes to get down to the configured 150k token
level.

Michael

Re: SA-Learn Recover to SQL is slow.

Posted by Big Wave Dave <bi...@gmail.com>.
On 1/3/07, Gary V <mr...@hotmail.com> wrote:
> >It finally finished the restore.
> >
> >For the sake of information to help future users....
> >
> >The "backup" file being used to restore into the new SQL database was
> >99MB and took 17hrs to import on my AMD 1.2Ghz machine with 1GB of
> >RAM.
> >
> >Dave
>
> Could be your database was not expiring. Probably a good idea to do a
> --force-expire prior to a backup. Just curious, If you run --force-expire
> now, what does --dump magic look like?
>
> Gary V
>

Here are the numbers...
[root@host ~]# sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0        253          0  non-token data: nspam
0.000          0        580          0  non-token data: nham
0.000          0    3637103          0  non-token data: ntokens
0.000          0 1167206400          0  non-token data: oldest atime
0.000          0 1167890964          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal sync atime
0.000          0 1167891012          0  non-token data: last expiry atime
0.000          0          0          0  non-token data: last expire atime delta
0.000          0          0          0  non-token data: last expire
reduction count
[root@host ~]# sa-learn --force-expire
[root@host ~]# sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0        253          0  non-token data: nspam
0.000          0        580          0  non-token data: nham
0.000          0    3637103          0  non-token data: ntokens
0.000          0 1167206400          0  non-token data: oldest atime
0.000          0 1167890964          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal sync atime
0.000          0 1167891646          0  non-token data: last expiry atime
0.000          0          0          0  non-token data: last expire atime delta
0.000          0          0          0  non-token data: last expire
reduction count
[root@host ~]#

It would appear to me as if it hasn't changed the number of tokens at all.

Dave

Re: SA-Learn Recover to SQL is slow.

Posted by "Jack L. Stone" <ja...@sage-american.com>.
On 3 Jan 2007 at 21:45, Gary V wrote:

> >It finally finished the restore.
> >
> >For the sake of information to help future users....
> >
> >The "backup" file being used to restore into the new SQL database was
> >99MB and took 17hrs to import on my AMD 1.2Ghz machine with 1GB of
> >RAM.
> >
> >Dave
> 
> Could be your database was not expiring. Probably a good idea to do a
> --force-expire prior to a backup. Just curious, If you run
> --force-expire now, what does --dump magic look like?
> 
> Gary V
> 

I'm not seeing any change in this same manner. Nothing changed:
root@xxxxxx>> sa-learn --dump magic
0.000  0          3   0  non-token data: bayes db version
0.000  0        253   0  non-token data: nspam
0.000  0       1817   0  non-token data: nham
0.000  0     126548   0  non-token data: ntokens
0.000  0 1161347400   0  non-token data: oldest atime
0.000  0 1167885013   0  non-token data: newest atime
0.000  0       0      0  non-token data: last journal sync atime
0.000  0 1167919691   0  non-token data: last expiry atime
0.000  0    5529600   0  non-token data: last expire atime delta
0.000  0      39556   0  non-token data: last expire reduction count


Regards,
Jack L. Stone
System Admin


Re: SA-Learn Recover to SQL is slow.

Posted by Gary V <mr...@hotmail.com>.
>It finally finished the restore.
>
>For the sake of information to help future users....
>
>The "backup" file being used to restore into the new SQL database was
>99MB and took 17hrs to import on my AMD 1.2Ghz machine with 1GB of
>RAM.
>
>Dave

Could be your database was not expiring. Probably a good idea to do a 
--force-expire prior to a backup. Just curious, If you run --force-expire 
now, what does --dump magic look like?

Gary V

_________________________________________________________________
Get FREE Web site and company branded e-mail from Microsoft Office Live 
http://clk.atdmt.com/MRT/go/mcrssaub0050001411mrt/direct/01/


Re: SA-Learn Recover to SQL is slow.

Posted by Big Wave Dave <bi...@gmail.com>.
On 1/3/07, Michael Parker <pa...@pobox.com> wrote:
> Big Wave Dave wrote:
> >
> >
> > What am I missing?
> >
> > I'd be thankful for any input.
>
> You're not missing anything.  The import takes a long time to run.  Its
> doing a lot of updates which are expensive in SQL.  The good news is
> that you can pretty much use the system while its doing the import
> because everything is atomic.
>
> There might be some tuning you could do on your database side that would
> speed things up, but that is a much larger discussion.
>
> Michael
>

It finally finished the restore.

For the sake of information to help future users....

The "backup" file being used to restore into the new SQL database was
99MB and took 17hrs to import on my AMD 1.2Ghz machine with 1GB of
RAM.

Dave

Re: SA-Learn Recover to SQL is slow.

Posted by Michael Parker <pa...@pobox.com>.
Big Wave Dave wrote:
> 
> 
> What am I missing?
> 
> I'd be thankful for any input.

You're not missing anything.  The import takes a long time to run.  Its
doing a lot of updates which are expensive in SQL.  The good news is
that you can pretty much use the system while its doing the import
because everything is atomic.

There might be some tuning you could do on your database side that would
speed things up, but that is a much larger discussion.

Michael