You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Micah Anderson <mi...@riseup.net> on 2009/06/05 16:08:53 UTC

Re: two databases

Michael Grant <mi...@gmail.com> writes:

> I did not realize one could store the bayes scores in sql.
>
> So I'd store the bayes scores on a third server and let both mxes use
> the same database.

I did this, but my bayes in mysql and pointed two different spamd
machines at it, but I had severe problems that I could not resolve. I
posted to the list[0] about the problems.

The basic problem was that as soon as I fired up the second server it
immediately starts blocking on the bayes work. Average scantimes go from
1-2 seconds up to 35+ and the max children get eaten up by blocking on
the bayes work to the point where its pointless because too many
processes are blocked. Disabling the bayes_sql stuff on one of the
machines dropped the scantimes back to their expected average of 1-2
seconds (but of course none of the BAYES tests will fire and
autolearning fails).

My mysql server is its own machine, it was local to the first spamd
(local LAN) and remote to the second (over the net). I eliminated any
hostname lookup problems, obviously couldn't eliminate network latency,
but that shouldn't have caused such a severe result. I'm running with
InnoDB tables, so I shouldn't have any row-level locking issues... in
any case I might have had some issues because my MySQL database needed
to be optimized, but I was not able to determine how and now I just run
one of the spamd's without bayes, which is not too bad because my bayes
database seems to be totally worthless at the moment. :P

micah

0. http://permalink.gmane.org/gmane.mail.spam.spamassassin.general/113673


Re: two databases

Posted by d....@yournetplus.com.
Quoting Micah Anderson <mi...@riseup.net>:

> any case I might have had some issues because my MySQL database needed
> to be optimized, but I was not able to determine how and now I just run
> one of the spamd's without bayes, which is not too bad because my bayes
> database seems to be totally worthless at the moment. :P

http://dev.mysql.com/doc/refman/5.0/en/optimize-table.html

I have a cronjob set up that does an optimize table on all the SA  
tables every 24 hours to make sure everything is in line.


Re: two databases

Posted by Rick Macdougall <ri...@ummm-beer.com>.
Michael Grant wrote:
> On Fri, Jun 5, 2009 at 16:08, Micah Anderson <mi...@riseup.net> wrote:
>> Michael Grant <mi...@gmail.com> writes:
>>
>>> I did not realize one could store the bayes scores in sql.
>>>
>>> So I'd store the bayes scores on a third server and let both mxes use
>>> the same database.
>> I did this, but my bayes in mysql and pointed two different spamd
>> machines at it, but I had severe problems that I could not resolve. I
>> posted to the list[0] about the problems.
>>
>> The basic problem was that as soon as I fired up the second server it
>> immediately starts blocking on the bayes work. Average scantimes go from
>> 1-2 seconds up to 35+ and the max children get eaten up by blocking on
>> the bayes work to the point where its pointless because too many
>> processes are blocked. Disabling the bayes_sql stuff on one of the
>> machines dropped the scantimes back to their expected average of 1-2
>> seconds (but of course none of the BAYES tests will fire and
>> autolearning fails).
>>

I found that the bayes lookup occurred, then the connection was closed, 
then a second connection attempt was made to do bayes learning but it 
attempted to use the same socket.

Because the socket on the remote server hadn't closed yet, the process 
hung until closed, then preceded.

What I ended up doing was having two spamd machines use DBI.pm (which I 
found on the spamassassin wiki and that makes SA use persistent 
connections) and have auto-learning ON on those two machines.

The other two machines run with bayes enabled but with auto-learning OFF.

For me this solved all my problems.

Please note how ever that this occurred to me using 3.0.x and I've just 
been upgraded ever since with out checking to see if the re-connection 
issue has been solved since everything *just works* as it is currently 
configured.

HTHs,

Rick


Re: two databases

Posted by Micah Anderson <mi...@riseup.net>.
* Michael Grant <mi...@gmail.com> [2009-06-05 10:26-0400]:
> On Fri, Jun 5, 2009 at 16:08, Micah Anderson <mi...@riseup.net> wrote:
> > Michael Grant <mi...@gmail.com> writes:
> >
> >> I did not realize one could store the bayes scores in sql.
> >>
> >> So I'd store the bayes scores on a third server and let both mxes use
> >> the same database.
> >
> > I did this, but my bayes in mysql and pointed two different spamd
> > machines at it, but I had severe problems that I could not resolve. I
> > posted to the list[0] about the problems.
> >
> > The basic problem was that as soon as I fired up the second server it
> > immediately starts blocking on the bayes work. Average scantimes go from
> > 1-2 seconds up to 35+ and the max children get eaten up by blocking on
> > the bayes work to the point where its pointless because too many
> > processes are blocked. Disabling the bayes_sql stuff on one of the
> > machines dropped the scantimes back to their expected average of 1-2
> > seconds (but of course none of the BAYES tests will fire and
> > autolearning fails).
> >
> > My mysql server is its own machine, it was local to the first spamd
> > (local LAN) and remote to the second (over the net). I eliminated any
> > hostname lookup problems, obviously couldn't eliminate network latency,
> > but that shouldn't have caused such a severe result. I'm running with
> > InnoDB tables, so I shouldn't have any row-level locking issues... in
> > any case I might have had some issues because my MySQL database needed
> > to be optimized, but I was not able to determine how and now I just run
> > one of the spamd's without bayes, which is not too bad because my bayes
> > database seems to be totally worthless at the moment. :P
> >
> > micah
> >
> > 0. http://permalink.gmane.org/gmane.mail.spam.spamassassin.general/113673
> >
> >
> 
> Wow.  I did not get around to setting this up yet.  But on the MySQL
> front, did you try enabling the query cache by adding this to the
> mysql command line?
> 
>     --maximum-query_cache_size=1M

I presume this setting is the same in my.cnf:
query_cache_limit	= 1048576

I dont recall all the things I tried, but it seems worth trying again,
this time with a fresh approach. 

> Also, a tool I used a lot to help debug this sort of issue was mytop.

I've never had too much luck with mytop, but I have found the
tuning-primer.sh to work well: http://www.day32.com/MySQL/

micah

Re: two databases

Posted by Michael Grant <mi...@gmail.com>.
On Fri, Jun 5, 2009 at 16:08, Micah Anderson <mi...@riseup.net> wrote:
> Michael Grant <mi...@gmail.com> writes:
>
>> I did not realize one could store the bayes scores in sql.
>>
>> So I'd store the bayes scores on a third server and let both mxes use
>> the same database.
>
> I did this, but my bayes in mysql and pointed two different spamd
> machines at it, but I had severe problems that I could not resolve. I
> posted to the list[0] about the problems.
>
> The basic problem was that as soon as I fired up the second server it
> immediately starts blocking on the bayes work. Average scantimes go from
> 1-2 seconds up to 35+ and the max children get eaten up by blocking on
> the bayes work to the point where its pointless because too many
> processes are blocked. Disabling the bayes_sql stuff on one of the
> machines dropped the scantimes back to their expected average of 1-2
> seconds (but of course none of the BAYES tests will fire and
> autolearning fails).
>
> My mysql server is its own machine, it was local to the first spamd
> (local LAN) and remote to the second (over the net). I eliminated any
> hostname lookup problems, obviously couldn't eliminate network latency,
> but that shouldn't have caused such a severe result. I'm running with
> InnoDB tables, so I shouldn't have any row-level locking issues... in
> any case I might have had some issues because my MySQL database needed
> to be optimized, but I was not able to determine how and now I just run
> one of the spamd's without bayes, which is not too bad because my bayes
> database seems to be totally worthless at the moment. :P
>
> micah
>
> 0. http://permalink.gmane.org/gmane.mail.spam.spamassassin.general/113673
>
>

Wow.  I did not get around to setting this up yet.  But on the MySQL
front, did you try enabling the query cache by adding this to the
mysql command line?

    --maximum-query_cache_size=1M

Also, a tool I used a lot to help debug this sort of issue was mytop.

Michael Grant