You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Jonas Akrouh Larsen <jo...@vrt.dk> on 2013/10/22 16:07:41 UTC

Shared bayes SQL between machines

Hi List

I tried a couple of years ago to have my bayes DB stored in SQL replicate as MASTER-MASTER between 2 servers.

It worked fine for starters, but it often broke down because of irregularities, something about duplicate data/keys if I recall correctly.

As I would like 1 single shared bayes DB, what is everyone else doing?

I would prefer the DB to be local, so 1 single shared db server isn't an option for me.

Currently I'm contemplating splitting up reads and writes with a mysql proxy, but im not sure that's the best option.

So do anybody have any tips or advise to offer?



Med venlig hilsen / Best regards

Jonas Akrouh Larsen

TechBiz ApS
Laplandsgade 4, 2. sal
2300 København S

Office: 7020 0979
Direct: 3336 9974
Mobile: 5120 1096
Web: www.techbiz.dk<http://www.techbiz.dk>



Re: Shared bayes SQL between machines

Posted by Axb <ax...@gmail.com>.
On 10/22/2013 05:21 PM, Jonas Akrouh Larsen wrote:
> Interesting, my problem is timeouts as well, when the master sql
> server is down, the other nodes slow down because they are waiting
> for bayes to timeout, meaning they build of a mail queue because they
> can't keep up (or well they become a lot slower anyway)
>
> Hence my need for something HA'ish.
>
> I can see how your setup should work fine, it does seem a bit manual
> though. Isn't redis supposed to have failover features built-in?
> Which would make your Zabbix fix unnecessary? Any particular reason
> you chose to script your own "failover" and not use the redis based
> one?

Redis cluster is work in progess.
Haven't played with it yet....

http://redis.io/topics/cluster-spec

RE: Shared bayes SQL between machines

Posted by Jonas Akrouh Larsen <jo...@vrt.dk>.
> On 10/22/2013 04:40 PM, Jonas Akrouh Larsen wrote:
> >> as if Bayes would be mission critical. .-) I have 47 spamd boxes
> >> connecting to one Redis box and a slave as spare.
> >>
> >
> > Did you test what happens when u shutdown the master/main redis box?
> > Does it just automatically failover or? I'd love to hear some
> > experiences if you been using it for a while.
> 
> I don't have failover configured.
> 
> [root@pyzord ~]# uptime
>   17:03:48 up 45 days,  6:53,  1 user,  load average: 1.14, 0.38, 0.20
> 
> If bayes box goes down, SA's bayes checks time out.
> If it's down for more than 8 min, monitoring (Zabbix) is configured to replace
> a sa_bayes.cf with one which points to the slave box and reloads spamd.
> When master comes back, Zabbix reverses the change.
> 
> sa-learn --dump magic
> 0.000          0          3          0  non-token data: bayes db version
> 0.000          0   12456296          0  non-token data: nspam
> 0.000          0    3362288          0  non-token data: nham
> 0.000          0          0          0  non-token data: ntokens
> 
> Bayes/SQL is just too slow for my traffic, and file based backends were
> constantly locked.
> 
> I was the one who requested the Redis backend after a long proof of
> concept "alpha" run period.
> 
> Wouldn't go back for anything in the world :)

Interesting, my problem is timeouts as well, when the master sql server is down, the other nodes slow down because they are waiting for bayes to timeout, meaning they build of a mail queue because they can't keep up (or well they become a lot slower anyway)

Hence my need for something HA'ish.

I can see how your setup should work fine, it does seem a bit manual though. Isn't redis supposed to have failover features built-in? Which would make your Zabbix fix unnecessary? Any particular reason you chose to script your own "failover" and not use the redis based one?

Thanks for sharing your setup btw.


Med venlig hilsen / Best regards
 
Jonas Akrouh Larsen
 
TechBiz ApS
Laplandsgade 4, 2. sal
2300 København S
 
Office: 7020 0979
Direct: 3336 9974
Mobile: 5120 1096
Web: www.techbiz.dk



Re: Shared bayes SQL between machines

Posted by Axb <ax...@gmail.com>.
On 10/22/2013 04:07 PM, Jonas Akrouh Larsen wrote:
> Hi List
>
> I tried a couple of years ago to have my bayes DB stored in SQL replicate as MASTER-MASTER between 2 servers.
>
> It worked fine for starters, but it often broke down because of irregularities, something about duplicate data/keys if I recall correctly.
>
> As I would like 1 single shared bayes DB, what is everyone else doing?
>
> I would prefer the DB to be local, so 1 single shared db server isn't an option for me.
>
> Currently I'm contemplating splitting up reads and writes with a mysql proxy, but im not sure that's the best option.
>
> So do anybody have any tips or advise to offer?

advice= none
tips= run SA 3.4 with bayes using Redis backend so all boxes access it.
VERY fast/robust/easy
totally set & forget.


RE: Shared bayes SQL between machines

Posted by John Hardin <jh...@impsec.org>.
On Tue, 22 Oct 2013, Jonas Akrouh Larsen wrote:

> Mmm well I'd like to use it, but it's also the manual learning.
>
> We use a web frontend which lists quarantined messages and both users 
> and myself do manual training from there (it just calls sa-learn)

Manual learning is easy to target on the master database. It's the 
distributed autolearn that's problematic.

>> -----Original Message-----
>> From: John Hardin [mailto:jhardin@impsec.org]
>> Sent: 22. oktober 2013 16:50
>> To: SpamAssassin Users List
>> Subject: RE: Shared bayes SQL between machines
>>
>> On Tue, 22 Oct 2013, Jonas Akrouh Larsen wrote:
>>
>>>> Use standard master-slave replication. Learn and expire only the
>>>> master database.
>>>>
>>>> I *think* we support separate credentials for learning to a different
>>>> database than you scan from, but I don't follow the Bayes SQL
>>>> interface too closely so I'm not sure.
>>>
>>> If that was supported it would solve my problem, but the only thing
>>> I've seen was somebody who wrote his own postgresql based module for
>>> splitting it up.
>>
>> Do you *have* to use autolearn?

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Maxim V: Close air support and friendly fire should be easier to
   tell apart.
-----------------------------------------------------------------------
  509 days since the first successful private support mission to ISS (SpaceX)

RE: Shared bayes SQL between machines

Posted by Jonas Akrouh Larsen <jo...@vrt.dk>.
> -----Original Message-----
> From: Dave Warren [mailto:davew@hireahit.com]
> Sent: 23. oktober 2013 09:51
> To: users@spamassassin.apache.org
> Subject: Re: Shared bayes SQL between machines
> 
> On 2013-10-22 07:52, Jonas Akrouh Larsen wrote:
> > Mmm well I'd like to use it, but it's also the manual learning.
> >
> > We use a web frontend which lists quarantined messages and both users
> > and myself do manual training from there (it just calls sa-learn)
> 
> How do you distribute load among your SpamAssassin servers? Is it relatively
> balanced, or do different groups of users hit different servers?
> 
> If it's balanced, consider just autolearning on the master and forget about
> autolearning on the rest. Sure, you'll get a little less data in overall, but you'll
> still get a representative flow of both spam and ham, and you can still
> perform manual training against the master database from other servers.
> 
It's pretty balanced, so actually it's a pretty good idea.

If I combine john's advice with specifying the master in sa-learn's config, and then only autolearn on 1 node everything should work I think.

With 1 master taking all the writes and nodes using their local read only copy of the DB for improved performance, it should work.

And it's a lot simpler than all sorts of failover mechanisms, master-master replication and whatnot

Thanks list :)



Med venlig hilsen / Best regards
 
Jonas Akrouh Larsen
 
TechBiz ApS
Laplandsgade 4, 2. sal
2300 København S
 
Office: 7020 0979
Direct: 3336 9974
Mobile: 5120 1096
Web: www.techbiz.dk



Re: Shared bayes SQL between machines

Posted by Dave Warren <da...@hireahit.com>.
On 2013-10-22 07:52, Jonas Akrouh Larsen wrote:
> Mmm well I'd like to use it, but it's also the manual learning.
>
> We use a web frontend which lists quarantined messages and both users and myself do manual training from there (it just calls sa-learn)

How do you distribute load among your SpamAssassin servers? Is it 
relatively balanced, or do different groups of users hit different servers?

If it's balanced, consider just autolearning on the master and forget 
about autolearning on the rest. Sure, you'll get a little less data in 
overall, but you'll still get a representative flow of both spam and 
ham, and you can still perform manual training against the master 
database from other servers.

-- 
Dave Warren
http://www.hireahit.com/
http://ca.linkedin.com/in/davejwarren


RE: Shared bayes SQL between machines

Posted by Jonas Akrouh Larsen <jo...@vrt.dk>.
Mmm well I'd like to use it, but it's also the manual learning.

We use a web frontend which lists quarantined messages and both users and myself do manual training from there (it just calls sa-learn)

Med venlig hilsen / Best regards
 
Jonas Akrouh Larsen
 
TechBiz ApS
Laplandsgade 4, 2. sal
2300 København S
 
Office: 7020 0979
Direct: 3336 9974
Mobile: 5120 1096
Web: www.techbiz.dk



> -----Original Message-----
> From: John Hardin [mailto:jhardin@impsec.org]
> Sent: 22. oktober 2013 16:50
> To: SpamAssassin Users List
> Subject: RE: Shared bayes SQL between machines
> 
> On Tue, 22 Oct 2013, Jonas Akrouh Larsen wrote:
> 
> >> Use standard master-slave replication. Learn and expire only the
> >> master database.
> >>
> >> I *think* we support separate credentials for learning to a different
> >> database than you scan from, but I don't follow the Bayes SQL
> >> interface too closely so I'm not sure.
> >
> > If that was supported it would solve my problem, but the only thing
> > I've seen was somebody who wrote his own postgresql based module for
> > splitting it up.
> 
> Do you *have* to use autolearn?
> 
> --
>   John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
>   jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
>   key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
> -----------------------------------------------------------------------
>    Gun Control is nothing more than an attempt to return to feudalism,
>    where the peasants are helpless and must humbly petition their lord
>    and master to protect them from bandits and thieves (when they can
>    get around to it), and where the lords and masters can abuse the
>    peasants whenever they like without fear of effective resistance.
> -----------------------------------------------------------------------
>   509 days since the first successful private support mission to ISS (SpaceX)

RE: Shared bayes SQL between machines

Posted by John Hardin <jh...@impsec.org>.
On Tue, 22 Oct 2013, Jonas Akrouh Larsen wrote:

>> Use standard master-slave replication. Learn and expire only the master
>> database.
>>
>> I *think* we support separate credentials for learning to a different database
>> than you scan from, but I don't follow the Bayes SQL interface too closely so
>> I'm not sure.
>
> If that was supported it would solve my problem, but the only thing I've 
> seen was somebody who wrote his own postgresql based module for 
> splitting it up.

Do you *have* to use autolearn?

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Gun Control is nothing more than an attempt to return to feudalism,
   where the peasants are helpless and must humbly petition their lord
   and master to protect them from bandits and thieves (when they can
   get around to it), and where the lords and masters can abuse the
   peasants whenever they like without fear of effective resistance.
-----------------------------------------------------------------------
  509 days since the first successful private support mission to ISS (SpaceX)

Re: Shared bayes SQL between machines

Posted by John Hardin <jh...@impsec.org>.
On Tue, 22 Oct 2013, Jonas Akrouh Larsen wrote:

> I tried a couple of years ago to have my bayes DB stored in SQL replicate as MASTER-MASTER between 2 servers.
>
> It worked fine for starters, but it often broke down because of irregularities, something about duplicate data/keys if I recall correctly.

That's not surprising in a master-master topology. Bayes was not designed 
for that environment.

> As I would like 1 single shared bayes DB, what is everyone else doing?
>
> I would prefer the DB to be local, so 1 single shared db server isn't an option for me.
>
> Currently I'm contemplating splitting up reads and writes with a mysql proxy, but im not sure that's the best option.
>
> So do anybody have any tips or advise to offer?

Use standard master-slave replication. Learn and expire only the master 
database.

I *think* we support separate credentials for learning to a different 
database than you scan from, but I don't follow the Bayes SQL interface 
too closely so I'm not sure.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Gun Control is nothing more than an attempt to return to feudalism,
   where the peasants are helpless and must humbly petition their lord
   and master to protect them from bandits and thieves (when they can
   get around to it), and where the lords and masters can abuse the
   peasants whenever they like without fear of effective resistance.
-----------------------------------------------------------------------
  509 days since the first successful private support mission to ISS (SpaceX)