You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Jonas Akrouh Larsen <jo...@vrt.dk> on 2013/10/22 16:07:41 UTC
Shared bayes SQL between machines
Hi List
I tried a couple of years ago to have my bayes DB stored in SQL replicate as MASTER-MASTER between 2 servers.
It worked fine for starters, but it often broke down because of irregularities, something about duplicate data/keys if I recall correctly.
As I would like 1 single shared bayes DB, what is everyone else doing?
I would prefer the DB to be local, so 1 single shared db server isn't an option for me.
Currently I'm contemplating splitting up reads and writes with a mysql proxy, but im not sure that's the best option.
So do anybody have any tips or advise to offer?
Med venlig hilsen / Best regards
Jonas Akrouh Larsen
TechBiz ApS
Laplandsgade 4, 2. sal
2300 København S
Office: 7020 0979
Direct: 3336 9974
Mobile: 5120 1096
Web: www.techbiz.dk<http://www.techbiz.dk>
Re: Shared bayes SQL between machines
Posted by Axb <ax...@gmail.com>.
On 10/22/2013 05:21 PM, Jonas Akrouh Larsen wrote:
> Interesting, my problem is timeouts as well, when the master sql
> server is down, the other nodes slow down because they are waiting
> for bayes to timeout, meaning they build of a mail queue because they
> can't keep up (or well they become a lot slower anyway)
>
> Hence my need for something HA'ish.
>
> I can see how your setup should work fine, it does seem a bit manual
> though. Isn't redis supposed to have failover features built-in?
> Which would make your Zabbix fix unnecessary? Any particular reason
> you chose to script your own "failover" and not use the redis based
> one?
Redis cluster is work in progess.
Haven't played with it yet....
http://redis.io/topics/cluster-spec
RE: Shared bayes SQL between machines
Posted by Jonas Akrouh Larsen <jo...@vrt.dk>.
> On 10/22/2013 04:40 PM, Jonas Akrouh Larsen wrote:
> >> as if Bayes would be mission critical. .-) I have 47 spamd boxes
> >> connecting to one Redis box and a slave as spare.
> >>
> >
> > Did you test what happens when u shutdown the master/main redis box?
> > Does it just automatically failover or? I'd love to hear some
> > experiences if you been using it for a while.
>
> I don't have failover configured.
>
> [root@pyzord ~]# uptime
> 17:03:48 up 45 days, 6:53, 1 user, load average: 1.14, 0.38, 0.20
>
> If bayes box goes down, SA's bayes checks time out.
> If it's down for more than 8 min, monitoring (Zabbix) is configured to replace
> a sa_bayes.cf with one which points to the slave box and reloads spamd.
> When master comes back, Zabbix reverses the change.
>
> sa-learn --dump magic
> 0.000 0 3 0 non-token data: bayes db version
> 0.000 0 12456296 0 non-token data: nspam
> 0.000 0 3362288 0 non-token data: nham
> 0.000 0 0 0 non-token data: ntokens
>
> Bayes/SQL is just too slow for my traffic, and file based backends were
> constantly locked.
>
> I was the one who requested the Redis backend after a long proof of
> concept "alpha" run period.
>
> Wouldn't go back for anything in the world :)
Interesting, my problem is timeouts as well, when the master sql server is down, the other nodes slow down because they are waiting for bayes to timeout, meaning they build of a mail queue because they can't keep up (or well they become a lot slower anyway)
Hence my need for something HA'ish.
I can see how your setup should work fine, it does seem a bit manual though. Isn't redis supposed to have failover features built-in? Which would make your Zabbix fix unnecessary? Any particular reason you chose to script your own "failover" and not use the redis based one?
Thanks for sharing your setup btw.
Med venlig hilsen / Best regards
Jonas Akrouh Larsen
TechBiz ApS
Laplandsgade 4, 2. sal
2300 København S
Office: 7020 0979
Direct: 3336 9974
Mobile: 5120 1096
Web: www.techbiz.dk
Re: Shared bayes SQL between machines
Posted by Axb <ax...@gmail.com>.
On 10/22/2013 04:07 PM, Jonas Akrouh Larsen wrote:
> Hi List
>
> I tried a couple of years ago to have my bayes DB stored in SQL replicate as MASTER-MASTER between 2 servers.
>
> It worked fine for starters, but it often broke down because of irregularities, something about duplicate data/keys if I recall correctly.
>
> As I would like 1 single shared bayes DB, what is everyone else doing?
>
> I would prefer the DB to be local, so 1 single shared db server isn't an option for me.
>
> Currently I'm contemplating splitting up reads and writes with a mysql proxy, but im not sure that's the best option.
>
> So do anybody have any tips or advise to offer?
advice= none
tips= run SA 3.4 with bayes using Redis backend so all boxes access it.
VERY fast/robust/easy
totally set & forget.
RE: Shared bayes SQL between machines
Posted by John Hardin <jh...@impsec.org>.
On Tue, 22 Oct 2013, Jonas Akrouh Larsen wrote:
> Mmm well I'd like to use it, but it's also the manual learning.
>
> We use a web frontend which lists quarantined messages and both users
> and myself do manual training from there (it just calls sa-learn)
Manual learning is easy to target on the master database. It's the
distributed autolearn that's problematic.
>> -----Original Message-----
>> From: John Hardin [mailto:jhardin@impsec.org]
>> Sent: 22. oktober 2013 16:50
>> To: SpamAssassin Users List
>> Subject: RE: Shared bayes SQL between machines
>>
>> On Tue, 22 Oct 2013, Jonas Akrouh Larsen wrote:
>>
>>>> Use standard master-slave replication. Learn and expire only the
>>>> master database.
>>>>
>>>> I *think* we support separate credentials for learning to a different
>>>> database than you scan from, but I don't follow the Bayes SQL
>>>> interface too closely so I'm not sure.
>>>
>>> If that was supported it would solve my problem, but the only thing
>>> I've seen was somebody who wrote his own postgresql based module for
>>> splitting it up.
>>
>> Do you *have* to use autolearn?
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Maxim V: Close air support and friendly fire should be easier to
tell apart.
-----------------------------------------------------------------------
509 days since the first successful private support mission to ISS (SpaceX)
RE: Shared bayes SQL between machines
Posted by Jonas Akrouh Larsen <jo...@vrt.dk>.
> -----Original Message-----
> From: Dave Warren [mailto:davew@hireahit.com]
> Sent: 23. oktober 2013 09:51
> To: users@spamassassin.apache.org
> Subject: Re: Shared bayes SQL between machines
>
> On 2013-10-22 07:52, Jonas Akrouh Larsen wrote:
> > Mmm well I'd like to use it, but it's also the manual learning.
> >
> > We use a web frontend which lists quarantined messages and both users
> > and myself do manual training from there (it just calls sa-learn)
>
> How do you distribute load among your SpamAssassin servers? Is it relatively
> balanced, or do different groups of users hit different servers?
>
> If it's balanced, consider just autolearning on the master and forget about
> autolearning on the rest. Sure, you'll get a little less data in overall, but you'll
> still get a representative flow of both spam and ham, and you can still
> perform manual training against the master database from other servers.
>
It's pretty balanced, so actually it's a pretty good idea.
If I combine john's advice with specifying the master in sa-learn's config, and then only autolearn on 1 node everything should work I think.
With 1 master taking all the writes and nodes using their local read only copy of the DB for improved performance, it should work.
And it's a lot simpler than all sorts of failover mechanisms, master-master replication and whatnot
Thanks list :)
Med venlig hilsen / Best regards
Jonas Akrouh Larsen
TechBiz ApS
Laplandsgade 4, 2. sal
2300 København S
Office: 7020 0979
Direct: 3336 9974
Mobile: 5120 1096
Web: www.techbiz.dk
Re: Shared bayes SQL between machines
Posted by Dave Warren <da...@hireahit.com>.
On 2013-10-22 07:52, Jonas Akrouh Larsen wrote:
> Mmm well I'd like to use it, but it's also the manual learning.
>
> We use a web frontend which lists quarantined messages and both users and myself do manual training from there (it just calls sa-learn)
How do you distribute load among your SpamAssassin servers? Is it
relatively balanced, or do different groups of users hit different servers?
If it's balanced, consider just autolearning on the master and forget
about autolearning on the rest. Sure, you'll get a little less data in
overall, but you'll still get a representative flow of both spam and
ham, and you can still perform manual training against the master
database from other servers.
--
Dave Warren
http://www.hireahit.com/
http://ca.linkedin.com/in/davejwarren
RE: Shared bayes SQL between machines
Posted by Jonas Akrouh Larsen <jo...@vrt.dk>.
Mmm well I'd like to use it, but it's also the manual learning.
We use a web frontend which lists quarantined messages and both users and myself do manual training from there (it just calls sa-learn)
Med venlig hilsen / Best regards
Jonas Akrouh Larsen
TechBiz ApS
Laplandsgade 4, 2. sal
2300 København S
Office: 7020 0979
Direct: 3336 9974
Mobile: 5120 1096
Web: www.techbiz.dk
> -----Original Message-----
> From: John Hardin [mailto:jhardin@impsec.org]
> Sent: 22. oktober 2013 16:50
> To: SpamAssassin Users List
> Subject: RE: Shared bayes SQL between machines
>
> On Tue, 22 Oct 2013, Jonas Akrouh Larsen wrote:
>
> >> Use standard master-slave replication. Learn and expire only the
> >> master database.
> >>
> >> I *think* we support separate credentials for learning to a different
> >> database than you scan from, but I don't follow the Bayes SQL
> >> interface too closely so I'm not sure.
> >
> > If that was supported it would solve my problem, but the only thing
> > I've seen was somebody who wrote his own postgresql based module for
> > splitting it up.
>
> Do you *have* to use autolearn?
>
> --
> John Hardin KA7OHZ http://www.impsec.org/~jhardin/
> jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
> key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
> -----------------------------------------------------------------------
> Gun Control is nothing more than an attempt to return to feudalism,
> where the peasants are helpless and must humbly petition their lord
> and master to protect them from bandits and thieves (when they can
> get around to it), and where the lords and masters can abuse the
> peasants whenever they like without fear of effective resistance.
> -----------------------------------------------------------------------
> 509 days since the first successful private support mission to ISS (SpaceX)
RE: Shared bayes SQL between machines
Posted by John Hardin <jh...@impsec.org>.
On Tue, 22 Oct 2013, Jonas Akrouh Larsen wrote:
>> Use standard master-slave replication. Learn and expire only the master
>> database.
>>
>> I *think* we support separate credentials for learning to a different database
>> than you scan from, but I don't follow the Bayes SQL interface too closely so
>> I'm not sure.
>
> If that was supported it would solve my problem, but the only thing I've
> seen was somebody who wrote his own postgresql based module for
> splitting it up.
Do you *have* to use autolearn?
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Gun Control is nothing more than an attempt to return to feudalism,
where the peasants are helpless and must humbly petition their lord
and master to protect them from bandits and thieves (when they can
get around to it), and where the lords and masters can abuse the
peasants whenever they like without fear of effective resistance.
-----------------------------------------------------------------------
509 days since the first successful private support mission to ISS (SpaceX)
Re: Shared bayes SQL between machines
Posted by John Hardin <jh...@impsec.org>.
On Tue, 22 Oct 2013, Jonas Akrouh Larsen wrote:
> I tried a couple of years ago to have my bayes DB stored in SQL replicate as MASTER-MASTER between 2 servers.
>
> It worked fine for starters, but it often broke down because of irregularities, something about duplicate data/keys if I recall correctly.
That's not surprising in a master-master topology. Bayes was not designed
for that environment.
> As I would like 1 single shared bayes DB, what is everyone else doing?
>
> I would prefer the DB to be local, so 1 single shared db server isn't an option for me.
>
> Currently I'm contemplating splitting up reads and writes with a mysql proxy, but im not sure that's the best option.
>
> So do anybody have any tips or advise to offer?
Use standard master-slave replication. Learn and expire only the master
database.
I *think* we support separate credentials for learning to a different
database than you scan from, but I don't follow the Bayes SQL interface
too closely so I'm not sure.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Gun Control is nothing more than an attempt to return to feudalism,
where the peasants are helpless and must humbly petition their lord
and master to protect them from bandits and thieves (when they can
get around to it), and where the lords and masters can abuse the
peasants whenever they like without fear of effective resistance.
-----------------------------------------------------------------------
509 days since the first successful private support mission to ISS (SpaceX)