You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Jeroen Koekkoek <j....@perrit.nl> on 2012/02/14 08:38:04 UTC

bayes module master-slave

Hi,

I have a question regarding a BayesStore module I'm writing. First let me explain what I'm trying to accomplish.

I want to build a setup of three servers, one master database, and two spam gateways. I want the bayes database to be replicated so that the bayes check produces the same result on both servers (more might be added in the future). To do this I want to use a master sql database that replicates to both mail servers. The BayesStore module should do all writes on the master, and all reads on the slave. Although this doesn't necessarily improve overall performance, it does allow the master to go down without the slaves being interrupted.

My questions:
1. Is this even a good idea?
2. Splitting on read/write actions might not be a good idea. It might be better to read only tokens locally, and token expiration delta etc from the master so that we're always working with the most up to date information?
3. Other pointers?

Best regards,
Jeroen Koekkoek

RE: bayes module master-slave

Posted by Jeroen Koekkoek <j....@perrit.nl>.
Hi,

I also created a master-slave module for the auto whitelist feature. The name of the repository has been changed and sources can be found in the following location.

https://github.com/perrit/spamassassin-ms

Just thought I'd let you know. Again if there's something I can do to get it into trunk, please let me know. I still think this feature can be very useful to others as well. I'm willing to spend time to update/modify sources or come up with a patch.

Best regards,
Jeroen Koekkoek


> -----Original Message-----
> From: Jeroen Koekkoek [mailto:j.koekkoek@perrit.nl]
> Sent: Tuesday, February 28, 2012 11:15 AM
> To: 'dev@spamassassin.apache.org'
> Subject: RE: bayes module master-slave
> 
> Hi,
> 
> The module is finished and works as expected. It actually wasn't that
> much work. I'm hoping that someone is willing to have a quick look and
> check it for (obvious) errors. I put the sources here:
> 
> https://github.com/perrit/Mail-SpamAssassin-BayesStore-PgSQL-Slave
> 
> Maybe this functionality can be implemented directly in the
> BayesStore::SQL module someday, I think It is very usefull as it makes
> sure that there's no single point of failure while keeping a single
> consistent copy of the data. Anyway, let me know what you think. I'm
> willing to help port this feature into SpamAssassin's sources if people
> are interested.
> 
> Best regards,
> Jeroen Koekkoek
> 
> > -----Original Message-----
> > From: Michael Parker [mailto:parkerm@pobox.com]
> > Sent: Monday, February 27, 2012 6:25 PM
> > To: Jeroen Koekkoek
> > Cc: 'dev@spamassassin.apache.org'
> > Subject: Re: bayes module master-slave
> >
> >
> > On Feb 27, 2012, at 9:53 AM, Jeroen Koekkoek wrote:
> >
> > > Hi,
> > >
> > > I'm testing my module now, and I wanted to build in some kind of
> > timeout for reconnecting to the master database. But as far as I can
> > tell SpamAssassin creates a new connection per incoming message, that
> > would mean that creating the timeout functionality is useless. Is this
> > conclusion correct? Does SpamAssassin create a new connection per
> > incoming message?
> >
> > Yes, but there is a plugin you can use for persistent connection (it
> > hasn't been updated in awhile but I believe that it still works),
> check
> > the wiki page.
> >
> > Michael
> >
> > >
> > > Best regards,
> > > Jeroen
> > >
> > >> -----Original Message-----
> > >> From: Jeroen Koekkoek [mailto:j.koekkoek@perrit.nl]
> > >> Sent: Tuesday, February 14, 2012 8:38 AM
> > >> To: 'dev@spamassassin.apache.org'
> > >> Subject: bayes module master-slave
> > >>
> > >> Hi,
> > >>
> > >> I have a question regarding a BayesStore module I'm writing. First
> > let
> > >> me explain what I'm trying to accomplish.
> > >>
> > >> I want to build a setup of three servers, one master database, and
> > two
> > >> spam gateways. I want the bayes database to be replicated so that
> the
> > >> bayes check produces the same result on both servers (more might be
> > >> added in the future). To do this I want to use a master sql
> database
> > >> that replicates to both mail servers. The BayesStore module should
> do
> > >> all writes on the master, and all reads on the slave. Although this
> > >> doesn't necessarily improve overall performance, it does allow the
> > >> master to go down without the slaves being interrupted.
> > >>
> > >> My questions:
> > >> 1. Is this even a good idea?
> > >> 2. Splitting on read/write actions might not be a good idea. It
> might
> > be
> > >> better to read only tokens locally, and token expiration delta etc
> > from
> > >> the master so that we're always working with the most up to date
> > >> information?
> > >> 3. Other pointers?
> > >>
> > >> Best regards,
> > >> Jeroen Koekkoek


RE: bayes module master-slave

Posted by Jeroen Koekkoek <j....@perrit.nl>.
Hi,

The module is finished and works as expected. It actually wasn't that much work. I'm hoping that someone is willing to have a quick look and check it for (obvious) errors. I put the sources here:

https://github.com/perrit/Mail-SpamAssassin-BayesStore-PgSQL-Slave

Maybe this functionality can be implemented directly in the BayesStore::SQL module someday, I think It is very usefull as it makes sure that there's no single point of failure while keeping a single consistent copy of the data. Anyway, let me know what you think. I'm willing to help port this feature into SpamAssassin's sources if people are interested.

Best regards,
Jeroen Koekkoek

> -----Original Message-----
> From: Michael Parker [mailto:parkerm@pobox.com]
> Sent: Monday, February 27, 2012 6:25 PM
> To: Jeroen Koekkoek
> Cc: 'dev@spamassassin.apache.org'
> Subject: Re: bayes module master-slave
> 
> 
> On Feb 27, 2012, at 9:53 AM, Jeroen Koekkoek wrote:
> 
> > Hi,
> >
> > I'm testing my module now, and I wanted to build in some kind of
> timeout for reconnecting to the master database. But as far as I can
> tell SpamAssassin creates a new connection per incoming message, that
> would mean that creating the timeout functionality is useless. Is this
> conclusion correct? Does SpamAssassin create a new connection per
> incoming message?
> 
> Yes, but there is a plugin you can use for persistent connection (it
> hasn't been updated in awhile but I believe that it still works), check
> the wiki page.
> 
> Michael
> 
> >
> > Best regards,
> > Jeroen
> >
> >> -----Original Message-----
> >> From: Jeroen Koekkoek [mailto:j.koekkoek@perrit.nl]
> >> Sent: Tuesday, February 14, 2012 8:38 AM
> >> To: 'dev@spamassassin.apache.org'
> >> Subject: bayes module master-slave
> >>
> >> Hi,
> >>
> >> I have a question regarding a BayesStore module I'm writing. First
> let
> >> me explain what I'm trying to accomplish.
> >>
> >> I want to build a setup of three servers, one master database, and
> two
> >> spam gateways. I want the bayes database to be replicated so that the
> >> bayes check produces the same result on both servers (more might be
> >> added in the future). To do this I want to use a master sql database
> >> that replicates to both mail servers. The BayesStore module should do
> >> all writes on the master, and all reads on the slave. Although this
> >> doesn't necessarily improve overall performance, it does allow the
> >> master to go down without the slaves being interrupted.
> >>
> >> My questions:
> >> 1. Is this even a good idea?
> >> 2. Splitting on read/write actions might not be a good idea. It might
> be
> >> better to read only tokens locally, and token expiration delta etc
> from
> >> the master so that we're always working with the most up to date
> >> information?
> >> 3. Other pointers?
> >>
> >> Best regards,
> >> Jeroen Koekkoek


Re: bayes module master-slave

Posted by Michael Parker <pa...@pobox.com>.
On Feb 27, 2012, at 9:53 AM, Jeroen Koekkoek wrote:

> Hi,
> 
> I'm testing my module now, and I wanted to build in some kind of timeout for reconnecting to the master database. But as far as I can tell SpamAssassin creates a new connection per incoming message, that would mean that creating the timeout functionality is useless. Is this conclusion correct? Does SpamAssassin create a new connection per incoming message?

Yes, but there is a plugin you can use for persistent connection (it hasn't been updated in awhile but I believe that it still works), check the wiki page.

Michael

> 
> Best regards,
> Jeroen
> 
>> -----Original Message-----
>> From: Jeroen Koekkoek [mailto:j.koekkoek@perrit.nl]
>> Sent: Tuesday, February 14, 2012 8:38 AM
>> To: 'dev@spamassassin.apache.org'
>> Subject: bayes module master-slave
>> 
>> Hi,
>> 
>> I have a question regarding a BayesStore module I'm writing. First let
>> me explain what I'm trying to accomplish.
>> 
>> I want to build a setup of three servers, one master database, and two
>> spam gateways. I want the bayes database to be replicated so that the
>> bayes check produces the same result on both servers (more might be
>> added in the future). To do this I want to use a master sql database
>> that replicates to both mail servers. The BayesStore module should do
>> all writes on the master, and all reads on the slave. Although this
>> doesn't necessarily improve overall performance, it does allow the
>> master to go down without the slaves being interrupted.
>> 
>> My questions:
>> 1. Is this even a good idea?
>> 2. Splitting on read/write actions might not be a good idea. It might be
>> better to read only tokens locally, and token expiration delta etc from
>> the master so that we're always working with the most up to date
>> information?
>> 3. Other pointers?
>> 
>> Best regards,
>> Jeroen Koekkoek


RE: bayes module master-slave

Posted by Jeroen Koekkoek <j....@perrit.nl>.
Hi,

I'm testing my module now, and I wanted to build in some kind of timeout for reconnecting to the master database. But as far as I can tell SpamAssassin creates a new connection per incoming message, that would mean that creating the timeout functionality is useless. Is this conclusion correct? Does SpamAssassin create a new connection per incoming message?

Best regards,
Jeroen

> -----Original Message-----
> From: Jeroen Koekkoek [mailto:j.koekkoek@perrit.nl]
> Sent: Tuesday, February 14, 2012 8:38 AM
> To: 'dev@spamassassin.apache.org'
> Subject: bayes module master-slave
> 
> Hi,
> 
> I have a question regarding a BayesStore module I'm writing. First let
> me explain what I'm trying to accomplish.
> 
> I want to build a setup of three servers, one master database, and two
> spam gateways. I want the bayes database to be replicated so that the
> bayes check produces the same result on both servers (more might be
> added in the future). To do this I want to use a master sql database
> that replicates to both mail servers. The BayesStore module should do
> all writes on the master, and all reads on the slave. Although this
> doesn't necessarily improve overall performance, it does allow the
> master to go down without the slaves being interrupted.
> 
> My questions:
> 1. Is this even a good idea?
> 2. Splitting on read/write actions might not be a good idea. It might be
> better to read only tokens locally, and token expiration delta etc from
> the master so that we're always working with the most up to date
> information?
> 3. Other pointers?
> 
> Best regards,
> Jeroen Koekkoek