You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Mike Fedyk <mf...@mikefedyk.com> on 2010/11/15 02:01:20 UTC

How to keep from sending more than one email from multiple replicated couchdb instances

node.js + CouchDB == Crazy Delicious by Mikeal Rogers
http://jsconf.eu/2010/speaker/nodejs_couchdb_crazy_delicious.html

I was watching this a couple days ago and I've been thinking about how
to deal with instance and service (think of sending emails as a
"service") failures.  Because it's easy to make sure that only one
email is sent if you only have one server sending emails, but if that
machine fails, then no emails get sent out.

You compose an email while offline and save it to your local couch
instance.  Then later it gets replicated to one of the couchdb
instances in your cloud.  And then:

1. You have the date when it was saved on the phone, etc.  If you had
a timestamp when that replication happened, you'd be able to have a
chain of couchdb instances try to send the email, but only if it is
older than X time after it was replicated to your cloud of couchdb
instances.  instance_a would try immediately, instance_b tries if it
hasn't been taken in X minutes, and so on for instance_c.  see [A].

2. When instance_a wants to send the email, it updates the state to
"taking" and then waits for instance_b and instance_c to ack the
taking by adding fields to the current document.  oops, instance_b and
instance_c will race more often than not and you'll get a conflict so
it needs to be separate temporary state tracking documents.  You still
need [A] or if there are no other instances you'll wait forever for
acks that won't happen.

3. You have one instance that sends emails and you deal with the
downtime if that instance fails or some other failure happens that
prevents email from being sent.

4. You send periodic test emails to make sure they are being sent, and
if they are not then take over the function on instance_$self.  see
[B]

A) And this only works assuming that all of your cloud couchdb
instances are replicating to each other correctly at the moment.  Now
you have N > 1 emails sent out.  (and imagine if what's happening is
something where it's more important than receiving an email or
receiving more than one email)  To keep this from happening you need a
couchdb instance heartbeat (maybe have an app update a document that
describes that instances "registration" in the system with the current
time stamp every 60 seconds) and a STONITH system to kill any
instances of couchdb that stop updating their document.

B) Do you still need [A]?  maybe it's good enough that the email
didn't get back to you, but maybe it is sending emails to other
places.  so it seems [A] is still needed.  Now you also need a service
registration system (make sure this and other services like it are
only running on one instance).

So these are some of the ideas that I'm coming up with on this issue.
I'm looking for more input.  What would you do?

Re: How to keep from sending more than one email from multiple replicated couchdb instances

Posted by Mike Fedyk <mf...@mikefedyk.com>.
On Sun, Nov 14, 2010 at 5:01 PM, Mike Fedyk <mf...@mikefedyk.com> wrote:
> node.js + CouchDB == Crazy Delicious by Mikeal Rogers
> http://jsconf.eu/2010/speaker/nodejs_couchdb_crazy_delicious.html
>
> I was watching this a couple days ago and I've been thinking about how
> to deal with instance and service (think of sending emails as a
> "service") failures.  Because it's easy to make sure that only one
> email is sent if you only have one server sending emails, but if that
> machine fails, then no emails get sent out.
>
> You compose an email while offline and save it to your local couch
> instance.  Then later it gets replicated to one of the couchdb
> instances in your cloud.  And then:
>
> 1. You have the date when it was saved on the phone, etc.  If you had
> a timestamp when that replication happened, you'd be able to have a
> chain of couchdb instances try to send the email, but only if it is
> older than X time after it was replicated to your cloud of couchdb
> instances.  instance_a would try immediately, instance_b tries if it
> hasn't been taken in X minutes, and so on for instance_c.  see [A].
>
> 2. When instance_a wants to send the email, it updates the state to
> "taking" and then waits for instance_b and instance_c to ack the
> taking by adding fields to the current document.  oops, instance_b and
> instance_c will race more often than not and you'll get a conflict so
> it needs to be separate temporary state tracking documents.  You still
> need [A] or if there are no other instances you'll wait forever for
> acks that won't happen.
>
> 3. You have one instance that sends emails and you deal with the
> downtime if that instance fails or some other failure happens that
> prevents email from being sent.
>
> 4. You send periodic test emails to make sure they are being sent, and
> if they are not then take over the function on instance_$self.  see
> [B]
>

Or... (I just thought of this idea)

5. When you write the update to change the state machine status from
NEW to TAKING (as well as a field with your instance id), you write to
any other couchdb instance except for $self.  Then when the write
replicates to you and the instance id matches $self, you send the
email.

C) This way you naturally test the instance you write to, and no other
instance will race with you to send the email. You can either keep a
list of the other instances and use them round-robin, or possibly use
DNS RR to do it for you, you just need to depend on the quality of the
DNS resolver.  With this you should be able to do away with [A] and
[B].

What do you think?

> A) And this only works assuming that all of your cloud couchdb
> instances are replicating to each other correctly at the moment.  Now
> you have N > 1 emails sent out.  (and imagine if what's happening is
> something where it's more important than receiving an email or
> receiving more than one email)  To keep this from happening you need a
> couchdb instance heartbeat (maybe have an app update a document that
> describes that instances "registration" in the system with the current
> time stamp every 60 seconds) and a STONITH system to kill any
> instances of couchdb that stop updating their document.
>
> B) Do you still need [A]?  maybe it's good enough that the email
> didn't get back to you, but maybe it is sending emails to other
> places.  so it seems [A] is still needed.  Now you also need a service
> registration system (make sure this and other services like it are
> only running on one instance).
>
> So these are some of the ideas that I'm coming up with on this issue.
> I'm looking for more input.  What would you do?
>

Re: How to keep from sending more than one email from multiple replicated couchdb instances

Posted by Wout Mertens <wo...@gmail.com>.
I think you need to decouple the database from the replication. Replication management is not a first-class citizen in CouchDB (yet?) and the problems you present show that.

Basically what you're looking at is a message board service, where clients post requests ("send this email") and servers take requests and execute them. If you add a board monitor to the mix, that one can be responsible for putting taken requests back on the board if the server that took it isn't responding.

The CouchDB servers would host this message board database and a replication monitor makes sure that all servers are up to date.

The monitors can be made resilient by having multiple, that communicate with heartbeats. There is only one monitor master that does the rescheduling, warning etc and the others stand by until it stops responding.

How does this model sound?

Note that the requests put on the board should be "transactional", in that the have to be retry-able if their server fails. If need be, the request can probably be split up in smaller parts but then you need an extra monitor that follows a recipe and posts these parts in execution order.

Wout.

On 15 Nov 2010, at 02:01, Mike Fedyk <mf...@mikefedyk.com> wrote:

> node.js + CouchDB == Crazy Delicious by Mikeal Rogers
> http://jsconf.eu/2010/speaker/nodejs_couchdb_crazy_delicious.html
> 
> I was watching this a couple days ago and I've been thinking about how
> to deal with instance and service (think of sending emails as a
> "service") failures.  Because it's easy to make sure that only one
> email is sent if you only have one server sending emails, but if that
> machine fails, then no emails get sent out.
> 
> You compose an email while offline and save it to your local couch
> instance.  Then later it gets replicated to one of the couchdb
> instances in your cloud.  And then:
> 
> 1. You have the date when it was saved on the phone, etc.  If you had
> a timestamp when that replication happened, you'd be able to have a
> chain of couchdb instances try to send the email, but only if it is
> older than X time after it was replicated to your cloud of couchdb
> instances.  instance_a would try immediately, instance_b tries if it
> hasn't been taken in X minutes, and so on for instance_c.  see [A].
> 
> 2. When instance_a wants to send the email, it updates the state to
> "taking" and then waits for instance_b and instance_c to ack the
> taking by adding fields to the current document.  oops, instance_b and
> instance_c will race more often than not and you'll get a conflict so
> it needs to be separate temporary state tracking documents.  You still
> need [A] or if there are no other instances you'll wait forever for
> acks that won't happen.
> 
> 3. You have one instance that sends emails and you deal with the
> downtime if that instance fails or some other failure happens that
> prevents email from being sent.
> 
> 4. You send periodic test emails to make sure they are being sent, and
> if they are not then take over the function on instance_$self.  see
> [B]
> 
> A) And this only works assuming that all of your cloud couchdb
> instances are replicating to each other correctly at the moment.  Now
> you have N > 1 emails sent out.  (and imagine if what's happening is
> something where it's more important than receiving an email or
> receiving more than one email)  To keep this from happening you need a
> couchdb instance heartbeat (maybe have an app update a document that
> describes that instances "registration" in the system with the current
> time stamp every 60 seconds) and a STONITH system to kill any
> instances of couchdb that stop updating their document.
> 
> B) Do you still need [A]?  maybe it's good enough that the email
> didn't get back to you, but maybe it is sending emails to other
> places.  so it seems [A] is still needed.  Now you also need a service
> registration system (make sure this and other services like it are
> only running on one instance).
> 
> So these are some of the ideas that I'm coming up with on this issue.
> I'm looking for more input.  What would you do?