You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Randall Leeds <ra...@gmail.com> on 2010/12/18 22:46:16 UTC

Re: How fast do CouchDB propagate changes to other nodes?

On Sat, Dec 18, 2010 at 04:00, Robert Dionne
<di...@dionne-associates.com> wrote:
>
> On Dec 17, 2010, at 6:07 PM, Randall Leeds wrote:
>>
>> keeping cluster information and database metadata up to date around
>> the cluster, but that information tends to be small and changes
>> infrequently.
>>
>> However, to me this sounds like a lot of work for something that might
>> be better solved using technologies like zeromq, particularly if
>> logging all messages is optional.
>>
>> Anyway, I'm happy to talk about all of this further since I think it's
>> kind of fascinating. I've been thinking a lot recently about how flood
>
> I'm curious, is flood replication what the name implies? Broadcasting?
>

I'll throw this at dev@, too.

Yes, broadcasting.

I've been thinking about alternative checkpoint schemes that take the
source and destination host out of the equation and figure out some
other way to verify common history. I imagine it's going to have to
involve a hash tree.

With the ability to resolve common history without having *directly*
exchanged checkpoints, hosts could receive incremental update batches
from different hosts if the replication graph changes over time.

Anyway, it's just a little infant of a thought, but I think it's a
good one to have in our collective conscious.

Randall

Re: How fast do CouchDB propagate changes to other nodes?

Posted by Paul Davis <pa...@gmail.com>.
On Sat, Dec 18, 2010 at 8:23 PM, Randall Leeds <ra...@gmail.com> wrote:
> On Sat, Dec 18, 2010 at 16:41, Paul Davis <pa...@gmail.com> wrote:
>> Right, I probably jumped a couple steps there:
>>
>> The unique datums we have to work with here are the _id/_rev pairs.
>> Theoretically (if we ignore rev_stemming) the ordering with which
>> these come to us is roughly unimportant.
>>
>> So the issue with our history (append only) is that there's no real
>> way to order it such that we can efficiently seek through it to see
>> what we have in common (that I can think of). Ie, replication still
>> needs a way to say "I only need to send these bits". Right now its the
>> src/dst/seq triple that lets us zip through only new edits.
>>
>> Well, theoretically, we could keep a merkle tree of all edits we've
>> ever seen and go that way, but that'd require keeping a history of
>> every edit ever seen which could never be removed.
>>
>> Granted this is just quick thinking. I could definitely be missing
>> something clever.
>>
>
> We're on the same page. I don't have anything clever yet either.
> The only other thing that's crossed my mind is some way to exchange
> information about checkpoints each participant has with a third party.
> You'd have to somehow verify that the checkpoint being presented to
> you is actually one created by the third party, which involves trust
> or verification. I like the verification route because I'd still love
> to decouple the endpoint from its hostname, the idea that I was
> stabbing quite horribly at when I prematurely proposed a couple
> patches to give databases uuids. But back to the point, something like
> "you got a bunch of edits since last we spoke, but I got these edits
> from this other endpoint, are they the same ones?" Even then, I'm not
> sure how this works without the merkle.
>

Your last bit there was exactly my idea about the bloom filter. Just
populate the filter with hashes of the uniquely identifying bits of an
edit and then send the filter around.

Re: How fast do CouchDB propagate changes to other nodes?

Posted by Randall Leeds <ra...@gmail.com>.
On Sat, Dec 18, 2010 at 16:41, Paul Davis <pa...@gmail.com> wrote:
> Right, I probably jumped a couple steps there:
>
> The unique datums we have to work with here are the _id/_rev pairs.
> Theoretically (if we ignore rev_stemming) the ordering with which
> these come to us is roughly unimportant.
>
> So the issue with our history (append only) is that there's no real
> way to order it such that we can efficiently seek through it to see
> what we have in common (that I can think of). Ie, replication still
> needs a way to say "I only need to send these bits". Right now its the
> src/dst/seq triple that lets us zip through only new edits.
>
> Well, theoretically, we could keep a merkle tree of all edits we've
> ever seen and go that way, but that'd require keeping a history of
> every edit ever seen which could never be removed.
>
> Granted this is just quick thinking. I could definitely be missing
> something clever.
>

We're on the same page. I don't have anything clever yet either.
The only other thing that's crossed my mind is some way to exchange
information about checkpoints each participant has with a third party.
You'd have to somehow verify that the checkpoint being presented to
you is actually one created by the third party, which involves trust
or verification. I like the verification route because I'd still love
to decouple the endpoint from its hostname, the idea that I was
stabbing quite horribly at when I prematurely proposed a couple
patches to give databases uuids. But back to the point, something like
"you got a bunch of edits since last we spoke, but I got these edits
from this other endpoint, are they the same ones?" Even then, I'm not
sure how this works without the merkle.

Re: How fast do CouchDB propagate changes to other nodes?

Posted by Paul Davis <pa...@gmail.com>.
Right, I probably jumped a couple steps there:

The unique datums we have to work with here are the _id/_rev pairs.
Theoretically (if we ignore rev_stemming) the ordering with which
these come to us is roughly unimportant.

So the issue with our history (append only) is that there's no real
way to order it such that we can efficiently seek through it to see
what we have in common (that I can think of). Ie, replication still
needs a way to say "I only need to send these bits". Right now its the
src/dst/seq triple that lets us zip through only new edits.

Well, theoretically, we could keep a merkle tree of all edits we've
ever seen and go that way, but that'd require keeping a history of
every edit ever seen which could never be removed.

Granted this is just quick thinking. I could definitely be missing
something clever.

On Sat, Dec 18, 2010 at 7:31 PM, Randall Leeds <ra...@gmail.com> wrote:
> On Sat, Dec 18, 2010 at 16:14, Paul Davis <pa...@gmail.com> wrote:
>> On Sat, Dec 18, 2010 at 4:46 PM, Randall Leeds <ra...@gmail.com> wrote:
>>> On Sat, Dec 18, 2010 at 04:00, Robert Dionne
>>> <di...@dionne-associates.com> wrote:
>>>>
>>>> On Dec 17, 2010, at 6:07 PM, Randall Leeds wrote:
>>>>>
>>>>> keeping cluster information and database metadata up to date around
>>>>> the cluster, but that information tends to be small and changes
>>>>> infrequently.
>>>>>
>>>>> However, to me this sounds like a lot of work for something that might
>>>>> be better solved using technologies like zeromq, particularly if
>>>>> logging all messages is optional.
>>>>>
>>>>> Anyway, I'm happy to talk about all of this further since I think it's
>>>>> kind of fascinating. I've been thinking a lot recently about how flood
>>>>
>>>> I'm curious, is flood replication what the name implies? Broadcasting?
>>>>
>>>
>>> I'll throw this at dev@, too.
>>>
>>> Yes, broadcasting.
>>>
>>> I've been thinking about alternative checkpoint schemes that take the
>>> source and destination host out of the equation and figure out some
>>> other way to verify common history. I imagine it's going to have to
>>> involve a hash tree.
>>>
>>> With the ability to resolve common history without having *directly*
>>> exchanged checkpoints, hosts could receive incremental update batches
>>> from different hosts if the replication graph changes over time.
>>>
>>> Anyway, it's just a little infant of a thought, but I think it's a
>>> good one to have in our collective conscious.
>>>
>>> Randall
>>>
>>
>> Random off the top of my head response:
>>
>> I don't see anything immediately following from what you describe.
>> Even if you had a way of saying "I already have this revision" there's
>> no real way to figure out where to start once you get rid of the
>> src/dst/seq triplet (that I can think of).
>>
>> Though an interesting observation is that replication never really
>> delete's anything in a history. As a quick optimization that could
>> lead to where you're wanting to go, you may check out storing a bloom
>> filter for the database that stores a hash of the docid/rev pair for
>> all incoming edits. Then the replicator could use that to speedup
>> replication when its already got edits from the source db.
>>
>> Assuming you update that filter in real time and can update in
>> progress replications, you should be able to get interesting patterns
>> of edits moving through a cluster.
>>
>> Or something to that effect.
>>
>> Paul
>>
>
> Maybe I wasn't clear. There may be a place for bloom filter here, but
> I was thinking something along the lines of "Hey, we've both have
> history up to this point that's common, even if we didn't receive
> those edits from the same place." If you imagine we had a hash tree of
> every edit you could maybe do some back and forth bisection and
> compare what your histories look like to find a common ancestor.
>
> Anyway, the problem is definitely hard, but I'm glad to talk about it whenever.
>

Re: How fast do CouchDB propagate changes to other nodes?

Posted by Randall Leeds <ra...@gmail.com>.
On Sat, Dec 18, 2010 at 16:14, Paul Davis <pa...@gmail.com> wrote:
> On Sat, Dec 18, 2010 at 4:46 PM, Randall Leeds <ra...@gmail.com> wrote:
>> On Sat, Dec 18, 2010 at 04:00, Robert Dionne
>> <di...@dionne-associates.com> wrote:
>>>
>>> On Dec 17, 2010, at 6:07 PM, Randall Leeds wrote:
>>>>
>>>> keeping cluster information and database metadata up to date around
>>>> the cluster, but that information tends to be small and changes
>>>> infrequently.
>>>>
>>>> However, to me this sounds like a lot of work for something that might
>>>> be better solved using technologies like zeromq, particularly if
>>>> logging all messages is optional.
>>>>
>>>> Anyway, I'm happy to talk about all of this further since I think it's
>>>> kind of fascinating. I've been thinking a lot recently about how flood
>>>
>>> I'm curious, is flood replication what the name implies? Broadcasting?
>>>
>>
>> I'll throw this at dev@, too.
>>
>> Yes, broadcasting.
>>
>> I've been thinking about alternative checkpoint schemes that take the
>> source and destination host out of the equation and figure out some
>> other way to verify common history. I imagine it's going to have to
>> involve a hash tree.
>>
>> With the ability to resolve common history without having *directly*
>> exchanged checkpoints, hosts could receive incremental update batches
>> from different hosts if the replication graph changes over time.
>>
>> Anyway, it's just a little infant of a thought, but I think it's a
>> good one to have in our collective conscious.
>>
>> Randall
>>
>
> Random off the top of my head response:
>
> I don't see anything immediately following from what you describe.
> Even if you had a way of saying "I already have this revision" there's
> no real way to figure out where to start once you get rid of the
> src/dst/seq triplet (that I can think of).
>
> Though an interesting observation is that replication never really
> delete's anything in a history. As a quick optimization that could
> lead to where you're wanting to go, you may check out storing a bloom
> filter for the database that stores a hash of the docid/rev pair for
> all incoming edits. Then the replicator could use that to speedup
> replication when its already got edits from the source db.
>
> Assuming you update that filter in real time and can update in
> progress replications, you should be able to get interesting patterns
> of edits moving through a cluster.
>
> Or something to that effect.
>
> Paul
>

Maybe I wasn't clear. There may be a place for bloom filter here, but
I was thinking something along the lines of "Hey, we've both have
history up to this point that's common, even if we didn't receive
those edits from the same place." If you imagine we had a hash tree of
every edit you could maybe do some back and forth bisection and
compare what your histories look like to find a common ancestor.

Anyway, the problem is definitely hard, but I'm glad to talk about it whenever.

Re: How fast do CouchDB propagate changes to other nodes?

Posted by Paul Davis <pa...@gmail.com>.
On Sat, Dec 18, 2010 at 4:46 PM, Randall Leeds <ra...@gmail.com> wrote:
> On Sat, Dec 18, 2010 at 04:00, Robert Dionne
> <di...@dionne-associates.com> wrote:
>>
>> On Dec 17, 2010, at 6:07 PM, Randall Leeds wrote:
>>>
>>> keeping cluster information and database metadata up to date around
>>> the cluster, but that information tends to be small and changes
>>> infrequently.
>>>
>>> However, to me this sounds like a lot of work for something that might
>>> be better solved using technologies like zeromq, particularly if
>>> logging all messages is optional.
>>>
>>> Anyway, I'm happy to talk about all of this further since I think it's
>>> kind of fascinating. I've been thinking a lot recently about how flood
>>
>> I'm curious, is flood replication what the name implies? Broadcasting?
>>
>
> I'll throw this at dev@, too.
>
> Yes, broadcasting.
>
> I've been thinking about alternative checkpoint schemes that take the
> source and destination host out of the equation and figure out some
> other way to verify common history. I imagine it's going to have to
> involve a hash tree.
>
> With the ability to resolve common history without having *directly*
> exchanged checkpoints, hosts could receive incremental update batches
> from different hosts if the replication graph changes over time.
>
> Anyway, it's just a little infant of a thought, but I think it's a
> good one to have in our collective conscious.
>
> Randall
>

Random off the top of my head response:

I don't see anything immediately following from what you describe.
Even if you had a way of saying "I already have this revision" there's
no real way to figure out where to start once you get rid of the
src/dst/seq triplet (that I can think of).

Though an interesting observation is that replication never really
delete's anything in a history. As a quick optimization that could
lead to where you're wanting to go, you may check out storing a bloom
filter for the database that stores a hash of the docid/rev pair for
all incoming edits. Then the replicator could use that to speedup
replication when its already got edits from the source db.

Assuming you update that filter in real time and can update in
progress replications, you should be able to get interesting patterns
of edits moving through a cluster.

Or something to that effect.

Paul