You are viewing a plain text version of this content. The canonical link for it is here.
Posted to replication@couchdb.apache.org by Chris Anderson <jc...@couchbase.com> on 2014/10/17 13:38:10 UTC

rev hash stability

Wouldn't it be cool if we all generated interoperable revision hashes?

I can't point any fingers because as far as I can tell, Couchbase is
promulgating at least 3 different revision hash generation schemes.

I've filed an issue for us here:
https://github.com/couchbase/mobile/issues/3

Part of the problem is CouchDB's use of term_to_binary:
https://github.com/apache/couchdb-couch/blob/d28af185295d4618b489c050bcc71407e89891f1/src/couch_db.erl#L820

I've seen this discussed informally, but I don't know if anyone has a
tractable plan to get us there.

Cheers,
Chris

-- 
Chris Anderson  @jchris
http://www.couchbase.com

Re: rev hash stability

Posted by Dale Harvey <da...@arandomurl.com>.
I think 2 users making the same change to a document is a perfectly good
time to generate a conflict, however

" * Some actual conflict is created in a doc, then multiple client apps
detect that they can automatically resolve it and do so, each creating
their own identical merged revisions that then all propagate."

is a fairly convincing reason, although the likely hood of that happening
seems fairly low, when it does happen it can potentially be a neverending
race.

The only other reason that Alexander somewhat referenced is that it makes
it easier to have copy and pasteable demos / instructions / guides etc

Fairly sure everyone is on the same page that if a client does generate
random revs the worst thing that can happen is potentionally a few extra
conflicts and it can still be interoperable.

So for that yeh if we take out the binary_to_term and pick a rev generation
algorithm that is going to be fairly light for us to implement in the
browser (taking into account attachments) then we would likely try to
implement it as well

On 17 October 2014 22:50, Jan Lehnardt <ja...@apache.org> wrote:

> I’m with Chris on this one. The replication protocol should define a
> portable way of creating deterministic rev ids while leaving room for
> random or other schemes where applicable.
>
> On 17 Oct 2014, at 22:47 , Chris Anderson <jc...@couchbase.com> wrote:
>
> > I would never suggest that a random rev or other style rev shouldn't be
> > functional/expected. It's just that if you do want to generate the same
> > revs as somebody else right now, it's hard. Making it less hard it would
> be
> > good for everyone.
> >
> > Chris
> >
> > On Friday, October 17, 2014, Brian Mitchell <br...@standardanalytics.io>
> > wrote:
> >
> >>
> >>> On Oct 17, 2014, at 3:41 PM, Jens Alfke <jens@couchbase.com
> >> <javascript:;>> wrote:
> >>>
> >>>
> >>>> On Oct 17, 2014, at 12:15 PM, Brian Mitchell <
> >> brian@standardanalytics.io <javascript:;>> wrote:
> >>>>
> >>>> Simply put: if and only if the revs match we should assume some
> >> optimism just like we
> >>>> do with things like atts_since. There’s already a lot of trust between
> >> two nodes for replication
> >>>> and we should assume that matching revs were either unique (or random)
> >> or based on some
> >>>> deterministic property that isn’t likely to collide unless it was an
> >> equivalent operation.
> >>>
> >>> I'm sorry, I've read this a few times and I can't figure out exactly
> >> what your meaning is. Could you elaborate? Particularly, what does "if
> the
> >> revs match" mean, exactly?
> >>>
> >>> Also, I don't think your statement "there’s already a lot of trust
> >> between two nodes for replication" is accurate in all cases. You seem
> to be
> >> thinking of a server cluster (a la BigCouch) but CouchDB-style
> replication
> >> is often used in a more distributed way. Both PouchDB and Couchbase Lite
> >> use replication between servers and clients. A client can be trusted to
> be
> >> acting on behalf of a user, but not beyond that.
> >>>
> >>> —Jens
> >>
> >> No problem. I probably kept the message too short.
> >>
> >> The issue is that requiring revs to match is a bit assuming about the
> >> context
> >> different implementations are designed to operate in. The case of the
> >> optimization
> >> makes a lot of sense in some cases (clustering for availability being
> the
> >> most
> >> obvious).
> >>
> >> This implies there is a contract to how any implementation should treat
> >> revisions:
> >>
> >> 1. Any revs that match between two documents should be assumed to be the
> >> same
> >> revision of the document. This is important outside of optimization
> (N-way
> >> replications
> >> for example).
> >>
> >> 2. Each implementation must be trusted to generate unique revisions.
> >>
> >> 3. Optionally: revisions can be generated deterministically to allow
> >> idempotent
> >> operations. This is really important for clusters (non-optional in
> >> practice) but
> >> has very little important for PouchDB.
> >>
> >> I’d urge implementations to document what guarantees their revs have but
> >> I would stop short in exposing the implementation (like the digest used
> or
> >> RNG function) as that is out of scope for the _rev contract for
> compatible
> >> implementations.
> >>
> >> There are many reasons to settle at this level of detail, backwards
> >> compatibility
> >> being the most important. The other is that it could allow other sorts
> of
> >> rev
> >> encoding in the future for some implementations (cheaper tree merges
> being
> >> one thing worth revisiting).
> >>
> >> So PouchDB should generate revs that make sense for PouchDB’s
> >> implementation.
> >> The contract of how these revs are interpreted shouldn’t constrain it to
> >> implementing
> >> the same JSON normalization and digest that others do. Same goes for
> other
> >> Couch’s.
> >>
> >> Brian.
> >>
> >>
> >
> > --
> > Chris Anderson  @jchris
> > http://www.couchbase.com
>
>

Re: rev hash stability

Posted by Jan Lehnardt <ja...@apache.org>.
I’m with Chris on this one. The replication protocol should define a portable way of creating deterministic rev ids while leaving room for random or other schemes where applicable.

On 17 Oct 2014, at 22:47 , Chris Anderson <jc...@couchbase.com> wrote:

> I would never suggest that a random rev or other style rev shouldn't be
> functional/expected. It's just that if you do want to generate the same
> revs as somebody else right now, it's hard. Making it less hard it would be
> good for everyone.
> 
> Chris
> 
> On Friday, October 17, 2014, Brian Mitchell <br...@standardanalytics.io>
> wrote:
> 
>> 
>>> On Oct 17, 2014, at 3:41 PM, Jens Alfke <jens@couchbase.com
>> <javascript:;>> wrote:
>>> 
>>> 
>>>> On Oct 17, 2014, at 12:15 PM, Brian Mitchell <
>> brian@standardanalytics.io <javascript:;>> wrote:
>>>> 
>>>> Simply put: if and only if the revs match we should assume some
>> optimism just like we
>>>> do with things like atts_since. There’s already a lot of trust between
>> two nodes for replication
>>>> and we should assume that matching revs were either unique (or random)
>> or based on some
>>>> deterministic property that isn’t likely to collide unless it was an
>> equivalent operation.
>>> 
>>> I'm sorry, I've read this a few times and I can't figure out exactly
>> what your meaning is. Could you elaborate? Particularly, what does "if the
>> revs match" mean, exactly?
>>> 
>>> Also, I don't think your statement "there’s already a lot of trust
>> between two nodes for replication" is accurate in all cases. You seem to be
>> thinking of a server cluster (a la BigCouch) but CouchDB-style replication
>> is often used in a more distributed way. Both PouchDB and Couchbase Lite
>> use replication between servers and clients. A client can be trusted to be
>> acting on behalf of a user, but not beyond that.
>>> 
>>> —Jens
>> 
>> No problem. I probably kept the message too short.
>> 
>> The issue is that requiring revs to match is a bit assuming about the
>> context
>> different implementations are designed to operate in. The case of the
>> optimization
>> makes a lot of sense in some cases (clustering for availability being the
>> most
>> obvious).
>> 
>> This implies there is a contract to how any implementation should treat
>> revisions:
>> 
>> 1. Any revs that match between two documents should be assumed to be the
>> same
>> revision of the document. This is important outside of optimization (N-way
>> replications
>> for example).
>> 
>> 2. Each implementation must be trusted to generate unique revisions.
>> 
>> 3. Optionally: revisions can be generated deterministically to allow
>> idempotent
>> operations. This is really important for clusters (non-optional in
>> practice) but
>> has very little important for PouchDB.
>> 
>> I’d urge implementations to document what guarantees their revs have but
>> I would stop short in exposing the implementation (like the digest used or
>> RNG function) as that is out of scope for the _rev contract for compatible
>> implementations.
>> 
>> There are many reasons to settle at this level of detail, backwards
>> compatibility
>> being the most important. The other is that it could allow other sorts of
>> rev
>> encoding in the future for some implementations (cheaper tree merges being
>> one thing worth revisiting).
>> 
>> So PouchDB should generate revs that make sense for PouchDB’s
>> implementation.
>> The contract of how these revs are interpreted shouldn’t constrain it to
>> implementing
>> the same JSON normalization and digest that others do. Same goes for other
>> Couch’s.
>> 
>> Brian.
>> 
>> 
> 
> -- 
> Chris Anderson  @jchris
> http://www.couchbase.com


Re: rev hash stability

Posted by Jan Lehnardt <ja...@apache.org>.
> On 19 Oct 2014, at 20:45 , Brian Mitchell <br...@standardanalytics.io> wrote:
> 
> 
>> On Oct 19, 2014, at 2:22 PM, Jan Lehnardt <ja...@apache.org> wrote:
>> 
>> 
>>> On 19 Oct 2014, at 20:15 , Brian Mitchell <br...@standardanalytics.io> wrote:
>>> 
>>> 
>>>> On Oct 19, 2014, at 1:49 PM, Jan Lehnardt <ja...@apache.org> wrote:
>>>> 
>>>> 
>>>>> On 18 Oct 2014, at 01:17 , Jens Alfke <je...@couchbase.com> wrote:
>>>>> 
>>>>> 
>>>>>> On Oct 17, 2014, at 2:22 PM, Brian Mitchell <br...@standardanalytics.io> wrote:
>>>>>> 
>>>>>> Giving revs meaning outside of this scope is likely to bring up more meta
>>>>>> discussion about the CouchDB data model and a long history of
>>>>>> undocumented choices which only manifest in the particular
>>>>>> implementation we have today.
>>>>> 
>>>>> That does appear to be a danger. I'm not interested in bike-shedding; if the Apache CouchDB community can't make progress on this issue then we can discuss it elsewhere to come up with solutions. I can't speak for Chris, but I'm here as a courtesy and because I believe interoperability is important. But I believe making progress is more important.
>>>> 
>>>> +1000. I think so far we’ve had a brief chatter about this and we are ready to move on.
>>>> 
>>>> How does moving this to a strawperson proposal sound? E.g. have a ticket, or pad, or gist somewhere where we can hammer out the details of this and what the various trade-offs of open decisions are?
>>>> 
>>>> JIRA obviously preferred, but happy to start this elsewhere if it provides less friction.
>>> 
>>> My primary point is that interoperation does *not* require the rev hashes be done the same. Clustering does but I can’t see why we’d encourage people to write the same thing to two slightly different systems simultaneously. Doing that, I can guarantee that rev problems will not be the only thing to fix.
>>> 
>>> If we want to define rev interoperation in terms of the minimal and the stronger case, that might work just fine but defining interoperation as the latter is excludes a variety of strategies that implementations can have and will likely mean different versions of CouchDB don’t “interoperate” under this very definition, which is simply not a useful way to describe the situation.
>> 
>> I can’t parse this, can you rephrase? :)
> 
> I’m basically saying that they don’t need to be generated the same way to be defined as interoperable. There are a few invariants required and a specific digest algorithm isn’t one of them. Creating a bogus rev 1-abcfoobaz using new_edits=false shows exactly how this works. The foundation for interoperation should only assume some definition of “match” which I mean, intuitively, that 1-abcfoobaz = 1-abcfoobaz, 2-abcfoobaz /= 1-abcfoobaz, 1-xyz /= 1-abc.
> 
> The need for a stronger set of rules is specific to how the implementation is *intended* on being used. In an eventually consistent cluster, it’s quite useful to have idempotents to repair via replication or to even duplicate writes to redundant nodes which replicate between one another. I don’t see a problem with defining rules to make this work well but it’s a very specific and demanding kind of interoperability.
> 
> Of course, revs matching are not going to solve cluster coherence between implementations on their own. For example, the abstraction still leaks in the multi-node replication case if there is replication lag (quite easily achieved, at least with how things work now). One can’t simply just write to two places and hope that my “idempotent operation” works. It’s a huge assumption of what was written prior to that and it relies on minimal knowledge being replication. It’s just a bad practice to assume that two distributed systems will always have the same view of things in relation to a third client. Clustering modes go through quite a bit of work to make it usable but it’s certainly far from automatic and not something that I’d put on the table for the definition of general interoperation. [1]
> 
> Thus a middle ground might be allowing two levels of interoperation to be defined. I still don’t see the value in focusing on this specific case. It’s my opinion that if there is something that breaks between vendors because of this, there are likely other assumptions to visit far before this one. I could be wrong as I don’t know what others are planning on doing.

Thanks for elaborating! I think I still don’t fully get your point, and that without examples, I’ll be lost, but don’t worry too much, there are smart enough people on this list to move the discussion forward.

> 
>>> Finally, if we really want to define a stable digest, I’d suggest that a reference implementation be created and proposed rather than forced upon the implementations before it materializes. This could possibly be made an option in the CouchDB configuration or build allowing it to be an experimental feature.
>> 
>> Hence my strawperson proposal that we can work on. I envision all implementors getting a say in what works for them and what doesn’t and that we find a consensus and a solution that we can roll this out harmlessly.
> 
> I agree but there seems to be a dismissal of the idea that we don’t need this rather than it really being a matter of just finding the right implementation that fits every useless. [2]

I don’t think there is a dismissal :) — I think we heard from everyone the broad sketches of why people want to come to an agreement and we’ve heard what pitfalls we need to be looking out for. There will be more pitfalls as we go along, but that’s expected.


> Brian.
> 
> [1]: I also alluded to the 409 issue in another email which shows the growing problem of how the old revision system isn’t well designed for anything but single node systems. I’d vote to remove this in 3.x since conflicts on write mean nothing in an eventually consistent system and the 409 actually makes it harder to test code in this case. It’s just trivial to poke holes in the setup and I don’t see how revs can possibly be the wall people actually hit.

Dale has brought this up before, I think there is a rough consensus on making 409 returns optional, e.g. the client can treat a couch as a single node, if it wants to, but only actually gets guarantees *if* it is in fact a single node. 3.x and beyond is a good time frame for that, IMHO.

> [2]: I think there is a better need for revision control that applications can leverage more significantly. There’s a long history of, rightly, discouraging people of using the MVCC implementation for application concerns, but that’s a limitation of the API, not of the idea. I could easily see revs being a richer entity in some systems, which makes this whole digest thing seem so specific and low level, that we’re really just locking ourselves in rather than opening the protocol up. It depends on where one might want to go, I guess.

I’m all for exploring this as well, but I don’t think we are painting ourselves into a corner here. What I roughly envision is that we agree on a baseline rev handling that all implementations must adhere to (the random rev case as today, possibly with refinements), we can also devise deterministic revs (like CouchDB today, but interoperable across all protocol implementations), and maybe even deterministic revs with stronger properties (like sha256 instead of md5) and so on. As long as the base property is handled correctly (any rev will do, identical revs mean same content after the same history), it doesn’t matter whether the rev carries more meaning on top of that. I’m not sure what implications of using the MVCC model for actual version control has on revision ids, but in the above world, they would just be another specialisation, if that is needed at all.

Obviously, this might all fall apart because I haven’t thought this through 100% yet, but it *seems* straightforward to me.

If the only thing that comes out of this is retaining today’s system, just change the calculations of the revs so they don’t rely on Erlang internals, that’d be good enough as per the initial feature request and would leave us to go further later on, so we don’t have to boil the ocean.

Moving CouchDB proper to new rev schemes would require a major version bump (IMHO), but we are already a lot less averse to that as we have been in the past.

In conclusion: does anyone volunteer to draw up the current state and proposed changes and their respective pros and cons for all to comment on?

Best
Jan
--






Re: rev hash stability

Posted by Brian Mitchell <br...@standardanalytics.io>.
> On Oct 19, 2014, at 2:22 PM, Jan Lehnardt <ja...@apache.org> wrote:
> 
> 
>> On 19 Oct 2014, at 20:15 , Brian Mitchell <br...@standardanalytics.io> wrote:
>> 
>> 
>>> On Oct 19, 2014, at 1:49 PM, Jan Lehnardt <ja...@apache.org> wrote:
>>> 
>>> 
>>>> On 18 Oct 2014, at 01:17 , Jens Alfke <je...@couchbase.com> wrote:
>>>> 
>>>> 
>>>>> On Oct 17, 2014, at 2:22 PM, Brian Mitchell <br...@standardanalytics.io> wrote:
>>>>> 
>>>>> Giving revs meaning outside of this scope is likely to bring up more meta
>>>>> discussion about the CouchDB data model and a long history of
>>>>> undocumented choices which only manifest in the particular
>>>>> implementation we have today.
>>>> 
>>>> That does appear to be a danger. I'm not interested in bike-shedding; if the Apache CouchDB community can't make progress on this issue then we can discuss it elsewhere to come up with solutions. I can't speak for Chris, but I'm here as a courtesy and because I believe interoperability is important. But I believe making progress is more important.
>>> 
>>> +1000. I think so far we’ve had a brief chatter about this and we are ready to move on.
>>> 
>>> How does moving this to a strawperson proposal sound? E.g. have a ticket, or pad, or gist somewhere where we can hammer out the details of this and what the various trade-offs of open decisions are?
>>> 
>>> JIRA obviously preferred, but happy to start this elsewhere if it provides less friction.
>> 
>> My primary point is that interoperation does *not* require the rev hashes be done the same. Clustering does but I can’t see why we’d encourage people to write the same thing to two slightly different systems simultaneously. Doing that, I can guarantee that rev problems will not be the only thing to fix.
>> 
>> If we want to define rev interoperation in terms of the minimal and the stronger case, that might work just fine but defining interoperation as the latter is excludes a variety of strategies that implementations can have and will likely mean different versions of CouchDB don’t “interoperate” under this very definition, which is simply not a useful way to describe the situation.
> 
> I can’t parse this, can you rephrase? :)

I’m basically saying that they don’t need to be generated the same way to be defined as interoperable. There are a few invariants required and a specific digest algorithm isn’t one of them. Creating a bogus rev 1-abcfoobaz using new_edits=false shows exactly how this works. The foundation for interoperation should only assume some definition of “match” which I mean, intuitively, that 1-abcfoobaz = 1-abcfoobaz, 2-abcfoobaz /= 1-abcfoobaz, 1-xyz /= 1-abc.

The need for a stronger set of rules is specific to how the implementation is *intended* on being used. In an eventually consistent cluster, it’s quite useful to have idempotents to repair via replication or to even duplicate writes to redundant nodes which replicate between one another. I don’t see a problem with defining rules to make this work well but it’s a very specific and demanding kind of interoperability.

Of course, revs matching are not going to solve cluster coherence between implementations on their own. For example, the abstraction still leaks in the multi-node replication case if there is replication lag (quite easily achieved, at least with how things work now). One can’t simply just write to two places and hope that my “idempotent operation” works. It’s a huge assumption of what was written prior to that and it relies on minimal knowledge being replication. It’s just a bad practice to assume that two distributed systems will always have the same view of things in relation to a third client. Clustering modes go through quite a bit of work to make it usable but it’s certainly far from automatic and not something that I’d put on the table for the definition of general interoperation. [1]

Thus a middle ground might be allowing two levels of interoperation to be defined. I still don’t see the value in focusing on this specific case. It’s my opinion that if there is something that breaks between vendors because of this, there are likely other assumptions to visit far before this one. I could be wrong as I don’t know what others are planning on doing.

>> Finally, if we really want to define a stable digest, I’d suggest that a reference implementation be created and proposed rather than forced upon the implementations before it materializes. This could possibly be made an option in the CouchDB configuration or build allowing it to be an experimental feature.
> 
> Hence my strawperson proposal that we can work on. I envision all implementors getting a say in what works for them and what doesn’t and that we find a consensus and a solution that we can roll this out harmlessly.

I agree but there seems to be a dismissal of the idea that we don’t need this rather than it really being a matter of just finding the right implementation that fits every useless. [2]

Brian.

[1]: I also alluded to the 409 issue in another email which shows the growing problem of how the old revision system isn’t well designed for anything but single node systems. I’d vote to remove this in 3.x since conflicts on write mean nothing in an eventually consistent system and the 409 actually makes it harder to test code in this case. It’s just trivial to poke holes in the setup and I don’t see how revs can possibly be the wall people actually hit.

[2]: I think there is a better need for revision control that applications can leverage more significantly. There’s a long history of, rightly, discouraging people of using the MVCC implementation for application concerns, but that’s a limitation of the API, not of the idea. I could easily see revs being a richer entity in some systems, which makes this whole digest thing seem so specific and low level, that we’re really just locking ourselves in rather than opening the protocol up. It depends on where one might want to go, I guess.


Re: rev hash stability

Posted by Jan Lehnardt <ja...@apache.org>.
> On 19 Oct 2014, at 20:15 , Brian Mitchell <br...@standardanalytics.io> wrote:
> 
> 
>> On Oct 19, 2014, at 1:49 PM, Jan Lehnardt <ja...@apache.org> wrote:
>> 
>> 
>>> On 18 Oct 2014, at 01:17 , Jens Alfke <je...@couchbase.com> wrote:
>>> 
>>> 
>>>> On Oct 17, 2014, at 2:22 PM, Brian Mitchell <br...@standardanalytics.io> wrote:
>>>> 
>>>> Giving revs meaning outside of this scope is likely to bring up more meta
>>>> discussion about the CouchDB data model and a long history of
>>>> undocumented choices which only manifest in the particular
>>>> implementation we have today.
>>> 
>>> That does appear to be a danger. I'm not interested in bike-shedding; if the Apache CouchDB community can't make progress on this issue then we can discuss it elsewhere to come up with solutions. I can't speak for Chris, but I'm here as a courtesy and because I believe interoperability is important. But I believe making progress is more important.
>> 
>> +1000. I think so far we’ve had a brief chatter about this and we are ready to move on.
>> 
>> How does moving this to a strawperson proposal sound? E.g. have a ticket, or pad, or gist somewhere where we can hammer out the details of this and what the various trade-offs of open decisions are?
>> 
>> JIRA obviously preferred, but happy to start this elsewhere if it provides less friction.
> 
> My primary point is that interoperation does *not* require the rev hashes be done the same. Clustering does but I can’t see why we’d encourage people to write the same thing to two slightly different systems simultaneously. Doing that, I can guarantee that rev problems will not be the only thing to fix.
> 
> If we want to define rev interoperation in terms of the minimal and the stronger case, that might work just fine but defining interoperation as the latter is excludes a variety of strategies that implementations can have and will likely mean different versions of CouchDB don’t “interoperate” under this very definition, which is simply not a useful way to describe the situation.

I can’t parse this, can you rephrase? :)

> Finally, if we really want to define a stable digest, I’d suggest that a reference implementation be created and proposed rather than forced upon the implementations before it materializes. This could possibly be made an option in the CouchDB configuration or build allowing it to be an experimental feature.

Hence my strawperson proposal that we can work on. I envision all implementors getting a say in what works for them and what doesn’t and that we find a consensus and a solution that we can roll this out harmlessly.

Best
Jan
--


Re: rev hash stability

Posted by Brian Mitchell <br...@standardanalytics.io>.
> On Oct 19, 2014, at 1:49 PM, Jan Lehnardt <ja...@apache.org> wrote:
> 
> 
>> On 18 Oct 2014, at 01:17 , Jens Alfke <je...@couchbase.com> wrote:
>> 
>> 
>>> On Oct 17, 2014, at 2:22 PM, Brian Mitchell <br...@standardanalytics.io> wrote:
>>> 
>>> Giving revs meaning outside of this scope is likely to bring up more meta
>>> discussion about the CouchDB data model and a long history of
>>> undocumented choices which only manifest in the particular
>>> implementation we have today.
>> 
>> That does appear to be a danger. I'm not interested in bike-shedding; if the Apache CouchDB community can't make progress on this issue then we can discuss it elsewhere to come up with solutions. I can't speak for Chris, but I'm here as a courtesy and because I believe interoperability is important. But I believe making progress is more important.
> 
> +1000. I think so far we’ve had a brief chatter about this and we are ready to move on.
> 
> How does moving this to a strawperson proposal sound? E.g. have a ticket, or pad, or gist somewhere where we can hammer out the details of this and what the various trade-offs of open decisions are?
> 
> JIRA obviously preferred, but happy to start this elsewhere if it provides less friction.

My primary point is that interoperation does *not* require the rev hashes be done the same. Clustering does but I can’t see why we’d encourage people to write the same thing to two slightly different systems simultaneously. Doing that, I can guarantee that rev problems will not be the only thing to fix.

If we want to define rev interoperation in terms of the minimal and the stronger case, that might work just fine but defining interoperation as the latter is excludes a variety of strategies that implementations can have and will likely mean different versions of CouchDB don’t “interoperate” under this very definition, which is simply not a useful way to describe the situation.

Finally, if we really want to define a stable digest, I’d suggest that a reference implementation be created and proposed rather than forced upon the implementations before it materializes. This could possibly be made an option in the CouchDB configuration or build allowing it to be an experimental feature.

Brian.




Re: rev hash stability

Posted by Jan Lehnardt <ja...@apache.org>.
> On 18 Oct 2014, at 01:17 , Jens Alfke <je...@couchbase.com> wrote:
> 
> 
>> On Oct 17, 2014, at 2:22 PM, Brian Mitchell <br...@standardanalytics.io> wrote:
>> 
>> Giving revs meaning outside of this scope is likely to bring up more meta
>> discussion about the CouchDB data model and a long history of
>> undocumented choices which only manifest in the particular
>> implementation we have today.
> 
> That does appear to be a danger. I'm not interested in bike-shedding; if the Apache CouchDB community can't make progress on this issue then we can discuss it elsewhere to come up with solutions. I can't speak for Chris, but I'm here as a courtesy and because I believe interoperability is important. But I believe making progress is more important.

+1000. I think so far we’ve had a brief chatter about this and we are ready to move on.

How does moving this to a strawperson proposal sound? E.g. have a ticket, or pad, or gist somewhere where we can hammer out the details of this and what the various trade-offs of open decisions are?

JIRA obviously preferred, but happy to start this elsewhere if it provides less friction.

Best
Jan
--



> Back to the matter at hand: experience from a long line of P2P systems (from FreeNet onwards) shows the value of giving pieces of distributed content their own unique and unforgeable IDs. CouchDB-style revision IDs partly meet this need, except that:
> (a) there are interoperability issues because every implementation has its own algorithm for generating the IDs;
> (b) none of the current ones are very unforgeable because they use the broken MD5 hash instead of something like SHA256;
> (c) the unforgeability isn't verified because the replicator doesn't check that a revision's ID matches its contents.
> 
> At some point — Couchbase would like to build P2P systems in the future — we may need to take this more seriously, at which point it becomes necessary to have a canonical rev-ID generation algorithm which is enforced by the replicator. That algorithm will need to be standardized for interoperability purposes, since otherwise two implementations would reject each other's revisions as forgeries.
> 
> That's why I see this issue as important.



Re: rev hash stability

Posted by Chris Anderson <jc...@couchbase.com>.
On Fri, Oct 17, 2014 at 4:17 PM, Jens Alfke <je...@couchbase.com> wrote:

>
> > On Oct 17, 2014, at 2:22 PM, Brian Mitchell <br...@standardanalytics.io>
> wrote:
> >
> > Giving revs meaning outside of this scope is likely to bring up more meta
> > discussion about the CouchDB data model and a long history of
> > undocumented choices which only manifest in the particular
> > implementation we have today.
>
> That does appear to be a danger. I'm not interested in bike-shedding; if
> the Apache CouchDB community can't make progress on this issue then we can
> discuss it elsewhere to come up with solutions. I can't speak for Chris,
> but I'm here as a courtesy and because I believe interoperability is
> important. But I believe making progress is more important.
>
>
My original motivation for raising the issue is, I expect to be writing an
integration suite to make sure the Couchbase rev generators on various
platforms all give the same answer. I'm hoping someone on this list has
thought more about the problem than me, so when we do move our stuff to a
uniform approach, it has at least a chance of being appealing to other
implementations.

But I do expect that there will always be cases where rev bodies are
random, and that replicators will always be able to handle that just fine.
(Except in the case Jens raises, which is akin to if someone was testing
the validity of rev hashes in a custom validation function -- that is, it'd
probably be an option you can turn on if you need unforgeability.)

Chris


> Back to the matter at hand: experience from a long line of P2P systems
> (from FreeNet onwards) shows the value of giving pieces of distributed
> content their own unique and unforgeable IDs. CouchDB-style revision IDs
> partly meet this need, except that:
> (a) there are interoperability issues because every implementation has its
> own algorithm for generating the IDs;
> (b) none of the current ones are very unforgeable because they use the
> broken MD5 hash instead of something like SHA256;
> (c) the unforgeability isn't verified because the replicator doesn't check
> that a revision's ID matches its contents.
>
> At some point — Couchbase would like to build P2P systems in the future —
> we may need to take this more seriously, at which point it becomes
> necessary to have a canonical rev-ID generation algorithm which is enforced
> by the replicator. That algorithm will need to be standardized for
> interoperability purposes, since otherwise two implementations would reject
> each other's revisions as forgeries.
>
> That's why I see this issue as important.
>
> —Jens




-- 
Chris Anderson  @jchris
http://www.couchbase.com

Re: rev hash stability

Posted by Jens Alfke <je...@couchbase.com>.
> On Oct 17, 2014, at 2:22 PM, Brian Mitchell <br...@standardanalytics.io> wrote:
> 
> Giving revs meaning outside of this scope is likely to bring up more meta
> discussion about the CouchDB data model and a long history of
> undocumented choices which only manifest in the particular
> implementation we have today.

That does appear to be a danger. I'm not interested in bike-shedding; if the Apache CouchDB community can't make progress on this issue then we can discuss it elsewhere to come up with solutions. I can't speak for Chris, but I'm here as a courtesy and because I believe interoperability is important. But I believe making progress is more important.

Back to the matter at hand: experience from a long line of P2P systems (from FreeNet onwards) shows the value of giving pieces of distributed content their own unique and unforgeable IDs. CouchDB-style revision IDs partly meet this need, except that:
(a) there are interoperability issues because every implementation has its own algorithm for generating the IDs;
(b) none of the current ones are very unforgeable because they use the broken MD5 hash instead of something like SHA256;
(c) the unforgeability isn't verified because the replicator doesn't check that a revision's ID matches its contents.

At some point — Couchbase would like to build P2P systems in the future — we may need to take this more seriously, at which point it becomes necessary to have a canonical rev-ID generation algorithm which is enforced by the replicator. That algorithm will need to be standardized for interoperability purposes, since otherwise two implementations would reject each other's revisions as forgeries.

That's why I see this issue as important.

—Jens

Re: rev hash stability

Posted by Brian Mitchell <br...@standardanalytics.io>.
I don’t want to distract the conversation too much but what the semantics
rev’s cary supposed to be pretty opaque. The lack of deterministic revs
don’t matter for an embedded database (at least in its current form). As
far as a user can see it’s a rev and it’s a token that represents some sort
of place in history for a document.

Giving revs meaning outside of this scope is likely to bring up more meta
discussion about the CouchDB data model and a long history of
undocumented choices which only manifest in the particular
implementation we have today.

Beyond that, I’d like to hear what the specific use case is for standard rev
content.

As Dale says, multicasting to multiple databases asynchronously that are
only integrated via replication is a very likely place to find conflicts. Even
when revs are deterministic, there are problems with replication lag
which can cause the same problem anyway. It just doesn’t make sense
in an uncoordinated system.

Forcing revs to be computed exactly the same everywhere seems about
as complete a solution as throwing 409’s on updates, which basically is
never enough to prevent conflicts and probably one of the worst features
to rely on in the era of clustering and in any case where replication is used.

Brian.

> On Oct 17, 2014, at 4:47 PM, Chris Anderson <jc...@couchbase.com> wrote:
> 
> I would never suggest that a random rev or other style rev shouldn't be
> functional/expected. It's just that if you do want to generate the same
> revs as somebody else right now, it's hard. Making it less hard it would be
> good for everyone.
> 
> Chris
> 
> On Friday, October 17, 2014, Brian Mitchell <br...@standardanalytics.io>
> wrote:
> 
>> 
>>> On Oct 17, 2014, at 3:41 PM, Jens Alfke <jens@couchbase.com
>> <javascript:;>> wrote:
>>> 
>>> 
>>>> On Oct 17, 2014, at 12:15 PM, Brian Mitchell <
>> brian@standardanalytics.io <javascript:;>> wrote:
>>>> 
>>>> Simply put: if and only if the revs match we should assume some
>> optimism just like we
>>>> do with things like atts_since. There’s already a lot of trust between
>> two nodes for replication
>>>> and we should assume that matching revs were either unique (or random)
>> or based on some
>>>> deterministic property that isn’t likely to collide unless it was an
>> equivalent operation.
>>> 
>>> I'm sorry, I've read this a few times and I can't figure out exactly
>> what your meaning is. Could you elaborate? Particularly, what does "if the
>> revs match" mean, exactly?
>>> 
>>> Also, I don't think your statement "there’s already a lot of trust
>> between two nodes for replication" is accurate in all cases. You seem to be
>> thinking of a server cluster (a la BigCouch) but CouchDB-style replication
>> is often used in a more distributed way. Both PouchDB and Couchbase Lite
>> use replication between servers and clients. A client can be trusted to be
>> acting on behalf of a user, but not beyond that.
>>> 
>>> —Jens
>> 
>> No problem. I probably kept the message too short.
>> 
>> The issue is that requiring revs to match is a bit assuming about the
>> context
>> different implementations are designed to operate in. The case of the
>> optimization
>> makes a lot of sense in some cases (clustering for availability being the
>> most
>> obvious).
>> 
>> This implies there is a contract to how any implementation should treat
>> revisions:
>> 
>> 1. Any revs that match between two documents should be assumed to be the
>> same
>> revision of the document. This is important outside of optimization (N-way
>> replications
>> for example).
>> 
>> 2. Each implementation must be trusted to generate unique revisions.
>> 
>> 3. Optionally: revisions can be generated deterministically to allow
>> idempotent
>> operations. This is really important for clusters (non-optional in
>> practice) but
>> has very little important for PouchDB.
>> 
>> I’d urge implementations to document what guarantees their revs have but
>> I would stop short in exposing the implementation (like the digest used or
>> RNG function) as that is out of scope for the _rev contract for compatible
>> implementations.
>> 
>> There are many reasons to settle at this level of detail, backwards
>> compatibility
>> being the most important. The other is that it could allow other sorts of
>> rev
>> encoding in the future for some implementations (cheaper tree merges being
>> one thing worth revisiting).
>> 
>> So PouchDB should generate revs that make sense for PouchDB’s
>> implementation.
>> The contract of how these revs are interpreted shouldn’t constrain it to
>> implementing
>> the same JSON normalization and digest that others do. Same goes for other
>> Couch’s.
>> 
>> Brian.
>> 
>> 
> 
> -- 
> Chris Anderson  @jchris
> http://www.couchbase.com


Re: rev hash stability

Posted by Chris Anderson <jc...@couchbase.com>.
I would never suggest that a random rev or other style rev shouldn't be
functional/expected. It's just that if you do want to generate the same
revs as somebody else right now, it's hard. Making it less hard it would be
good for everyone.

Chris

On Friday, October 17, 2014, Brian Mitchell <br...@standardanalytics.io>
wrote:

>
> > On Oct 17, 2014, at 3:41 PM, Jens Alfke <jens@couchbase.com
> <javascript:;>> wrote:
> >
> >
> >> On Oct 17, 2014, at 12:15 PM, Brian Mitchell <
> brian@standardanalytics.io <javascript:;>> wrote:
> >>
> >> Simply put: if and only if the revs match we should assume some
> optimism just like we
> >> do with things like atts_since. There’s already a lot of trust between
> two nodes for replication
> >> and we should assume that matching revs were either unique (or random)
> or based on some
> >> deterministic property that isn’t likely to collide unless it was an
> equivalent operation.
> >
> > I'm sorry, I've read this a few times and I can't figure out exactly
> what your meaning is. Could you elaborate? Particularly, what does "if the
> revs match" mean, exactly?
> >
> > Also, I don't think your statement "there’s already a lot of trust
> between two nodes for replication" is accurate in all cases. You seem to be
> thinking of a server cluster (a la BigCouch) but CouchDB-style replication
> is often used in a more distributed way. Both PouchDB and Couchbase Lite
> use replication between servers and clients. A client can be trusted to be
> acting on behalf of a user, but not beyond that.
> >
> > —Jens
>
> No problem. I probably kept the message too short.
>
> The issue is that requiring revs to match is a bit assuming about the
> context
> different implementations are designed to operate in. The case of the
> optimization
> makes a lot of sense in some cases (clustering for availability being the
> most
> obvious).
>
> This implies there is a contract to how any implementation should treat
> revisions:
>
> 1. Any revs that match between two documents should be assumed to be the
> same
> revision of the document. This is important outside of optimization (N-way
> replications
> for example).
>
> 2. Each implementation must be trusted to generate unique revisions.
>
> 3. Optionally: revisions can be generated deterministically to allow
> idempotent
> operations. This is really important for clusters (non-optional in
> practice) but
> has very little important for PouchDB.
>
> I’d urge implementations to document what guarantees their revs have but
> I would stop short in exposing the implementation (like the digest used or
> RNG function) as that is out of scope for the _rev contract for compatible
> implementations.
>
> There are many reasons to settle at this level of detail, backwards
> compatibility
> being the most important. The other is that it could allow other sorts of
> rev
> encoding in the future for some implementations (cheaper tree merges being
> one thing worth revisiting).
>
> So PouchDB should generate revs that make sense for PouchDB’s
> implementation.
> The contract of how these revs are interpreted shouldn’t constrain it to
> implementing
> the same JSON normalization and digest that others do. Same goes for other
> Couch’s.
>
> Brian.
>
>

-- 
Chris Anderson  @jchris
http://www.couchbase.com

Re: rev hash stability

Posted by Jens Alfke <je...@couchbase.com>.
> On Oct 17, 2014, at 12:48 PM, Brian Mitchell <br...@standardanalytics.io> wrote:
> 
> 1. Any revs that match between two documents should be assumed to be the same
> revision of the document. This is important outside of optimization (N-way replications
> for example).

Again, I'm not sure what you mean by "match". (Or by "between two documents" … none of this makes sense if there's more than one document involved. Was that a typo?)

If by "match" you mean "equal contents [aside from the _rev property]", I don't think any current implementation does what you said. First off, there's a very important piece of information that's not stored in a revision: its parent revision. Two revisions with identical contents but different parents should never be considered equal.

Even if you accept that, I don't think it's feasible to merge two revisions with different IDs but equal contents. A minor bit is that the replicator will have extra overhead in comparing the new revision against existing peers to determine whether they're equal. But the big problem is that if it is equal to an already stored revision … then what? You can't throw one of them away, and there's no mechanism to record their equality, so keeping them both doesn't make sense.

> 3. Optionally: revisions can be generated deterministically to allow idempotent
> operations. This is really important for clusters (non-optional in practice) but
> has very little important for PouchDB.

No, it's very important for a distributed system, of which PouchDB is a likely client. The example I gave before, of two people checking off the same to-do list item, is an example of an idempotent operation. I could come up with a lot more.

—Jens

Re: rev hash stability

Posted by Brian Mitchell <br...@standardanalytics.io>.
> On Oct 17, 2014, at 3:41 PM, Jens Alfke <je...@couchbase.com> wrote:
> 
> 
>> On Oct 17, 2014, at 12:15 PM, Brian Mitchell <br...@standardanalytics.io> wrote:
>> 
>> Simply put: if and only if the revs match we should assume some optimism just like we
>> do with things like atts_since. There’s already a lot of trust between two nodes for replication
>> and we should assume that matching revs were either unique (or random) or based on some
>> deterministic property that isn’t likely to collide unless it was an equivalent operation.
> 
> I'm sorry, I've read this a few times and I can't figure out exactly what your meaning is. Could you elaborate? Particularly, what does "if the revs match" mean, exactly?
> 
> Also, I don't think your statement "there’s already a lot of trust between two nodes for replication" is accurate in all cases. You seem to be thinking of a server cluster (a la BigCouch) but CouchDB-style replication is often used in a more distributed way. Both PouchDB and Couchbase Lite use replication between servers and clients. A client can be trusted to be acting on behalf of a user, but not beyond that.
> 
> —Jens

No problem. I probably kept the message too short.

The issue is that requiring revs to match is a bit assuming about the context
different implementations are designed to operate in. The case of the optimization
makes a lot of sense in some cases (clustering for availability being the most
obvious).

This implies there is a contract to how any implementation should treat revisions:

1. Any revs that match between two documents should be assumed to be the same
revision of the document. This is important outside of optimization (N-way replications
for example).

2. Each implementation must be trusted to generate unique revisions.

3. Optionally: revisions can be generated deterministically to allow idempotent
operations. This is really important for clusters (non-optional in practice) but
has very little important for PouchDB.

I’d urge implementations to document what guarantees their revs have but
I would stop short in exposing the implementation (like the digest used or
RNG function) as that is out of scope for the _rev contract for compatible
implementations.

There are many reasons to settle at this level of detail, backwards compatibility
being the most important. The other is that it could allow other sorts of rev
encoding in the future for some implementations (cheaper tree merges being
one thing worth revisiting).

So PouchDB should generate revs that make sense for PouchDB’s implementation.
The contract of how these revs are interpreted shouldn’t constrain it to implementing
the same JSON normalization and digest that others do. Same goes for other Couch’s.

Brian.


Re: rev hash stability

Posted by Jens Alfke <je...@couchbase.com>.
> On Oct 17, 2014, at 12:15 PM, Brian Mitchell <br...@standardanalytics.io> wrote:
> 
> Simply put: if and only if the revs match we should assume some optimism just like we
> do with things like atts_since. There’s already a lot of trust between two nodes for replication
> and we should assume that matching revs were either unique (or random) or based on some
> deterministic property that isn’t likely to collide unless it was an equivalent operation.

I'm sorry, I've read this a few times and I can't figure out exactly what your meaning is. Could you elaborate? Particularly, what does "if the revs match" mean, exactly?

Also, I don't think your statement "there’s already a lot of trust between two nodes for replication" is accurate in all cases. You seem to be thinking of a server cluster (a la BigCouch) but CouchDB-style replication is often used in a more distributed way. Both PouchDB and Couchbase Lite use replication between servers and clients. A client can be trusted to be acting on behalf of a user, but not beyond that.

—Jens

Re: rev hash stability

Posted by Brian Mitchell <br...@standardanalytics.io>.
I agree on the anti-pattern. Concurrency control seems to be more of a cluster property. I
do think, however, that the optimization isn’t a “bad” thing. It just shouldn’t be a requirement
for operation and never be relied upon as an assumption.

Simply put: if and only if the revs match we should assume some optimism just like we
do with things like atts_since. There’s already a lot of trust between two nodes for replication
and we should assume that matching revs were either unique (or random) or based on some
deterministic property that isn’t likely to collide unless it was an equivalent operation.

I’d encourage PouchDB to keep random revs unless they have some internal need to change
it. Even then, I think it’s PouchDB’s choice to generate revs however it wants as long as it
follows these semantics.

Brian.

> On Oct 17, 2014, at 9:20 AM, Dale Harvey <da...@arandomurl.com> wrote:
> 
> Does anyone have a compelling reason for this optimisation existing?
> 
> I cant think of many reasons for a user to be sending the same writes to
> different servers then not wanting them to conflict, I feel like its an
> anti pattern and I feel like if I make seperate writes and they dont
> conflict when I replicate, something is broken. Considering where there are
> a lot of huge and fairly easy wins in replication, spending any time on
> this almost never touched case doesnt seem worth it.
> 
> PouchDB just uses random revs, the only people that have cared have known
> the inner working of couchdb, a pouchdb user has never been confused by the
> behaviour as far as I can remember.
> 
> On 17 October 2014 13:45, Alexander Shorin <kx...@gmail.com> wrote:
> 
>> Hi Chris,
>> 
>> I'd already opened an issue to track this down:
>> https://issues.apache.org/jira/browse/COUCHDB-2338
>> 
>> If anyone has some plan, it better to be there.
>> 
>> --
>> ,,,^..^,,,
>> 
>> 
>> On Fri, Oct 17, 2014 at 3:38 PM, Chris Anderson <jc...@couchbase.com>
>> wrote:
>>> Wouldn't it be cool if we all generated interoperable revision hashes?
>>> 
>>> I can't point any fingers because as far as I can tell, Couchbase is
>>> promulgating at least 3 different revision hash generation schemes.
>>> 
>>> I've filed an issue for us here:
>>> https://github.com/couchbase/mobile/issues/3
>>> 
>>> Part of the problem is CouchDB's use of term_to_binary:
>>> 
>> https://github.com/apache/couchdb-couch/blob/d28af185295d4618b489c050bcc71407e89891f1/src/couch_db.erl#L820
>>> 
>>> I've seen this discussed informally, but I don't know if anyone has a
>>> tractable plan to get us there.
>>> 
>>> Cheers,
>>> Chris
>>> 
>>> --
>>> Chris Anderson  @jchris
>>> http://www.couchbase.com
>> 


Re: rev hash stability

Posted by Alexander Shorin <kx...@gmail.com>.
On Fri, Oct 17, 2014 at 5:20 PM, Dale Harvey <da...@arandomurl.com> wrote:
> Does anyone have a compelling reason for this optimisation existing?

I believe Robert Newson does (:

> I cant think of many reasons for a user to be sending the same writes to
> different servers then not wanting them to conflict, I feel like its an
> anti pattern and I feel like if I make seperate writes and they dont
> conflict when I replicate, something is broken. Considering where there are
> a lot of huge and fairly easy wins in replication, spending any time on
> this almost never touched case doesnt seem worth it.
>
> PouchDB just uses random revs, the only people that have cared have known
> the inner working of couchdb, a pouchdb user has never been confused by the
> behaviour as far as I can remember.

The answer is in the word: "reproducibility" or consistency of your
actions. You'll probably find awkward if repeating exactly the same
actions in the same order would give you different results, right? And
since the result is different, replication would considered it as
conflict, while there is no of any.

--
,,,^..^,,,

Re: rev hash stability

Posted by Jens Alfke <je...@couchbase.com>.
> On Oct 17, 2014, at 6:20 AM, Dale Harvey <da...@arandomurl.com> wrote:
> 
> I cant think of many reasons for a user to be sending the same writes to
> different servers then not wanting them to conflict

It's not that. It's _multiple_ users making identical changes, and not wanting that to conflict. There are a bunch of use cases:

* Two people check off the same item in a shopping list, i.e. set "checked":true.
or,
* Some actual conflict is created in a doc, then multiple client apps detect that they can automatically resolve it and do so, each creating their own identical merged revisions that then all propagate.

(These are mobile-centric I realize, but I'm sure there are equivalent CouchDB-like scenarios.)

—Jens

Re: rev hash stability

Posted by Dale Harvey <da...@arandomurl.com>.
Does anyone have a compelling reason for this optimisation existing?

I cant think of many reasons for a user to be sending the same writes to
different servers then not wanting them to conflict, I feel like its an
anti pattern and I feel like if I make seperate writes and they dont
conflict when I replicate, something is broken. Considering where there are
a lot of huge and fairly easy wins in replication, spending any time on
this almost never touched case doesnt seem worth it.

PouchDB just uses random revs, the only people that have cared have known
the inner working of couchdb, a pouchdb user has never been confused by the
behaviour as far as I can remember.

On 17 October 2014 13:45, Alexander Shorin <kx...@gmail.com> wrote:

> Hi Chris,
>
> I'd already opened an issue to track this down:
> https://issues.apache.org/jira/browse/COUCHDB-2338
>
> If anyone has some plan, it better to be there.
>
> --
> ,,,^..^,,,
>
>
> On Fri, Oct 17, 2014 at 3:38 PM, Chris Anderson <jc...@couchbase.com>
> wrote:
> > Wouldn't it be cool if we all generated interoperable revision hashes?
> >
> > I can't point any fingers because as far as I can tell, Couchbase is
> > promulgating at least 3 different revision hash generation schemes.
> >
> > I've filed an issue for us here:
> > https://github.com/couchbase/mobile/issues/3
> >
> > Part of the problem is CouchDB's use of term_to_binary:
> >
> https://github.com/apache/couchdb-couch/blob/d28af185295d4618b489c050bcc71407e89891f1/src/couch_db.erl#L820
> >
> > I've seen this discussed informally, but I don't know if anyone has a
> > tractable plan to get us there.
> >
> > Cheers,
> > Chris
> >
> > --
> > Chris Anderson  @jchris
> > http://www.couchbase.com
>

Re: rev hash stability

Posted by Alexander Shorin <kx...@gmail.com>.
Hi Chris,

I'd already opened an issue to track this down:
https://issues.apache.org/jira/browse/COUCHDB-2338

If anyone has some plan, it better to be there.

--
,,,^..^,,,


On Fri, Oct 17, 2014 at 3:38 PM, Chris Anderson <jc...@couchbase.com> wrote:
> Wouldn't it be cool if we all generated interoperable revision hashes?
>
> I can't point any fingers because as far as I can tell, Couchbase is
> promulgating at least 3 different revision hash generation schemes.
>
> I've filed an issue for us here:
> https://github.com/couchbase/mobile/issues/3
>
> Part of the problem is CouchDB's use of term_to_binary:
> https://github.com/apache/couchdb-couch/blob/d28af185295d4618b489c050bcc71407e89891f1/src/couch_db.erl#L820
>
> I've seen this discussed informally, but I don't know if anyone has a
> tractable plan to get us there.
>
> Cheers,
> Chris
>
> --
> Chris Anderson  @jchris
> http://www.couchbase.com