You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Geoffrey Cox <re...@gmail.com> on 2017/06/04 04:04:20 UTC

Sequence number bug in CouchDB 2 or is there another way to compare sequence numbers?

I’m digging deeper into CouchDB 2 and I’m finding some unexpected ordering
with sequence numbers. In one case, I found that an early change in a
_changes feed has the sequence number



*99-g1AAAAI-eJyd0EsOgjAQBuAGiI-dN9C9LmrBwqzkJtrSNkgQV6z1JnoTvYneBEvbhA0aMU1mkj6-_NMSITTJfYFm2anOcsFT10mpTzyG-LxpmiL32eqoN8aEAcWE9dz_jPCFrnzrHGQchiFM4kSgaV0JqQ6VFF-AtAV2DggMgCEGxrNhQfatc3bOyDiKUalg2EBVoCu66KapazcUh41e69-GssjNIvcWWRokk2oNofwj0MNazy4QFURhGQ0J9LKI-SHPIBHEgiak51nxBhxnrRk*



The last sequence number in my _changes feed, for the same DB, is



*228-g1AAAAJFeJyd0EkOgjAUBuAGTJCdN9AjlIKFruQm2jFAEFes9SZ6E72J3gQ7JW7QCGnyXtLhy-vfAgCWVSjAip96XglW-o5afRJQwNbDMDRVSOuj3ogQJRgiOnL_O8I2urKdd4B1KCRpkRcCxH0npKo7KX4ApQH2HogsAElOKOPTBjkY5-yd2DqKYqnItA91C13BRTdNXY0VWouRrV7JDOvmrLuxlLW4VAlJ5Qzr4aznJ2wskIIy-y9sh7wcYoMKLJKRXOACjTxr3uHcsBE*



In a browser console, the following is false



'228-g1AAAAJFeJyd0EkOgjAUBuAGTJCdN9AjlIKFruQm2jFAEFes9SZ6E72J3gQ7JW7QCGnyXtLhy-vfAgCWVSjAip96XglW-o5afRJQwNbDMDRVSOuj3ogQJRgiOnL_O8I2urKdd4B1KCRpkRcCxH0npKo7KX4ApQH2HogsAElOKOPTBjkY5-yd2DqKYqnItA91C13BRTdNXY0VWouRrV7JDOvmrLuxlLW4VAlJ5Qzr4aznJ2wskIIy-y9sh7wcYoMKLJKRXOACjTxr3uHcsBE'
>
'99-g1AAAAI-eJyd0EsOgjAQBuAGiI-dN9C9LmrBwqzkJtrSNkgQV6z1JnoTvYneBEvbhA0aMU1mkj6-_NMSITTJfYFm2anOcsFT10mpTzyG-LxpmiL32eqoN8aEAcWE9dz_jPCFrnzrHGQchiFM4kSgaV0JqQ6VFF-AtAV2DggMgCEGxrNhQfatc3bOyDiKUalg2EBVoCu66KapazcUh41e69-GssjNIvcWWRokk2oNofwj0MNazy4QFURhGQ0J9LKI-SHPIBHEgiak51nxBhxnrRk'



Is this a bug or do I need to use some other method to compare sequence
numbers?



In looking at the other sequence numbers in my _changes feed, it looks like
they are generally ordered as I would expect, but in this case it appears
that when the first number, e.g. 99, jumps from 2 digits to 3 digits, the
ordering breaks. If you boil this down to a simple string comparison
example, you can see that '228' > '99' => false


Thanks.


Geoff

Re: Sequence number bug in CouchDB 2 or is there another way to compare sequence numbers?

Posted by Robert Samuel Newson <rn...@apache.org>.
:)

> On 5 Jun 2017, at 02:55, Geoffrey Cox <re...@gmail.com> wrote:
> 
> Hi Robert,
> 
> This last email is killer at explaining how the sequence numbers work.
> Thanks!
> 
> I've modified my code to use the algorithm you mentioned, i.e. the seq
> number of the last row in the _changes feed is used in the next call to the
> _changes feed and it appears to be working well. I'm not sure where the
> breakdown was before, but I suspect I probably have a small bug somewhere
> else in my code. For now, I don't think there are any major issues with
> this replication protocol, but I'll be sure to provide more details if I
> find anything in the future.
> 
> Thanks again!
> 
> Geoff
> 
> On Sun, Jun 4, 2017, 10:37 Robert Samuel Newson <rn...@apache.org> wrote:
> 
>> Nope, it won't. There is no single order to the changes feed, so even if
>> you could decode these values you still couldn't decide if one seq was
>> before another. I'll illustrate with unencoded values.
>> 
>> Imagine an empty (N=2, Q=2) database, then we add two documents at the
>> same time that map to different shards. We make two requests to _changes at
>> the same time but, critically, one changes feed is fed by node2 for the
>> 81-ff range and sees that update and hits node1 for the 00-80 range which
>> has not yet seen the other update. the other changes feed sees the opposite.
>> 
>> 1. [{node1, 00-80, 0}, {node2, 81-ff, 1}]
>> 2. [{node1, 81-ff, 1}, {node2, 00-80, 0}]
>> 
>> is seq 1 greater than 2? No. is seq 2 greater than 1? No.
>> 
>> Fractionally later, of course, both nodes see both updates and so the last
>> seq of any changes feed would converge to any of the following;
>> 
>> [{node1, 00-80, 1}, {node1, 81-ff, 1}]
>> [{node1, 00-80, 1}, {node2, 81-ff, 1}]
>> [{node2, 00-80, 1}, {node1, 81-ff, 1}]
>> [{node2, 00-80, 1}, {node2, 81-ff, 1}]
>> 
>> note also that all four of the seq values above will encode to different
>> string values (though the number on the front will be the same, 2).
>> 
>> The algorithm for correctly consuming the changes feed is described in the
>> quoted posts but I'll pull it out for clarity;
>> 
>> 1. read /dbname/_changes
>> 2. process each row idempotently
>> 3. periodically (every X seconds or every X rows) store the "seq" value of
>> the last row you processed
>> 
>> If you ever crash, or if you weren't using continuous=true, you can do
>> this same procedure again but modified in step 1;
>> 
>> revised 1. read /dbname/_changes?since=X
>> 
>> where X is the value you saved in step 3. If you're not using continuous
>> mode then you could just record the "last_seq" value at the end of
>> consuming the non-continuous response. You run the risk of reprocessing a
>> lot more items, though.
>> 
>> With this scheme (which the replicator and all indexers follow), you don't
>> care if the results come out of order, you don't need to compare any two
>> seq values.
>> 
>> You _do_ need to ensure you can correctly process the same change multiple
>> times. For an example of that, consider the replicator, when it sees a row
>> from a changes feed it asks the target database if it contains the _id and
>> _rev values from that row. If it does, the replicator moves on to the next
>> row. If it doesn't, it tries to write the document in that row to the
>> target database. In the event of a crash, and therefore a call to _changes
>> with a seq value from before processing that row, it will ask the target
>> database if it has the _id/_rev again, only this time the target will say
>> yes.
>> 
>> We are very interested in any evidence that this algorithm does not work
>> as it is fundamental to CouchDB replication and internal data redundancy
>> too.
>> 
>> B.
>> 
>> 
>>> On 4 Jun 2017, at 16:06, Geoffrey Cox <re...@gmail.com> wrote:
>>> 
>>> Wow, thanks for all the great information, Robert and Alexander! I think
>> it
>>> would be very valuable for this information to be included in the CouchDB
>>> docs. I want to create a StackOverflow post that summarizes some of this
>>> stuff once I understand it just a little better.
>>> 
>>> 
>>> 
>>> I actually came up with an algorithm that appears to work correctly when
>>> comparing sequence numbers to generally determine which sequence number
>> is
>>> the later.
>>> 
>>> 
>>> 
>>> var sequenceGreaterThan = function (s1, s2) {
>>> 
>>> if (typeof s1 === 'number') { // CouchDB 1?
>>> 
>>>  return s1 > s2;
>>> 
>>> } else {
>>> 
>>>  var parts1 = s1.split(/-/),
>>> 
>>>    s1Int = parseInt(parts1[0]),
>>> 
>>>    parts2 = s2.split(/-/),
>>> 
>>>    s2Int = parseInt(parts2[0]);
>>> 
>>>  return s1Int > s2Int || (s1Int === s2Int && parts1[1] > parts2[1]);
>>> 
>>> }
>>> 
>>> };
>>> 
>>> 
>>> 
>>> Let me explain why it is necessary and hopefully someone can verify that
>> it
>>> will work for my case. As part of implementing
>>> https://github.com/redgeoff/spiegel for CouchDB 2, I’ve had to port
>>> sequence number comparisons. There is a piece of Spiegel that listens to
>>> the changes feed on a database and then executes a function to process
>> the
>>> change. As is mentioned, changes can be received out of order or even
>>> replayed and this is fine, but the listener needs to come back later and
>>> pick up from where it has left off to process the next set of changes. To
>>> do this, there is code that checks for the latest sequence number so that
>>> it can be used in the next call to _changes. (Continuous listening on the
>>> _changes feed is not desired here as one of the goals of Spiegel is to
>>> listen to many databases without using many database connections).
>>> 
>>> 
>>> 
>>> For my particular case, does sequenceGreaterThan appear to work as
>> intended
>>> and guarantee that my listening on the _changes feed will generally move
>>> forward?
>>> 
>>> 
>>> 
>>> (In my testing, I’ve found that using the sequence number of the last
>> item
>>> in a _changes feed in the next call to the _changes feed doesn’t
>>> necessarily return the next set of changes. It appears to be a fairly
>> rare
>>> occurrence, but this is why I am using a sequence number comparison).
>>> 
>>> 
>>> Thanks!
>>> 
>>> 
>>> Geoff
>>> 
>>> 
>>> On Sun, Jun 4, 2017 at 3:07 AM Alexander Harm <co...@aharm.de> wrote:
>>> 
>>>> Hello Geoffrey,
>>>> 
>>>> some time ago Robert and Paul explained the ordering in the changes
>>>> feed. Maybe that helps:
>>>> 
>>>> Hey all,
>>>> 
>>>> Bob did a pretty good job explaining how the changes feed works most
>>>> of the time but I did want to call attention to an error condition
>>>> that might be of concern to the original question. There is a
>>>> situation where you can see old changes in the response depending on
>>>> some timing and error conditions.
>>>> 
>>>> For a bit of background on the since parameter, when we are looking at
>>>> a clustered since sequence like such:
>>>> 
>>>> 
>>>> 
>> 35-g1AAAAG1eJyNz0EKwjAQBdBoC2I3nkEPEGppSLOyV8k0U2pJE9C61pvpzWKaitBNyWYGhs9_jCaEpF2iyFFBY29YK-B0sNbcu6tB2mj7UNKM1OCofXQrCVycc32XyP3gDztExlkpYwqWTLHGQO0nPH_SJkh5i6IVTUzHUmJrkkn9JC-_PPaetCxoLYe8AhHTM2mnf-q8-tjMfWYuPHcIHIiyqDiPKuq_TDeP1A
>>>> 
>>>> What that actually contains is an encoded set of {shard, node,
>>>> update_seq} triples kinda like such:
>>>> 
>>>> [
>>>>   {"shards/00000000-3fffffff/davisp/test-db.1384769918", db7,  { 9,
>>>> <<"ee5754a">>, db7}},
>>>>   {"shards/40000000-7fffffff/davisp/test-db.1384769918", db2,  { 1,
>>>> <<"0fe9f9c">>, db2}},
>>>>   {"shards/80000000-bfffffff/davisp/test-db.1384769918", db5,  {10,
>>>> <<"f7b08b9">>, db5}},
>>>>   {"shards/c0000000-ffffffff/davisp/test-db.1384769918", db12, {15,
>>>> <<"b942877">>, db12}}
>>>> ]
>>>> 
>>>> What's happening here is that when you specify that parameter, CouchDB
>>>> will decode it and then try and read the changes from each of those
>>>> shards resuming at the given update seq. As an aside the extra uuid
>>>> prefix and node name are extra so that we can try and skip some old
>>>> changes, an optimization but not important for this discussion.
>>>> 
>>>> Now, the important bit is that if we specify this since seq and one of
>>>> the nodes db2, db5, db7, or db12 happens to be down (or just gone if
>>>> you stored that since seq for a long time and say the cluster changed
>>>> size) then CouchDB has to choose another shard to replace the missing
>>>> node. When this happens you will see "old" changes that have already
>>>> been processed. Your application should be able to handle this using
>>>> the approaches that Bob listed in his email.
>>>> 
>>>> However, (there's always a however in distributed systems) there
>>>> exists a timing situation where you may be reading the changes feed,
>>>> an update comes in to the shard you're reading from, and you see the
>>>> change. Then say that node goes down (which would terminate the
>>>> changes feed). The client would then reconnect with their latest
>>>> update seq and get a different node. This node may (depending on a
>>>> whole bunch of timing and optimization things) send a change for the
>>>> doc that is technically "before" the change that was already
>>>> processed. So you do have to be able to recognize that you already
>>>> processed a change for the doc. CouchDB does this internally by
>>>> keeping the revision tree and checking revisions against that.
>>>> 
>>>> I get the feeling that my description may not be super clear so I'll
>>>> try and restate it as a sequence of events:
>>>> 
>>>> 1. Client requests _changes with since=35-g1AAA...
>>>> 2. Someone makes an update to doc foo
>>>> 3. The first shard that handles the update is part of the changes feed
>>>> from #1
>>>> 4. Client reading _changes sees update appear in its feed
>>>> 5. The node containing the shard with an update dies (an operator
>>>> rebooted the wrong node perhaps)
>>>> 6. Client has its changes feed disconnect and reconnects with
>>>> since=35-g1AAA.. (or even a newer sequence)
>>>> 7. The shard that responds as a replacement for the shard on the dead
>>>> node does not (yet) have the update
>>>> 8. The doc exists in the update sequence after where it starts reading
>>>> its changes feed
>>>> 9. Client sees the "old" change and must know that its old
>>>> 
>>>> Note that this is all very unlikely and super sensitive to timing. For
>>>> instance, one node seeing the update, and then dying, and the other
>>>> two copies not receving the update would require some very peculiar
>>>> network disconnections in the cluster while still being reachable from
>>>> the load balancer. But the possibility remains which means apps will
>>>> want to be able to handle it.
>>>> 
>>>> Paul
>>>> 
>>>> On Sat, Sep 3, 2016 at 12:09 PM, Robert Payne <ro...@me.com>
>> wrote:
>>>>> Thanks for this,
>>>>> 
>>>>> Sorry it’s a bit light on details, it’s a very specific use case on iOS
>>>> where cpu/disk speed is constrained and we have a large data set. The
>> full
>>>> replication protocol just requires too many reads/writes to be
>> performant
>>>> and we’ve optimised it various ways. We’re idempotent so long as
>>>> per-document changes are in-order which I was just checking.
>>>>> 
>>>>> I appreciate the more technical analysis and it certainly clears up
>> what
>>>> I was asking.
>>>>> 
>>>>> Cheers,
>>>>> 
>>>>> Robert
>>>>> 
>>>>>> On 4/09/2016, at 4:41 AM, Robert Samuel Newson <rn...@apache.org>
>>>> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> It is important to understand that the order of rows in the _changes
>>>> response is not important. In couchdb before 2.0 the response was
>> totally
>>>> ordered, but this was never necessary for correctness. The essential
>>>> contract for _changes is that you are guaranteed to see all changes made
>>>> since the 'since' parameter you pass. The order of those changes is not
>>>> guaranteed and it is also not guaranteed that changes from _before_ that
>>>> 'since' value are _not_ also returned. The consequence of this contract
>> is
>>>> that all consumers of the _changes response must apply each row
>>>> idempotently. This is true for the replicator, of course.
>>>>>> 
>>>>>> The changes response in 2.0 is partially ordered. The changes from any
>>>> given shard will be in a consistent order, but we merge the changes from
>>>> each shard range of your database as they are collected from the various
>>>> contributing nodes, we don't apply a total ordering over that. The
>> reason
>>>> is simple; it's expensive and unnecessary. It's important to also
>> remember
>>>> that replication, even before 2.0, would not replicate in strict source
>>>> update order either, due to (valuable) parallelism when reading changes
>> and
>>>> applying them.
>>>>>> 
>>>>>> Your question: "Is it possible for the changes feed to send older
>>>> changes before newer changes for the same document ID across multiple
>>>> calls?" requires a little further background knowledge before answering.
>>>>>> 
>>>>>> While we call it a changes "feed" it's important to remember what it
>>>> really is, internally, first. Every database in couchdb, prior to 2.0,
>> is a
>>>> single file with multiple b+trees recorded inside it that are kept in
>>>> absolute sync with each other. One b+tree allows you to look up a
>> document
>>>> by the _id parameter. The other b+tree allows you to look up a document
>> by
>>>> its update order. It is essential to note that these two b+trees have
>> the
>>>> same number of key/value pairs in them at all times.
>>>>>> 
>>>>>> To illustrate this more clearly, consider an empty database. We add
>> one
>>>> document to it. It is retrievable by its _id and is also visible in the
>>>> _changes response as change number 1. Now, we update that document. It
>> is
>>>> now change number 2. Change number 1 will never again appear in the
>>>> _changes response. That is, every document appears in the _changes
>> response
>>>> at its most recent update number.
>>>>>> 
>>>>>> When you call _changes without the continuous parameter, couchdb is
>>>> simply traversing that second b+tree and returning each row it finds. It
>>>> may do this from the beginning (which was 1 before our update and 2
>> after)
>>>> or it may do so from some update seq you supply with the 'since'
>> parameter.
>>>>>> 
>>>>>> With that now understood, we can look at what changes when we do
>>>> continuous=true which is what makes it a "feed" (that is, a potentially
>>>> unending response of changes as they are made). This is sent in two
>> phases.
>>>> The first is exactly as the previous paragraph. Once all those changes
>> have
>>>> been sent, couchdb enters a loop where it returns updates as they happen
>>>> (or shortly after).
>>>>>> 
>>>>>> It is only in a continuous=true response in couchdb before 2.0 that
>> you
>>>> would ever see more than one change for any given document.
>>>>>> 
>>>>>> So, to cut a long story short (too late), the answer to your question
>>>> is "no". The changes feed is not a permanent history of all changes
>> made to
>>>> all documents. Once a document is updated, it is _moved_ to a newer
>>>> position and no longer appears in its old one (and no record of that
>>>> position is even preserved). Do note, though, that couchdb might return
>>>> 'Doc A change (seq: 2-XXXX)' even if your 'since' parameter is _after_
>> the
>>>> last change to doc A. We won't return ' Doc A change (seq: 1-XXXX)' at
>> all
>>>> after its updated to 2-XXXX.
>>>>>> 
>>>>>> The algorithm for correctly processing the changes response is as
>>>> follows, and any variation on this is likely broken;
>>>>>> 
>>>>>> 1) call /_changes?since=0
>>>>>> 2) for each returned row, ensure the target has the change in question
>>>> (either use _id + _rev to prevent duplicate application of the change or
>>>> apply the change in a way that is idempotent)
>>>>>> 3) periodically store the update seq of the last processed row to
>>>> stable storage (a _local document is a good choice)
>>>>>> 
>>>>>> If you wish to resume applying changes after a shutdown, reboot, or
>>>> crash, repeat the above process but substitute your stored update
>> sequence
>>>> in the ?since= parameter.
>>>>>> 
>>>>>> There are many things that use the changes feed in this way. Within
>>>> couchdb, there's database replication (obviously) but also couchdb
>> views.
>>>> Outside of the core, software like pouchdb and couchdb-lucene use the
>>>> changes feed to replicate data or update search indexes.
>>>>>> 
>>>>>> I hope this was useful, and I think it might expose some problems in
>>>> your couchdb-to-sqlite synchronisation protocol. Your email is obviously
>>>> silent on many details there, but if you've predicated its design on the
>>>> total ordering properties of couchdb < 2.0, you likely have some work
>> to do.
>>>>>> 
>>>>>> B.
>>>>>> 
>>>>>> 
>>>>>>> On 3 Sep 2016, at 00:04, Robert Payne <ro...@me.com> wrote:
>>>>>>> 
>>>>>>> Hey Everyone,
>>>>>>> 
>>>>>>> Reading up on the CouchDB 2.0 migration guides and getting a bit
>> antsy
>>>> around the mentions of out of order changes feed and sorts. Is it
>> possible
>>>> for the changes feed to send older changes before newer changes for the
>>>> same document ID across multiple calls?
>>>>>>> 
>>>>>>> Assuming start at ?since=“” and always pass in the “last_seq” on
>> every
>>>> additional call could a situation like this occur in a single or
>> multiple
>>>> HTTP calls:
>>>>>>> 
>>>>>>> — Changes feed emits Doc A change (seq: 2-XXXX)
>>>>>>> — Changes feed emits Doc B change (seq: 3-XXXX)
>>>>>>> — Changes feed emits Doc A change (seq: 1-XXXX)
>>>>>>> 
>>>>>>> I’m really hoping the case is just that across different doc ids
>>>> changes can be out of order. Our use case on mobile is a bit particular
>> as
>>>> we duplicate edits into a separate SQLite table and use the changes
>> feed to
>>>> keep the local database up to date with winning revs from the server, it
>>>> just increases the performance of sync by a ton since there is only 1
>> check
>>>> and set in SQLite per change that comes in.
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> Robert
>>>> 
>>>>> Geoffrey Cox <ma...@gmail.com>
>>>>> 4. June 2017 at 06:04
>>>>> I’m digging deeper into CouchDB 2 and I’m finding some unexpected
>>>> ordering
>>>>> with sequence numbers. In one case, I found that an early change in a
>>>>> _changes feed has the sequence number
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>> *99-g1AAAAI-eJyd0EsOgjAQBuAGiI-dN9C9LmrBwqzkJtrSNkgQV6z1JnoTvYneBEvbhA0aMU1mkj6-_NMSITTJfYFm2anOcsFT10mpTzyG-LxpmiL32eqoN8aEAcWE9dz_jPCFrnzrHGQchiFM4kSgaV0JqQ6VFF-AtAV2DggMgCEGxrNhQfatc3bOyDiKUalg2EBVoCu66KapazcUh41e69-GssjNIvcWWRokk2oNofwj0MNazy4QFURhGQ0J9LKI-SHPIBHEgiak51nxBhxnrRk*
>>>>> 
>>>>> 
>>>>> 
>>>>> The last sequence number in my _changes feed, for the same DB, is
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>> *228-g1AAAAJFeJyd0EkOgjAUBuAGTJCdN9AjlIKFruQm2jFAEFes9SZ6E72J3gQ7JW7QCGnyXtLhy-vfAgCWVSjAip96XglW-o5afRJQwNbDMDRVSOuj3ogQJRgiOnL_O8I2urKdd4B1KCRpkRcCxH0npKo7KX4ApQH2HogsAElOKOPTBjkY5-yd2DqKYqnItA91C13BRTdNXY0VWouRrV7JDOvmrLuxlLW4VAlJ5Qzr4aznJ2wskIIy-y9sh7wcYoMKLJKRXOACjTxr3uHcsBE*
>>>>> 
>>>>> 
>>>>> 
>>>>> In a browser console, the following is false
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>> '228-g1AAAAJFeJyd0EkOgjAUBuAGTJCdN9AjlIKFruQm2jFAEFes9SZ6E72J3gQ7JW7QCGnyXtLhy-vfAgCWVSjAip96XglW-o5afRJQwNbDMDRVSOuj3ogQJRgiOnL_O8I2urKdd4B1KCRpkRcCxH0npKo7KX4ApQH2HogsAElOKOPTBjkY5-yd2DqKYqnItA91C13BRTdNXY0VWouRrV7JDOvmrLuxlLW4VAlJ5Qzr4aznJ2wskIIy-y9sh7wcYoMKLJKRXOACjTxr3uHcsBE'
>>>>> 
>>>> 
>> '99-g1AAAAI-eJyd0EsOgjAQBuAGiI-dN9C9LmrBwqzkJtrSNkgQV6z1JnoTvYneBEvbhA0aMU1mkj6-_NMSITTJfYFm2anOcsFT10mpTzyG-LxpmiL32eqoN8aEAcWE9dz_jPCFrnzrHGQchiFM4kSgaV0JqQ6VFF-AtAV2DggMgCEGxrNhQfatc3bOyDiKUalg2EBVoCu66KapazcUh41e69-GssjNIvcWWRokk2oNofwj0MNazy4QFURhGQ0J9LKI-SHPIBHEgiak51nxBhxnrRk'
>>>>> 
>>>>> 
>>>>> 
>>>>> Is this a bug or do I need to use some other method to compare sequence
>>>>> numbers?
>>>>> 
>>>>> 
>>>>> 
>>>>> In looking at the other sequence numbers in my _changes feed, it looks
>>>>> like
>>>>> they are generally ordered as I would expect, but in this case it
>> appears
>>>>> that when the first number, e.g. 99, jumps from 2 digits to 3 digits,
>> the
>>>>> ordering breaks. If you boil this down to a simple string comparison
>>>>> example, you can see that '228' > '99' => false
>>>>> 
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>> 
>>>>> Geoff
>>>>> 
>>>> 
>>>> 
>> 
>> 


Re: Sequence number bug in CouchDB 2 or is there another way to compare sequence numbers?

Posted by Geoffrey Cox <re...@gmail.com>.
Hi Robert,

This last email is killer at explaining how the sequence numbers work.
Thanks!

I've modified my code to use the algorithm you mentioned, i.e. the seq
number of the last row in the _changes feed is used in the next call to the
_changes feed and it appears to be working well. I'm not sure where the
breakdown was before, but I suspect I probably have a small bug somewhere
else in my code. For now, I don't think there are any major issues with
this replication protocol, but I'll be sure to provide more details if I
find anything in the future.

Thanks again!

Geoff

On Sun, Jun 4, 2017, 10:37 Robert Samuel Newson <rn...@apache.org> wrote:

> Nope, it won't. There is no single order to the changes feed, so even if
> you could decode these values you still couldn't decide if one seq was
> before another. I'll illustrate with unencoded values.
>
> Imagine an empty (N=2, Q=2) database, then we add two documents at the
> same time that map to different shards. We make two requests to _changes at
> the same time but, critically, one changes feed is fed by node2 for the
> 81-ff range and sees that update and hits node1 for the 00-80 range which
> has not yet seen the other update. the other changes feed sees the opposite.
>
> 1. [{node1, 00-80, 0}, {node2, 81-ff, 1}]
> 2. [{node1, 81-ff, 1}, {node2, 00-80, 0}]
>
> is seq 1 greater than 2? No. is seq 2 greater than 1? No.
>
> Fractionally later, of course, both nodes see both updates and so the last
> seq of any changes feed would converge to any of the following;
>
> [{node1, 00-80, 1}, {node1, 81-ff, 1}]
> [{node1, 00-80, 1}, {node2, 81-ff, 1}]
> [{node2, 00-80, 1}, {node1, 81-ff, 1}]
> [{node2, 00-80, 1}, {node2, 81-ff, 1}]
>
> note also that all four of the seq values above will encode to different
> string values (though the number on the front will be the same, 2).
>
> The algorithm for correctly consuming the changes feed is described in the
> quoted posts but I'll pull it out for clarity;
>
> 1. read /dbname/_changes
> 2. process each row idempotently
> 3. periodically (every X seconds or every X rows) store the "seq" value of
> the last row you processed
>
> If you ever crash, or if you weren't using continuous=true, you can do
> this same procedure again but modified in step 1;
>
> revised 1. read /dbname/_changes?since=X
>
> where X is the value you saved in step 3. If you're not using continuous
> mode then you could just record the "last_seq" value at the end of
> consuming the non-continuous response. You run the risk of reprocessing a
> lot more items, though.
>
> With this scheme (which the replicator and all indexers follow), you don't
> care if the results come out of order, you don't need to compare any two
> seq values.
>
> You _do_ need to ensure you can correctly process the same change multiple
> times. For an example of that, consider the replicator, when it sees a row
> from a changes feed it asks the target database if it contains the _id and
> _rev values from that row. If it does, the replicator moves on to the next
> row. If it doesn't, it tries to write the document in that row to the
> target database. In the event of a crash, and therefore a call to _changes
> with a seq value from before processing that row, it will ask the target
> database if it has the _id/_rev again, only this time the target will say
> yes.
>
> We are very interested in any evidence that this algorithm does not work
> as it is fundamental to CouchDB replication and internal data redundancy
> too.
>
> B.
>
>
> > On 4 Jun 2017, at 16:06, Geoffrey Cox <re...@gmail.com> wrote:
> >
> > Wow, thanks for all the great information, Robert and Alexander! I think
> it
> > would be very valuable for this information to be included in the CouchDB
> > docs. I want to create a StackOverflow post that summarizes some of this
> > stuff once I understand it just a little better.
> >
> >
> >
> > I actually came up with an algorithm that appears to work correctly when
> > comparing sequence numbers to generally determine which sequence number
> is
> > the later.
> >
> >
> >
> > var sequenceGreaterThan = function (s1, s2) {
> >
> > if (typeof s1 === 'number') { // CouchDB 1?
> >
> >   return s1 > s2;
> >
> > } else {
> >
> >   var parts1 = s1.split(/-/),
> >
> >     s1Int = parseInt(parts1[0]),
> >
> >     parts2 = s2.split(/-/),
> >
> >     s2Int = parseInt(parts2[0]);
> >
> >   return s1Int > s2Int || (s1Int === s2Int && parts1[1] > parts2[1]);
> >
> > }
> >
> > };
> >
> >
> >
> > Let me explain why it is necessary and hopefully someone can verify that
> it
> > will work for my case. As part of implementing
> > https://github.com/redgeoff/spiegel for CouchDB 2, I’ve had to port
> > sequence number comparisons. There is a piece of Spiegel that listens to
> > the changes feed on a database and then executes a function to process
> the
> > change. As is mentioned, changes can be received out of order or even
> > replayed and this is fine, but the listener needs to come back later and
> > pick up from where it has left off to process the next set of changes. To
> > do this, there is code that checks for the latest sequence number so that
> > it can be used in the next call to _changes. (Continuous listening on the
> > _changes feed is not desired here as one of the goals of Spiegel is to
> > listen to many databases without using many database connections).
> >
> >
> >
> > For my particular case, does sequenceGreaterThan appear to work as
> intended
> > and guarantee that my listening on the _changes feed will generally move
> > forward?
> >
> >
> >
> > (In my testing, I’ve found that using the sequence number of the last
> item
> > in a _changes feed in the next call to the _changes feed doesn’t
> > necessarily return the next set of changes. It appears to be a fairly
> rare
> > occurrence, but this is why I am using a sequence number comparison).
> >
> >
> > Thanks!
> >
> >
> > Geoff
> >
> >
> > On Sun, Jun 4, 2017 at 3:07 AM Alexander Harm <co...@aharm.de> wrote:
> >
> >> Hello Geoffrey,
> >>
> >> some time ago Robert and Paul explained the ordering in the changes
> >> feed. Maybe that helps:
> >>
> >> Hey all,
> >>
> >> Bob did a pretty good job explaining how the changes feed works most
> >> of the time but I did want to call attention to an error condition
> >> that might be of concern to the original question. There is a
> >> situation where you can see old changes in the response depending on
> >> some timing and error conditions.
> >>
> >> For a bit of background on the since parameter, when we are looking at
> >> a clustered since sequence like such:
> >>
> >>
> >>
> 35-g1AAAAG1eJyNz0EKwjAQBdBoC2I3nkEPEGppSLOyV8k0U2pJE9C61pvpzWKaitBNyWYGhs9_jCaEpF2iyFFBY29YK-B0sNbcu6tB2mj7UNKM1OCofXQrCVycc32XyP3gDztExlkpYwqWTLHGQO0nPH_SJkh5i6IVTUzHUmJrkkn9JC-_PPaetCxoLYe8AhHTM2mnf-q8-tjMfWYuPHcIHIiyqDiPKuq_TDeP1A
> >>
> >> What that actually contains is an encoded set of {shard, node,
> >> update_seq} triples kinda like such:
> >>
> >> [
> >>    {"shards/00000000-3fffffff/davisp/test-db.1384769918", db7,  { 9,
> >> <<"ee5754a">>, db7}},
> >>    {"shards/40000000-7fffffff/davisp/test-db.1384769918", db2,  { 1,
> >> <<"0fe9f9c">>, db2}},
> >>    {"shards/80000000-bfffffff/davisp/test-db.1384769918", db5,  {10,
> >> <<"f7b08b9">>, db5}},
> >>    {"shards/c0000000-ffffffff/davisp/test-db.1384769918", db12, {15,
> >> <<"b942877">>, db12}}
> >> ]
> >>
> >> What's happening here is that when you specify that parameter, CouchDB
> >> will decode it and then try and read the changes from each of those
> >> shards resuming at the given update seq. As an aside the extra uuid
> >> prefix and node name are extra so that we can try and skip some old
> >> changes, an optimization but not important for this discussion.
> >>
> >> Now, the important bit is that if we specify this since seq and one of
> >> the nodes db2, db5, db7, or db12 happens to be down (or just gone if
> >> you stored that since seq for a long time and say the cluster changed
> >> size) then CouchDB has to choose another shard to replace the missing
> >> node. When this happens you will see "old" changes that have already
> >> been processed. Your application should be able to handle this using
> >> the approaches that Bob listed in his email.
> >>
> >> However, (there's always a however in distributed systems) there
> >> exists a timing situation where you may be reading the changes feed,
> >> an update comes in to the shard you're reading from, and you see the
> >> change. Then say that node goes down (which would terminate the
> >> changes feed). The client would then reconnect with their latest
> >> update seq and get a different node. This node may (depending on a
> >> whole bunch of timing and optimization things) send a change for the
> >> doc that is technically "before" the change that was already
> >> processed. So you do have to be able to recognize that you already
> >> processed a change for the doc. CouchDB does this internally by
> >> keeping the revision tree and checking revisions against that.
> >>
> >> I get the feeling that my description may not be super clear so I'll
> >> try and restate it as a sequence of events:
> >>
> >> 1. Client requests _changes with since=35-g1AAA...
> >> 2. Someone makes an update to doc foo
> >> 3. The first shard that handles the update is part of the changes feed
> >> from #1
> >> 4. Client reading _changes sees update appear in its feed
> >> 5. The node containing the shard with an update dies (an operator
> >> rebooted the wrong node perhaps)
> >> 6. Client has its changes feed disconnect and reconnects with
> >> since=35-g1AAA.. (or even a newer sequence)
> >> 7. The shard that responds as a replacement for the shard on the dead
> >> node does not (yet) have the update
> >> 8. The doc exists in the update sequence after where it starts reading
> >> its changes feed
> >> 9. Client sees the "old" change and must know that its old
> >>
> >> Note that this is all very unlikely and super sensitive to timing. For
> >> instance, one node seeing the update, and then dying, and the other
> >> two copies not receving the update would require some very peculiar
> >> network disconnections in the cluster while still being reachable from
> >> the load balancer. But the possibility remains which means apps will
> >> want to be able to handle it.
> >>
> >> Paul
> >>
> >> On Sat, Sep 3, 2016 at 12:09 PM, Robert Payne <ro...@me.com>
> wrote:
> >>> Thanks for this,
> >>>
> >>> Sorry it’s a bit light on details, it’s a very specific use case on iOS
> >> where cpu/disk speed is constrained and we have a large data set. The
> full
> >> replication protocol just requires too many reads/writes to be
> performant
> >> and we’ve optimised it various ways. We’re idempotent so long as
> >> per-document changes are in-order which I was just checking.
> >>>
> >>> I appreciate the more technical analysis and it certainly clears up
> what
> >> I was asking.
> >>>
> >>> Cheers,
> >>>
> >>> Robert
> >>>
> >>>> On 4/09/2016, at 4:41 AM, Robert Samuel Newson <rn...@apache.org>
> >> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> It is important to understand that the order of rows in the _changes
> >> response is not important. In couchdb before 2.0 the response was
> totally
> >> ordered, but this was never necessary for correctness. The essential
> >> contract for _changes is that you are guaranteed to see all changes made
> >> since the 'since' parameter you pass. The order of those changes is not
> >> guaranteed and it is also not guaranteed that changes from _before_ that
> >> 'since' value are _not_ also returned. The consequence of this contract
> is
> >> that all consumers of the _changes response must apply each row
> >> idempotently. This is true for the replicator, of course.
> >>>>
> >>>> The changes response in 2.0 is partially ordered. The changes from any
> >> given shard will be in a consistent order, but we merge the changes from
> >> each shard range of your database as they are collected from the various
> >> contributing nodes, we don't apply a total ordering over that. The
> reason
> >> is simple; it's expensive and unnecessary. It's important to also
> remember
> >> that replication, even before 2.0, would not replicate in strict source
> >> update order either, due to (valuable) parallelism when reading changes
> and
> >> applying them.
> >>>>
> >>>> Your question: "Is it possible for the changes feed to send older
> >> changes before newer changes for the same document ID across multiple
> >> calls?" requires a little further background knowledge before answering.
> >>>>
> >>>> While we call it a changes "feed" it's important to remember what it
> >> really is, internally, first. Every database in couchdb, prior to 2.0,
> is a
> >> single file with multiple b+trees recorded inside it that are kept in
> >> absolute sync with each other. One b+tree allows you to look up a
> document
> >> by the _id parameter. The other b+tree allows you to look up a document
> by
> >> its update order. It is essential to note that these two b+trees have
> the
> >> same number of key/value pairs in them at all times.
> >>>>
> >>>> To illustrate this more clearly, consider an empty database. We add
> one
> >> document to it. It is retrievable by its _id and is also visible in the
> >> _changes response as change number 1. Now, we update that document. It
> is
> >> now change number 2. Change number 1 will never again appear in the
> >> _changes response. That is, every document appears in the _changes
> response
> >> at its most recent update number.
> >>>>
> >>>> When you call _changes without the continuous parameter, couchdb is
> >> simply traversing that second b+tree and returning each row it finds. It
> >> may do this from the beginning (which was 1 before our update and 2
> after)
> >> or it may do so from some update seq you supply with the 'since'
> parameter.
> >>>>
> >>>> With that now understood, we can look at what changes when we do
> >> continuous=true which is what makes it a "feed" (that is, a potentially
> >> unending response of changes as they are made). This is sent in two
> phases.
> >> The first is exactly as the previous paragraph. Once all those changes
> have
> >> been sent, couchdb enters a loop where it returns updates as they happen
> >> (or shortly after).
> >>>>
> >>>> It is only in a continuous=true response in couchdb before 2.0 that
> you
> >> would ever see more than one change for any given document.
> >>>>
> >>>> So, to cut a long story short (too late), the answer to your question
> >> is "no". The changes feed is not a permanent history of all changes
> made to
> >> all documents. Once a document is updated, it is _moved_ to a newer
> >> position and no longer appears in its old one (and no record of that
> >> position is even preserved). Do note, though, that couchdb might return
> >> 'Doc A change (seq: 2-XXXX)' even if your 'since' parameter is _after_
> the
> >> last change to doc A. We won't return ' Doc A change (seq: 1-XXXX)' at
> all
> >> after its updated to 2-XXXX.
> >>>>
> >>>> The algorithm for correctly processing the changes response is as
> >> follows, and any variation on this is likely broken;
> >>>>
> >>>> 1) call /_changes?since=0
> >>>> 2) for each returned row, ensure the target has the change in question
> >> (either use _id + _rev to prevent duplicate application of the change or
> >> apply the change in a way that is idempotent)
> >>>> 3) periodically store the update seq of the last processed row to
> >> stable storage (a _local document is a good choice)
> >>>>
> >>>> If you wish to resume applying changes after a shutdown, reboot, or
> >> crash, repeat the above process but substitute your stored update
> sequence
> >> in the ?since= parameter.
> >>>>
> >>>> There are many things that use the changes feed in this way. Within
> >> couchdb, there's database replication (obviously) but also couchdb
> views.
> >> Outside of the core, software like pouchdb and couchdb-lucene use the
> >> changes feed to replicate data or update search indexes.
> >>>>
> >>>> I hope this was useful, and I think it might expose some problems in
> >> your couchdb-to-sqlite synchronisation protocol. Your email is obviously
> >> silent on many details there, but if you've predicated its design on the
> >> total ordering properties of couchdb < 2.0, you likely have some work
> to do.
> >>>>
> >>>> B.
> >>>>
> >>>>
> >>>>> On 3 Sep 2016, at 00:04, Robert Payne <ro...@me.com> wrote:
> >>>>>
> >>>>> Hey Everyone,
> >>>>>
> >>>>> Reading up on the CouchDB 2.0 migration guides and getting a bit
> antsy
> >> around the mentions of out of order changes feed and sorts. Is it
> possible
> >> for the changes feed to send older changes before newer changes for the
> >> same document ID across multiple calls?
> >>>>>
> >>>>> Assuming start at ?since=“” and always pass in the “last_seq” on
> every
> >> additional call could a situation like this occur in a single or
> multiple
> >> HTTP calls:
> >>>>>
> >>>>> — Changes feed emits Doc A change (seq: 2-XXXX)
> >>>>> — Changes feed emits Doc B change (seq: 3-XXXX)
> >>>>> — Changes feed emits Doc A change (seq: 1-XXXX)
> >>>>>
> >>>>> I’m really hoping the case is just that across different doc ids
> >> changes can be out of order. Our use case on mobile is a bit particular
> as
> >> we duplicate edits into a separate SQLite table and use the changes
> feed to
> >> keep the local database up to date with winning revs from the server, it
> >> just increases the performance of sync by a ton since there is only 1
> check
> >> and set in SQLite per change that comes in.
> >>>>>
> >>>>> Cheers,
> >>>>> Robert
> >>
> >>> Geoffrey Cox <ma...@gmail.com>
> >>> 4. June 2017 at 06:04
> >>> I’m digging deeper into CouchDB 2 and I’m finding some unexpected
> >> ordering
> >>> with sequence numbers. In one case, I found that an early change in a
> >>> _changes feed has the sequence number
> >>>
> >>>
> >>>
> >>>
> >>
> *99-g1AAAAI-eJyd0EsOgjAQBuAGiI-dN9C9LmrBwqzkJtrSNkgQV6z1JnoTvYneBEvbhA0aMU1mkj6-_NMSITTJfYFm2anOcsFT10mpTzyG-LxpmiL32eqoN8aEAcWE9dz_jPCFrnzrHGQchiFM4kSgaV0JqQ6VFF-AtAV2DggMgCEGxrNhQfatc3bOyDiKUalg2EBVoCu66KapazcUh41e69-GssjNIvcWWRokk2oNofwj0MNazy4QFURhGQ0J9LKI-SHPIBHEgiak51nxBhxnrRk*
> >>>
> >>>
> >>>
> >>> The last sequence number in my _changes feed, for the same DB, is
> >>>
> >>>
> >>>
> >>>
> >>
> *228-g1AAAAJFeJyd0EkOgjAUBuAGTJCdN9AjlIKFruQm2jFAEFes9SZ6E72J3gQ7JW7QCGnyXtLhy-vfAgCWVSjAip96XglW-o5afRJQwNbDMDRVSOuj3ogQJRgiOnL_O8I2urKdd4B1KCRpkRcCxH0npKo7KX4ApQH2HogsAElOKOPTBjkY5-yd2DqKYqnItA91C13BRTdNXY0VWouRrV7JDOvmrLuxlLW4VAlJ5Qzr4aznJ2wskIIy-y9sh7wcYoMKLJKRXOACjTxr3uHcsBE*
> >>>
> >>>
> >>>
> >>> In a browser console, the following is false
> >>>
> >>>
> >>>
> >>>
> >>
> '228-g1AAAAJFeJyd0EkOgjAUBuAGTJCdN9AjlIKFruQm2jFAEFes9SZ6E72J3gQ7JW7QCGnyXtLhy-vfAgCWVSjAip96XglW-o5afRJQwNbDMDRVSOuj3ogQJRgiOnL_O8I2urKdd4B1KCRpkRcCxH0npKo7KX4ApQH2HogsAElOKOPTBjkY5-yd2DqKYqnItA91C13BRTdNXY0VWouRrV7JDOvmrLuxlLW4VAlJ5Qzr4aznJ2wskIIy-y9sh7wcYoMKLJKRXOACjTxr3uHcsBE'
> >>>
> >>
> '99-g1AAAAI-eJyd0EsOgjAQBuAGiI-dN9C9LmrBwqzkJtrSNkgQV6z1JnoTvYneBEvbhA0aMU1mkj6-_NMSITTJfYFm2anOcsFT10mpTzyG-LxpmiL32eqoN8aEAcWE9dz_jPCFrnzrHGQchiFM4kSgaV0JqQ6VFF-AtAV2DggMgCEGxrNhQfatc3bOyDiKUalg2EBVoCu66KapazcUh41e69-GssjNIvcWWRokk2oNofwj0MNazy4QFURhGQ0J9LKI-SHPIBHEgiak51nxBhxnrRk'
> >>>
> >>>
> >>>
> >>> Is this a bug or do I need to use some other method to compare sequence
> >>> numbers?
> >>>
> >>>
> >>>
> >>> In looking at the other sequence numbers in my _changes feed, it looks
> >>> like
> >>> they are generally ordered as I would expect, but in this case it
> appears
> >>> that when the first number, e.g. 99, jumps from 2 digits to 3 digits,
> the
> >>> ordering breaks. If you boil this down to a simple string comparison
> >>> example, you can see that '228' > '99' => false
> >>>
> >>>
> >>> Thanks.
> >>>
> >>>
> >>> Geoff
> >>>
> >>
> >>
>
>

Re: Sequence number bug in CouchDB 2 or is there another way to compare sequence numbers?

Posted by Robert Samuel Newson <rn...@apache.org>.
Nope, it won't. There is no single order to the changes feed, so even if you could decode these values you still couldn't decide if one seq was before another. I'll illustrate with unencoded values.

Imagine an empty (N=2, Q=2) database, then we add two documents at the same time that map to different shards. We make two requests to _changes at the same time but, critically, one changes feed is fed by node2 for the 81-ff range and sees that update and hits node1 for the 00-80 range which has not yet seen the other update. the other changes feed sees the opposite. 

1. [{node1, 00-80, 0}, {node2, 81-ff, 1}]
2. [{node1, 81-ff, 1}, {node2, 00-80, 0}]

is seq 1 greater than 2? No. is seq 2 greater than 1? No.

Fractionally later, of course, both nodes see both updates and so the last seq of any changes feed would converge to any of the following;

[{node1, 00-80, 1}, {node1, 81-ff, 1}]
[{node1, 00-80, 1}, {node2, 81-ff, 1}]
[{node2, 00-80, 1}, {node1, 81-ff, 1}]
[{node2, 00-80, 1}, {node2, 81-ff, 1}]

note also that all four of the seq values above will encode to different string values (though the number on the front will be the same, 2).

The algorithm for correctly consuming the changes feed is described in the quoted posts but I'll pull it out for clarity;

1. read /dbname/_changes
2. process each row idempotently
3. periodically (every X seconds or every X rows) store the "seq" value of the last row you processed

If you ever crash, or if you weren't using continuous=true, you can do this same procedure again but modified in step 1;

revised 1. read /dbname/_changes?since=X

where X is the value you saved in step 3. If you're not using continuous mode then you could just record the "last_seq" value at the end of consuming the non-continuous response. You run the risk of reprocessing a lot more items, though.

With this scheme (which the replicator and all indexers follow), you don't care if the results come out of order, you don't need to compare any two seq values.

You _do_ need to ensure you can correctly process the same change multiple times. For an example of that, consider the replicator, when it sees a row from a changes feed it asks the target database if it contains the _id and _rev values from that row. If it does, the replicator moves on to the next row. If it doesn't, it tries to write the document in that row to the target database. In the event of a crash, and therefore a call to _changes with a seq value from before processing that row, it will ask the target database if it has the _id/_rev again, only this time the target will say yes.

We are very interested in any evidence that this algorithm does not work as it is fundamental to CouchDB replication and internal data redundancy too.

B.


> On 4 Jun 2017, at 16:06, Geoffrey Cox <re...@gmail.com> wrote:
> 
> Wow, thanks for all the great information, Robert and Alexander! I think it
> would be very valuable for this information to be included in the CouchDB
> docs. I want to create a StackOverflow post that summarizes some of this
> stuff once I understand it just a little better.
> 
> 
> 
> I actually came up with an algorithm that appears to work correctly when
> comparing sequence numbers to generally determine which sequence number is
> the later.
> 
> 
> 
> var sequenceGreaterThan = function (s1, s2) {
> 
> if (typeof s1 === 'number') { // CouchDB 1?
> 
>   return s1 > s2;
> 
> } else {
> 
>   var parts1 = s1.split(/-/),
> 
>     s1Int = parseInt(parts1[0]),
> 
>     parts2 = s2.split(/-/),
> 
>     s2Int = parseInt(parts2[0]);
> 
>   return s1Int > s2Int || (s1Int === s2Int && parts1[1] > parts2[1]);
> 
> }
> 
> };
> 
> 
> 
> Let me explain why it is necessary and hopefully someone can verify that it
> will work for my case. As part of implementing
> https://github.com/redgeoff/spiegel for CouchDB 2, I’ve had to port
> sequence number comparisons. There is a piece of Spiegel that listens to
> the changes feed on a database and then executes a function to process the
> change. As is mentioned, changes can be received out of order or even
> replayed and this is fine, but the listener needs to come back later and
> pick up from where it has left off to process the next set of changes. To
> do this, there is code that checks for the latest sequence number so that
> it can be used in the next call to _changes. (Continuous listening on the
> _changes feed is not desired here as one of the goals of Spiegel is to
> listen to many databases without using many database connections).
> 
> 
> 
> For my particular case, does sequenceGreaterThan appear to work as intended
> and guarantee that my listening on the _changes feed will generally move
> forward?
> 
> 
> 
> (In my testing, I’ve found that using the sequence number of the last item
> in a _changes feed in the next call to the _changes feed doesn’t
> necessarily return the next set of changes. It appears to be a fairly rare
> occurrence, but this is why I am using a sequence number comparison).
> 
> 
> Thanks!
> 
> 
> Geoff
> 
> 
> On Sun, Jun 4, 2017 at 3:07 AM Alexander Harm <co...@aharm.de> wrote:
> 
>> Hello Geoffrey,
>> 
>> some time ago Robert and Paul explained the ordering in the changes
>> feed. Maybe that helps:
>> 
>> Hey all,
>> 
>> Bob did a pretty good job explaining how the changes feed works most
>> of the time but I did want to call attention to an error condition
>> that might be of concern to the original question. There is a
>> situation where you can see old changes in the response depending on
>> some timing and error conditions.
>> 
>> For a bit of background on the since parameter, when we are looking at
>> a clustered since sequence like such:
>> 
>> 
>> 35-g1AAAAG1eJyNz0EKwjAQBdBoC2I3nkEPEGppSLOyV8k0U2pJE9C61pvpzWKaitBNyWYGhs9_jCaEpF2iyFFBY29YK-B0sNbcu6tB2mj7UNKM1OCofXQrCVycc32XyP3gDztExlkpYwqWTLHGQO0nPH_SJkh5i6IVTUzHUmJrkkn9JC-_PPaetCxoLYe8AhHTM2mnf-q8-tjMfWYuPHcIHIiyqDiPKuq_TDeP1A
>> 
>> What that actually contains is an encoded set of {shard, node,
>> update_seq} triples kinda like such:
>> 
>> [
>>    {"shards/00000000-3fffffff/davisp/test-db.1384769918", db7,  { 9,
>> <<"ee5754a">>, db7}},
>>    {"shards/40000000-7fffffff/davisp/test-db.1384769918", db2,  { 1,
>> <<"0fe9f9c">>, db2}},
>>    {"shards/80000000-bfffffff/davisp/test-db.1384769918", db5,  {10,
>> <<"f7b08b9">>, db5}},
>>    {"shards/c0000000-ffffffff/davisp/test-db.1384769918", db12, {15,
>> <<"b942877">>, db12}}
>> ]
>> 
>> What's happening here is that when you specify that parameter, CouchDB
>> will decode it and then try and read the changes from each of those
>> shards resuming at the given update seq. As an aside the extra uuid
>> prefix and node name are extra so that we can try and skip some old
>> changes, an optimization but not important for this discussion.
>> 
>> Now, the important bit is that if we specify this since seq and one of
>> the nodes db2, db5, db7, or db12 happens to be down (or just gone if
>> you stored that since seq for a long time and say the cluster changed
>> size) then CouchDB has to choose another shard to replace the missing
>> node. When this happens you will see "old" changes that have already
>> been processed. Your application should be able to handle this using
>> the approaches that Bob listed in his email.
>> 
>> However, (there's always a however in distributed systems) there
>> exists a timing situation where you may be reading the changes feed,
>> an update comes in to the shard you're reading from, and you see the
>> change. Then say that node goes down (which would terminate the
>> changes feed). The client would then reconnect with their latest
>> update seq and get a different node. This node may (depending on a
>> whole bunch of timing and optimization things) send a change for the
>> doc that is technically "before" the change that was already
>> processed. So you do have to be able to recognize that you already
>> processed a change for the doc. CouchDB does this internally by
>> keeping the revision tree and checking revisions against that.
>> 
>> I get the feeling that my description may not be super clear so I'll
>> try and restate it as a sequence of events:
>> 
>> 1. Client requests _changes with since=35-g1AAA...
>> 2. Someone makes an update to doc foo
>> 3. The first shard that handles the update is part of the changes feed
>> from #1
>> 4. Client reading _changes sees update appear in its feed
>> 5. The node containing the shard with an update dies (an operator
>> rebooted the wrong node perhaps)
>> 6. Client has its changes feed disconnect and reconnects with
>> since=35-g1AAA.. (or even a newer sequence)
>> 7. The shard that responds as a replacement for the shard on the dead
>> node does not (yet) have the update
>> 8. The doc exists in the update sequence after where it starts reading
>> its changes feed
>> 9. Client sees the "old" change and must know that its old
>> 
>> Note that this is all very unlikely and super sensitive to timing. For
>> instance, one node seeing the update, and then dying, and the other
>> two copies not receving the update would require some very peculiar
>> network disconnections in the cluster while still being reachable from
>> the load balancer. But the possibility remains which means apps will
>> want to be able to handle it.
>> 
>> Paul
>> 
>> On Sat, Sep 3, 2016 at 12:09 PM, Robert Payne <ro...@me.com> wrote:
>>> Thanks for this,
>>> 
>>> Sorry it’s a bit light on details, it’s a very specific use case on iOS
>> where cpu/disk speed is constrained and we have a large data set. The full
>> replication protocol just requires too many reads/writes to be performant
>> and we’ve optimised it various ways. We’re idempotent so long as
>> per-document changes are in-order which I was just checking.
>>> 
>>> I appreciate the more technical analysis and it certainly clears up what
>> I was asking.
>>> 
>>> Cheers,
>>> 
>>> Robert
>>> 
>>>> On 4/09/2016, at 4:41 AM, Robert Samuel Newson <rn...@apache.org>
>> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> It is important to understand that the order of rows in the _changes
>> response is not important. In couchdb before 2.0 the response was totally
>> ordered, but this was never necessary for correctness. The essential
>> contract for _changes is that you are guaranteed to see all changes made
>> since the 'since' parameter you pass. The order of those changes is not
>> guaranteed and it is also not guaranteed that changes from _before_ that
>> 'since' value are _not_ also returned. The consequence of this contract is
>> that all consumers of the _changes response must apply each row
>> idempotently. This is true for the replicator, of course.
>>>> 
>>>> The changes response in 2.0 is partially ordered. The changes from any
>> given shard will be in a consistent order, but we merge the changes from
>> each shard range of your database as they are collected from the various
>> contributing nodes, we don't apply a total ordering over that. The reason
>> is simple; it's expensive and unnecessary. It's important to also remember
>> that replication, even before 2.0, would not replicate in strict source
>> update order either, due to (valuable) parallelism when reading changes and
>> applying them.
>>>> 
>>>> Your question: "Is it possible for the changes feed to send older
>> changes before newer changes for the same document ID across multiple
>> calls?" requires a little further background knowledge before answering.
>>>> 
>>>> While we call it a changes "feed" it's important to remember what it
>> really is, internally, first. Every database in couchdb, prior to 2.0, is a
>> single file with multiple b+trees recorded inside it that are kept in
>> absolute sync with each other. One b+tree allows you to look up a document
>> by the _id parameter. The other b+tree allows you to look up a document by
>> its update order. It is essential to note that these two b+trees have the
>> same number of key/value pairs in them at all times.
>>>> 
>>>> To illustrate this more clearly, consider an empty database. We add one
>> document to it. It is retrievable by its _id and is also visible in the
>> _changes response as change number 1. Now, we update that document. It is
>> now change number 2. Change number 1 will never again appear in the
>> _changes response. That is, every document appears in the _changes response
>> at its most recent update number.
>>>> 
>>>> When you call _changes without the continuous parameter, couchdb is
>> simply traversing that second b+tree and returning each row it finds. It
>> may do this from the beginning (which was 1 before our update and 2 after)
>> or it may do so from some update seq you supply with the 'since' parameter.
>>>> 
>>>> With that now understood, we can look at what changes when we do
>> continuous=true which is what makes it a "feed" (that is, a potentially
>> unending response of changes as they are made). This is sent in two phases.
>> The first is exactly as the previous paragraph. Once all those changes have
>> been sent, couchdb enters a loop where it returns updates as they happen
>> (or shortly after).
>>>> 
>>>> It is only in a continuous=true response in couchdb before 2.0 that you
>> would ever see more than one change for any given document.
>>>> 
>>>> So, to cut a long story short (too late), the answer to your question
>> is "no". The changes feed is not a permanent history of all changes made to
>> all documents. Once a document is updated, it is _moved_ to a newer
>> position and no longer appears in its old one (and no record of that
>> position is even preserved). Do note, though, that couchdb might return
>> 'Doc A change (seq: 2-XXXX)' even if your 'since' parameter is _after_ the
>> last change to doc A. We won't return ' Doc A change (seq: 1-XXXX)' at all
>> after its updated to 2-XXXX.
>>>> 
>>>> The algorithm for correctly processing the changes response is as
>> follows, and any variation on this is likely broken;
>>>> 
>>>> 1) call /_changes?since=0
>>>> 2) for each returned row, ensure the target has the change in question
>> (either use _id + _rev to prevent duplicate application of the change or
>> apply the change in a way that is idempotent)
>>>> 3) periodically store the update seq of the last processed row to
>> stable storage (a _local document is a good choice)
>>>> 
>>>> If you wish to resume applying changes after a shutdown, reboot, or
>> crash, repeat the above process but substitute your stored update sequence
>> in the ?since= parameter.
>>>> 
>>>> There are many things that use the changes feed in this way. Within
>> couchdb, there's database replication (obviously) but also couchdb views.
>> Outside of the core, software like pouchdb and couchdb-lucene use the
>> changes feed to replicate data or update search indexes.
>>>> 
>>>> I hope this was useful, and I think it might expose some problems in
>> your couchdb-to-sqlite synchronisation protocol. Your email is obviously
>> silent on many details there, but if you've predicated its design on the
>> total ordering properties of couchdb < 2.0, you likely have some work to do.
>>>> 
>>>> B.
>>>> 
>>>> 
>>>>> On 3 Sep 2016, at 00:04, Robert Payne <ro...@me.com> wrote:
>>>>> 
>>>>> Hey Everyone,
>>>>> 
>>>>> Reading up on the CouchDB 2.0 migration guides and getting a bit antsy
>> around the mentions of out of order changes feed and sorts. Is it possible
>> for the changes feed to send older changes before newer changes for the
>> same document ID across multiple calls?
>>>>> 
>>>>> Assuming start at ?since=“” and always pass in the “last_seq” on every
>> additional call could a situation like this occur in a single or multiple
>> HTTP calls:
>>>>> 
>>>>> — Changes feed emits Doc A change (seq: 2-XXXX)
>>>>> — Changes feed emits Doc B change (seq: 3-XXXX)
>>>>> — Changes feed emits Doc A change (seq: 1-XXXX)
>>>>> 
>>>>> I’m really hoping the case is just that across different doc ids
>> changes can be out of order. Our use case on mobile is a bit particular as
>> we duplicate edits into a separate SQLite table and use the changes feed to
>> keep the local database up to date with winning revs from the server, it
>> just increases the performance of sync by a ton since there is only 1 check
>> and set in SQLite per change that comes in.
>>>>> 
>>>>> Cheers,
>>>>> Robert
>> 
>>> Geoffrey Cox <ma...@gmail.com>
>>> 4. June 2017 at 06:04
>>> I’m digging deeper into CouchDB 2 and I’m finding some unexpected
>> ordering
>>> with sequence numbers. In one case, I found that an early change in a
>>> _changes feed has the sequence number
>>> 
>>> 
>>> 
>>> 
>> *99-g1AAAAI-eJyd0EsOgjAQBuAGiI-dN9C9LmrBwqzkJtrSNkgQV6z1JnoTvYneBEvbhA0aMU1mkj6-_NMSITTJfYFm2anOcsFT10mpTzyG-LxpmiL32eqoN8aEAcWE9dz_jPCFrnzrHGQchiFM4kSgaV0JqQ6VFF-AtAV2DggMgCEGxrNhQfatc3bOyDiKUalg2EBVoCu66KapazcUh41e69-GssjNIvcWWRokk2oNofwj0MNazy4QFURhGQ0J9LKI-SHPIBHEgiak51nxBhxnrRk*
>>> 
>>> 
>>> 
>>> The last sequence number in my _changes feed, for the same DB, is
>>> 
>>> 
>>> 
>>> 
>> *228-g1AAAAJFeJyd0EkOgjAUBuAGTJCdN9AjlIKFruQm2jFAEFes9SZ6E72J3gQ7JW7QCGnyXtLhy-vfAgCWVSjAip96XglW-o5afRJQwNbDMDRVSOuj3ogQJRgiOnL_O8I2urKdd4B1KCRpkRcCxH0npKo7KX4ApQH2HogsAElOKOPTBjkY5-yd2DqKYqnItA91C13BRTdNXY0VWouRrV7JDOvmrLuxlLW4VAlJ5Qzr4aznJ2wskIIy-y9sh7wcYoMKLJKRXOACjTxr3uHcsBE*
>>> 
>>> 
>>> 
>>> In a browser console, the following is false
>>> 
>>> 
>>> 
>>> 
>> '228-g1AAAAJFeJyd0EkOgjAUBuAGTJCdN9AjlIKFruQm2jFAEFes9SZ6E72J3gQ7JW7QCGnyXtLhy-vfAgCWVSjAip96XglW-o5afRJQwNbDMDRVSOuj3ogQJRgiOnL_O8I2urKdd4B1KCRpkRcCxH0npKo7KX4ApQH2HogsAElOKOPTBjkY5-yd2DqKYqnItA91C13BRTdNXY0VWouRrV7JDOvmrLuxlLW4VAlJ5Qzr4aznJ2wskIIy-y9sh7wcYoMKLJKRXOACjTxr3uHcsBE'
>>> 
>> '99-g1AAAAI-eJyd0EsOgjAQBuAGiI-dN9C9LmrBwqzkJtrSNkgQV6z1JnoTvYneBEvbhA0aMU1mkj6-_NMSITTJfYFm2anOcsFT10mpTzyG-LxpmiL32eqoN8aEAcWE9dz_jPCFrnzrHGQchiFM4kSgaV0JqQ6VFF-AtAV2DggMgCEGxrNhQfatc3bOyDiKUalg2EBVoCu66KapazcUh41e69-GssjNIvcWWRokk2oNofwj0MNazy4QFURhGQ0J9LKI-SHPIBHEgiak51nxBhxnrRk'
>>> 
>>> 
>>> 
>>> Is this a bug or do I need to use some other method to compare sequence
>>> numbers?
>>> 
>>> 
>>> 
>>> In looking at the other sequence numbers in my _changes feed, it looks
>>> like
>>> they are generally ordered as I would expect, but in this case it appears
>>> that when the first number, e.g. 99, jumps from 2 digits to 3 digits, the
>>> ordering breaks. If you boil this down to a simple string comparison
>>> example, you can see that '228' > '99' => false
>>> 
>>> 
>>> Thanks.
>>> 
>>> 
>>> Geoff
>>> 
>> 
>> 


Re: Sequence number bug in CouchDB 2 or is there another way to compare sequence numbers?

Posted by Geoffrey Cox <re...@gmail.com>.
Wow, thanks for all the great information, Robert and Alexander! I think it
would be very valuable for this information to be included in the CouchDB
docs. I want to create a StackOverflow post that summarizes some of this
stuff once I understand it just a little better.



I actually came up with an algorithm that appears to work correctly when
comparing sequence numbers to generally determine which sequence number is
the later.



var sequenceGreaterThan = function (s1, s2) {

 if (typeof s1 === 'number') { // CouchDB 1?

   return s1 > s2;

 } else {

   var parts1 = s1.split(/-/),

     s1Int = parseInt(parts1[0]),

     parts2 = s2.split(/-/),

     s2Int = parseInt(parts2[0]);

   return s1Int > s2Int || (s1Int === s2Int && parts1[1] > parts2[1]);

 }

};



Let me explain why it is necessary and hopefully someone can verify that it
will work for my case. As part of implementing
https://github.com/redgeoff/spiegel for CouchDB 2, I’ve had to port
sequence number comparisons. There is a piece of Spiegel that listens to
the changes feed on a database and then executes a function to process the
change. As is mentioned, changes can be received out of order or even
replayed and this is fine, but the listener needs to come back later and
pick up from where it has left off to process the next set of changes. To
do this, there is code that checks for the latest sequence number so that
it can be used in the next call to _changes. (Continuous listening on the
_changes feed is not desired here as one of the goals of Spiegel is to
listen to many databases without using many database connections).



For my particular case, does sequenceGreaterThan appear to work as intended
and guarantee that my listening on the _changes feed will generally move
forward?



(In my testing, I’ve found that using the sequence number of the last item
in a _changes feed in the next call to the _changes feed doesn’t
necessarily return the next set of changes. It appears to be a fairly rare
occurrence, but this is why I am using a sequence number comparison).


Thanks!


Geoff


On Sun, Jun 4, 2017 at 3:07 AM Alexander Harm <co...@aharm.de> wrote:

> Hello Geoffrey,
>
> some time ago Robert and Paul explained the ordering in the changes
> feed. Maybe that helps:
>
> Hey all,
>
> Bob did a pretty good job explaining how the changes feed works most
> of the time but I did want to call attention to an error condition
> that might be of concern to the original question. There is a
> situation where you can see old changes in the response depending on
> some timing and error conditions.
>
> For a bit of background on the since parameter, when we are looking at
> a clustered since sequence like such:
>
>
> 35-g1AAAAG1eJyNz0EKwjAQBdBoC2I3nkEPEGppSLOyV8k0U2pJE9C61pvpzWKaitBNyWYGhs9_jCaEpF2iyFFBY29YK-B0sNbcu6tB2mj7UNKM1OCofXQrCVycc32XyP3gDztExlkpYwqWTLHGQO0nPH_SJkh5i6IVTUzHUmJrkkn9JC-_PPaetCxoLYe8AhHTM2mnf-q8-tjMfWYuPHcIHIiyqDiPKuq_TDeP1A
>
> What that actually contains is an encoded set of {shard, node,
> update_seq} triples kinda like such:
>
> [
>     {"shards/00000000-3fffffff/davisp/test-db.1384769918", db7,  { 9,
> <<"ee5754a">>, db7}},
>     {"shards/40000000-7fffffff/davisp/test-db.1384769918", db2,  { 1,
> <<"0fe9f9c">>, db2}},
>     {"shards/80000000-bfffffff/davisp/test-db.1384769918", db5,  {10,
> <<"f7b08b9">>, db5}},
>     {"shards/c0000000-ffffffff/davisp/test-db.1384769918", db12, {15,
> <<"b942877">>, db12}}
> ]
>
> What's happening here is that when you specify that parameter, CouchDB
> will decode it and then try and read the changes from each of those
> shards resuming at the given update seq. As an aside the extra uuid
> prefix and node name are extra so that we can try and skip some old
> changes, an optimization but not important for this discussion.
>
> Now, the important bit is that if we specify this since seq and one of
> the nodes db2, db5, db7, or db12 happens to be down (or just gone if
> you stored that since seq for a long time and say the cluster changed
> size) then CouchDB has to choose another shard to replace the missing
> node. When this happens you will see "old" changes that have already
> been processed. Your application should be able to handle this using
> the approaches that Bob listed in his email.
>
> However, (there's always a however in distributed systems) there
> exists a timing situation where you may be reading the changes feed,
> an update comes in to the shard you're reading from, and you see the
> change. Then say that node goes down (which would terminate the
> changes feed). The client would then reconnect with their latest
> update seq and get a different node. This node may (depending on a
> whole bunch of timing and optimization things) send a change for the
> doc that is technically "before" the change that was already
> processed. So you do have to be able to recognize that you already
> processed a change for the doc. CouchDB does this internally by
> keeping the revision tree and checking revisions against that.
>
> I get the feeling that my description may not be super clear so I'll
> try and restate it as a sequence of events:
>
> 1. Client requests _changes with since=35-g1AAA...
> 2. Someone makes an update to doc foo
> 3. The first shard that handles the update is part of the changes feed
> from #1
> 4. Client reading _changes sees update appear in its feed
> 5. The node containing the shard with an update dies (an operator
> rebooted the wrong node perhaps)
> 6. Client has its changes feed disconnect and reconnects with
> since=35-g1AAA.. (or even a newer sequence)
> 7. The shard that responds as a replacement for the shard on the dead
> node does not (yet) have the update
> 8. The doc exists in the update sequence after where it starts reading
> its changes feed
> 9. Client sees the "old" change and must know that its old
>
> Note that this is all very unlikely and super sensitive to timing. For
> instance, one node seeing the update, and then dying, and the other
> two copies not receving the update would require some very peculiar
> network disconnections in the cluster while still being reachable from
> the load balancer. But the possibility remains which means apps will
> want to be able to handle it.
>
> Paul
>
> On Sat, Sep 3, 2016 at 12:09 PM, Robert Payne <ro...@me.com> wrote:
> > Thanks for this,
> >
> > Sorry it’s a bit light on details, it’s a very specific use case on iOS
> where cpu/disk speed is constrained and we have a large data set. The full
> replication protocol just requires too many reads/writes to be performant
> and we’ve optimised it various ways. We’re idempotent so long as
> per-document changes are in-order which I was just checking.
> >
> > I appreciate the more technical analysis and it certainly clears up what
> I was asking.
> >
> > Cheers,
> >
> > Robert
> >
> >> On 4/09/2016, at 4:41 AM, Robert Samuel Newson <rn...@apache.org>
> wrote:
> >>
> >> Hi,
> >>
> >> It is important to understand that the order of rows in the _changes
> response is not important. In couchdb before 2.0 the response was totally
> ordered, but this was never necessary for correctness. The essential
> contract for _changes is that you are guaranteed to see all changes made
> since the 'since' parameter you pass. The order of those changes is not
> guaranteed and it is also not guaranteed that changes from _before_ that
> 'since' value are _not_ also returned. The consequence of this contract is
> that all consumers of the _changes response must apply each row
> idempotently. This is true for the replicator, of course.
> >>
> >> The changes response in 2.0 is partially ordered. The changes from any
> given shard will be in a consistent order, but we merge the changes from
> each shard range of your database as they are collected from the various
> contributing nodes, we don't apply a total ordering over that. The reason
> is simple; it's expensive and unnecessary. It's important to also remember
> that replication, even before 2.0, would not replicate in strict source
> update order either, due to (valuable) parallelism when reading changes and
> applying them.
> >>
> >> Your question: "Is it possible for the changes feed to send older
> changes before newer changes for the same document ID across multiple
> calls?" requires a little further background knowledge before answering.
> >>
> >> While we call it a changes "feed" it's important to remember what it
> really is, internally, first. Every database in couchdb, prior to 2.0, is a
> single file with multiple b+trees recorded inside it that are kept in
> absolute sync with each other. One b+tree allows you to look up a document
> by the _id parameter. The other b+tree allows you to look up a document by
> its update order. It is essential to note that these two b+trees have the
> same number of key/value pairs in them at all times.
> >>
> >> To illustrate this more clearly, consider an empty database. We add one
> document to it. It is retrievable by its _id and is also visible in the
> _changes response as change number 1. Now, we update that document. It is
> now change number 2. Change number 1 will never again appear in the
> _changes response. That is, every document appears in the _changes response
> at its most recent update number.
> >>
> >> When you call _changes without the continuous parameter, couchdb is
> simply traversing that second b+tree and returning each row it finds. It
> may do this from the beginning (which was 1 before our update and 2 after)
> or it may do so from some update seq you supply with the 'since' parameter.
> >>
> >> With that now understood, we can look at what changes when we do
> continuous=true which is what makes it a "feed" (that is, a potentially
> unending response of changes as they are made). This is sent in two phases.
> The first is exactly as the previous paragraph. Once all those changes have
> been sent, couchdb enters a loop where it returns updates as they happen
> (or shortly after).
> >>
> >> It is only in a continuous=true response in couchdb before 2.0 that you
> would ever see more than one change for any given document.
> >>
> >> So, to cut a long story short (too late), the answer to your question
> is "no". The changes feed is not a permanent history of all changes made to
> all documents. Once a document is updated, it is _moved_ to a newer
> position and no longer appears in its old one (and no record of that
> position is even preserved). Do note, though, that couchdb might return
> 'Doc A change (seq: 2-XXXX)' even if your 'since' parameter is _after_ the
> last change to doc A. We won't return ' Doc A change (seq: 1-XXXX)' at all
> after its updated to 2-XXXX.
> >>
> >> The algorithm for correctly processing the changes response is as
> follows, and any variation on this is likely broken;
> >>
> >> 1) call /_changes?since=0
> >> 2) for each returned row, ensure the target has the change in question
> (either use _id + _rev to prevent duplicate application of the change or
> apply the change in a way that is idempotent)
> >> 3) periodically store the update seq of the last processed row to
> stable storage (a _local document is a good choice)
> >>
> >> If you wish to resume applying changes after a shutdown, reboot, or
> crash, repeat the above process but substitute your stored update sequence
> in the ?since= parameter.
> >>
> >> There are many things that use the changes feed in this way. Within
> couchdb, there's database replication (obviously) but also couchdb views.
> Outside of the core, software like pouchdb and couchdb-lucene use the
> changes feed to replicate data or update search indexes.
> >>
> >> I hope this was useful, and I think it might expose some problems in
> your couchdb-to-sqlite synchronisation protocol. Your email is obviously
> silent on many details there, but if you've predicated its design on the
> total ordering properties of couchdb < 2.0, you likely have some work to do.
> >>
> >> B.
> >>
> >>
> >>> On 3 Sep 2016, at 00:04, Robert Payne <ro...@me.com> wrote:
> >>>
> >>> Hey Everyone,
> >>>
> >>> Reading up on the CouchDB 2.0 migration guides and getting a bit antsy
> around the mentions of out of order changes feed and sorts. Is it possible
> for the changes feed to send older changes before newer changes for the
> same document ID across multiple calls?
> >>>
> >>> Assuming start at ?since=“” and always pass in the “last_seq” on every
> additional call could a situation like this occur in a single or multiple
> HTTP calls:
> >>>
> >>> — Changes feed emits Doc A change (seq: 2-XXXX)
> >>> — Changes feed emits Doc B change (seq: 3-XXXX)
> >>> — Changes feed emits Doc A change (seq: 1-XXXX)
> >>>
> >>> I’m really hoping the case is just that across different doc ids
> changes can be out of order. Our use case on mobile is a bit particular as
> we duplicate edits into a separate SQLite table and use the changes feed to
> keep the local database up to date with winning revs from the server, it
> just increases the performance of sync by a ton since there is only 1 check
> and set in SQLite per change that comes in.
> >>>
> >>> Cheers,
> >>> Robert
>
> > Geoffrey Cox <ma...@gmail.com>
> > 4. June 2017 at 06:04
> > I’m digging deeper into CouchDB 2 and I’m finding some unexpected
> ordering
> > with sequence numbers. In one case, I found that an early change in a
> > _changes feed has the sequence number
> >
> >
> >
> >
> *99-g1AAAAI-eJyd0EsOgjAQBuAGiI-dN9C9LmrBwqzkJtrSNkgQV6z1JnoTvYneBEvbhA0aMU1mkj6-_NMSITTJfYFm2anOcsFT10mpTzyG-LxpmiL32eqoN8aEAcWE9dz_jPCFrnzrHGQchiFM4kSgaV0JqQ6VFF-AtAV2DggMgCEGxrNhQfatc3bOyDiKUalg2EBVoCu66KapazcUh41e69-GssjNIvcWWRokk2oNofwj0MNazy4QFURhGQ0J9LKI-SHPIBHEgiak51nxBhxnrRk*
> >
> >
> >
> > The last sequence number in my _changes feed, for the same DB, is
> >
> >
> >
> >
> *228-g1AAAAJFeJyd0EkOgjAUBuAGTJCdN9AjlIKFruQm2jFAEFes9SZ6E72J3gQ7JW7QCGnyXtLhy-vfAgCWVSjAip96XglW-o5afRJQwNbDMDRVSOuj3ogQJRgiOnL_O8I2urKdd4B1KCRpkRcCxH0npKo7KX4ApQH2HogsAElOKOPTBjkY5-yd2DqKYqnItA91C13BRTdNXY0VWouRrV7JDOvmrLuxlLW4VAlJ5Qzr4aznJ2wskIIy-y9sh7wcYoMKLJKRXOACjTxr3uHcsBE*
> >
> >
> >
> > In a browser console, the following is false
> >
> >
> >
> >
> '228-g1AAAAJFeJyd0EkOgjAUBuAGTJCdN9AjlIKFruQm2jFAEFes9SZ6E72J3gQ7JW7QCGnyXtLhy-vfAgCWVSjAip96XglW-o5afRJQwNbDMDRVSOuj3ogQJRgiOnL_O8I2urKdd4B1KCRpkRcCxH0npKo7KX4ApQH2HogsAElOKOPTBjkY5-yd2DqKYqnItA91C13BRTdNXY0VWouRrV7JDOvmrLuxlLW4VAlJ5Qzr4aznJ2wskIIy-y9sh7wcYoMKLJKRXOACjTxr3uHcsBE'
> >
> '99-g1AAAAI-eJyd0EsOgjAQBuAGiI-dN9C9LmrBwqzkJtrSNkgQV6z1JnoTvYneBEvbhA0aMU1mkj6-_NMSITTJfYFm2anOcsFT10mpTzyG-LxpmiL32eqoN8aEAcWE9dz_jPCFrnzrHGQchiFM4kSgaV0JqQ6VFF-AtAV2DggMgCEGxrNhQfatc3bOyDiKUalg2EBVoCu66KapazcUh41e69-GssjNIvcWWRokk2oNofwj0MNazy4QFURhGQ0J9LKI-SHPIBHEgiak51nxBhxnrRk'
> >
> >
> >
> > Is this a bug or do I need to use some other method to compare sequence
> > numbers?
> >
> >
> >
> > In looking at the other sequence numbers in my _changes feed, it looks
> > like
> > they are generally ordered as I would expect, but in this case it appears
> > that when the first number, e.g. 99, jumps from 2 digits to 3 digits, the
> > ordering breaks. If you boil this down to a simple string comparison
> > example, you can see that '228' > '99' => false
> >
> >
> > Thanks.
> >
> >
> > Geoff
> >
>
>

Re: Sequence number bug in CouchDB 2 or is there another way to compare sequence numbers?

Posted by Alexander Harm <co...@aharm.de>.
Hello Geoffrey,

some time ago Robert and Paul explained the ordering in the changes
feed. Maybe that helps:

Hey all,

Bob did a pretty good job explaining how the changes feed works most
of the time but I did want to call attention to an error condition
that might be of concern to the original question. There is a
situation where you can see old changes in the response depending on
some timing and error conditions.

For a bit of background on the since parameter, when we are looking at
a clustered since sequence like such:

35-g1AAAAG1eJyNz0EKwjAQBdBoC2I3nkEPEGppSLOyV8k0U2pJE9C61pvpzWKaitBNyWYGhs9_jCaEpF2iyFFBY29YK-B0sNbcu6tB2mj7UNKM1OCofXQrCVycc32XyP3gDztExlkpYwqWTLHGQO0nPH_SJkh5i6IVTUzHUmJrkkn9JC-_PPaetCxoLYe8AhHTM2mnf-q8-tjMfWYuPHcIHIiyqDiPKuq_TDeP1A

What that actually contains is an encoded set of {shard, node,
update_seq} triples kinda like such:

[
    {"shards/00000000-3fffffff/davisp/test-db.1384769918", db7,  { 9,
<<"ee5754a">>, db7}},
    {"shards/40000000-7fffffff/davisp/test-db.1384769918", db2,  { 1,
<<"0fe9f9c">>, db2}},
    {"shards/80000000-bfffffff/davisp/test-db.1384769918", db5,  {10,
<<"f7b08b9">>, db5}},
    {"shards/c0000000-ffffffff/davisp/test-db.1384769918", db12, {15,
<<"b942877">>, db12}}
]

What's happening here is that when you specify that parameter, CouchDB
will decode it and then try and read the changes from each of those
shards resuming at the given update seq. As an aside the extra uuid
prefix and node name are extra so that we can try and skip some old
changes, an optimization but not important for this discussion.

Now, the important bit is that if we specify this since seq and one of
the nodes db2, db5, db7, or db12 happens to be down (or just gone if
you stored that since seq for a long time and say the cluster changed
size) then CouchDB has to choose another shard to replace the missing
node. When this happens you will see "old" changes that have already
been processed. Your application should be able to handle this using
the approaches that Bob listed in his email.

However, (there's always a however in distributed systems) there
exists a timing situation where you may be reading the changes feed,
an update comes in to the shard you're reading from, and you see the
change. Then say that node goes down (which would terminate the
changes feed). The client would then reconnect with their latest
update seq and get a different node. This node may (depending on a
whole bunch of timing and optimization things) send a change for the
doc that is technically "before" the change that was already
processed. So you do have to be able to recognize that you already
processed a change for the doc. CouchDB does this internally by
keeping the revision tree and checking revisions against that.

I get the feeling that my description may not be super clear so I'll
try and restate it as a sequence of events:

1. Client requests _changes with since=35-g1AAA...
2. Someone makes an update to doc foo
3. The first shard that handles the update is part of the changes feed from #1
4. Client reading _changes sees update appear in its feed
5. The node containing the shard with an update dies (an operator
rebooted the wrong node perhaps)
6. Client has its changes feed disconnect and reconnects with
since=35-g1AAA.. (or even a newer sequence)
7. The shard that responds as a replacement for the shard on the dead
node does not (yet) have the update
8. The doc exists in the update sequence after where it starts reading
its changes feed
9. Client sees the "old" change and must know that its old

Note that this is all very unlikely and super sensitive to timing. For
instance, one node seeing the update, and then dying, and the other
two copies not receving the update would require some very peculiar
network disconnections in the cluster while still being reachable from
the load balancer. But the possibility remains which means apps will
want to be able to handle it.

Paul

On Sat, Sep 3, 2016 at 12:09 PM, Robert Payne <ro...@me.com> wrote:
> Thanks for this,
>
> Sorry it’s a bit light on details, it’s a very specific use case on iOS where cpu/disk speed is constrained and we have a large data set. The full replication protocol just requires too many reads/writes to be performant and we’ve optimised it various ways. We’re idempotent so long as per-document changes are in-order which I was just checking.
>
> I appreciate the more technical analysis and it certainly clears up what I was asking.
>
> Cheers,
>
> Robert
>
>> On 4/09/2016, at 4:41 AM, Robert Samuel Newson <rn...@apache.org> wrote:
>>
>> Hi,
>>
>> It is important to understand that the order of rows in the _changes response is not important. In couchdb before 2.0 the response was totally ordered, but this was never necessary for correctness. The essential contract for _changes is that you are guaranteed to see all changes made since the 'since' parameter you pass. The order of those changes is not guaranteed and it is also not guaranteed that changes from _before_ that 'since' value are _not_ also returned. The consequence of this contract is that all consumers of the _changes response must apply each row idempotently. This is true for the replicator, of course.
>>
>> The changes response in 2.0 is partially ordered. The changes from any given shard will be in a consistent order, but we merge the changes from each shard range of your database as they are collected from the various contributing nodes, we don't apply a total ordering over that. The reason is simple; it's expensive and unnecessary. It's important to also remember that replication, even before 2.0, would not replicate in strict source update order either, due to (valuable) parallelism when reading changes and applying them.
>>
>> Your question: "Is it possible for the changes feed to send older changes before newer changes for the same document ID across multiple calls?" requires a little further background knowledge before answering.
>>
>> While we call it a changes "feed" it's important to remember what it really is, internally, first. Every database in couchdb, prior to 2.0, is a single file with multiple b+trees recorded inside it that are kept in absolute sync with each other. One b+tree allows you to look up a document by the _id parameter. The other b+tree allows you to look up a document by its update order. It is essential to note that these two b+trees have the same number of key/value pairs in them at all times.
>>
>> To illustrate this more clearly, consider an empty database. We add one document to it. It is retrievable by its _id and is also visible in the _changes response as change number 1. Now, we update that document. It is now change number 2. Change number 1 will never again appear in the _changes response. That is, every document appears in the _changes response at its most recent update number.
>>
>> When you call _changes without the continuous parameter, couchdb is simply traversing that second b+tree and returning each row it finds. It may do this from the beginning (which was 1 before our update and 2 after) or it may do so from some update seq you supply with the 'since' parameter.
>>
>> With that now understood, we can look at what changes when we do continuous=true which is what makes it a "feed" (that is, a potentially unending response of changes as they are made). This is sent in two phases. The first is exactly as the previous paragraph. Once all those changes have been sent, couchdb enters a loop where it returns updates as they happen (or shortly after).
>>
>> It is only in a continuous=true response in couchdb before 2.0 that you would ever see more than one change for any given document.
>>
>> So, to cut a long story short (too late), the answer to your question is "no". The changes feed is not a permanent history of all changes made to all documents. Once a document is updated, it is _moved_ to a newer position and no longer appears in its old one (and no record of that position is even preserved). Do note, though, that couchdb might return 'Doc A change (seq: 2-XXXX)' even if your 'since' parameter is _after_ the last change to doc A. We won't return ' Doc A change (seq: 1-XXXX)' at all after its updated to 2-XXXX.
>>
>> The algorithm for correctly processing the changes response is as follows, and any variation on this is likely broken;
>>
>> 1) call /_changes?since=0
>> 2) for each returned row, ensure the target has the change in question (either use _id + _rev to prevent duplicate application of the change or apply the change in a way that is idempotent)
>> 3) periodically store the update seq of the last processed row to stable storage (a _local document is a good choice)
>>
>> If you wish to resume applying changes after a shutdown, reboot, or crash, repeat the above process but substitute your stored update sequence in the ?since= parameter.
>>
>> There are many things that use the changes feed in this way. Within couchdb, there's database replication (obviously) but also couchdb views. Outside of the core, software like pouchdb and couchdb-lucene use the changes feed to replicate data or update search indexes.
>>
>> I hope this was useful, and I think it might expose some problems in your couchdb-to-sqlite synchronisation protocol. Your email is obviously silent on many details there, but if you've predicated its design on the total ordering properties of couchdb < 2.0, you likely have some work to do.
>>
>> B.
>>
>>
>>> On 3 Sep 2016, at 00:04, Robert Payne <ro...@me.com> wrote:
>>>
>>> Hey Everyone,
>>>
>>> Reading up on the CouchDB 2.0 migration guides and getting a bit antsy around the mentions of out of order changes feed and sorts. Is it possible for the changes feed to send older changes before newer changes for the same document ID across multiple calls?
>>>
>>> Assuming start at ?since=“” and always pass in the “last_seq” on every additional call could a situation like this occur in a single or multiple HTTP calls:
>>>
>>> — Changes feed emits Doc A change (seq: 2-XXXX)
>>> — Changes feed emits Doc B change (seq: 3-XXXX)
>>> — Changes feed emits Doc A change (seq: 1-XXXX)
>>>
>>> I’m really hoping the case is just that across different doc ids changes can be out of order. Our use case on mobile is a bit particular as we duplicate edits into a separate SQLite table and use the changes feed to keep the local database up to date with winning revs from the server, it just increases the performance of sync by a ton since there is only 1 check and set in SQLite per change that comes in.
>>>
>>> Cheers,
>>> Robert

> Geoffrey Cox <ma...@gmail.com>
> 4. June 2017 at 06:04
> I’m digging deeper into CouchDB 2 and I’m finding some unexpected ordering
> with sequence numbers. In one case, I found that an early change in a
> _changes feed has the sequence number
>
>
>
> *99-g1AAAAI-eJyd0EsOgjAQBuAGiI-dN9C9LmrBwqzkJtrSNkgQV6z1JnoTvYneBEvbhA0aMU1mkj6-_NMSITTJfYFm2anOcsFT10mpTzyG-LxpmiL32eqoN8aEAcWE9dz_jPCFrnzrHGQchiFM4kSgaV0JqQ6VFF-AtAV2DggMgCEGxrNhQfatc3bOyDiKUalg2EBVoCu66KapazcUh41e69-GssjNIvcWWRokk2oNofwj0MNazy4QFURhGQ0J9LKI-SHPIBHEgiak51nxBhxnrRk*
>
>
>
> The last sequence number in my _changes feed, for the same DB, is
>
>
>
> *228-g1AAAAJFeJyd0EkOgjAUBuAGTJCdN9AjlIKFruQm2jFAEFes9SZ6E72J3gQ7JW7QCGnyXtLhy-vfAgCWVSjAip96XglW-o5afRJQwNbDMDRVSOuj3ogQJRgiOnL_O8I2urKdd4B1KCRpkRcCxH0npKo7KX4ApQH2HogsAElOKOPTBjkY5-yd2DqKYqnItA91C13BRTdNXY0VWouRrV7JDOvmrLuxlLW4VAlJ5Qzr4aznJ2wskIIy-y9sh7wcYoMKLJKRXOACjTxr3uHcsBE*
>
>
>
> In a browser console, the following is false
>
>
>
> '228-g1AAAAJFeJyd0EkOgjAUBuAGTJCdN9AjlIKFruQm2jFAEFes9SZ6E72J3gQ7JW7QCGnyXtLhy-vfAgCWVSjAip96XglW-o5afRJQwNbDMDRVSOuj3ogQJRgiOnL_O8I2urKdd4B1KCRpkRcCxH0npKo7KX4ApQH2HogsAElOKOPTBjkY5-yd2DqKYqnItA91C13BRTdNXY0VWouRrV7JDOvmrLuxlLW4VAlJ5Qzr4aznJ2wskIIy-y9sh7wcYoMKLJKRXOACjTxr3uHcsBE'
> '99-g1AAAAI-eJyd0EsOgjAQBuAGiI-dN9C9LmrBwqzkJtrSNkgQV6z1JnoTvYneBEvbhA0aMU1mkj6-_NMSITTJfYFm2anOcsFT10mpTzyG-LxpmiL32eqoN8aEAcWE9dz_jPCFrnzrHGQchiFM4kSgaV0JqQ6VFF-AtAV2DggMgCEGxrNhQfatc3bOyDiKUalg2EBVoCu66KapazcUh41e69-GssjNIvcWWRokk2oNofwj0MNazy4QFURhGQ0J9LKI-SHPIBHEgiak51nxBhxnrRk'
>
>
>
> Is this a bug or do I need to use some other method to compare sequence
> numbers?
>
>
>
> In looking at the other sequence numbers in my _changes feed, it looks
> like
> they are generally ordered as I would expect, but in this case it appears
> that when the first number, e.g. 99, jumps from 2 digits to 3 digits, the
> ordering breaks. If you boil this down to a simple string comparison
> example, you can see that '228' > '99' => false
>
>
> Thanks.
>
>
> Geoff
>


Re: Sequence number bug in CouchDB 2 or is there another way to compare sequence numbers?

Posted by Robert Samuel Newson <rn...@apache.org>.
Hi,

There's no easy way to compare them in 2.0 and no requirement for them to be in order. They are not, in short, designed to be examined or compared outside of couchdb; treat them opaquely.

The number on the front is the sum of the individual update sequences encoded in the second part and exists only to trick older versions of the couchdb replicator into making checkpoints.

The latter half of the sequence string is an encoded list of {node, range, seq} tuples (where seq is the integer value you know from pre-2.0 releases). When a sequence string is passed back in, as the since= parameter, couchdb decodes this string and passes the appropriate integer seq value to the individual shard.

All that said, in general the front number should increase. The full strings themselves are not comparable, since there's no defined order to the encoded list (so two strings could be generated that are encoded differently but decode to the same list of tuples, just in a different order).

Another aspect to this is that the changes feed is not totally ordered. For a given shard it _is_ totally ordered (a shard being identical to a pre 2.0 database with an integer sequence), couchdb doesn't shuffle that output (though correctness of replication would be retained if it did). A clustered database is comprised of several shards, though (the 'q' value, defaulting to 4 iirc). The clustered changes feed combines those separate changes feed into a single one, but makes no effort to impose a total order over that. We don't do it because it would be expensive and unnecessary.

The contract for /dbname/_changes?since=X is that the response is guaranteed to include all changes since your X value. The order is not defined, and you might get changes from before X.

Using binary_to_term(couch_util:decodeBase64Url(EncodedStringHere)).;

Your 99- string decodes to;

[{couchdb@couchdb2,[0,536870911],
                   {44,<<"2a9602a">>,couchdb@couchdb2}},
 {couchdb@couchdb2,[536870912,1073741823],
                   {0,<<"a093878">>,undefined}},
 {couchdb@couchdb2,[1073741824,1610612735],
                   {4,<<"0979abc">>,couchdb@couchdb2}},
 {couchdb@couchdb2,[1610612736,2147483647],
                   {6,<<"fa6ef9a">>,couchdb@couchdb2}},
 {couchdb@couchdb2,[2147483648,2684354559],
                   {0,<<"b959591">>,undefined}},
 {couchdb@couchdb2,[2684354560,3221225471],
                   {43,<<"cef193e">>,couchdb@couchdb2}},
 {couchdb@couchdb2,[3221225472,3758096383],
                   {0,<<"6d2f0e4">>,undefined}},
 {couchdb@couchdb2,[3758096384,4294967295],
                   {2,<<"497d682">>,couchdb@couchdb2}}]

and your 228- string decodes to;

[{couchdb@couchdb2,[0,536870911],
                   {105,<<"2a9602a">>,couchdb@couchdb2}},
 {couchdb@couchdb2,[536870912,1073741823],
                   {0,<<"a093878">>,undefined}},
 {couchdb@couchdb2,[1073741824,1610612735],
                   {7,<<"0979abc">>,couchdb@couchdb2}},
 {couchdb@couchdb2,[1610612736,2147483647],
                   {9,<<"fa6ef9a">>,couchdb@couchdb2}},
 {couchdb@couchdb2,[2147483648,2684354559],
                   {3,<<"b959591">>,couchdb@couchdb2}},
 {couchdb@couchdb2,[2684354560,3221225471],
                   {102,<<"cef193e">>,couchdb@couchdb2}},
 {couchdb@couchdb2,[3221225472,3758096383],
                   {0,<<"6d2f0e4">>,undefined}},
 {couchdb@couchdb2,[3758096384,4294967295],
                   {2,<<"497d682">>,couchdb@couchdb2}}].


B.


> On 4 Jun 2017, at 05:04, Geoffrey Cox <re...@gmail.com> wrote:
> 
> I’m digging deeper into CouchDB 2 and I’m finding some unexpected ordering
> with sequence numbers. In one case, I found that an early change in a
> _changes feed has the sequence number
> 
> 
> 
> *99-g1AAAAI-eJyd0EsOgjAQBuAGiI-dN9C9LmrBwqzkJtrSNkgQV6z1JnoTvYneBEvbhA0aMU1mkj6-_NMSITTJfYFm2anOcsFT10mpTzyG-LxpmiL32eqoN8aEAcWE9dz_jPCFrnzrHGQchiFM4kSgaV0JqQ6VFF-AtAV2DggMgCEGxrNhQfatc3bOyDiKUalg2EBVoCu66KapazcUh41e69-GssjNIvcWWRokk2oNofwj0MNazy4QFURhGQ0J9LKI-SHPIBHEgiak51nxBhxnrRk*
> 
> 
> 
> The last sequence number in my _changes feed, for the same DB, is
> 
> 
> 
> *228-g1AAAAJFeJyd0EkOgjAUBuAGTJCdN9AjlIKFruQm2jFAEFes9SZ6E72J3gQ7JW7QCGnyXtLhy-vfAgCWVSjAip96XglW-o5afRJQwNbDMDRVSOuj3ogQJRgiOnL_O8I2urKdd4B1KCRpkRcCxH0npKo7KX4ApQH2HogsAElOKOPTBjkY5-yd2DqKYqnItA91C13BRTdNXY0VWouRrV7JDOvmrLuxlLW4VAlJ5Qzr4aznJ2wskIIy-y9sh7wcYoMKLJKRXOACjTxr3uHcsBE*
> 
> 
> 
> In a browser console, the following is false
> 
> 
> 
> '228-g1AAAAJFeJyd0EkOgjAUBuAGTJCdN9AjlIKFruQm2jFAEFes9SZ6E72J3gQ7JW7QCGnyXtLhy-vfAgCWVSjAip96XglW-o5afRJQwNbDMDRVSOuj3ogQJRgiOnL_O8I2urKdd4B1KCRpkRcCxH0npKo7KX4ApQH2HogsAElOKOPTBjkY5-yd2DqKYqnItA91C13BRTdNXY0VWouRrV7JDOvmrLuxlLW4VAlJ5Qzr4aznJ2wskIIy-y9sh7wcYoMKLJKRXOACjTxr3uHcsBE'
>> 
> '99-g1AAAAI-eJyd0EsOgjAQBuAGiI-dN9C9LmrBwqzkJtrSNkgQV6z1JnoTvYneBEvbhA0aMU1mkj6-_NMSITTJfYFm2anOcsFT10mpTzyG-LxpmiL32eqoN8aEAcWE9dz_jPCFrnzrHGQchiFM4kSgaV0JqQ6VFF-AtAV2DggMgCEGxrNhQfatc3bOyDiKUalg2EBVoCu66KapazcUh41e69-GssjNIvcWWRokk2oNofwj0MNazy4QFURhGQ0J9LKI-SHPIBHEgiak51nxBhxnrRk'
> 
> 
> 
> Is this a bug or do I need to use some other method to compare sequence
> numbers?
> 
> 
> 
> In looking at the other sequence numbers in my _changes feed, it looks like
> they are generally ordered as I would expect, but in this case it appears
> that when the first number, e.g. 99, jumps from 2 digits to 3 digits, the
> ordering breaks. If you boil this down to a simple string comparison
> example, you can see that '228' > '99' => false
> 
> 
> Thanks.
> 
> 
> Geoff