You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Jens Alfke <je...@couchbase.com> on 2011/11/29 23:07:15 UTC

Interpreting status of replication from BigCouch

CouchCocoa attempts to provide progress information about replications, so the app can display a progress bar or similar UI, or at least indicate when replication has completed. This is surprisingly difficult. I got it working well with regular CouchDB, but now I’m re-opening the can of worms because BigCouch doesn’t use integral sequence numbers and so the status messages provided by the replicator contain binary goop, for example:

        “status”: "Processed <<\"3264-g1AAAADzeJzLYWBgYMlgTmFQSElKzi9KdUhJMtLLTS3KLElMT9VLzskvTUnMK9HLSy3JAapkSmRIsv___39WEgMDsyGqNhM82pIcgGRSPUynDvEW5rEASYYGIAXUvB-s24F4eyG6D0B0Q-xWzgIA8ZZODg\">> / <<\"3264-g1AAAADzeJzLYWBgYMlgTmFQSElKzi9KdUhJMtLLTS3KLElMT9VLzskvTUnMK9HLSy3JAapkSmRIsv___39WEgMDsyGqNkM82pIcgGRSPUynDvEW5rEASYYGIAXUvB-s2wFVtwlB3QcguiF2K2cBAO-bTgs\">> changes”

I’ve updated my parser to be able to tweeze out the two binary blobs, and I figured I could at least compare them for string equality to detect when the last change is processed. Unfortunately this doesn’t appear to work. The example above is the last status message I got in a pull from Cloudant; apparently it’s completed, because nothing else has been copied over in the 15 minutes since, but the two blobs are not exactly equal. (They are _almost_ equal, up to the last 25 or so characters.)

The unfortunate result is that in the iOS GrocerySync app, the replication progress bar gets stuck and never goes away.

I realize that these blobs are probably deliberately opaque and their format Subject To Change Without Notice, but it would be helpful if I had some kind of heuristic about their current format that would at least let me detect when a replication has ended. Any suggestions?

—Jens


Re: Interpreting status of replication from BigCouch

Posted by Robert Newson <rn...@apache.org>.
I did teach couchdb-lucene how to unpack and understand BigCouch
sequences (since it implements the 'block until up to date' feature
that couchdb views do), you might find that code useful;

https://github.com/rnewson/couchdb-lucene/blob/master/src/main/java/com/github/rnewson/couchdb/lucene/couchdb/UpdateSequence.java#L30-48

https://github.com/rnewson/couchdb-lucene/blob/master/src/main/java/com/github/rnewson/couchdb/lucene/couchdb/UpdateSequence.java#L83-94

B.

On 30 November 2011 02:37, Adam Kocoloski <ko...@apache.org> wrote:
> Nifty!  I should also mention that the BigCouch sequence is gzip'ed, which the Erlang binary_to_term implementation handles transparently.  Regards,
>
> Adam
>
> On Nov 29, 2011, at 9:21 PM, Jason Smith wrote:
>
>> FWIW, this is a Javascript term_to_binary implementation. It would be
>> straightforward to implement binary_to_term. (Patches welcome!)
>>
>> https://github.com/iriscouch/erlang.js
>>
>> The key table is this:
>> https://github.com/iriscouch/erlang.js/blob/master/lib.js#L1-26
>>
>> On Wed, Nov 30, 2011 at 9:06 AM, Adam Kocoloski <ko...@apache.org> wrote:
>>> Hi Jens, you're certainly right about the format being subject to change -- in BigCouch's master branch it's [3264, "g1AAA..."] to allow for sane sorting of the sequences.  Just an FYI.
>>>
>>> The hex portion of the sequence is a Base64-encoded term_to_binary representation of a covering set of shards.  If you execute a binary_to_term(couch_util:decodeBase64Url("g1AAA...")) on the two sequences in that status string you'll see that the server replaced one of the shards in the denominator with a replica.  Perhaps one of the nodes in the cluster was unavailable when the replication request was made.  I'm afraid there's no good way to do an equality comparison on the client when that happens.  Under normal operating conditions the two hex blobs should compare equal at the end of a replication.  Regards,
>>>
>>> Adam
>>>
>>> On Nov 29, 2011, at 5:07 PM, Jens Alfke wrote:
>>>
>>>> CouchCocoa attempts to provide progress information about replications, so the app can display a progress bar or similar UI, or at least indicate when replication has completed. This is surprisingly difficult. I got it working well with regular CouchDB, but now I’m re-opening the can of worms because BigCouch doesn’t use integral sequence numbers and so the status messages provided by the replicator contain binary goop, for example:
>>>>
>>>>        “status”: "Processed <<\"3264-g1AAAADzeJzLYWBgYMlgTmFQSElKzi9KdUhJMtLLTS3KLElMT9VLzskvTUnMK9HLSy3JAapkSmRIsv___39WEgMDsyGqNhM82pIcgGRSPUynDvEW5rEASYYGIAXUvB-s24F4eyG6D0B0Q-xWzgIA8ZZODg\">> / <<\"3264-g1AAAADzeJzLYWBgYMlgTmFQSElKzi9KdUhJMtLLTS3KLElMT9VLzskvTUnMK9HLSy3JAapkSmRIsv___39WEgMDsyGqNkM82pIcgGRSPUynDvEW5rEASYYGIAXUvB-s2wFVtwlB3QcguiF2K2cBAO-bTgs\">> changes”
>>>>
>>>> I’ve updated my parser to be able to tweeze out the two binary blobs, and I figured I could at least compare them for string equality to detect when the last change is processed. Unfortunately this doesn’t appear to work. The example above is the last status message I got in a pull from Cloudant; apparently it’s completed, because nothing else has been copied over in the 15 minutes since, but the two blobs are not exactly equal. (They are _almost_ equal, up to the last 25 or so characters.)
>>>>
>>>> The unfortunate result is that in the iOS GrocerySync app, the replication progress bar gets stuck and never goes away.
>>>>
>>>> I realize that these blobs are probably deliberately opaque and their format Subject To Change Without Notice, but it would be helpful if I had some kind of heuristic about their current format that would at least let me detect when a replication has ended. Any suggestions?
>>>>
>>>> —Jens
>>>>
>>>
>>>
>>
>>
>>
>> --
>> Iris Couch
>
>

Re: Interpreting status of replication from BigCouch

Posted by Adam Kocoloski <ko...@apache.org>.
Nifty!  I should also mention that the BigCouch sequence is gzip'ed, which the Erlang binary_to_term implementation handles transparently.  Regards,

Adam

On Nov 29, 2011, at 9:21 PM, Jason Smith wrote:

> FWIW, this is a Javascript term_to_binary implementation. It would be
> straightforward to implement binary_to_term. (Patches welcome!)
> 
> https://github.com/iriscouch/erlang.js
> 
> The key table is this:
> https://github.com/iriscouch/erlang.js/blob/master/lib.js#L1-26
> 
> On Wed, Nov 30, 2011 at 9:06 AM, Adam Kocoloski <ko...@apache.org> wrote:
>> Hi Jens, you're certainly right about the format being subject to change -- in BigCouch's master branch it's [3264, "g1AAA..."] to allow for sane sorting of the sequences.  Just an FYI.
>> 
>> The hex portion of the sequence is a Base64-encoded term_to_binary representation of a covering set of shards.  If you execute a binary_to_term(couch_util:decodeBase64Url("g1AAA...")) on the two sequences in that status string you'll see that the server replaced one of the shards in the denominator with a replica.  Perhaps one of the nodes in the cluster was unavailable when the replication request was made.  I'm afraid there's no good way to do an equality comparison on the client when that happens.  Under normal operating conditions the two hex blobs should compare equal at the end of a replication.  Regards,
>> 
>> Adam
>> 
>> On Nov 29, 2011, at 5:07 PM, Jens Alfke wrote:
>> 
>>> CouchCocoa attempts to provide progress information about replications, so the app can display a progress bar or similar UI, or at least indicate when replication has completed. This is surprisingly difficult. I got it working well with regular CouchDB, but now I’m re-opening the can of worms because BigCouch doesn’t use integral sequence numbers and so the status messages provided by the replicator contain binary goop, for example:
>>> 
>>>        “status”: "Processed <<\"3264-g1AAAADzeJzLYWBgYMlgTmFQSElKzi9KdUhJMtLLTS3KLElMT9VLzskvTUnMK9HLSy3JAapkSmRIsv___39WEgMDsyGqNhM82pIcgGRSPUynDvEW5rEASYYGIAXUvB-s24F4eyG6D0B0Q-xWzgIA8ZZODg\">> / <<\"3264-g1AAAADzeJzLYWBgYMlgTmFQSElKzi9KdUhJMtLLTS3KLElMT9VLzskvTUnMK9HLSy3JAapkSmRIsv___39WEgMDsyGqNkM82pIcgGRSPUynDvEW5rEASYYGIAXUvB-s2wFVtwlB3QcguiF2K2cBAO-bTgs\">> changes”
>>> 
>>> I’ve updated my parser to be able to tweeze out the two binary blobs, and I figured I could at least compare them for string equality to detect when the last change is processed. Unfortunately this doesn’t appear to work. The example above is the last status message I got in a pull from Cloudant; apparently it’s completed, because nothing else has been copied over in the 15 minutes since, but the two blobs are not exactly equal. (They are _almost_ equal, up to the last 25 or so characters.)
>>> 
>>> The unfortunate result is that in the iOS GrocerySync app, the replication progress bar gets stuck and never goes away.
>>> 
>>> I realize that these blobs are probably deliberately opaque and their format Subject To Change Without Notice, but it would be helpful if I had some kind of heuristic about their current format that would at least let me detect when a replication has ended. Any suggestions?
>>> 
>>> —Jens
>>> 
>> 
>> 
> 
> 
> 
> -- 
> Iris Couch


Re: Interpreting status of replication from BigCouch

Posted by Jason Smith <jh...@iriscouch.com>.
FWIW, this is a Javascript term_to_binary implementation. It would be
straightforward to implement binary_to_term. (Patches welcome!)

https://github.com/iriscouch/erlang.js

The key table is this:
https://github.com/iriscouch/erlang.js/blob/master/lib.js#L1-26

On Wed, Nov 30, 2011 at 9:06 AM, Adam Kocoloski <ko...@apache.org> wrote:
> Hi Jens, you're certainly right about the format being subject to change -- in BigCouch's master branch it's [3264, "g1AAA..."] to allow for sane sorting of the sequences.  Just an FYI.
>
> The hex portion of the sequence is a Base64-encoded term_to_binary representation of a covering set of shards.  If you execute a binary_to_term(couch_util:decodeBase64Url("g1AAA...")) on the two sequences in that status string you'll see that the server replaced one of the shards in the denominator with a replica.  Perhaps one of the nodes in the cluster was unavailable when the replication request was made.  I'm afraid there's no good way to do an equality comparison on the client when that happens.  Under normal operating conditions the two hex blobs should compare equal at the end of a replication.  Regards,
>
> Adam
>
> On Nov 29, 2011, at 5:07 PM, Jens Alfke wrote:
>
>> CouchCocoa attempts to provide progress information about replications, so the app can display a progress bar or similar UI, or at least indicate when replication has completed. This is surprisingly difficult. I got it working well with regular CouchDB, but now I’m re-opening the can of worms because BigCouch doesn’t use integral sequence numbers and so the status messages provided by the replicator contain binary goop, for example:
>>
>>        “status”: "Processed <<\"3264-g1AAAADzeJzLYWBgYMlgTmFQSElKzi9KdUhJMtLLTS3KLElMT9VLzskvTUnMK9HLSy3JAapkSmRIsv___39WEgMDsyGqNhM82pIcgGRSPUynDvEW5rEASYYGIAXUvB-s24F4eyG6D0B0Q-xWzgIA8ZZODg\">> / <<\"3264-g1AAAADzeJzLYWBgYMlgTmFQSElKzi9KdUhJMtLLTS3KLElMT9VLzskvTUnMK9HLSy3JAapkSmRIsv___39WEgMDsyGqNkM82pIcgGRSPUynDvEW5rEASYYGIAXUvB-s2wFVtwlB3QcguiF2K2cBAO-bTgs\">> changes”
>>
>> I’ve updated my parser to be able to tweeze out the two binary blobs, and I figured I could at least compare them for string equality to detect when the last change is processed. Unfortunately this doesn’t appear to work. The example above is the last status message I got in a pull from Cloudant; apparently it’s completed, because nothing else has been copied over in the 15 minutes since, but the two blobs are not exactly equal. (They are _almost_ equal, up to the last 25 or so characters.)
>>
>> The unfortunate result is that in the iOS GrocerySync app, the replication progress bar gets stuck and never goes away.
>>
>> I realize that these blobs are probably deliberately opaque and their format Subject To Change Without Notice, but it would be helpful if I had some kind of heuristic about their current format that would at least let me detect when a replication has ended. Any suggestions?
>>
>> —Jens
>>
>
>



-- 
Iris Couch

Re: Interpreting status of replication from BigCouch

Posted by Adam Kocoloski <ko...@apache.org>.
Hi Jens, you're certainly right about the format being subject to change -- in BigCouch's master branch it's [3264, "g1AAA..."] to allow for sane sorting of the sequences.  Just an FYI.

The hex portion of the sequence is a Base64-encoded term_to_binary representation of a covering set of shards.  If you execute a binary_to_term(couch_util:decodeBase64Url("g1AAA...")) on the two sequences in that status string you'll see that the server replaced one of the shards in the denominator with a replica.  Perhaps one of the nodes in the cluster was unavailable when the replication request was made.  I'm afraid there's no good way to do an equality comparison on the client when that happens.  Under normal operating conditions the two hex blobs should compare equal at the end of a replication.  Regards,

Adam

On Nov 29, 2011, at 5:07 PM, Jens Alfke wrote:

> CouchCocoa attempts to provide progress information about replications, so the app can display a progress bar or similar UI, or at least indicate when replication has completed. This is surprisingly difficult. I got it working well with regular CouchDB, but now I’m re-opening the can of worms because BigCouch doesn’t use integral sequence numbers and so the status messages provided by the replicator contain binary goop, for example:
> 
>        “status”: "Processed <<\"3264-g1AAAADzeJzLYWBgYMlgTmFQSElKzi9KdUhJMtLLTS3KLElMT9VLzskvTUnMK9HLSy3JAapkSmRIsv___39WEgMDsyGqNhM82pIcgGRSPUynDvEW5rEASYYGIAXUvB-s24F4eyG6D0B0Q-xWzgIA8ZZODg\">> / <<\"3264-g1AAAADzeJzLYWBgYMlgTmFQSElKzi9KdUhJMtLLTS3KLElMT9VLzskvTUnMK9HLSy3JAapkSmRIsv___39WEgMDsyGqNkM82pIcgGRSPUynDvEW5rEASYYGIAXUvB-s2wFVtwlB3QcguiF2K2cBAO-bTgs\">> changes”
> 
> I’ve updated my parser to be able to tweeze out the two binary blobs, and I figured I could at least compare them for string equality to detect when the last change is processed. Unfortunately this doesn’t appear to work. The example above is the last status message I got in a pull from Cloudant; apparently it’s completed, because nothing else has been copied over in the 15 minutes since, but the two blobs are not exactly equal. (They are _almost_ equal, up to the last 25 or so characters.)
> 
> The unfortunate result is that in the iOS GrocerySync app, the replication progress bar gets stuck and never goes away.
> 
> I realize that these blobs are probably deliberately opaque and their format Subject To Change Without Notice, but it would be helpful if I had some kind of heuristic about their current format that would at least let me detect when a replication has ended. Any suggestions?
> 
> —Jens
>