You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Dustin Sallings <du...@spy.net> on 2012/10/04 02:28:57 UTC

Fwd: replication problems

	I'm bringing this back up as requested.  I'm currently simultaneously in the "not replicating interesting things" and "has duplicate replicates state".  I think the stuff below shows the "not replicating" stuff.

	Active tasks shows the other (these are based on replicator DB documents (example below):

[
    {
        "checkpointed_source_seq": 2022317, 
        "continuous": true, 
        "doc_id": "cbstats-from-dogbowl", 
        "doc_write_failures": 0, 
        "docs_read": 300, 
        "docs_written": 300, 
        "missing_revisions_found": 300, 
        "pid": "<0.10466.12>", 
        "progress": 100, 
        "replication_id": "50daecd0a29f4b7e5d102990831f3d64+continuous", 
        "revisions_checked": 304, 
        "source": "http://dustin:*****@single.couchbase.net/cbstats/", 
        "source_seq": 2022317, 
        "started_on": 1349309457, 
        "target": "cbstats", 
        "type": "replication", 
        "updated_on": 1349310442
    }, 
    {
        "checkpointed_source_seq": 2022317, 
        "continuous": true, 
        "doc_id": "cbstats-from-dogbowl", 
        "doc_write_failures": 0, 
        "docs_read": 62, 
        "docs_written": 62, 
        "missing_revisions_found": 62, 
        "pid": "<0.11019.12>", 
        "progress": 100, 
        "replication_id": "411e341d5aa9a3fe636cf4ea8ba71720+continuous", 
        "revisions_checked": 304, 
        "source": "http://dustin:*****@single.couchbase.net/cbstats/", 
        "source_seq": 2022317, 
        "started_on": 1349309471, 
        "target": "cbstats", 
        "type": "replication", 
        "updated_on": 1349310443
    }, 
    {
        "checkpointed_source_seq": 107068, 
        "continuous": true, 
        "doc_id": "gerrit-from-prod", 
        "doc_write_failures": 0, 
        "docs_read": 22, 
        "docs_written": 22, 
        "missing_revisions_found": 22, 
        "pid": "<0.11086.12>", 
        "progress": 100, 
        "replication_id": "4a21031dac0d81637a23c32bad620be9+continuous", 
        "revisions_checked": 26, 
        "source": "http://dustinphoto.iriscouch.com/gerrit/", 
        "source_seq": 107068, 
        "started_on": 1349309487, 
        "target": "gerrit", 
        "type": "replication", 
        "updated_on": 1349310445
    }, 
    {
        "checkpointed_source_seq": 107068, 
        "continuous": true, 
        "doc_id": "gerrit-from-prod", 
        "doc_write_failures": 0, 
        "docs_read": 17, 
        "docs_written": 17, 
        "missing_revisions_found": 17, 
        "pid": "<0.11107.12>", 
        "progress": 100, 
        "replication_id": "b4ad5d3f2e5b78670e4c8364b18000e9+continuous", 
        "revisions_checked": 26, 
        "source": "http://dustinphoto.iriscouch.com/gerrit/", 
        "source_seq": 107068, 
        "started_on": 1349309488, 
        "target": "gerrit", 
        "type": "replication", 
        "updated_on": 1349310445
    }
]


	The replicator document for the latter, for example is this:

{
   "_id": "gerrit-from-prod",
   "_rev": "2235-36de10fb757581a1782dacbb26ee4809",
   "source": "http://dustinphoto.iriscouch.com/gerrit",
   "target": "gerrit",
   "continuous": true,
   "user_ctx": {
       "roles": [
           "_admin"
       ]
   },
   "_replication_state_time": "2012-10-03T17:11:27-07:00",
   "_replication_id": "b4ad5d3f2e5b78670e4c8364b18000e9",
   "_replication_state": "triggered"
}


Begin forwarded message:

> From: Dustin Sallings <du...@spy.net>
> Subject: Re: replication problems
> Date: June 15, 2012 0:10:04 PDT
> To: dev@couchdb.apache.org
> Reply-To: dev@couchdb.apache.org
> 
> 
> On Jun 14, 2012, at 11:28 PM, Benoit Chesneau wrote:
> 
>> Ar you using _replicate or _replicator ? Anything interresting in logs?
> 
> 
> 	I'm using _replicator (wonderful feature, I just kill the DB and everything goes back the way I want it).
> 
> 	Hmm...  I do think I found some stuff digging through the logs.  This is the local DB I noticed not doing its thing, although there were tons of errors all around this.  Looks like the server got into some kind of bad state and sort of half-crashed.
> 
> 
> [Thu, 14 Jun 2012 23:20:12 GMT] [error] [<0.133.0>] Replication `ae601df0373da82d1b4a9ff741c8ba18+continuous` (`rpics` -> `rpics-processed`) failed: {{timeout,{gen_server,call,[<0.213.0>,{open_ref_count,<0.4
> 42.0>}]}},
> {gen_server,call,
>             [couch_server,
>              {open,<<"rpics">>,
>                    [{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}]},
>              infinity]}}
> [Thu, 14 Jun 2012 23:20:25 GMT] [error] [<0.383.0>] ** Generic server <0.383.0> terminating 
> ** Last message in was {'EXIT',<0.384.0>,
>                        {{timeout,
>                          {gen_server,call,
>                           [<0.213.0>,{open_ref_count,<0.442.0>}]}},
>                         {gen_server,call,
>                          [couch_server,
>                           {open,<<"cbstats">>,
>                            [{user_ctx,
>                              {user_ctx,null,[<<"_admin">>],undefined}},
>                             {user_ctx,
>                              {user_ctx,null,[<<"_admin">>],undefined}}]},
>                           infinity]}}}
> 
> ** When Server state == {state,<0.272.0>,<0.384.0>,20,
>                         {httpdb,
>                          "http://dustin:LOGGED_PASSWORD@single.couchbase.net/cbstats/",
>                          nil,
>                          [{"Accept","application/json"},
>                           {"User-Agent","CouchDB/1.2.0"}],
>                          30000,
>                          [{socket_options,
>                            [{keepalive,true},{nodelay,false}]}],
>                          10,250,<0.273.0>,20},
>                         {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,
>                          <0.290.0>,<0.286.0>,<0.367.0>,
>                          {db_header,6,984356,0,
>                           {860345646,{737369,975,640891414},59433736},
>                           {860348005,738344,42056446},
>                           {860352635,[],5737},
>                           0,nil,nil,1000},
>                          984356,
>                          {btree,<0.286.0>,
>                           {860345646,{737369,975,640891414},59433736},
>                           #Fun<couch_db_updater.10.57960608>,
>                           #Fun<couch_db_updater.11.57960608>,
>                           #Fun<couch_btree.5.133731799>,
>                           #Fun<couch_db_updater.12.57960608>,snappy},
>                          {btree,<0.286.0>,
>                           {860348005,738344,42056446},
>                           #Fun<couch_db_updater.13.57960608>,
>                           #Fun<couch_db_updater.14.57960608>,
>                           #Fun<couch_btree.5.133731799>,
>                           #Fun<couch_db_updater.15.57960608>,snappy},
>                          {btree,<0.286.0>,
>                           {860352635,[],5737},
>                           #Fun<couch_btree.3.133731799>,
>                           #Fun<couch_btree.4.133731799>,
>                           #Fun<couch_btree.5.133731799>,nil,snappy},
>                          984356,<<"cbstats">>,
>                          "/Volumes/terror/db/couchdb/cbstats.couch",[],[],
>                          nil,
>                          {user_ctx,null,[<<"_admin">>],undefined},
>                          nil,1000,
>                          [before_header,after_header,on_file_open],
>                          [{user_ctx,
>                            {user_ctx,null,[<<"_admin">>],undefined}}],
>                          snappy,nil,nil},
>                         [],nil,nil,nil,
>                         {rep_stats,0,0,0,0,0},
>                         nil,<0.385.0>,
>                         {batch,[],0}}
> ** Reason for termination == 
> ** {noproc,{gen_server,call,[<0.367.0>,{drop,<0.383.0>},infinity]}}
> 
> 
> 
> 
> 	Scrolling to the beginning of the errors, I find this:
> 
> 
> [Thu, 14 Jun 2012 23:15:54 GMT] [error] [<0.164.0>] Replication `543f76281e8d52d6ce5b51fddf0588e7+continuous` (`photo` -> `http://dustin:*****@dustinphoto.couchone.com/photo/`) failed: source_db_down
> [Thu, 14 Jun 2012 23:18:57 GMT] [info] [<0.358.0>] 127.0.0.1 - - GET /_all_dbs 200
> [Thu, 14 Jun 2012 23:19:52 GMT] [error] [<0.289.0>] ** Generic server <0.289.0> terminating 
> ** Last message in was {update_docs,<0.272.0>,[],
>                           [{{doc,
>                                 <<"_local/c4cc070f896d7267e52ba012856fed4b">>,
>                                 {0,[<<"346185">>]},
>                                 {[{<<"session_id">>,
>                                    <<"9fb3475683d44bb1e151031dd42cc59f">>},
>                                   {<<"source_last_seq">>,1419004},
>                                   {<<"replication_id_version">>,2},
>                                   {<<"history">>,
>                                    [{[{<<"session_id">>,
>                                        <<"9fb3475683d44bb1e151031dd42cc59f">>},
>                                       {<<"start_time">>,
>                                        <<"Thu, 14 Jun 2012 01:35:02 GMT">>},
>                                       {<<"end_time">>,
>                                        <<"Thu, 14 Jun 2012 23:15:29 GMT">>},
>                                       {<<"start_last_seq">>,1410146},
>                                       {<<"end_last_seq">>,1419004},
>                                       {<<"recorded_seq">>,1419004},
>                                       {<<"missing_checked">>,8100},
>                                       {<<"missing_found">>,8100},
>                                       {<<"docs_read">>,8100},
>                                       {<<"docs_written">>,8100},
>                                       {<<"doc_write_failures">>,0}]},
>                                     {[{<<"session_id">>,
>                                        <<"3edd7c50327eab7ec0768451e34efa8b">>},
>                                       {<<"start_time">>,
>                                        <<"Tue, 12 Jun 2012 05:51:17 GMT">>},
>                                       {<<"end_time">>,
>                                        <<"Tue, 12 Jun 2012 13:02:37 GMT">>},
>                                       {<<"start_last_seq">>,1407186},
>                                       {<<"end_last_seq">>,1410146},
>                                       {<<"recorded_seq">>,1410146},
>                                       {<<"missing_checked">>,2583},
>                                       {<<"missing_found">>,2577},
>                                       {<<"docs_read">>,2577},
>                                       {<<"docs_written">>,2577},
>                                       {<<"doc_write_failures">>,0}]},
>                                     {[{<<"session_id">>,
>                                        <<"172de62044281a01b1584a9d099f42af">>},
>                                       {<<"start_time">>,
>                                        <<"Mon, 11 Jun 2012 03:40:11 GMT">>},
>                                       {<<"end_time">>,
>                                        <<"Mon, 11 Jun 2012 15:16:24 GMT">>},
>                                       {<<"start_last_seq">>,1405428},
>                                       {<<"end_last_seq">>,1407186},
>                                       {<<"recorded_seq">>,1407186},
>                                       {<<"missing_checked">>,1721},
>                                       {<<"missing_found">>,1721},
>                                       {<<"docs_read">>,1721},
>                                       {<<"docs_written">>,1721},
>                                       {<<"doc_write_failures">>,0}]},
>                                     {[{<<"session_id">>,
>                                        <<"e60a126a2036c5fab00a1249101820c8">>},
>                                       {<<"start_time">>,
>                                        <<"Sat, 09 Jun 2012 07:47:22 GMT">>},
>                                       {<<"end_time">>,
>                                        <<"Sun, 10 Jun 2012 21:16:20 GMT">>},
>                                       {<<"start_last_seq">>,1386289},
>                                       {<<"end_last_seq">>,1405428},
>                                       {<<"recorded_seq">>,1405428},
>                                       {<<"missing_checked">>,16977},
>                                       {<<"missing_found">>,16977},
>                                       {<<"docs_read">>,16977},
>                                       {<<"docs_written">>,16977},
>                                       {<<"doc_write_failures">>,0}]},
>                                     {[{<<"session_id">>,
>                                        <<"ef3e4333d340dcf73ddfa3fe8c720042">>},
>                                       {<<"start_time">>,
>                                        <<"Mon, 04 Jun 2012 02:39:44 GMT">>},
>                                       {<<"end_time">>,
>                                        <<"Mon, 04 Jun 2012 12:35:50 GMT">>},
>                                       {<<"start_last_seq">>,1384738},
>                                       {<<"end_last_seq">>,1386289},
>                                       {<<"recorded_seq">>,1386289},
>                                       {<<"missing_checked">>,1551},
>                                       {<<"missing_found">>,1550},
>                                       {<<"docs_read">>,1550},
>                                       {<<"docs_written">>,1550},
>                                       {<<"doc_write_failures">>,0}]},
>                                     {[{<<"session_id">>,
>                                        <<"d5123a3caf462794aaf5a47be1bb3b6e">>},
>                                       {<<"start_time">>,
>                                        <<"Wed, 30 May 2012 20:41:43 GMT">>},
>                                       {<<"end_time">>,
>                                        <<"Mon, 04 Jun 2012 02:37:33 GMT">>},
>                                       {<<"start_last_seq">>,1372404},
>                                       {<<"end_last_seq">>,1384738},
>                                       {<<"recorded_seq">>,1384738},
>                                       {<<"missing_checked">>,12334},
>                                       {<<"missing_found">>,12333},
>                                       {<<"docs_read">>,12333},
>                                       {<<"docs_written">>,12333},
>                                       {<<"doc_write_failures">>,0}]},
>                                     {[{<<"session_id">>,
>                                        <<"52a16e8832f70dc094f6fff5e9b7d75b">>},
>                                       {<<"start_time">>,
>                                        <<"Sun, 27 May 2012 23:36:41 GMT">>},
>                                       {<<"end_time">>,
>                                        <<"Wed, 30 May 2012 20:40:14 GMT">>},
>                                       {<<"start_last_seq">>,1361049},
>                                       {<<"end_last_seq">>,1372404},
>                                       {<<"recorded_seq">>,1372404},
>                                       {<<"missing_checked">>,11355},
>                                       {<<"missing_found">>,11355},
>                                       {<<"docs_read">>,11355},
>                                       {<<"docs_written">>,11355},
>                                       {<<"doc_write_failures">>,0}]},
> [...lots of these...]
> 
>                                 [],false,[]},
>                             #Ref<0.0.15.159973>}],
>                           false,false}
> ** When Server state == {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,
>                            <0.290.0>,<0.286.0>,<0.367.0>,
>                            {db_header,6,992456,0,
>                                {943280145,{744250,975,647546641},60017672},
>                                {943282327,745225,42485979},
>                                {943267963,[],5753},
>                                0,nil,nil,1000},
>                            992456,
>                            {btree,<0.286.0>,
>                                {943280145,{744250,975,647546641},60017672},
>                                #Fun<couch_db_updater.10.57960608>,
>                                #Fun<couch_db_updater.11.57960608>,
>                                #Fun<couch_btree.5.133731799>,
>                                #Fun<couch_db_updater.12.57960608>,snappy},
>                            {btree,<0.286.0>,
>                                {943282327,745225,42485979},
>                                #Fun<couch_db_updater.13.57960608>,
>                                #Fun<couch_db_updater.14.57960608>,
>                                #Fun<couch_btree.5.133731799>,
>                                #Fun<couch_db_updater.15.57960608>,snappy},
>                            {btree,<0.286.0>,
>                                {943267963,[],5753},
>                                #Fun<couch_btree.3.133731799>,
>                                #Fun<couch_btree.4.133731799>,
>                                #Fun<couch_btree.5.133731799>,nil,snappy},
>                            992456,<<"cbstats">>,
>                            "/Volumes/terror/db/couchdb/cbstats.couch",[],[],
>                            nil,
>                            {user_ctx,null,[],undefined},
>                            nil,1000,
>                            [before_header,after_header,on_file_open],
>                            [{user_ctx,
>                                 {user_ctx,null,[<<"_admin">>],undefined}}],
>                            snappy,nil,nil}
> ** Reason for termination == 
> ** {timeout,
>       {gen_server,call,
>           [<0.288.0>,
>            {db_updated,
>                {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,<0.290.0>,
>                    <0.286.0>,<0.367.0>,
>                    {db_header,6,992456,0,
>                        {943280145,{744250,975,647546641},60017672},
>                        {943282327,745225,42485979},
>                        {943267963,[],5753},
>                        0,nil,nil,1000},
>                    992456,
>                    {btree,<0.286.0>,
>                        {943280145,{744250,975,647546641},60017672},
>                        #Fun<couch_db_updater.10.57960608>,
>                        #Fun<couch_db_updater.11.57960608>,
>                        #Fun<couch_btree.5.133731799>,
>                        #Fun<couch_db_updater.12.57960608>,snappy},
>                    {btree,<0.286.0>,
>                        {943282327,745225,42485979},
>                        #Fun<couch_db_updater.13.57960608>,
>                        #Fun<couch_db_updater.14.57960608>,
>                        #Fun<couch_btree.5.133731799>,
>                        #Fun<couch_db_updater.15.57960608>,snappy},
>                    {btree,<0.286.0>,
>                        {943284347,[],5756},
>                        #Fun<couch_btree.3.133731799>,
>                        #Fun<couch_btree.4.133731799>,
>                        #Fun<couch_btree.5.133731799>,nil,snappy},
>                    992456,<<"cbstats">>,
>                    "/Volumes/terror/db/couchdb/cbstats.couch",[],[],nil,
>                    {user_ctx,null,[],undefined},
>                    #Ref<0.0.15.160107>,1000,
>                    [before_header,after_header,on_file_open],
>                    [{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}],
>                    snappy,nil,nil}}]}}
> 
> 
> 
> 
> -- 
> dustin sallings
> 
> 
> 

-- 
dustin sallings




Re: replication problems

Posted by Robert Newson <ro...@gmail.com>.
With the caveat that we don't "need" id stability, yes. It's highly
desirable that we can find a checkpoint if one exists, though.
Consider a 3 node bigcouch cluster. Depending in which node handles
the replication, you'll get a different rep id, and therefore look for
a different checkpoint doc. if all 3 nodes would generate the same rep
id for a task, the checkpoint doc will always be found, since each
node can find any document in the database, even if its held remotely.
The dynamic port case from the ticket is another way to see that we
can have a checkpoint miss despite there being a perfectly good
checkpoint.

I'd like to hear more on the API approach as I can't see how it would
work, or at least not work better than the proposal for each node to
generate a uuid, report it in the welcome message, and create a
generation 3 rep id algo that would fetch the uuid and substitute it
for the unstable host:port values.

One other thought occurs, if the port is equal to 5984, we can assume
its stable and not do the GET / to find the uuid. Perhaps a port
whitelist for folks to move to 80 or 443 too.

Sent from the ocean floor

On 11 Oct 2012, at 15:18, Dave Cottlehuber <dc...@jsonified.com> wrote:

> On 11 October 2012 15:58, Bob Dionne <di...@dionne-associates.com> wrote:
>> Incorporating a unique id from the source and target seems like a good way to go but I'm wondering if an id from an ini file will
>> work in the clustered BigCouch case. Would an API level request work better? Something the replicator would interrogate
>> for both the source and the target.
>
> Making sure I understand the problem here:
>
> For couches that may not have a fixed IP/DNS/port combination, we need
> a sticky way of identifying replication endpoints, right.
>
> The API approach makes sense, perhaps something you get back in GET / ?
>
> I think this won't work however for anybody with a round-robin DNS or
> loadbalancer in front a multiple (possibly geographically spread)
> clusters (whether BC or normal couch). Not sure how common a scenario
> this is though.
>
> A+
> Dave

Re: replication problems

Posted by Dave Cottlehuber <dc...@jsonified.com>.
On 11 October 2012 15:58, Bob Dionne <di...@dionne-associates.com> wrote:
> Incorporating a unique id from the source and target seems like a good way to go but I'm wondering if an id from an ini file will
> work in the clustered BigCouch case. Would an API level request work better? Something the replicator would interrogate
> for both the source and the target.

Making sure I understand the problem here:

For couches that may not have a fixed IP/DNS/port combination, we need
a sticky way of identifying replication endpoints, right.

The API approach makes sense, perhaps something you get back in GET / ?

I think this won't work however for anybody with a round-robin DNS or
loadbalancer in front a multiple (possibly geographically spread)
clusters (whether BC or normal couch). Not sure how common a scenario
this is though.

A+
Dave

Re: replication problems

Posted by Robert Newson <ro...@gmail.com>.
Well, sure, if source or target are remote, the replication_id
function will have to make an http call to get the uuid. We can skip
that if the port is outside the dynamic range, so, mostly we only ever
do these extra calls when needed, the upside of a stable checkpoint
outweighing the 1 or 2 http calls when starting a replication.

Easier to describe this as code, I think, than prose. Seems simple and easy.

Sent from the ocean floor
On 11 Oct 2012, at 17:14, Bob Dionne <di...@dionne-associates.com> wrote:

>
> On Oct 11, 2012, at 10:27 AM, Robert Newson <ro...@gmail.com> wrote:
>
>> I figured we'd set the same replicator uuid for all nodes of a
>> cluster, same as for cookie auth. Removing host:port and the
>> inet:gethostname bits will be nice. We'll still find gen 2
>> checkpoints, replacing them with gen 3.
>
> Yes, I think that works. There still needs to be a way to surface that for both source and target, in the case the replication is mediated on a third machine. Overloading the semantics of the welcome message seems ok
>
>
>
>
>> Of course, it's possible all
>> this relaxing by the pool stuff has damaged my brain.
>>
>> Sent from the ocean floor
>>
>> On 11 Oct 2012, at 14:59, Bob Dionne <di...@dionne-associates.com> wrote:
>>
>>> Incorporating a unique id from the source and target seems like a good way to go but I'm wondering if an id from an ini file will
>>> work in the clustered BigCouch case. Would an API level request work better? Something the replicator would interrogate
>>> for both the source and the target.
>>>
>>>
>>> On Oct 11, 2012, at 5:42 AM, Robert Newson <ro...@gmail.com> wrote:
>>>
>>>> I'll note here that the attached patch is wrong. It uses a single uuid
>>>> from the node running replication, which might not be the source or
>>>> target. Instead, the uuid of source and target must be retrieved and
>>>> used instead of the host:port. Jason's suggestion to add the uuid
>>>> (stored in the ini file) to the welcome message sounds really good to
>>>> me.
>>>>
>>>> Can't attach this to the ticket today as I don't have my Jira creds.
>>>>
>>>> Sent from the ocean floor
>>>>
>>>> On 10 Oct 2012, at 21:40, Jan Lehnardt <ja...@apache.org> wrote:
>>>>
>>>>> flagged.
>>>>>
>>>>> On Oct 10, 2012, at 22:34 , Robert Newson <ro...@gmail.com> wrote:
>>>>>
>>>>>> Jan,
>>>>>>
>>>>>> Flag that as fix-for 1.3? I don't have my creds on my phone to do it.
>>>>>>
>>>>>> I like the ini uuid idea best, modelled after the cookie with secret.
>>>>>> If we have the uuid, we'd omit host name as well as port, right?
>>>>>>
>>>>>> Sent from the ocean floor
>>>>>>
>>>>>> On 10 Oct 2012, at 21:12, Jan Lehnardt <ja...@apache.org> wrote:
>>>>>>
>>>>>>> Filipe tells me this is https://issues.apache.org/jira/browse/COUCHDB-1259
>>>>>>>
>>>>>>> Cheers
>>>>>>> Jan
>>>>>>> --
>>>>>>>
>>>>>>> On Oct 4, 2012, at 02:28 , Dustin Sallings <du...@spy.net> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> I'm bringing this back up as requested.  I'm currently simultaneously in the "not replicating interesting things" and "has duplicate replicates state".  I think the stuff below shows the "not replicating" stuff.
>>>>>>>>
>>>>>>>> Active tasks shows the other (these are based on replicator DB documents (example below):
>>>>>>>>
>>>>>>>> [
>>>>>>>> {
>>>>>>>> "checkpointed_source_seq": 2022317,
>>>>>>>> "continuous": true,
>>>>>>>> "doc_id": "cbstats-from-dogbowl",
>>>>>>>> "doc_write_failures": 0,
>>>>>>>> "docs_read": 300,
>>>>>>>> "docs_written": 300,
>>>>>>>> "missing_revisions_found": 300,
>>>>>>>> "pid": "<0.10466.12>",
>>>>>>>> "progress": 100,
>>>>>>>> "replication_id": "50daecd0a29f4b7e5d102990831f3d64+continuous",
>>>>>>>> "revisions_checked": 304,
>>>>>>>> "source": "http://dustin:*****@single.couchbase.net/cbstats/",
>>>>>>>> "source_seq": 2022317,
>>>>>>>> "started_on": 1349309457,
>>>>>>>> "target": "cbstats",
>>>>>>>> "type": "replication",
>>>>>>>> "updated_on": 1349310442
>>>>>>>> },
>>>>>>>> {
>>>>>>>> "checkpointed_source_seq": 2022317,
>>>>>>>> "continuous": true,
>>>>>>>> "doc_id": "cbstats-from-dogbowl",
>>>>>>>> "doc_write_failures": 0,
>>>>>>>> "docs_read": 62,
>>>>>>>> "docs_written": 62,
>>>>>>>> "missing_revisions_found": 62,
>>>>>>>> "pid": "<0.11019.12>",
>>>>>>>> "progress": 100,
>>>>>>>> "replication_id": "411e341d5aa9a3fe636cf4ea8ba71720+continuous",
>>>>>>>> "revisions_checked": 304,
>>>>>>>> "source": "http://dustin:*****@single.couchbase.net/cbstats/",
>>>>>>>> "source_seq": 2022317,
>>>>>>>> "started_on": 1349309471,
>>>>>>>> "target": "cbstats",
>>>>>>>> "type": "replication",
>>>>>>>> "updated_on": 1349310443
>>>>>>>> },
>>>>>>>> {
>>>>>>>> "checkpointed_source_seq": 107068,
>>>>>>>> "continuous": true,
>>>>>>>> "doc_id": "gerrit-from-prod",
>>>>>>>> "doc_write_failures": 0,
>>>>>>>> "docs_read": 22,
>>>>>>>> "docs_written": 22,
>>>>>>>> "missing_revisions_found": 22,
>>>>>>>> "pid": "<0.11086.12>",
>>>>>>>> "progress": 100,
>>>>>>>> "replication_id": "4a21031dac0d81637a23c32bad620be9+continuous",
>>>>>>>> "revisions_checked": 26,
>>>>>>>> "source": "http://dustinphoto.iriscouch.com/gerrit/",
>>>>>>>> "source_seq": 107068,
>>>>>>>> "started_on": 1349309487,
>>>>>>>> "target": "gerrit",
>>>>>>>> "type": "replication",
>>>>>>>> "updated_on": 1349310445
>>>>>>>> },
>>>>>>>> {
>>>>>>>> "checkpointed_source_seq": 107068,
>>>>>>>> "continuous": true,
>>>>>>>> "doc_id": "gerrit-from-prod",
>>>>>>>> "doc_write_failures": 0,
>>>>>>>> "docs_read": 17,
>>>>>>>> "docs_written": 17,
>>>>>>>> "missing_revisions_found": 17,
>>>>>>>> "pid": "<0.11107.12>",
>>>>>>>> "progress": 100,
>>>>>>>> "replication_id": "b4ad5d3f2e5b78670e4c8364b18000e9+continuous",
>>>>>>>> "revisions_checked": 26,
>>>>>>>> "source": "http://dustinphoto.iriscouch.com/gerrit/",
>>>>>>>> "source_seq": 107068,
>>>>>>>> "started_on": 1349309488,
>>>>>>>> "target": "gerrit",
>>>>>>>> "type": "replication",
>>>>>>>> "updated_on": 1349310445
>>>>>>>> }
>>>>>>>> ]
>>>>>>>>
>>>>>>>>
>>>>>>>> The replicator document for the latter, for example is this:
>>>>>>>>
>>>>>>>> {
>>>>>>>> "_id": "gerrit-from-prod",
>>>>>>>> "_rev": "2235-36de10fb757581a1782dacbb26ee4809",
>>>>>>>> "source": "http://dustinphoto.iriscouch.com/gerrit",
>>>>>>>> "target": "gerrit",
>>>>>>>> "continuous": true,
>>>>>>>> "user_ctx": {
>>>>>>>> "roles": [
>>>>>>>>    "_admin"
>>>>>>>> ]
>>>>>>>> },
>>>>>>>> "_replication_state_time": "2012-10-03T17:11:27-07:00",
>>>>>>>> "_replication_id": "b4ad5d3f2e5b78670e4c8364b18000e9",
>>>>>>>> "_replication_state": "triggered"
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> Begin forwarded message:
>>>>>>>>
>>>>>>>>> From: Dustin Sallings <du...@spy.net>
>>>>>>>>> Subject: Re: replication problems
>>>>>>>>> Date: June 15, 2012 0:10:04 PDT
>>>>>>>>> To: dev@couchdb.apache.org
>>>>>>>>> Reply-To: dev@couchdb.apache.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Jun 14, 2012, at 11:28 PM, Benoit Chesneau wrote:
>>>>>>>>>
>>>>>>>>>> Ar you using _replicate or _replicator ? Anything interresting in logs?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I'm using _replicator (wonderful feature, I just kill the DB and everything goes back the way I want it).
>>>>>>>>>
>>>>>>>>> Hmm...  I do think I found some stuff digging through the logs.  This is the local DB I noticed not doing its thing, although there were tons of errors all around this.  Looks like the server got into some kind of bad state and sort of half-crashed.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [Thu, 14 Jun 2012 23:20:12 GMT] [error] [<0.133.0>] Replication `ae601df0373da82d1b4a9ff741c8ba18+continuous` (`rpics` -> `rpics-processed`) failed: {{timeout,{gen_server,call,[<0.213.0>,{open_ref_count,<0.4
>>>>>>>>> 42.0>}]}},
>>>>>>>>> {gen_server,call,
>>>>>>>>>     [couch_server,
>>>>>>>>>      {open,<<"rpics">>,
>>>>>>>>>            [{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}]},
>>>>>>>>>      infinity]}}
>>>>>>>>> [Thu, 14 Jun 2012 23:20:25 GMT] [error] [<0.383.0>] ** Generic server <0.383.0> terminating
>>>>>>>>> ** Last message in was {'EXIT',<0.384.0>,
>>>>>>>>>                {{timeout,
>>>>>>>>>                  {gen_server,call,
>>>>>>>>>                   [<0.213.0>,{open_ref_count,<0.442.0>}]}},
>>>>>>>>>                 {gen_server,call,
>>>>>>>>>                  [couch_server,
>>>>>>>>>                   {open,<<"cbstats">>,
>>>>>>>>>                    [{user_ctx,
>>>>>>>>>                      {user_ctx,null,[<<"_admin">>],undefined}},
>>>>>>>>>                     {user_ctx,
>>>>>>>>>                      {user_ctx,null,[<<"_admin">>],undefined}}]},
>>>>>>>>>                   infinity]}}}
>>>>>>>>>
>>>>>>>>> ** When Server state == {state,<0.272.0>,<0.384.0>,20,
>>>>>>>>>                 {httpdb,
>>>>>>>>>                  "http://dustin:LOGGED_PASSWORD@single.couchbase.net/cbstats/",
>>>>>>>>>                  nil,
>>>>>>>>>                  [{"Accept","application/json"},
>>>>>>>>>                   {"User-Agent","CouchDB/1.2.0"}],
>>>>>>>>>                  30000,
>>>>>>>>>                  [{socket_options,
>>>>>>>>>                    [{keepalive,true},{nodelay,false}]}],
>>>>>>>>>                  10,250,<0.273.0>,20},
>>>>>>>>>                 {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,
>>>>>>>>>                  <0.290.0>,<0.286.0>,<0.367.0>,
>>>>>>>>>                  {db_header,6,984356,0,
>>>>>>>>>                   {860345646,{737369,975,640891414},59433736},
>>>>>>>>>                   {860348005,738344,42056446},
>>>>>>>>>                   {860352635,[],5737},
>>>>>>>>>                   0,nil,nil,1000},
>>>>>>>>>                  984356,
>>>>>>>>>                  {btree,<0.286.0>,
>>>>>>>>>                   {860345646,{737369,975,640891414},59433736},
>>>>>>>>>                   #Fun<couch_db_updater.10.57960608>,
>>>>>>>>>                   #Fun<couch_db_updater.11.57960608>,
>>>>>>>>>                   #Fun<couch_btree.5.133731799>,
>>>>>>>>>                   #Fun<couch_db_updater.12.57960608>,snappy},
>>>>>>>>>                  {btree,<0.286.0>,
>>>>>>>>>                   {860348005,738344,42056446},
>>>>>>>>>                   #Fun<couch_db_updater.13.57960608>,
>>>>>>>>>                   #Fun<couch_db_updater.14.57960608>,
>>>>>>>>>                   #Fun<couch_btree.5.133731799>,
>>>>>>>>>                   #Fun<couch_db_updater.15.57960608>,snappy},
>>>>>>>>>                  {btree,<0.286.0>,
>>>>>>>>>                   {860352635,[],5737},
>>>>>>>>>                   #Fun<couch_btree.3.133731799>,
>>>>>>>>>                   #Fun<couch_btree.4.133731799>,
>>>>>>>>>                   #Fun<couch_btree.5.133731799>,nil,snappy},
>>>>>>>>>                  984356,<<"cbstats">>,
>>>>>>>>>                  "/Volumes/terror/db/couchdb/cbstats.couch",[],[],
>>>>>>>>>                  nil,
>>>>>>>>>                  {user_ctx,null,[<<"_admin">>],undefined},
>>>>>>>>>                  nil,1000,
>>>>>>>>>                  [before_header,after_header,on_file_open],
>>>>>>>>>                  [{user_ctx,
>>>>>>>>>                    {user_ctx,null,[<<"_admin">>],undefined}}],
>>>>>>>>>                  snappy,nil,nil},
>>>>>>>>>                 [],nil,nil,nil,
>>>>>>>>>                 {rep_stats,0,0,0,0,0},
>>>>>>>>>                 nil,<0.385.0>,
>>>>>>>>>                 {batch,[],0}}
>>>>>>>>> ** Reason for termination ==
>>>>>>>>> ** {noproc,{gen_server,call,[<0.367.0>,{drop,<0.383.0>},infinity]}}
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Scrolling to the beginning of the errors, I find this:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [Thu, 14 Jun 2012 23:15:54 GMT] [error] [<0.164.0>] Replication `543f76281e8d52d6ce5b51fddf0588e7+continuous` (`photo` -> `http://dustin:*****@dustinphoto.couchone.com/photo/`) failed: source_db_down
>>>>>>>>> [Thu, 14 Jun 2012 23:18:57 GMT] [info] [<0.358.0>] 127.0.0.1 - - GET /_all_dbs 200
>>>>>>>>> [Thu, 14 Jun 2012 23:19:52 GMT] [error] [<0.289.0>] ** Generic server <0.289.0> terminating
>>>>>>>>> ** Last message in was {update_docs,<0.272.0>,[],
>>>>>>>>>                   [{{doc,
>>>>>>>>>                         <<"_local/c4cc070f896d7267e52ba012856fed4b">>,
>>>>>>>>>                         {0,[<<"346185">>]},
>>>>>>>>>                         {[{<<"session_id">>,
>>>>>>>>>                            <<"9fb3475683d44bb1e151031dd42cc59f">>},
>>>>>>>>>                           {<<"source_last_seq">>,1419004},
>>>>>>>>>                           {<<"replication_id_version">>,2},
>>>>>>>>>                           {<<"history">>,
>>>>>>>>>                            [{[{<<"session_id">>,
>>>>>>>>>                                <<"9fb3475683d44bb1e151031dd42cc59f">>},
>>>>>>>>>                               {<<"start_time">>,
>>>>>>>>>                                <<"Thu, 14 Jun 2012 01:35:02 GMT">>},
>>>>>>>>>                               {<<"end_time">>,
>>>>>>>>>                                <<"Thu, 14 Jun 2012 23:15:29 GMT">>},
>>>>>>>>>                               {<<"start_last_seq">>,1410146},
>>>>>>>>>                               {<<"end_last_seq">>,1419004},
>>>>>>>>>                               {<<"recorded_seq">>,1419004},
>>>>>>>>>                               {<<"missing_checked">>,8100},
>>>>>>>>>                               {<<"missing_found">>,8100},
>>>>>>>>>                               {<<"docs_read">>,8100},
>>>>>>>>>                               {<<"docs_written">>,8100},
>>>>>>>>>                               {<<"doc_write_failures">>,0}]},
>>>>>>>>>                             {[{<<"session_id">>,
>>>>>>>>>                                <<"3edd7c50327eab7ec0768451e34efa8b">>},
>>>>>>>>>                               {<<"start_time">>,
>>>>>>>>>                                <<"Tue, 12 Jun 2012 05:51:17 GMT">>},
>>>>>>>>>                               {<<"end_time">>,
>>>>>>>>>                                <<"Tue, 12 Jun 2012 13:02:37 GMT">>},
>>>>>>>>>                               {<<"start_last_seq">>,1407186},
>>>>>>>>>                               {<<"end_last_seq">>,1410146},
>>>>>>>>>                               {<<"recorded_seq">>,1410146},
>>>>>>>>>                               {<<"missing_checked">>,2583},
>>>>>>>>>                               {<<"missing_found">>,2577},
>>>>>>>>>                               {<<"docs_read">>,2577},
>>>>>>>>>                               {<<"docs_written">>,2577},
>>>>>>>>>                               {<<"doc_write_failures">>,0}]},
>>>>>>>>>                             {[{<<"session_id">>,
>>>>>>>>>                                <<"172de62044281a01b1584a9d099f42af">>},
>>>>>>>>>                               {<<"start_time">>,
>>>>>>>>>                                <<"Mon, 11 Jun 2012 03:40:11 GMT">>},
>>>>>>>>>                               {<<"end_time">>,
>>>>>>>>>                                <<"Mon, 11 Jun 2012 15:16:24 GMT">>},
>>>>>>>>>                               {<<"start_last_seq">>,1405428},
>>>>>>>>>                               {<<"end_last_seq">>,1407186},
>>>>>>>>>                               {<<"recorded_seq">>,1407186},
>>>>>>>>>                               {<<"missing_checked">>,1721},
>>>>>>>>>                               {<<"missing_found">>,1721},
>>>>>>>>>                               {<<"docs_read">>,1721},
>>>>>>>>>                               {<<"docs_written">>,1721},
>>>>>>>>>                               {<<"doc_write_failures">>,0}]},
>>>>>>>>>                             {[{<<"session_id">>,
>>>>>>>>>                                <<"e60a126a2036c5fab00a1249101820c8">>},
>>>>>>>>>                               {<<"start_time">>,
>>>>>>>>>                                <<"Sat, 09 Jun 2012 07:47:22 GMT">>},
>>>>>>>>>                               {<<"end_time">>,
>>>>>>>>>                                <<"Sun, 10 Jun 2012 21:16:20 GMT">>},
>>>>>>>>>                               {<<"start_last_seq">>,1386289},
>>>>>>>>>                               {<<"end_last_seq">>,1405428},
>>>>>>>>>                               {<<"recorded_seq">>,1405428},
>>>>>>>>>                               {<<"missing_checked">>,16977},
>>>>>>>>>                               {<<"missing_found">>,16977},
>>>>>>>>>                               {<<"docs_read">>,16977},
>>>>>>>>>                               {<<"docs_written">>,16977},
>>>>>>>>>                               {<<"doc_write_failures">>,0}]},
>>>>>>>>>                             {[{<<"session_id">>,
>>>>>>>>>                                <<"ef3e4333d340dcf73ddfa3fe8c720042">>},
>>>>>>>>>                               {<<"start_time">>,
>>>>>>>>>                                <<"Mon, 04 Jun 2012 02:39:44 GMT">>},
>>>>>>>>>                               {<<"end_time">>,
>>>>>>>>>                                <<"Mon, 04 Jun 2012 12:35:50 GMT">>},
>>>>>>>>>                               {<<"start_last_seq">>,1384738},
>>>>>>>>>                               {<<"end_last_seq">>,1386289},
>>>>>>>>>                               {<<"recorded_seq">>,1386289},
>>>>>>>>>                               {<<"missing_checked">>,1551},
>>>>>>>>>                               {<<"missing_found">>,1550},
>>>>>>>>>                               {<<"docs_read">>,1550},
>>>>>>>>>                               {<<"docs_written">>,1550},
>>>>>>>>>                               {<<"doc_write_failures">>,0}]},
>>>>>>>>>                             {[{<<"session_id">>,
>>>>>>>>>                                <<"d5123a3caf462794aaf5a47be1bb3b6e">>},
>>>>>>>>>                               {<<"start_time">>,
>>>>>>>>>                                <<"Wed, 30 May 2012 20:41:43 GMT">>},
>>>>>>>>>                               {<<"end_time">>,
>>>>>>>>>                                <<"Mon, 04 Jun 2012 02:37:33 GMT">>},
>>>>>>>>>                               {<<"start_last_seq">>,1372404},
>>>>>>>>>                               {<<"end_last_seq">>,1384738},
>>>>>>>>>                               {<<"recorded_seq">>,1384738},
>>>>>>>>>                               {<<"missing_checked">>,12334},
>>>>>>>>>                               {<<"missing_found">>,12333},
>>>>>>>>>                               {<<"docs_read">>,12333},
>>>>>>>>>                               {<<"docs_written">>,12333},
>>>>>>>>>                               {<<"doc_write_failures">>,0}]},
>>>>>>>>>                             {[{<<"session_id">>,
>>>>>>>>>                                <<"52a16e8832f70dc094f6fff5e9b7d75b">>},
>>>>>>>>>                               {<<"start_time">>,
>>>>>>>>>                                <<"Sun, 27 May 2012 23:36:41 GMT">>},
>>>>>>>>>                               {<<"end_time">>,
>>>>>>>>>                                <<"Wed, 30 May 2012 20:40:14 GMT">>},
>>>>>>>>>                               {<<"start_last_seq">>,1361049},
>>>>>>>>>                               {<<"end_last_seq">>,1372404},
>>>>>>>>>                               {<<"recorded_seq">>,1372404},
>>>>>>>>>                               {<<"missing_checked">>,11355},
>>>>>>>>>                               {<<"missing_found">>,11355},
>>>>>>>>>                               {<<"docs_read">>,11355},
>>>>>>>>>                               {<<"docs_written">>,11355},
>>>>>>>>>                               {<<"doc_write_failures">>,0}]},
>>>>>>>>> [...lots of these...]
>>>>>>>>>
>>>>>>>>>                         [],false,[]},
>>>>>>>>>                     #Ref<0.0.15.159973>}],
>>>>>>>>>                   false,false}
>>>>>>>>> ** When Server state == {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,
>>>>>>>>>                    <0.290.0>,<0.286.0>,<0.367.0>,
>>>>>>>>>                    {db_header,6,992456,0,
>>>>>>>>>                        {943280145,{744250,975,647546641},60017672},
>>>>>>>>>                        {943282327,745225,42485979},
>>>>>>>>>                        {943267963,[],5753},
>>>>>>>>>                        0,nil,nil,1000},
>>>>>>>>>                    992456,
>>>>>>>>>                    {btree,<0.286.0>,
>>>>>>>>>                        {943280145,{744250,975,647546641},60017672},
>>>>>>>>>                        #Fun<couch_db_updater.10.57960608>,
>>>>>>>>>                        #Fun<couch_db_updater.11.57960608>,
>>>>>>>>>                        #Fun<couch_btree.5.133731799>,
>>>>>>>>>                        #Fun<couch_db_updater.12.57960608>,snappy},
>>>>>>>>>                    {btree,<0.286.0>,
>>>>>>>>>                        {943282327,745225,42485979},
>>>>>>>>>                        #Fun<couch_db_updater.13.57960608>,
>>>>>>>>>                        #Fun<couch_db_updater.14.57960608>,
>>>>>>>>>                        #Fun<couch_btree.5.133731799>,
>>>>>>>>>                        #Fun<couch_db_updater.15.57960608>,snappy},
>>>>>>>>>                    {btree,<0.286.0>,
>>>>>>>>>                        {943267963,[],5753},
>>>>>>>>>                        #Fun<couch_btree.3.133731799>,
>>>>>>>>>                        #Fun<couch_btree.4.133731799>,
>>>>>>>>>                        #Fun<couch_btree.5.133731799>,nil,snappy},
>>>>>>>>>                    992456,<<"cbstats">>,
>>>>>>>>>                    "/Volumes/terror/db/couchdb/cbstats.couch",[],[],
>>>>>>>>>                    nil,
>>>>>>>>>                    {user_ctx,null,[],undefined},
>>>>>>>>>                    nil,1000,
>>>>>>>>>                    [before_header,after_header,on_file_open],
>>>>>>>>>                    [{user_ctx,
>>>>>>>>>                         {user_ctx,null,[<<"_admin">>],undefined}}],
>>>>>>>>>                    snappy,nil,nil}
>>>>>>>>> ** Reason for termination ==
>>>>>>>>> ** {timeout,
>>>>>>>>> {gen_server,call,
>>>>>>>>>   [<0.288.0>,
>>>>>>>>>    {db_updated,
>>>>>>>>>        {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,<0.290.0>,
>>>>>>>>>            <0.286.0>,<0.367.0>,
>>>>>>>>>            {db_header,6,992456,0,
>>>>>>>>>                {943280145,{744250,975,647546641},60017672},
>>>>>>>>>                {943282327,745225,42485979},
>>>>>>>>>                {943267963,[],5753},
>>>>>>>>>                0,nil,nil,1000},
>>>>>>>>>            992456,
>>>>>>>>>            {btree,<0.286.0>,
>>>>>>>>>                {943280145,{744250,975,647546641},60017672},
>>>>>>>>>                #Fun<couch_db_updater.10.57960608>,
>>>>>>>>>                #Fun<couch_db_updater.11.57960608>,
>>>>>>>>>                #Fun<couch_btree.5.133731799>,
>>>>>>>>>                #Fun<couch_db_updater.12.57960608>,snappy},
>>>>>>>>>            {btree,<0.286.0>,
>>>>>>>>>                {943282327,745225,42485979},
>>>>>>>>>                #Fun<couch_db_updater.13.57960608>,
>>>>>>>>>                #Fun<couch_db_updater.14.57960608>,
>>>>>>>>>                #Fun<couch_btree.5.133731799>,
>>>>>>>>>                #Fun<couch_db_updater.15.57960608>,snappy},
>>>>>>>>>            {btree,<0.286.0>,
>>>>>>>>>                {943284347,[],5756},
>>>>>>>>>                #Fun<couch_btree.3.133731799>,
>>>>>>>>>                #Fun<couch_btree.4.133731799>,
>>>>>>>>>                #Fun<couch_btree.5.133731799>,nil,snappy},
>>>>>>>>>            992456,<<"cbstats">>,
>>>>>>>>>            "/Volumes/terror/db/couchdb/cbstats.couch",[],[],nil,
>>>>>>>>>            {user_ctx,null,[],undefined},
>>>>>>>>>            #Ref<0.0.15.160107>,1000,
>>>>>>>>>            [before_header,after_header,on_file_open],
>>>>>>>>>            [{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}],
>>>>>>>>>            snappy,nil,nil}}]}}
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> dustin sallings
>>>>>>>>
>>>>>>>> --
>>>>>>>> dustin sallings
>

Re: replication problems

Posted by Bob Dionne <di...@dionne-associates.com>.
On Oct 11, 2012, at 10:27 AM, Robert Newson <ro...@gmail.com> wrote:

> I figured we'd set the same replicator uuid for all nodes of a
> cluster, same as for cookie auth. Removing host:port and the
> inet:gethostname bits will be nice. We'll still find gen 2
> checkpoints, replacing them with gen 3.

Yes, I think that works. There still needs to be a way to surface that for both source and target, in the case the replication is mediated on a third machine. Overloading the semantics of the welcome message seems ok




> Of course, it's possible all
> this relaxing by the pool stuff has damaged my brain.
> 
> Sent from the ocean floor
> 
> On 11 Oct 2012, at 14:59, Bob Dionne <di...@dionne-associates.com> wrote:
> 
>> Incorporating a unique id from the source and target seems like a good way to go but I'm wondering if an id from an ini file will
>> work in the clustered BigCouch case. Would an API level request work better? Something the replicator would interrogate
>> for both the source and the target.
>> 
>> 
>> On Oct 11, 2012, at 5:42 AM, Robert Newson <ro...@gmail.com> wrote:
>> 
>>> I'll note here that the attached patch is wrong. It uses a single uuid
>>> from the node running replication, which might not be the source or
>>> target. Instead, the uuid of source and target must be retrieved and
>>> used instead of the host:port. Jason's suggestion to add the uuid
>>> (stored in the ini file) to the welcome message sounds really good to
>>> me.
>>> 
>>> Can't attach this to the ticket today as I don't have my Jira creds.
>>> 
>>> Sent from the ocean floor
>>> 
>>> On 10 Oct 2012, at 21:40, Jan Lehnardt <ja...@apache.org> wrote:
>>> 
>>>> flagged.
>>>> 
>>>> On Oct 10, 2012, at 22:34 , Robert Newson <ro...@gmail.com> wrote:
>>>> 
>>>>> Jan,
>>>>> 
>>>>> Flag that as fix-for 1.3? I don't have my creds on my phone to do it.
>>>>> 
>>>>> I like the ini uuid idea best, modelled after the cookie with secret.
>>>>> If we have the uuid, we'd omit host name as well as port, right?
>>>>> 
>>>>> Sent from the ocean floor
>>>>> 
>>>>> On 10 Oct 2012, at 21:12, Jan Lehnardt <ja...@apache.org> wrote:
>>>>> 
>>>>>> Filipe tells me this is https://issues.apache.org/jira/browse/COUCHDB-1259
>>>>>> 
>>>>>> Cheers
>>>>>> Jan
>>>>>> --
>>>>>> 
>>>>>> On Oct 4, 2012, at 02:28 , Dustin Sallings <du...@spy.net> wrote:
>>>>>> 
>>>>>>> 
>>>>>>> I'm bringing this back up as requested.  I'm currently simultaneously in the "not replicating interesting things" and "has duplicate replicates state".  I think the stuff below shows the "not replicating" stuff.
>>>>>>> 
>>>>>>> Active tasks shows the other (these are based on replicator DB documents (example below):
>>>>>>> 
>>>>>>> [
>>>>>>> {
>>>>>>>  "checkpointed_source_seq": 2022317,
>>>>>>>  "continuous": true,
>>>>>>>  "doc_id": "cbstats-from-dogbowl",
>>>>>>>  "doc_write_failures": 0,
>>>>>>>  "docs_read": 300,
>>>>>>>  "docs_written": 300,
>>>>>>>  "missing_revisions_found": 300,
>>>>>>>  "pid": "<0.10466.12>",
>>>>>>>  "progress": 100,
>>>>>>>  "replication_id": "50daecd0a29f4b7e5d102990831f3d64+continuous",
>>>>>>>  "revisions_checked": 304,
>>>>>>>  "source": "http://dustin:*****@single.couchbase.net/cbstats/",
>>>>>>>  "source_seq": 2022317,
>>>>>>>  "started_on": 1349309457,
>>>>>>>  "target": "cbstats",
>>>>>>>  "type": "replication",
>>>>>>>  "updated_on": 1349310442
>>>>>>> },
>>>>>>> {
>>>>>>>  "checkpointed_source_seq": 2022317,
>>>>>>>  "continuous": true,
>>>>>>>  "doc_id": "cbstats-from-dogbowl",
>>>>>>>  "doc_write_failures": 0,
>>>>>>>  "docs_read": 62,
>>>>>>>  "docs_written": 62,
>>>>>>>  "missing_revisions_found": 62,
>>>>>>>  "pid": "<0.11019.12>",
>>>>>>>  "progress": 100,
>>>>>>>  "replication_id": "411e341d5aa9a3fe636cf4ea8ba71720+continuous",
>>>>>>>  "revisions_checked": 304,
>>>>>>>  "source": "http://dustin:*****@single.couchbase.net/cbstats/",
>>>>>>>  "source_seq": 2022317,
>>>>>>>  "started_on": 1349309471,
>>>>>>>  "target": "cbstats",
>>>>>>>  "type": "replication",
>>>>>>>  "updated_on": 1349310443
>>>>>>> },
>>>>>>> {
>>>>>>>  "checkpointed_source_seq": 107068,
>>>>>>>  "continuous": true,
>>>>>>>  "doc_id": "gerrit-from-prod",
>>>>>>>  "doc_write_failures": 0,
>>>>>>>  "docs_read": 22,
>>>>>>>  "docs_written": 22,
>>>>>>>  "missing_revisions_found": 22,
>>>>>>>  "pid": "<0.11086.12>",
>>>>>>>  "progress": 100,
>>>>>>>  "replication_id": "4a21031dac0d81637a23c32bad620be9+continuous",
>>>>>>>  "revisions_checked": 26,
>>>>>>>  "source": "http://dustinphoto.iriscouch.com/gerrit/",
>>>>>>>  "source_seq": 107068,
>>>>>>>  "started_on": 1349309487,
>>>>>>>  "target": "gerrit",
>>>>>>>  "type": "replication",
>>>>>>>  "updated_on": 1349310445
>>>>>>> },
>>>>>>> {
>>>>>>>  "checkpointed_source_seq": 107068,
>>>>>>>  "continuous": true,
>>>>>>>  "doc_id": "gerrit-from-prod",
>>>>>>>  "doc_write_failures": 0,
>>>>>>>  "docs_read": 17,
>>>>>>>  "docs_written": 17,
>>>>>>>  "missing_revisions_found": 17,
>>>>>>>  "pid": "<0.11107.12>",
>>>>>>>  "progress": 100,
>>>>>>>  "replication_id": "b4ad5d3f2e5b78670e4c8364b18000e9+continuous",
>>>>>>>  "revisions_checked": 26,
>>>>>>>  "source": "http://dustinphoto.iriscouch.com/gerrit/",
>>>>>>>  "source_seq": 107068,
>>>>>>>  "started_on": 1349309488,
>>>>>>>  "target": "gerrit",
>>>>>>>  "type": "replication",
>>>>>>>  "updated_on": 1349310445
>>>>>>> }
>>>>>>> ]
>>>>>>> 
>>>>>>> 
>>>>>>> The replicator document for the latter, for example is this:
>>>>>>> 
>>>>>>> {
>>>>>>> "_id": "gerrit-from-prod",
>>>>>>> "_rev": "2235-36de10fb757581a1782dacbb26ee4809",
>>>>>>> "source": "http://dustinphoto.iriscouch.com/gerrit",
>>>>>>> "target": "gerrit",
>>>>>>> "continuous": true,
>>>>>>> "user_ctx": {
>>>>>>> "roles": [
>>>>>>>     "_admin"
>>>>>>> ]
>>>>>>> },
>>>>>>> "_replication_state_time": "2012-10-03T17:11:27-07:00",
>>>>>>> "_replication_id": "b4ad5d3f2e5b78670e4c8364b18000e9",
>>>>>>> "_replication_state": "triggered"
>>>>>>> }
>>>>>>> 
>>>>>>> 
>>>>>>> Begin forwarded message:
>>>>>>> 
>>>>>>>> From: Dustin Sallings <du...@spy.net>
>>>>>>>> Subject: Re: replication problems
>>>>>>>> Date: June 15, 2012 0:10:04 PDT
>>>>>>>> To: dev@couchdb.apache.org
>>>>>>>> Reply-To: dev@couchdb.apache.org
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Jun 14, 2012, at 11:28 PM, Benoit Chesneau wrote:
>>>>>>>> 
>>>>>>>>> Ar you using _replicate or _replicator ? Anything interresting in logs?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I'm using _replicator (wonderful feature, I just kill the DB and everything goes back the way I want it).
>>>>>>>> 
>>>>>>>> Hmm...  I do think I found some stuff digging through the logs.  This is the local DB I noticed not doing its thing, although there were tons of errors all around this.  Looks like the server got into some kind of bad state and sort of half-crashed.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> [Thu, 14 Jun 2012 23:20:12 GMT] [error] [<0.133.0>] Replication `ae601df0373da82d1b4a9ff741c8ba18+continuous` (`rpics` -> `rpics-processed`) failed: {{timeout,{gen_server,call,[<0.213.0>,{open_ref_count,<0.4
>>>>>>>> 42.0>}]}},
>>>>>>>> {gen_server,call,
>>>>>>>>      [couch_server,
>>>>>>>>       {open,<<"rpics">>,
>>>>>>>>             [{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}]},
>>>>>>>>       infinity]}}
>>>>>>>> [Thu, 14 Jun 2012 23:20:25 GMT] [error] [<0.383.0>] ** Generic server <0.383.0> terminating
>>>>>>>> ** Last message in was {'EXIT',<0.384.0>,
>>>>>>>>                 {{timeout,
>>>>>>>>                   {gen_server,call,
>>>>>>>>                    [<0.213.0>,{open_ref_count,<0.442.0>}]}},
>>>>>>>>                  {gen_server,call,
>>>>>>>>                   [couch_server,
>>>>>>>>                    {open,<<"cbstats">>,
>>>>>>>>                     [{user_ctx,
>>>>>>>>                       {user_ctx,null,[<<"_admin">>],undefined}},
>>>>>>>>                      {user_ctx,
>>>>>>>>                       {user_ctx,null,[<<"_admin">>],undefined}}]},
>>>>>>>>                    infinity]}}}
>>>>>>>> 
>>>>>>>> ** When Server state == {state,<0.272.0>,<0.384.0>,20,
>>>>>>>>                  {httpdb,
>>>>>>>>                   "http://dustin:LOGGED_PASSWORD@single.couchbase.net/cbstats/",
>>>>>>>>                   nil,
>>>>>>>>                   [{"Accept","application/json"},
>>>>>>>>                    {"User-Agent","CouchDB/1.2.0"}],
>>>>>>>>                   30000,
>>>>>>>>                   [{socket_options,
>>>>>>>>                     [{keepalive,true},{nodelay,false}]}],
>>>>>>>>                   10,250,<0.273.0>,20},
>>>>>>>>                  {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,
>>>>>>>>                   <0.290.0>,<0.286.0>,<0.367.0>,
>>>>>>>>                   {db_header,6,984356,0,
>>>>>>>>                    {860345646,{737369,975,640891414},59433736},
>>>>>>>>                    {860348005,738344,42056446},
>>>>>>>>                    {860352635,[],5737},
>>>>>>>>                    0,nil,nil,1000},
>>>>>>>>                   984356,
>>>>>>>>                   {btree,<0.286.0>,
>>>>>>>>                    {860345646,{737369,975,640891414},59433736},
>>>>>>>>                    #Fun<couch_db_updater.10.57960608>,
>>>>>>>>                    #Fun<couch_db_updater.11.57960608>,
>>>>>>>>                    #Fun<couch_btree.5.133731799>,
>>>>>>>>                    #Fun<couch_db_updater.12.57960608>,snappy},
>>>>>>>>                   {btree,<0.286.0>,
>>>>>>>>                    {860348005,738344,42056446},
>>>>>>>>                    #Fun<couch_db_updater.13.57960608>,
>>>>>>>>                    #Fun<couch_db_updater.14.57960608>,
>>>>>>>>                    #Fun<couch_btree.5.133731799>,
>>>>>>>>                    #Fun<couch_db_updater.15.57960608>,snappy},
>>>>>>>>                   {btree,<0.286.0>,
>>>>>>>>                    {860352635,[],5737},
>>>>>>>>                    #Fun<couch_btree.3.133731799>,
>>>>>>>>                    #Fun<couch_btree.4.133731799>,
>>>>>>>>                    #Fun<couch_btree.5.133731799>,nil,snappy},
>>>>>>>>                   984356,<<"cbstats">>,
>>>>>>>>                   "/Volumes/terror/db/couchdb/cbstats.couch",[],[],
>>>>>>>>                   nil,
>>>>>>>>                   {user_ctx,null,[<<"_admin">>],undefined},
>>>>>>>>                   nil,1000,
>>>>>>>>                   [before_header,after_header,on_file_open],
>>>>>>>>                   [{user_ctx,
>>>>>>>>                     {user_ctx,null,[<<"_admin">>],undefined}}],
>>>>>>>>                   snappy,nil,nil},
>>>>>>>>                  [],nil,nil,nil,
>>>>>>>>                  {rep_stats,0,0,0,0,0},
>>>>>>>>                  nil,<0.385.0>,
>>>>>>>>                  {batch,[],0}}
>>>>>>>> ** Reason for termination ==
>>>>>>>> ** {noproc,{gen_server,call,[<0.367.0>,{drop,<0.383.0>},infinity]}}
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Scrolling to the beginning of the errors, I find this:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> [Thu, 14 Jun 2012 23:15:54 GMT] [error] [<0.164.0>] Replication `543f76281e8d52d6ce5b51fddf0588e7+continuous` (`photo` -> `http://dustin:*****@dustinphoto.couchone.com/photo/`) failed: source_db_down
>>>>>>>> [Thu, 14 Jun 2012 23:18:57 GMT] [info] [<0.358.0>] 127.0.0.1 - - GET /_all_dbs 200
>>>>>>>> [Thu, 14 Jun 2012 23:19:52 GMT] [error] [<0.289.0>] ** Generic server <0.289.0> terminating
>>>>>>>> ** Last message in was {update_docs,<0.272.0>,[],
>>>>>>>>                    [{{doc,
>>>>>>>>                          <<"_local/c4cc070f896d7267e52ba012856fed4b">>,
>>>>>>>>                          {0,[<<"346185">>]},
>>>>>>>>                          {[{<<"session_id">>,
>>>>>>>>                             <<"9fb3475683d44bb1e151031dd42cc59f">>},
>>>>>>>>                            {<<"source_last_seq">>,1419004},
>>>>>>>>                            {<<"replication_id_version">>,2},
>>>>>>>>                            {<<"history">>,
>>>>>>>>                             [{[{<<"session_id">>,
>>>>>>>>                                 <<"9fb3475683d44bb1e151031dd42cc59f">>},
>>>>>>>>                                {<<"start_time">>,
>>>>>>>>                                 <<"Thu, 14 Jun 2012 01:35:02 GMT">>},
>>>>>>>>                                {<<"end_time">>,
>>>>>>>>                                 <<"Thu, 14 Jun 2012 23:15:29 GMT">>},
>>>>>>>>                                {<<"start_last_seq">>,1410146},
>>>>>>>>                                {<<"end_last_seq">>,1419004},
>>>>>>>>                                {<<"recorded_seq">>,1419004},
>>>>>>>>                                {<<"missing_checked">>,8100},
>>>>>>>>                                {<<"missing_found">>,8100},
>>>>>>>>                                {<<"docs_read">>,8100},
>>>>>>>>                                {<<"docs_written">>,8100},
>>>>>>>>                                {<<"doc_write_failures">>,0}]},
>>>>>>>>                              {[{<<"session_id">>,
>>>>>>>>                                 <<"3edd7c50327eab7ec0768451e34efa8b">>},
>>>>>>>>                                {<<"start_time">>,
>>>>>>>>                                 <<"Tue, 12 Jun 2012 05:51:17 GMT">>},
>>>>>>>>                                {<<"end_time">>,
>>>>>>>>                                 <<"Tue, 12 Jun 2012 13:02:37 GMT">>},
>>>>>>>>                                {<<"start_last_seq">>,1407186},
>>>>>>>>                                {<<"end_last_seq">>,1410146},
>>>>>>>>                                {<<"recorded_seq">>,1410146},
>>>>>>>>                                {<<"missing_checked">>,2583},
>>>>>>>>                                {<<"missing_found">>,2577},
>>>>>>>>                                {<<"docs_read">>,2577},
>>>>>>>>                                {<<"docs_written">>,2577},
>>>>>>>>                                {<<"doc_write_failures">>,0}]},
>>>>>>>>                              {[{<<"session_id">>,
>>>>>>>>                                 <<"172de62044281a01b1584a9d099f42af">>},
>>>>>>>>                                {<<"start_time">>,
>>>>>>>>                                 <<"Mon, 11 Jun 2012 03:40:11 GMT">>},
>>>>>>>>                                {<<"end_time">>,
>>>>>>>>                                 <<"Mon, 11 Jun 2012 15:16:24 GMT">>},
>>>>>>>>                                {<<"start_last_seq">>,1405428},
>>>>>>>>                                {<<"end_last_seq">>,1407186},
>>>>>>>>                                {<<"recorded_seq">>,1407186},
>>>>>>>>                                {<<"missing_checked">>,1721},
>>>>>>>>                                {<<"missing_found">>,1721},
>>>>>>>>                                {<<"docs_read">>,1721},
>>>>>>>>                                {<<"docs_written">>,1721},
>>>>>>>>                                {<<"doc_write_failures">>,0}]},
>>>>>>>>                              {[{<<"session_id">>,
>>>>>>>>                                 <<"e60a126a2036c5fab00a1249101820c8">>},
>>>>>>>>                                {<<"start_time">>,
>>>>>>>>                                 <<"Sat, 09 Jun 2012 07:47:22 GMT">>},
>>>>>>>>                                {<<"end_time">>,
>>>>>>>>                                 <<"Sun, 10 Jun 2012 21:16:20 GMT">>},
>>>>>>>>                                {<<"start_last_seq">>,1386289},
>>>>>>>>                                {<<"end_last_seq">>,1405428},
>>>>>>>>                                {<<"recorded_seq">>,1405428},
>>>>>>>>                                {<<"missing_checked">>,16977},
>>>>>>>>                                {<<"missing_found">>,16977},
>>>>>>>>                                {<<"docs_read">>,16977},
>>>>>>>>                                {<<"docs_written">>,16977},
>>>>>>>>                                {<<"doc_write_failures">>,0}]},
>>>>>>>>                              {[{<<"session_id">>,
>>>>>>>>                                 <<"ef3e4333d340dcf73ddfa3fe8c720042">>},
>>>>>>>>                                {<<"start_time">>,
>>>>>>>>                                 <<"Mon, 04 Jun 2012 02:39:44 GMT">>},
>>>>>>>>                                {<<"end_time">>,
>>>>>>>>                                 <<"Mon, 04 Jun 2012 12:35:50 GMT">>},
>>>>>>>>                                {<<"start_last_seq">>,1384738},
>>>>>>>>                                {<<"end_last_seq">>,1386289},
>>>>>>>>                                {<<"recorded_seq">>,1386289},
>>>>>>>>                                {<<"missing_checked">>,1551},
>>>>>>>>                                {<<"missing_found">>,1550},
>>>>>>>>                                {<<"docs_read">>,1550},
>>>>>>>>                                {<<"docs_written">>,1550},
>>>>>>>>                                {<<"doc_write_failures">>,0}]},
>>>>>>>>                              {[{<<"session_id">>,
>>>>>>>>                                 <<"d5123a3caf462794aaf5a47be1bb3b6e">>},
>>>>>>>>                                {<<"start_time">>,
>>>>>>>>                                 <<"Wed, 30 May 2012 20:41:43 GMT">>},
>>>>>>>>                                {<<"end_time">>,
>>>>>>>>                                 <<"Mon, 04 Jun 2012 02:37:33 GMT">>},
>>>>>>>>                                {<<"start_last_seq">>,1372404},
>>>>>>>>                                {<<"end_last_seq">>,1384738},
>>>>>>>>                                {<<"recorded_seq">>,1384738},
>>>>>>>>                                {<<"missing_checked">>,12334},
>>>>>>>>                                {<<"missing_found">>,12333},
>>>>>>>>                                {<<"docs_read">>,12333},
>>>>>>>>                                {<<"docs_written">>,12333},
>>>>>>>>                                {<<"doc_write_failures">>,0}]},
>>>>>>>>                              {[{<<"session_id">>,
>>>>>>>>                                 <<"52a16e8832f70dc094f6fff5e9b7d75b">>},
>>>>>>>>                                {<<"start_time">>,
>>>>>>>>                                 <<"Sun, 27 May 2012 23:36:41 GMT">>},
>>>>>>>>                                {<<"end_time">>,
>>>>>>>>                                 <<"Wed, 30 May 2012 20:40:14 GMT">>},
>>>>>>>>                                {<<"start_last_seq">>,1361049},
>>>>>>>>                                {<<"end_last_seq">>,1372404},
>>>>>>>>                                {<<"recorded_seq">>,1372404},
>>>>>>>>                                {<<"missing_checked">>,11355},
>>>>>>>>                                {<<"missing_found">>,11355},
>>>>>>>>                                {<<"docs_read">>,11355},
>>>>>>>>                                {<<"docs_written">>,11355},
>>>>>>>>                                {<<"doc_write_failures">>,0}]},
>>>>>>>> [...lots of these...]
>>>>>>>> 
>>>>>>>>                          [],false,[]},
>>>>>>>>                      #Ref<0.0.15.159973>}],
>>>>>>>>                    false,false}
>>>>>>>> ** When Server state == {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,
>>>>>>>>                     <0.290.0>,<0.286.0>,<0.367.0>,
>>>>>>>>                     {db_header,6,992456,0,
>>>>>>>>                         {943280145,{744250,975,647546641},60017672},
>>>>>>>>                         {943282327,745225,42485979},
>>>>>>>>                         {943267963,[],5753},
>>>>>>>>                         0,nil,nil,1000},
>>>>>>>>                     992456,
>>>>>>>>                     {btree,<0.286.0>,
>>>>>>>>                         {943280145,{744250,975,647546641},60017672},
>>>>>>>>                         #Fun<couch_db_updater.10.57960608>,
>>>>>>>>                         #Fun<couch_db_updater.11.57960608>,
>>>>>>>>                         #Fun<couch_btree.5.133731799>,
>>>>>>>>                         #Fun<couch_db_updater.12.57960608>,snappy},
>>>>>>>>                     {btree,<0.286.0>,
>>>>>>>>                         {943282327,745225,42485979},
>>>>>>>>                         #Fun<couch_db_updater.13.57960608>,
>>>>>>>>                         #Fun<couch_db_updater.14.57960608>,
>>>>>>>>                         #Fun<couch_btree.5.133731799>,
>>>>>>>>                         #Fun<couch_db_updater.15.57960608>,snappy},
>>>>>>>>                     {btree,<0.286.0>,
>>>>>>>>                         {943267963,[],5753},
>>>>>>>>                         #Fun<couch_btree.3.133731799>,
>>>>>>>>                         #Fun<couch_btree.4.133731799>,
>>>>>>>>                         #Fun<couch_btree.5.133731799>,nil,snappy},
>>>>>>>>                     992456,<<"cbstats">>,
>>>>>>>>                     "/Volumes/terror/db/couchdb/cbstats.couch",[],[],
>>>>>>>>                     nil,
>>>>>>>>                     {user_ctx,null,[],undefined},
>>>>>>>>                     nil,1000,
>>>>>>>>                     [before_header,after_header,on_file_open],
>>>>>>>>                     [{user_ctx,
>>>>>>>>                          {user_ctx,null,[<<"_admin">>],undefined}}],
>>>>>>>>                     snappy,nil,nil}
>>>>>>>> ** Reason for termination ==
>>>>>>>> ** {timeout,
>>>>>>>> {gen_server,call,
>>>>>>>>    [<0.288.0>,
>>>>>>>>     {db_updated,
>>>>>>>>         {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,<0.290.0>,
>>>>>>>>             <0.286.0>,<0.367.0>,
>>>>>>>>             {db_header,6,992456,0,
>>>>>>>>                 {943280145,{744250,975,647546641},60017672},
>>>>>>>>                 {943282327,745225,42485979},
>>>>>>>>                 {943267963,[],5753},
>>>>>>>>                 0,nil,nil,1000},
>>>>>>>>             992456,
>>>>>>>>             {btree,<0.286.0>,
>>>>>>>>                 {943280145,{744250,975,647546641},60017672},
>>>>>>>>                 #Fun<couch_db_updater.10.57960608>,
>>>>>>>>                 #Fun<couch_db_updater.11.57960608>,
>>>>>>>>                 #Fun<couch_btree.5.133731799>,
>>>>>>>>                 #Fun<couch_db_updater.12.57960608>,snappy},
>>>>>>>>             {btree,<0.286.0>,
>>>>>>>>                 {943282327,745225,42485979},
>>>>>>>>                 #Fun<couch_db_updater.13.57960608>,
>>>>>>>>                 #Fun<couch_db_updater.14.57960608>,
>>>>>>>>                 #Fun<couch_btree.5.133731799>,
>>>>>>>>                 #Fun<couch_db_updater.15.57960608>,snappy},
>>>>>>>>             {btree,<0.286.0>,
>>>>>>>>                 {943284347,[],5756},
>>>>>>>>                 #Fun<couch_btree.3.133731799>,
>>>>>>>>                 #Fun<couch_btree.4.133731799>,
>>>>>>>>                 #Fun<couch_btree.5.133731799>,nil,snappy},
>>>>>>>>             992456,<<"cbstats">>,
>>>>>>>>             "/Volumes/terror/db/couchdb/cbstats.couch",[],[],nil,
>>>>>>>>             {user_ctx,null,[],undefined},
>>>>>>>>             #Ref<0.0.15.160107>,1000,
>>>>>>>>             [before_header,after_header,on_file_open],
>>>>>>>>             [{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}],
>>>>>>>>             snappy,nil,nil}}]}}
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> dustin sallings
>>>>>>> 
>>>>>>> --
>>>>>>> dustin sallings
>> 


Re: replication problems

Posted by Robert Newson <ro...@gmail.com>.
I figured we'd set the same replicator uuid for all nodes of a
cluster, same as for cookie auth. Removing host:port and the
inet:gethostname bits will be nice. We'll still find gen 2
checkpoints, replacing them with gen 3. Of course, it's possible all
this relaxing by the pool stuff has damaged my brain.

Sent from the ocean floor

On 11 Oct 2012, at 14:59, Bob Dionne <di...@dionne-associates.com> wrote:

> Incorporating a unique id from the source and target seems like a good way to go but I'm wondering if an id from an ini file will
> work in the clustered BigCouch case. Would an API level request work better? Something the replicator would interrogate
> for both the source and the target.
>
>
> On Oct 11, 2012, at 5:42 AM, Robert Newson <ro...@gmail.com> wrote:
>
>> I'll note here that the attached patch is wrong. It uses a single uuid
>> from the node running replication, which might not be the source or
>> target. Instead, the uuid of source and target must be retrieved and
>> used instead of the host:port. Jason's suggestion to add the uuid
>> (stored in the ini file) to the welcome message sounds really good to
>> me.
>>
>> Can't attach this to the ticket today as I don't have my Jira creds.
>>
>> Sent from the ocean floor
>>
>> On 10 Oct 2012, at 21:40, Jan Lehnardt <ja...@apache.org> wrote:
>>
>>> flagged.
>>>
>>> On Oct 10, 2012, at 22:34 , Robert Newson <ro...@gmail.com> wrote:
>>>
>>>> Jan,
>>>>
>>>> Flag that as fix-for 1.3? I don't have my creds on my phone to do it.
>>>>
>>>> I like the ini uuid idea best, modelled after the cookie with secret.
>>>> If we have the uuid, we'd omit host name as well as port, right?
>>>>
>>>> Sent from the ocean floor
>>>>
>>>> On 10 Oct 2012, at 21:12, Jan Lehnardt <ja...@apache.org> wrote:
>>>>
>>>>> Filipe tells me this is https://issues.apache.org/jira/browse/COUCHDB-1259
>>>>>
>>>>> Cheers
>>>>> Jan
>>>>> --
>>>>>
>>>>> On Oct 4, 2012, at 02:28 , Dustin Sallings <du...@spy.net> wrote:
>>>>>
>>>>>>
>>>>>> I'm bringing this back up as requested.  I'm currently simultaneously in the "not replicating interesting things" and "has duplicate replicates state".  I think the stuff below shows the "not replicating" stuff.
>>>>>>
>>>>>> Active tasks shows the other (these are based on replicator DB documents (example below):
>>>>>>
>>>>>> [
>>>>>> {
>>>>>>   "checkpointed_source_seq": 2022317,
>>>>>>   "continuous": true,
>>>>>>   "doc_id": "cbstats-from-dogbowl",
>>>>>>   "doc_write_failures": 0,
>>>>>>   "docs_read": 300,
>>>>>>   "docs_written": 300,
>>>>>>   "missing_revisions_found": 300,
>>>>>>   "pid": "<0.10466.12>",
>>>>>>   "progress": 100,
>>>>>>   "replication_id": "50daecd0a29f4b7e5d102990831f3d64+continuous",
>>>>>>   "revisions_checked": 304,
>>>>>>   "source": "http://dustin:*****@single.couchbase.net/cbstats/",
>>>>>>   "source_seq": 2022317,
>>>>>>   "started_on": 1349309457,
>>>>>>   "target": "cbstats",
>>>>>>   "type": "replication",
>>>>>>   "updated_on": 1349310442
>>>>>> },
>>>>>> {
>>>>>>   "checkpointed_source_seq": 2022317,
>>>>>>   "continuous": true,
>>>>>>   "doc_id": "cbstats-from-dogbowl",
>>>>>>   "doc_write_failures": 0,
>>>>>>   "docs_read": 62,
>>>>>>   "docs_written": 62,
>>>>>>   "missing_revisions_found": 62,
>>>>>>   "pid": "<0.11019.12>",
>>>>>>   "progress": 100,
>>>>>>   "replication_id": "411e341d5aa9a3fe636cf4ea8ba71720+continuous",
>>>>>>   "revisions_checked": 304,
>>>>>>   "source": "http://dustin:*****@single.couchbase.net/cbstats/",
>>>>>>   "source_seq": 2022317,
>>>>>>   "started_on": 1349309471,
>>>>>>   "target": "cbstats",
>>>>>>   "type": "replication",
>>>>>>   "updated_on": 1349310443
>>>>>> },
>>>>>> {
>>>>>>   "checkpointed_source_seq": 107068,
>>>>>>   "continuous": true,
>>>>>>   "doc_id": "gerrit-from-prod",
>>>>>>   "doc_write_failures": 0,
>>>>>>   "docs_read": 22,
>>>>>>   "docs_written": 22,
>>>>>>   "missing_revisions_found": 22,
>>>>>>   "pid": "<0.11086.12>",
>>>>>>   "progress": 100,
>>>>>>   "replication_id": "4a21031dac0d81637a23c32bad620be9+continuous",
>>>>>>   "revisions_checked": 26,
>>>>>>   "source": "http://dustinphoto.iriscouch.com/gerrit/",
>>>>>>   "source_seq": 107068,
>>>>>>   "started_on": 1349309487,
>>>>>>   "target": "gerrit",
>>>>>>   "type": "replication",
>>>>>>   "updated_on": 1349310445
>>>>>> },
>>>>>> {
>>>>>>   "checkpointed_source_seq": 107068,
>>>>>>   "continuous": true,
>>>>>>   "doc_id": "gerrit-from-prod",
>>>>>>   "doc_write_failures": 0,
>>>>>>   "docs_read": 17,
>>>>>>   "docs_written": 17,
>>>>>>   "missing_revisions_found": 17,
>>>>>>   "pid": "<0.11107.12>",
>>>>>>   "progress": 100,
>>>>>>   "replication_id": "b4ad5d3f2e5b78670e4c8364b18000e9+continuous",
>>>>>>   "revisions_checked": 26,
>>>>>>   "source": "http://dustinphoto.iriscouch.com/gerrit/",
>>>>>>   "source_seq": 107068,
>>>>>>   "started_on": 1349309488,
>>>>>>   "target": "gerrit",
>>>>>>   "type": "replication",
>>>>>>   "updated_on": 1349310445
>>>>>> }
>>>>>> ]
>>>>>>
>>>>>>
>>>>>> The replicator document for the latter, for example is this:
>>>>>>
>>>>>> {
>>>>>> "_id": "gerrit-from-prod",
>>>>>> "_rev": "2235-36de10fb757581a1782dacbb26ee4809",
>>>>>> "source": "http://dustinphoto.iriscouch.com/gerrit",
>>>>>> "target": "gerrit",
>>>>>> "continuous": true,
>>>>>> "user_ctx": {
>>>>>>  "roles": [
>>>>>>      "_admin"
>>>>>>  ]
>>>>>> },
>>>>>> "_replication_state_time": "2012-10-03T17:11:27-07:00",
>>>>>> "_replication_id": "b4ad5d3f2e5b78670e4c8364b18000e9",
>>>>>> "_replication_state": "triggered"
>>>>>> }
>>>>>>
>>>>>>
>>>>>> Begin forwarded message:
>>>>>>
>>>>>>> From: Dustin Sallings <du...@spy.net>
>>>>>>> Subject: Re: replication problems
>>>>>>> Date: June 15, 2012 0:10:04 PDT
>>>>>>> To: dev@couchdb.apache.org
>>>>>>> Reply-To: dev@couchdb.apache.org
>>>>>>>
>>>>>>>
>>>>>>> On Jun 14, 2012, at 11:28 PM, Benoit Chesneau wrote:
>>>>>>>
>>>>>>>> Ar you using _replicate or _replicator ? Anything interresting in logs?
>>>>>>>
>>>>>>>
>>>>>>> I'm using _replicator (wonderful feature, I just kill the DB and everything goes back the way I want it).
>>>>>>>
>>>>>>> Hmm...  I do think I found some stuff digging through the logs.  This is the local DB I noticed not doing its thing, although there were tons of errors all around this.  Looks like the server got into some kind of bad state and sort of half-crashed.
>>>>>>>
>>>>>>>
>>>>>>> [Thu, 14 Jun 2012 23:20:12 GMT] [error] [<0.133.0>] Replication `ae601df0373da82d1b4a9ff741c8ba18+continuous` (`rpics` -> `rpics-processed`) failed: {{timeout,{gen_server,call,[<0.213.0>,{open_ref_count,<0.4
>>>>>>> 42.0>}]}},
>>>>>>> {gen_server,call,
>>>>>>>       [couch_server,
>>>>>>>        {open,<<"rpics">>,
>>>>>>>              [{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}]},
>>>>>>>        infinity]}}
>>>>>>> [Thu, 14 Jun 2012 23:20:25 GMT] [error] [<0.383.0>] ** Generic server <0.383.0> terminating
>>>>>>> ** Last message in was {'EXIT',<0.384.0>,
>>>>>>>                  {{timeout,
>>>>>>>                    {gen_server,call,
>>>>>>>                     [<0.213.0>,{open_ref_count,<0.442.0>}]}},
>>>>>>>                   {gen_server,call,
>>>>>>>                    [couch_server,
>>>>>>>                     {open,<<"cbstats">>,
>>>>>>>                      [{user_ctx,
>>>>>>>                        {user_ctx,null,[<<"_admin">>],undefined}},
>>>>>>>                       {user_ctx,
>>>>>>>                        {user_ctx,null,[<<"_admin">>],undefined}}]},
>>>>>>>                     infinity]}}}
>>>>>>>
>>>>>>> ** When Server state == {state,<0.272.0>,<0.384.0>,20,
>>>>>>>                   {httpdb,
>>>>>>>                    "http://dustin:LOGGED_PASSWORD@single.couchbase.net/cbstats/",
>>>>>>>                    nil,
>>>>>>>                    [{"Accept","application/json"},
>>>>>>>                     {"User-Agent","CouchDB/1.2.0"}],
>>>>>>>                    30000,
>>>>>>>                    [{socket_options,
>>>>>>>                      [{keepalive,true},{nodelay,false}]}],
>>>>>>>                    10,250,<0.273.0>,20},
>>>>>>>                   {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,
>>>>>>>                    <0.290.0>,<0.286.0>,<0.367.0>,
>>>>>>>                    {db_header,6,984356,0,
>>>>>>>                     {860345646,{737369,975,640891414},59433736},
>>>>>>>                     {860348005,738344,42056446},
>>>>>>>                     {860352635,[],5737},
>>>>>>>                     0,nil,nil,1000},
>>>>>>>                    984356,
>>>>>>>                    {btree,<0.286.0>,
>>>>>>>                     {860345646,{737369,975,640891414},59433736},
>>>>>>>                     #Fun<couch_db_updater.10.57960608>,
>>>>>>>                     #Fun<couch_db_updater.11.57960608>,
>>>>>>>                     #Fun<couch_btree.5.133731799>,
>>>>>>>                     #Fun<couch_db_updater.12.57960608>,snappy},
>>>>>>>                    {btree,<0.286.0>,
>>>>>>>                     {860348005,738344,42056446},
>>>>>>>                     #Fun<couch_db_updater.13.57960608>,
>>>>>>>                     #Fun<couch_db_updater.14.57960608>,
>>>>>>>                     #Fun<couch_btree.5.133731799>,
>>>>>>>                     #Fun<couch_db_updater.15.57960608>,snappy},
>>>>>>>                    {btree,<0.286.0>,
>>>>>>>                     {860352635,[],5737},
>>>>>>>                     #Fun<couch_btree.3.133731799>,
>>>>>>>                     #Fun<couch_btree.4.133731799>,
>>>>>>>                     #Fun<couch_btree.5.133731799>,nil,snappy},
>>>>>>>                    984356,<<"cbstats">>,
>>>>>>>                    "/Volumes/terror/db/couchdb/cbstats.couch",[],[],
>>>>>>>                    nil,
>>>>>>>                    {user_ctx,null,[<<"_admin">>],undefined},
>>>>>>>                    nil,1000,
>>>>>>>                    [before_header,after_header,on_file_open],
>>>>>>>                    [{user_ctx,
>>>>>>>                      {user_ctx,null,[<<"_admin">>],undefined}}],
>>>>>>>                    snappy,nil,nil},
>>>>>>>                   [],nil,nil,nil,
>>>>>>>                   {rep_stats,0,0,0,0,0},
>>>>>>>                   nil,<0.385.0>,
>>>>>>>                   {batch,[],0}}
>>>>>>> ** Reason for termination ==
>>>>>>> ** {noproc,{gen_server,call,[<0.367.0>,{drop,<0.383.0>},infinity]}}
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Scrolling to the beginning of the errors, I find this:
>>>>>>>
>>>>>>>
>>>>>>> [Thu, 14 Jun 2012 23:15:54 GMT] [error] [<0.164.0>] Replication `543f76281e8d52d6ce5b51fddf0588e7+continuous` (`photo` -> `http://dustin:*****@dustinphoto.couchone.com/photo/`) failed: source_db_down
>>>>>>> [Thu, 14 Jun 2012 23:18:57 GMT] [info] [<0.358.0>] 127.0.0.1 - - GET /_all_dbs 200
>>>>>>> [Thu, 14 Jun 2012 23:19:52 GMT] [error] [<0.289.0>] ** Generic server <0.289.0> terminating
>>>>>>> ** Last message in was {update_docs,<0.272.0>,[],
>>>>>>>                     [{{doc,
>>>>>>>                           <<"_local/c4cc070f896d7267e52ba012856fed4b">>,
>>>>>>>                           {0,[<<"346185">>]},
>>>>>>>                           {[{<<"session_id">>,
>>>>>>>                              <<"9fb3475683d44bb1e151031dd42cc59f">>},
>>>>>>>                             {<<"source_last_seq">>,1419004},
>>>>>>>                             {<<"replication_id_version">>,2},
>>>>>>>                             {<<"history">>,
>>>>>>>                              [{[{<<"session_id">>,
>>>>>>>                                  <<"9fb3475683d44bb1e151031dd42cc59f">>},
>>>>>>>                                 {<<"start_time">>,
>>>>>>>                                  <<"Thu, 14 Jun 2012 01:35:02 GMT">>},
>>>>>>>                                 {<<"end_time">>,
>>>>>>>                                  <<"Thu, 14 Jun 2012 23:15:29 GMT">>},
>>>>>>>                                 {<<"start_last_seq">>,1410146},
>>>>>>>                                 {<<"end_last_seq">>,1419004},
>>>>>>>                                 {<<"recorded_seq">>,1419004},
>>>>>>>                                 {<<"missing_checked">>,8100},
>>>>>>>                                 {<<"missing_found">>,8100},
>>>>>>>                                 {<<"docs_read">>,8100},
>>>>>>>                                 {<<"docs_written">>,8100},
>>>>>>>                                 {<<"doc_write_failures">>,0}]},
>>>>>>>                               {[{<<"session_id">>,
>>>>>>>                                  <<"3edd7c50327eab7ec0768451e34efa8b">>},
>>>>>>>                                 {<<"start_time">>,
>>>>>>>                                  <<"Tue, 12 Jun 2012 05:51:17 GMT">>},
>>>>>>>                                 {<<"end_time">>,
>>>>>>>                                  <<"Tue, 12 Jun 2012 13:02:37 GMT">>},
>>>>>>>                                 {<<"start_last_seq">>,1407186},
>>>>>>>                                 {<<"end_last_seq">>,1410146},
>>>>>>>                                 {<<"recorded_seq">>,1410146},
>>>>>>>                                 {<<"missing_checked">>,2583},
>>>>>>>                                 {<<"missing_found">>,2577},
>>>>>>>                                 {<<"docs_read">>,2577},
>>>>>>>                                 {<<"docs_written">>,2577},
>>>>>>>                                 {<<"doc_write_failures">>,0}]},
>>>>>>>                               {[{<<"session_id">>,
>>>>>>>                                  <<"172de62044281a01b1584a9d099f42af">>},
>>>>>>>                                 {<<"start_time">>,
>>>>>>>                                  <<"Mon, 11 Jun 2012 03:40:11 GMT">>},
>>>>>>>                                 {<<"end_time">>,
>>>>>>>                                  <<"Mon, 11 Jun 2012 15:16:24 GMT">>},
>>>>>>>                                 {<<"start_last_seq">>,1405428},
>>>>>>>                                 {<<"end_last_seq">>,1407186},
>>>>>>>                                 {<<"recorded_seq">>,1407186},
>>>>>>>                                 {<<"missing_checked">>,1721},
>>>>>>>                                 {<<"missing_found">>,1721},
>>>>>>>                                 {<<"docs_read">>,1721},
>>>>>>>                                 {<<"docs_written">>,1721},
>>>>>>>                                 {<<"doc_write_failures">>,0}]},
>>>>>>>                               {[{<<"session_id">>,
>>>>>>>                                  <<"e60a126a2036c5fab00a1249101820c8">>},
>>>>>>>                                 {<<"start_time">>,
>>>>>>>                                  <<"Sat, 09 Jun 2012 07:47:22 GMT">>},
>>>>>>>                                 {<<"end_time">>,
>>>>>>>                                  <<"Sun, 10 Jun 2012 21:16:20 GMT">>},
>>>>>>>                                 {<<"start_last_seq">>,1386289},
>>>>>>>                                 {<<"end_last_seq">>,1405428},
>>>>>>>                                 {<<"recorded_seq">>,1405428},
>>>>>>>                                 {<<"missing_checked">>,16977},
>>>>>>>                                 {<<"missing_found">>,16977},
>>>>>>>                                 {<<"docs_read">>,16977},
>>>>>>>                                 {<<"docs_written">>,16977},
>>>>>>>                                 {<<"doc_write_failures">>,0}]},
>>>>>>>                               {[{<<"session_id">>,
>>>>>>>                                  <<"ef3e4333d340dcf73ddfa3fe8c720042">>},
>>>>>>>                                 {<<"start_time">>,
>>>>>>>                                  <<"Mon, 04 Jun 2012 02:39:44 GMT">>},
>>>>>>>                                 {<<"end_time">>,
>>>>>>>                                  <<"Mon, 04 Jun 2012 12:35:50 GMT">>},
>>>>>>>                                 {<<"start_last_seq">>,1384738},
>>>>>>>                                 {<<"end_last_seq">>,1386289},
>>>>>>>                                 {<<"recorded_seq">>,1386289},
>>>>>>>                                 {<<"missing_checked">>,1551},
>>>>>>>                                 {<<"missing_found">>,1550},
>>>>>>>                                 {<<"docs_read">>,1550},
>>>>>>>                                 {<<"docs_written">>,1550},
>>>>>>>                                 {<<"doc_write_failures">>,0}]},
>>>>>>>                               {[{<<"session_id">>,
>>>>>>>                                  <<"d5123a3caf462794aaf5a47be1bb3b6e">>},
>>>>>>>                                 {<<"start_time">>,
>>>>>>>                                  <<"Wed, 30 May 2012 20:41:43 GMT">>},
>>>>>>>                                 {<<"end_time">>,
>>>>>>>                                  <<"Mon, 04 Jun 2012 02:37:33 GMT">>},
>>>>>>>                                 {<<"start_last_seq">>,1372404},
>>>>>>>                                 {<<"end_last_seq">>,1384738},
>>>>>>>                                 {<<"recorded_seq">>,1384738},
>>>>>>>                                 {<<"missing_checked">>,12334},
>>>>>>>                                 {<<"missing_found">>,12333},
>>>>>>>                                 {<<"docs_read">>,12333},
>>>>>>>                                 {<<"docs_written">>,12333},
>>>>>>>                                 {<<"doc_write_failures">>,0}]},
>>>>>>>                               {[{<<"session_id">>,
>>>>>>>                                  <<"52a16e8832f70dc094f6fff5e9b7d75b">>},
>>>>>>>                                 {<<"start_time">>,
>>>>>>>                                  <<"Sun, 27 May 2012 23:36:41 GMT">>},
>>>>>>>                                 {<<"end_time">>,
>>>>>>>                                  <<"Wed, 30 May 2012 20:40:14 GMT">>},
>>>>>>>                                 {<<"start_last_seq">>,1361049},
>>>>>>>                                 {<<"end_last_seq">>,1372404},
>>>>>>>                                 {<<"recorded_seq">>,1372404},
>>>>>>>                                 {<<"missing_checked">>,11355},
>>>>>>>                                 {<<"missing_found">>,11355},
>>>>>>>                                 {<<"docs_read">>,11355},
>>>>>>>                                 {<<"docs_written">>,11355},
>>>>>>>                                 {<<"doc_write_failures">>,0}]},
>>>>>>> [...lots of these...]
>>>>>>>
>>>>>>>                           [],false,[]},
>>>>>>>                       #Ref<0.0.15.159973>}],
>>>>>>>                     false,false}
>>>>>>> ** When Server state == {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,
>>>>>>>                      <0.290.0>,<0.286.0>,<0.367.0>,
>>>>>>>                      {db_header,6,992456,0,
>>>>>>>                          {943280145,{744250,975,647546641},60017672},
>>>>>>>                          {943282327,745225,42485979},
>>>>>>>                          {943267963,[],5753},
>>>>>>>                          0,nil,nil,1000},
>>>>>>>                      992456,
>>>>>>>                      {btree,<0.286.0>,
>>>>>>>                          {943280145,{744250,975,647546641},60017672},
>>>>>>>                          #Fun<couch_db_updater.10.57960608>,
>>>>>>>                          #Fun<couch_db_updater.11.57960608>,
>>>>>>>                          #Fun<couch_btree.5.133731799>,
>>>>>>>                          #Fun<couch_db_updater.12.57960608>,snappy},
>>>>>>>                      {btree,<0.286.0>,
>>>>>>>                          {943282327,745225,42485979},
>>>>>>>                          #Fun<couch_db_updater.13.57960608>,
>>>>>>>                          #Fun<couch_db_updater.14.57960608>,
>>>>>>>                          #Fun<couch_btree.5.133731799>,
>>>>>>>                          #Fun<couch_db_updater.15.57960608>,snappy},
>>>>>>>                      {btree,<0.286.0>,
>>>>>>>                          {943267963,[],5753},
>>>>>>>                          #Fun<couch_btree.3.133731799>,
>>>>>>>                          #Fun<couch_btree.4.133731799>,
>>>>>>>                          #Fun<couch_btree.5.133731799>,nil,snappy},
>>>>>>>                      992456,<<"cbstats">>,
>>>>>>>                      "/Volumes/terror/db/couchdb/cbstats.couch",[],[],
>>>>>>>                      nil,
>>>>>>>                      {user_ctx,null,[],undefined},
>>>>>>>                      nil,1000,
>>>>>>>                      [before_header,after_header,on_file_open],
>>>>>>>                      [{user_ctx,
>>>>>>>                           {user_ctx,null,[<<"_admin">>],undefined}}],
>>>>>>>                      snappy,nil,nil}
>>>>>>> ** Reason for termination ==
>>>>>>> ** {timeout,
>>>>>>> {gen_server,call,
>>>>>>>     [<0.288.0>,
>>>>>>>      {db_updated,
>>>>>>>          {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,<0.290.0>,
>>>>>>>              <0.286.0>,<0.367.0>,
>>>>>>>              {db_header,6,992456,0,
>>>>>>>                  {943280145,{744250,975,647546641},60017672},
>>>>>>>                  {943282327,745225,42485979},
>>>>>>>                  {943267963,[],5753},
>>>>>>>                  0,nil,nil,1000},
>>>>>>>              992456,
>>>>>>>              {btree,<0.286.0>,
>>>>>>>                  {943280145,{744250,975,647546641},60017672},
>>>>>>>                  #Fun<couch_db_updater.10.57960608>,
>>>>>>>                  #Fun<couch_db_updater.11.57960608>,
>>>>>>>                  #Fun<couch_btree.5.133731799>,
>>>>>>>                  #Fun<couch_db_updater.12.57960608>,snappy},
>>>>>>>              {btree,<0.286.0>,
>>>>>>>                  {943282327,745225,42485979},
>>>>>>>                  #Fun<couch_db_updater.13.57960608>,
>>>>>>>                  #Fun<couch_db_updater.14.57960608>,
>>>>>>>                  #Fun<couch_btree.5.133731799>,
>>>>>>>                  #Fun<couch_db_updater.15.57960608>,snappy},
>>>>>>>              {btree,<0.286.0>,
>>>>>>>                  {943284347,[],5756},
>>>>>>>                  #Fun<couch_btree.3.133731799>,
>>>>>>>                  #Fun<couch_btree.4.133731799>,
>>>>>>>                  #Fun<couch_btree.5.133731799>,nil,snappy},
>>>>>>>              992456,<<"cbstats">>,
>>>>>>>              "/Volumes/terror/db/couchdb/cbstats.couch",[],[],nil,
>>>>>>>              {user_ctx,null,[],undefined},
>>>>>>>              #Ref<0.0.15.160107>,1000,
>>>>>>>              [before_header,after_header,on_file_open],
>>>>>>>              [{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}],
>>>>>>>              snappy,nil,nil}}]}}
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> dustin sallings
>>>>>>
>>>>>> --
>>>>>> dustin sallings
>

Re: replication problems

Posted by Robert Newson <ro...@gmail.com>.
Allowing it to be specified directly, falling back to gen 3, 2, 1
calculation seems ok, but getting baroque. Davisp had a suggestion for
a layered replicator where each optimisation was a separate feature
subject to negotiation. The ability to checkpoint and methods to find
recorded checkpoints are candidate features.

Stabilising replication id with a per-server uuid seems a modest step.
It helps two cases, the dynamic port one and the cluster one (where
any node can be substituted for another). A tidier solution in the
latter case than the one in bigcouch today. So this tidies something
just in time for the merge.

Sent from the ocean floor

On 11 Oct 2012, at 14:59, Bob Dionne <di...@dionne-associates.com> wrote:

> Incorporating a unique id from the source and target seems like a good way to go but I'm wondering if an id from an ini file will
> work in the clustered BigCouch case. Would an API level request work better? Something the replicator would interrogate
> for both the source and the target.
>
>
> On Oct 11, 2012, at 5:42 AM, Robert Newson <ro...@gmail.com> wrote:
>
>> I'll note here that the attached patch is wrong. It uses a single uuid
>> from the node running replication, which might not be the source or
>> target. Instead, the uuid of source and target must be retrieved and
>> used instead of the host:port. Jason's suggestion to add the uuid
>> (stored in the ini file) to the welcome message sounds really good to
>> me.
>>
>> Can't attach this to the ticket today as I don't have my Jira creds.
>>
>> Sent from the ocean floor
>>
>> On 10 Oct 2012, at 21:40, Jan Lehnardt <ja...@apache.org> wrote:
>>
>>> flagged.
>>>
>>> On Oct 10, 2012, at 22:34 , Robert Newson <ro...@gmail.com> wrote:
>>>
>>>> Jan,
>>>>
>>>> Flag that as fix-for 1.3? I don't have my creds on my phone to do it.
>>>>
>>>> I like the ini uuid idea best, modelled after the cookie with secret.
>>>> If we have the uuid, we'd omit host name as well as port, right?
>>>>
>>>> Sent from the ocean floor
>>>>
>>>> On 10 Oct 2012, at 21:12, Jan Lehnardt <ja...@apache.org> wrote:
>>>>
>>>>> Filipe tells me this is https://issues.apache.org/jira/browse/COUCHDB-1259
>>>>>
>>>>> Cheers
>>>>> Jan
>>>>> --
>>>>>
>>>>> On Oct 4, 2012, at 02:28 , Dustin Sallings <du...@spy.net> wrote:
>>>>>
>>>>>>
>>>>>> I'm bringing this back up as requested.  I'm currently simultaneously in the "not replicating interesting things" and "has duplicate replicates state".  I think the stuff below shows the "not replicating" stuff.
>>>>>>
>>>>>> Active tasks shows the other (these are based on replicator DB documents (example below):
>>>>>>
>>>>>> [
>>>>>> {
>>>>>>   "checkpointed_source_seq": 2022317,
>>>>>>   "continuous": true,
>>>>>>   "doc_id": "cbstats-from-dogbowl",
>>>>>>   "doc_write_failures": 0,
>>>>>>   "docs_read": 300,
>>>>>>   "docs_written": 300,
>>>>>>   "missing_revisions_found": 300,
>>>>>>   "pid": "<0.10466.12>",
>>>>>>   "progress": 100,
>>>>>>   "replication_id": "50daecd0a29f4b7e5d102990831f3d64+continuous",
>>>>>>   "revisions_checked": 304,
>>>>>>   "source": "http://dustin:*****@single.couchbase.net/cbstats/",
>>>>>>   "source_seq": 2022317,
>>>>>>   "started_on": 1349309457,
>>>>>>   "target": "cbstats",
>>>>>>   "type": "replication",
>>>>>>   "updated_on": 1349310442
>>>>>> },
>>>>>> {
>>>>>>   "checkpointed_source_seq": 2022317,
>>>>>>   "continuous": true,
>>>>>>   "doc_id": "cbstats-from-dogbowl",
>>>>>>   "doc_write_failures": 0,
>>>>>>   "docs_read": 62,
>>>>>>   "docs_written": 62,
>>>>>>   "missing_revisions_found": 62,
>>>>>>   "pid": "<0.11019.12>",
>>>>>>   "progress": 100,
>>>>>>   "replication_id": "411e341d5aa9a3fe636cf4ea8ba71720+continuous",
>>>>>>   "revisions_checked": 304,
>>>>>>   "source": "http://dustin:*****@single.couchbase.net/cbstats/",
>>>>>>   "source_seq": 2022317,
>>>>>>   "started_on": 1349309471,
>>>>>>   "target": "cbstats",
>>>>>>   "type": "replication",
>>>>>>   "updated_on": 1349310443
>>>>>> },
>>>>>> {
>>>>>>   "checkpointed_source_seq": 107068,
>>>>>>   "continuous": true,
>>>>>>   "doc_id": "gerrit-from-prod",
>>>>>>   "doc_write_failures": 0,
>>>>>>   "docs_read": 22,
>>>>>>   "docs_written": 22,
>>>>>>   "missing_revisions_found": 22,
>>>>>>   "pid": "<0.11086.12>",
>>>>>>   "progress": 100,
>>>>>>   "replication_id": "4a21031dac0d81637a23c32bad620be9+continuous",
>>>>>>   "revisions_checked": 26,
>>>>>>   "source": "http://dustinphoto.iriscouch.com/gerrit/",
>>>>>>   "source_seq": 107068,
>>>>>>   "started_on": 1349309487,
>>>>>>   "target": "gerrit",
>>>>>>   "type": "replication",
>>>>>>   "updated_on": 1349310445
>>>>>> },
>>>>>> {
>>>>>>   "checkpointed_source_seq": 107068,
>>>>>>   "continuous": true,
>>>>>>   "doc_id": "gerrit-from-prod",
>>>>>>   "doc_write_failures": 0,
>>>>>>   "docs_read": 17,
>>>>>>   "docs_written": 17,
>>>>>>   "missing_revisions_found": 17,
>>>>>>   "pid": "<0.11107.12>",
>>>>>>   "progress": 100,
>>>>>>   "replication_id": "b4ad5d3f2e5b78670e4c8364b18000e9+continuous",
>>>>>>   "revisions_checked": 26,
>>>>>>   "source": "http://dustinphoto.iriscouch.com/gerrit/",
>>>>>>   "source_seq": 107068,
>>>>>>   "started_on": 1349309488,
>>>>>>   "target": "gerrit",
>>>>>>   "type": "replication",
>>>>>>   "updated_on": 1349310445
>>>>>> }
>>>>>> ]
>>>>>>
>>>>>>
>>>>>> The replicator document for the latter, for example is this:
>>>>>>
>>>>>> {
>>>>>> "_id": "gerrit-from-prod",
>>>>>> "_rev": "2235-36de10fb757581a1782dacbb26ee4809",
>>>>>> "source": "http://dustinphoto.iriscouch.com/gerrit",
>>>>>> "target": "gerrit",
>>>>>> "continuous": true,
>>>>>> "user_ctx": {
>>>>>>  "roles": [
>>>>>>      "_admin"
>>>>>>  ]
>>>>>> },
>>>>>> "_replication_state_time": "2012-10-03T17:11:27-07:00",
>>>>>> "_replication_id": "b4ad5d3f2e5b78670e4c8364b18000e9",
>>>>>> "_replication_state": "triggered"
>>>>>> }
>>>>>>
>>>>>>
>>>>>> Begin forwarded message:
>>>>>>
>>>>>>> From: Dustin Sallings <du...@spy.net>
>>>>>>> Subject: Re: replication problems
>>>>>>> Date: June 15, 2012 0:10:04 PDT
>>>>>>> To: dev@couchdb.apache.org
>>>>>>> Reply-To: dev@couchdb.apache.org
>>>>>>>
>>>>>>>
>>>>>>> On Jun 14, 2012, at 11:28 PM, Benoit Chesneau wrote:
>>>>>>>
>>>>>>>> Ar you using _replicate or _replicator ? Anything interresting in logs?
>>>>>>>
>>>>>>>
>>>>>>> I'm using _replicator (wonderful feature, I just kill the DB and everything goes back the way I want it).
>>>>>>>
>>>>>>> Hmm...  I do think I found some stuff digging through the logs.  This is the local DB I noticed not doing its thing, although there were tons of errors all around this.  Looks like the server got into some kind of bad state and sort of half-crashed.
>>>>>>>
>>>>>>>
>>>>>>> [Thu, 14 Jun 2012 23:20:12 GMT] [error] [<0.133.0>] Replication `ae601df0373da82d1b4a9ff741c8ba18+continuous` (`rpics` -> `rpics-processed`) failed: {{timeout,{gen_server,call,[<0.213.0>,{open_ref_count,<0.4
>>>>>>> 42.0>}]}},
>>>>>>> {gen_server,call,
>>>>>>>       [couch_server,
>>>>>>>        {open,<<"rpics">>,
>>>>>>>              [{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}]},
>>>>>>>        infinity]}}
>>>>>>> [Thu, 14 Jun 2012 23:20:25 GMT] [error] [<0.383.0>] ** Generic server <0.383.0> terminating
>>>>>>> ** Last message in was {'EXIT',<0.384.0>,
>>>>>>>                  {{timeout,
>>>>>>>                    {gen_server,call,
>>>>>>>                     [<0.213.0>,{open_ref_count,<0.442.0>}]}},
>>>>>>>                   {gen_server,call,
>>>>>>>                    [couch_server,
>>>>>>>                     {open,<<"cbstats">>,
>>>>>>>                      [{user_ctx,
>>>>>>>                        {user_ctx,null,[<<"_admin">>],undefined}},
>>>>>>>                       {user_ctx,
>>>>>>>                        {user_ctx,null,[<<"_admin">>],undefined}}]},
>>>>>>>                     infinity]}}}
>>>>>>>
>>>>>>> ** When Server state == {state,<0.272.0>,<0.384.0>,20,
>>>>>>>                   {httpdb,
>>>>>>>                    "http://dustin:LOGGED_PASSWORD@single.couchbase.net/cbstats/",
>>>>>>>                    nil,
>>>>>>>                    [{"Accept","application/json"},
>>>>>>>                     {"User-Agent","CouchDB/1.2.0"}],
>>>>>>>                    30000,
>>>>>>>                    [{socket_options,
>>>>>>>                      [{keepalive,true},{nodelay,false}]}],
>>>>>>>                    10,250,<0.273.0>,20},
>>>>>>>                   {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,
>>>>>>>                    <0.290.0>,<0.286.0>,<0.367.0>,
>>>>>>>                    {db_header,6,984356,0,
>>>>>>>                     {860345646,{737369,975,640891414},59433736},
>>>>>>>                     {860348005,738344,42056446},
>>>>>>>                     {860352635,[],5737},
>>>>>>>                     0,nil,nil,1000},
>>>>>>>                    984356,
>>>>>>>                    {btree,<0.286.0>,
>>>>>>>                     {860345646,{737369,975,640891414},59433736},
>>>>>>>                     #Fun<couch_db_updater.10.57960608>,
>>>>>>>                     #Fun<couch_db_updater.11.57960608>,
>>>>>>>                     #Fun<couch_btree.5.133731799>,
>>>>>>>                     #Fun<couch_db_updater.12.57960608>,snappy},
>>>>>>>                    {btree,<0.286.0>,
>>>>>>>                     {860348005,738344,42056446},
>>>>>>>                     #Fun<couch_db_updater.13.57960608>,
>>>>>>>                     #Fun<couch_db_updater.14.57960608>,
>>>>>>>                     #Fun<couch_btree.5.133731799>,
>>>>>>>                     #Fun<couch_db_updater.15.57960608>,snappy},
>>>>>>>                    {btree,<0.286.0>,
>>>>>>>                     {860352635,[],5737},
>>>>>>>                     #Fun<couch_btree.3.133731799>,
>>>>>>>                     #Fun<couch_btree.4.133731799>,
>>>>>>>                     #Fun<couch_btree.5.133731799>,nil,snappy},
>>>>>>>                    984356,<<"cbstats">>,
>>>>>>>                    "/Volumes/terror/db/couchdb/cbstats.couch",[],[],
>>>>>>>                    nil,
>>>>>>>                    {user_ctx,null,[<<"_admin">>],undefined},
>>>>>>>                    nil,1000,
>>>>>>>                    [before_header,after_header,on_file_open],
>>>>>>>                    [{user_ctx,
>>>>>>>                      {user_ctx,null,[<<"_admin">>],undefined}}],
>>>>>>>                    snappy,nil,nil},
>>>>>>>                   [],nil,nil,nil,
>>>>>>>                   {rep_stats,0,0,0,0,0},
>>>>>>>                   nil,<0.385.0>,
>>>>>>>                   {batch,[],0}}
>>>>>>> ** Reason for termination ==
>>>>>>> ** {noproc,{gen_server,call,[<0.367.0>,{drop,<0.383.0>},infinity]}}
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Scrolling to the beginning of the errors, I find this:
>>>>>>>
>>>>>>>
>>>>>>> [Thu, 14 Jun 2012 23:15:54 GMT] [error] [<0.164.0>] Replication `543f76281e8d52d6ce5b51fddf0588e7+continuous` (`photo` -> `http://dustin:*****@dustinphoto.couchone.com/photo/`) failed: source_db_down
>>>>>>> [Thu, 14 Jun 2012 23:18:57 GMT] [info] [<0.358.0>] 127.0.0.1 - - GET /_all_dbs 200
>>>>>>> [Thu, 14 Jun 2012 23:19:52 GMT] [error] [<0.289.0>] ** Generic server <0.289.0> terminating
>>>>>>> ** Last message in was {update_docs,<0.272.0>,[],
>>>>>>>                     [{{doc,
>>>>>>>                           <<"_local/c4cc070f896d7267e52ba012856fed4b">>,
>>>>>>>                           {0,[<<"346185">>]},
>>>>>>>                           {[{<<"session_id">>,
>>>>>>>                              <<"9fb3475683d44bb1e151031dd42cc59f">>},
>>>>>>>                             {<<"source_last_seq">>,1419004},
>>>>>>>                             {<<"replication_id_version">>,2},
>>>>>>>                             {<<"history">>,
>>>>>>>                              [{[{<<"session_id">>,
>>>>>>>                                  <<"9fb3475683d44bb1e151031dd42cc59f">>},
>>>>>>>                                 {<<"start_time">>,
>>>>>>>                                  <<"Thu, 14 Jun 2012 01:35:02 GMT">>},
>>>>>>>                                 {<<"end_time">>,
>>>>>>>                                  <<"Thu, 14 Jun 2012 23:15:29 GMT">>},
>>>>>>>                                 {<<"start_last_seq">>,1410146},
>>>>>>>                                 {<<"end_last_seq">>,1419004},
>>>>>>>                                 {<<"recorded_seq">>,1419004},
>>>>>>>                                 {<<"missing_checked">>,8100},
>>>>>>>                                 {<<"missing_found">>,8100},
>>>>>>>                                 {<<"docs_read">>,8100},
>>>>>>>                                 {<<"docs_written">>,8100},
>>>>>>>                                 {<<"doc_write_failures">>,0}]},
>>>>>>>                               {[{<<"session_id">>,
>>>>>>>                                  <<"3edd7c50327eab7ec0768451e34efa8b">>},
>>>>>>>                                 {<<"start_time">>,
>>>>>>>                                  <<"Tue, 12 Jun 2012 05:51:17 GMT">>},
>>>>>>>                                 {<<"end_time">>,
>>>>>>>                                  <<"Tue, 12 Jun 2012 13:02:37 GMT">>},
>>>>>>>                                 {<<"start_last_seq">>,1407186},
>>>>>>>                                 {<<"end_last_seq">>,1410146},
>>>>>>>                                 {<<"recorded_seq">>,1410146},
>>>>>>>                                 {<<"missing_checked">>,2583},
>>>>>>>                                 {<<"missing_found">>,2577},
>>>>>>>                                 {<<"docs_read">>,2577},
>>>>>>>                                 {<<"docs_written">>,2577},
>>>>>>>                                 {<<"doc_write_failures">>,0}]},
>>>>>>>                               {[{<<"session_id">>,
>>>>>>>                                  <<"172de62044281a01b1584a9d099f42af">>},
>>>>>>>                                 {<<"start_time">>,
>>>>>>>                                  <<"Mon, 11 Jun 2012 03:40:11 GMT">>},
>>>>>>>                                 {<<"end_time">>,
>>>>>>>                                  <<"Mon, 11 Jun 2012 15:16:24 GMT">>},
>>>>>>>                                 {<<"start_last_seq">>,1405428},
>>>>>>>                                 {<<"end_last_seq">>,1407186},
>>>>>>>                                 {<<"recorded_seq">>,1407186},
>>>>>>>                                 {<<"missing_checked">>,1721},
>>>>>>>                                 {<<"missing_found">>,1721},
>>>>>>>                                 {<<"docs_read">>,1721},
>>>>>>>                                 {<<"docs_written">>,1721},
>>>>>>>                                 {<<"doc_write_failures">>,0}]},
>>>>>>>                               {[{<<"session_id">>,
>>>>>>>                                  <<"e60a126a2036c5fab00a1249101820c8">>},
>>>>>>>                                 {<<"start_time">>,
>>>>>>>                                  <<"Sat, 09 Jun 2012 07:47:22 GMT">>},
>>>>>>>                                 {<<"end_time">>,
>>>>>>>                                  <<"Sun, 10 Jun 2012 21:16:20 GMT">>},
>>>>>>>                                 {<<"start_last_seq">>,1386289},
>>>>>>>                                 {<<"end_last_seq">>,1405428},
>>>>>>>                                 {<<"recorded_seq">>,1405428},
>>>>>>>                                 {<<"missing_checked">>,16977},
>>>>>>>                                 {<<"missing_found">>,16977},
>>>>>>>                                 {<<"docs_read">>,16977},
>>>>>>>                                 {<<"docs_written">>,16977},
>>>>>>>                                 {<<"doc_write_failures">>,0}]},
>>>>>>>                               {[{<<"session_id">>,
>>>>>>>                                  <<"ef3e4333d340dcf73ddfa3fe8c720042">>},
>>>>>>>                                 {<<"start_time">>,
>>>>>>>                                  <<"Mon, 04 Jun 2012 02:39:44 GMT">>},
>>>>>>>                                 {<<"end_time">>,
>>>>>>>                                  <<"Mon, 04 Jun 2012 12:35:50 GMT">>},
>>>>>>>                                 {<<"start_last_seq">>,1384738},
>>>>>>>                                 {<<"end_last_seq">>,1386289},
>>>>>>>                                 {<<"recorded_seq">>,1386289},
>>>>>>>                                 {<<"missing_checked">>,1551},
>>>>>>>                                 {<<"missing_found">>,1550},
>>>>>>>                                 {<<"docs_read">>,1550},
>>>>>>>                                 {<<"docs_written">>,1550},
>>>>>>>                                 {<<"doc_write_failures">>,0}]},
>>>>>>>                               {[{<<"session_id">>,
>>>>>>>                                  <<"d5123a3caf462794aaf5a47be1bb3b6e">>},
>>>>>>>                                 {<<"start_time">>,
>>>>>>>                                  <<"Wed, 30 May 2012 20:41:43 GMT">>},
>>>>>>>                                 {<<"end_time">>,
>>>>>>>                                  <<"Mon, 04 Jun 2012 02:37:33 GMT">>},
>>>>>>>                                 {<<"start_last_seq">>,1372404},
>>>>>>>                                 {<<"end_last_seq">>,1384738},
>>>>>>>                                 {<<"recorded_seq">>,1384738},
>>>>>>>                                 {<<"missing_checked">>,12334},
>>>>>>>                                 {<<"missing_found">>,12333},
>>>>>>>                                 {<<"docs_read">>,12333},
>>>>>>>                                 {<<"docs_written">>,12333},
>>>>>>>                                 {<<"doc_write_failures">>,0}]},
>>>>>>>                               {[{<<"session_id">>,
>>>>>>>                                  <<"52a16e8832f70dc094f6fff5e9b7d75b">>},
>>>>>>>                                 {<<"start_time">>,
>>>>>>>                                  <<"Sun, 27 May 2012 23:36:41 GMT">>},
>>>>>>>                                 {<<"end_time">>,
>>>>>>>                                  <<"Wed, 30 May 2012 20:40:14 GMT">>},
>>>>>>>                                 {<<"start_last_seq">>,1361049},
>>>>>>>                                 {<<"end_last_seq">>,1372404},
>>>>>>>                                 {<<"recorded_seq">>,1372404},
>>>>>>>                                 {<<"missing_checked">>,11355},
>>>>>>>                                 {<<"missing_found">>,11355},
>>>>>>>                                 {<<"docs_read">>,11355},
>>>>>>>                                 {<<"docs_written">>,11355},
>>>>>>>                                 {<<"doc_write_failures">>,0}]},
>>>>>>> [...lots of these...]
>>>>>>>
>>>>>>>                           [],false,[]},
>>>>>>>                       #Ref<0.0.15.159973>}],
>>>>>>>                     false,false}
>>>>>>> ** When Server state == {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,
>>>>>>>                      <0.290.0>,<0.286.0>,<0.367.0>,
>>>>>>>                      {db_header,6,992456,0,
>>>>>>>                          {943280145,{744250,975,647546641},60017672},
>>>>>>>                          {943282327,745225,42485979},
>>>>>>>                          {943267963,[],5753},
>>>>>>>                          0,nil,nil,1000},
>>>>>>>                      992456,
>>>>>>>                      {btree,<0.286.0>,
>>>>>>>                          {943280145,{744250,975,647546641},60017672},
>>>>>>>                          #Fun<couch_db_updater.10.57960608>,
>>>>>>>                          #Fun<couch_db_updater.11.57960608>,
>>>>>>>                          #Fun<couch_btree.5.133731799>,
>>>>>>>                          #Fun<couch_db_updater.12.57960608>,snappy},
>>>>>>>                      {btree,<0.286.0>,
>>>>>>>                          {943282327,745225,42485979},
>>>>>>>                          #Fun<couch_db_updater.13.57960608>,
>>>>>>>                          #Fun<couch_db_updater.14.57960608>,
>>>>>>>                          #Fun<couch_btree.5.133731799>,
>>>>>>>                          #Fun<couch_db_updater.15.57960608>,snappy},
>>>>>>>                      {btree,<0.286.0>,
>>>>>>>                          {943267963,[],5753},
>>>>>>>                          #Fun<couch_btree.3.133731799>,
>>>>>>>                          #Fun<couch_btree.4.133731799>,
>>>>>>>                          #Fun<couch_btree.5.133731799>,nil,snappy},
>>>>>>>                      992456,<<"cbstats">>,
>>>>>>>                      "/Volumes/terror/db/couchdb/cbstats.couch",[],[],
>>>>>>>                      nil,
>>>>>>>                      {user_ctx,null,[],undefined},
>>>>>>>                      nil,1000,
>>>>>>>                      [before_header,after_header,on_file_open],
>>>>>>>                      [{user_ctx,
>>>>>>>                           {user_ctx,null,[<<"_admin">>],undefined}}],
>>>>>>>                      snappy,nil,nil}
>>>>>>> ** Reason for termination ==
>>>>>>> ** {timeout,
>>>>>>> {gen_server,call,
>>>>>>>     [<0.288.0>,
>>>>>>>      {db_updated,
>>>>>>>          {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,<0.290.0>,
>>>>>>>              <0.286.0>,<0.367.0>,
>>>>>>>              {db_header,6,992456,0,
>>>>>>>                  {943280145,{744250,975,647546641},60017672},
>>>>>>>                  {943282327,745225,42485979},
>>>>>>>                  {943267963,[],5753},
>>>>>>>                  0,nil,nil,1000},
>>>>>>>              992456,
>>>>>>>              {btree,<0.286.0>,
>>>>>>>                  {943280145,{744250,975,647546641},60017672},
>>>>>>>                  #Fun<couch_db_updater.10.57960608>,
>>>>>>>                  #Fun<couch_db_updater.11.57960608>,
>>>>>>>                  #Fun<couch_btree.5.133731799>,
>>>>>>>                  #Fun<couch_db_updater.12.57960608>,snappy},
>>>>>>>              {btree,<0.286.0>,
>>>>>>>                  {943282327,745225,42485979},
>>>>>>>                  #Fun<couch_db_updater.13.57960608>,
>>>>>>>                  #Fun<couch_db_updater.14.57960608>,
>>>>>>>                  #Fun<couch_btree.5.133731799>,
>>>>>>>                  #Fun<couch_db_updater.15.57960608>,snappy},
>>>>>>>              {btree,<0.286.0>,
>>>>>>>                  {943284347,[],5756},
>>>>>>>                  #Fun<couch_btree.3.133731799>,
>>>>>>>                  #Fun<couch_btree.4.133731799>,
>>>>>>>                  #Fun<couch_btree.5.133731799>,nil,snappy},
>>>>>>>              992456,<<"cbstats">>,
>>>>>>>              "/Volumes/terror/db/couchdb/cbstats.couch",[],[],nil,
>>>>>>>              {user_ctx,null,[],undefined},
>>>>>>>              #Ref<0.0.15.160107>,1000,
>>>>>>>              [before_header,after_header,on_file_open],
>>>>>>>              [{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}],
>>>>>>>              snappy,nil,nil}}]}}
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> dustin sallings
>>>>>>
>>>>>> --
>>>>>> dustin sallings
>

Re: replication problems

Posted by Bob Dionne <di...@dionne-associates.com>.
Incorporating a unique id from the source and target seems like a good way to go but I'm wondering if an id from an ini file will
work in the clustered BigCouch case. Would an API level request work better? Something the replicator would interrogate
for both the source and the target.


On Oct 11, 2012, at 5:42 AM, Robert Newson <ro...@gmail.com> wrote:

> I'll note here that the attached patch is wrong. It uses a single uuid
> from the node running replication, which might not be the source or
> target. Instead, the uuid of source and target must be retrieved and
> used instead of the host:port. Jason's suggestion to add the uuid
> (stored in the ini file) to the welcome message sounds really good to
> me.
> 
> Can't attach this to the ticket today as I don't have my Jira creds.
> 
> Sent from the ocean floor
> 
> On 10 Oct 2012, at 21:40, Jan Lehnardt <ja...@apache.org> wrote:
> 
>> flagged.
>> 
>> On Oct 10, 2012, at 22:34 , Robert Newson <ro...@gmail.com> wrote:
>> 
>>> Jan,
>>> 
>>> Flag that as fix-for 1.3? I don't have my creds on my phone to do it.
>>> 
>>> I like the ini uuid idea best, modelled after the cookie with secret.
>>> If we have the uuid, we'd omit host name as well as port, right?
>>> 
>>> Sent from the ocean floor
>>> 
>>> On 10 Oct 2012, at 21:12, Jan Lehnardt <ja...@apache.org> wrote:
>>> 
>>>> Filipe tells me this is https://issues.apache.org/jira/browse/COUCHDB-1259
>>>> 
>>>> Cheers
>>>> Jan
>>>> --
>>>> 
>>>> On Oct 4, 2012, at 02:28 , Dustin Sallings <du...@spy.net> wrote:
>>>> 
>>>>> 
>>>>> I'm bringing this back up as requested.  I'm currently simultaneously in the "not replicating interesting things" and "has duplicate replicates state".  I think the stuff below shows the "not replicating" stuff.
>>>>> 
>>>>> Active tasks shows the other (these are based on replicator DB documents (example below):
>>>>> 
>>>>> [
>>>>> {
>>>>>    "checkpointed_source_seq": 2022317,
>>>>>    "continuous": true,
>>>>>    "doc_id": "cbstats-from-dogbowl",
>>>>>    "doc_write_failures": 0,
>>>>>    "docs_read": 300,
>>>>>    "docs_written": 300,
>>>>>    "missing_revisions_found": 300,
>>>>>    "pid": "<0.10466.12>",
>>>>>    "progress": 100,
>>>>>    "replication_id": "50daecd0a29f4b7e5d102990831f3d64+continuous",
>>>>>    "revisions_checked": 304,
>>>>>    "source": "http://dustin:*****@single.couchbase.net/cbstats/",
>>>>>    "source_seq": 2022317,
>>>>>    "started_on": 1349309457,
>>>>>    "target": "cbstats",
>>>>>    "type": "replication",
>>>>>    "updated_on": 1349310442
>>>>> },
>>>>> {
>>>>>    "checkpointed_source_seq": 2022317,
>>>>>    "continuous": true,
>>>>>    "doc_id": "cbstats-from-dogbowl",
>>>>>    "doc_write_failures": 0,
>>>>>    "docs_read": 62,
>>>>>    "docs_written": 62,
>>>>>    "missing_revisions_found": 62,
>>>>>    "pid": "<0.11019.12>",
>>>>>    "progress": 100,
>>>>>    "replication_id": "411e341d5aa9a3fe636cf4ea8ba71720+continuous",
>>>>>    "revisions_checked": 304,
>>>>>    "source": "http://dustin:*****@single.couchbase.net/cbstats/",
>>>>>    "source_seq": 2022317,
>>>>>    "started_on": 1349309471,
>>>>>    "target": "cbstats",
>>>>>    "type": "replication",
>>>>>    "updated_on": 1349310443
>>>>> },
>>>>> {
>>>>>    "checkpointed_source_seq": 107068,
>>>>>    "continuous": true,
>>>>>    "doc_id": "gerrit-from-prod",
>>>>>    "doc_write_failures": 0,
>>>>>    "docs_read": 22,
>>>>>    "docs_written": 22,
>>>>>    "missing_revisions_found": 22,
>>>>>    "pid": "<0.11086.12>",
>>>>>    "progress": 100,
>>>>>    "replication_id": "4a21031dac0d81637a23c32bad620be9+continuous",
>>>>>    "revisions_checked": 26,
>>>>>    "source": "http://dustinphoto.iriscouch.com/gerrit/",
>>>>>    "source_seq": 107068,
>>>>>    "started_on": 1349309487,
>>>>>    "target": "gerrit",
>>>>>    "type": "replication",
>>>>>    "updated_on": 1349310445
>>>>> },
>>>>> {
>>>>>    "checkpointed_source_seq": 107068,
>>>>>    "continuous": true,
>>>>>    "doc_id": "gerrit-from-prod",
>>>>>    "doc_write_failures": 0,
>>>>>    "docs_read": 17,
>>>>>    "docs_written": 17,
>>>>>    "missing_revisions_found": 17,
>>>>>    "pid": "<0.11107.12>",
>>>>>    "progress": 100,
>>>>>    "replication_id": "b4ad5d3f2e5b78670e4c8364b18000e9+continuous",
>>>>>    "revisions_checked": 26,
>>>>>    "source": "http://dustinphoto.iriscouch.com/gerrit/",
>>>>>    "source_seq": 107068,
>>>>>    "started_on": 1349309488,
>>>>>    "target": "gerrit",
>>>>>    "type": "replication",
>>>>>    "updated_on": 1349310445
>>>>> }
>>>>> ]
>>>>> 
>>>>> 
>>>>> The replicator document for the latter, for example is this:
>>>>> 
>>>>> {
>>>>> "_id": "gerrit-from-prod",
>>>>> "_rev": "2235-36de10fb757581a1782dacbb26ee4809",
>>>>> "source": "http://dustinphoto.iriscouch.com/gerrit",
>>>>> "target": "gerrit",
>>>>> "continuous": true,
>>>>> "user_ctx": {
>>>>>   "roles": [
>>>>>       "_admin"
>>>>>   ]
>>>>> },
>>>>> "_replication_state_time": "2012-10-03T17:11:27-07:00",
>>>>> "_replication_id": "b4ad5d3f2e5b78670e4c8364b18000e9",
>>>>> "_replication_state": "triggered"
>>>>> }
>>>>> 
>>>>> 
>>>>> Begin forwarded message:
>>>>> 
>>>>>> From: Dustin Sallings <du...@spy.net>
>>>>>> Subject: Re: replication problems
>>>>>> Date: June 15, 2012 0:10:04 PDT
>>>>>> To: dev@couchdb.apache.org
>>>>>> Reply-To: dev@couchdb.apache.org
>>>>>> 
>>>>>> 
>>>>>> On Jun 14, 2012, at 11:28 PM, Benoit Chesneau wrote:
>>>>>> 
>>>>>>> Ar you using _replicate or _replicator ? Anything interresting in logs?
>>>>>> 
>>>>>> 
>>>>>> I'm using _replicator (wonderful feature, I just kill the DB and everything goes back the way I want it).
>>>>>> 
>>>>>> Hmm...  I do think I found some stuff digging through the logs.  This is the local DB I noticed not doing its thing, although there were tons of errors all around this.  Looks like the server got into some kind of bad state and sort of half-crashed.
>>>>>> 
>>>>>> 
>>>>>> [Thu, 14 Jun 2012 23:20:12 GMT] [error] [<0.133.0>] Replication `ae601df0373da82d1b4a9ff741c8ba18+continuous` (`rpics` -> `rpics-processed`) failed: {{timeout,{gen_server,call,[<0.213.0>,{open_ref_count,<0.4
>>>>>> 42.0>}]}},
>>>>>> {gen_server,call,
>>>>>>        [couch_server,
>>>>>>         {open,<<"rpics">>,
>>>>>>               [{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}]},
>>>>>>         infinity]}}
>>>>>> [Thu, 14 Jun 2012 23:20:25 GMT] [error] [<0.383.0>] ** Generic server <0.383.0> terminating
>>>>>> ** Last message in was {'EXIT',<0.384.0>,
>>>>>>                   {{timeout,
>>>>>>                     {gen_server,call,
>>>>>>                      [<0.213.0>,{open_ref_count,<0.442.0>}]}},
>>>>>>                    {gen_server,call,
>>>>>>                     [couch_server,
>>>>>>                      {open,<<"cbstats">>,
>>>>>>                       [{user_ctx,
>>>>>>                         {user_ctx,null,[<<"_admin">>],undefined}},
>>>>>>                        {user_ctx,
>>>>>>                         {user_ctx,null,[<<"_admin">>],undefined}}]},
>>>>>>                      infinity]}}}
>>>>>> 
>>>>>> ** When Server state == {state,<0.272.0>,<0.384.0>,20,
>>>>>>                    {httpdb,
>>>>>>                     "http://dustin:LOGGED_PASSWORD@single.couchbase.net/cbstats/",
>>>>>>                     nil,
>>>>>>                     [{"Accept","application/json"},
>>>>>>                      {"User-Agent","CouchDB/1.2.0"}],
>>>>>>                     30000,
>>>>>>                     [{socket_options,
>>>>>>                       [{keepalive,true},{nodelay,false}]}],
>>>>>>                     10,250,<0.273.0>,20},
>>>>>>                    {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,
>>>>>>                     <0.290.0>,<0.286.0>,<0.367.0>,
>>>>>>                     {db_header,6,984356,0,
>>>>>>                      {860345646,{737369,975,640891414},59433736},
>>>>>>                      {860348005,738344,42056446},
>>>>>>                      {860352635,[],5737},
>>>>>>                      0,nil,nil,1000},
>>>>>>                     984356,
>>>>>>                     {btree,<0.286.0>,
>>>>>>                      {860345646,{737369,975,640891414},59433736},
>>>>>>                      #Fun<couch_db_updater.10.57960608>,
>>>>>>                      #Fun<couch_db_updater.11.57960608>,
>>>>>>                      #Fun<couch_btree.5.133731799>,
>>>>>>                      #Fun<couch_db_updater.12.57960608>,snappy},
>>>>>>                     {btree,<0.286.0>,
>>>>>>                      {860348005,738344,42056446},
>>>>>>                      #Fun<couch_db_updater.13.57960608>,
>>>>>>                      #Fun<couch_db_updater.14.57960608>,
>>>>>>                      #Fun<couch_btree.5.133731799>,
>>>>>>                      #Fun<couch_db_updater.15.57960608>,snappy},
>>>>>>                     {btree,<0.286.0>,
>>>>>>                      {860352635,[],5737},
>>>>>>                      #Fun<couch_btree.3.133731799>,
>>>>>>                      #Fun<couch_btree.4.133731799>,
>>>>>>                      #Fun<couch_btree.5.133731799>,nil,snappy},
>>>>>>                     984356,<<"cbstats">>,
>>>>>>                     "/Volumes/terror/db/couchdb/cbstats.couch",[],[],
>>>>>>                     nil,
>>>>>>                     {user_ctx,null,[<<"_admin">>],undefined},
>>>>>>                     nil,1000,
>>>>>>                     [before_header,after_header,on_file_open],
>>>>>>                     [{user_ctx,
>>>>>>                       {user_ctx,null,[<<"_admin">>],undefined}}],
>>>>>>                     snappy,nil,nil},
>>>>>>                    [],nil,nil,nil,
>>>>>>                    {rep_stats,0,0,0,0,0},
>>>>>>                    nil,<0.385.0>,
>>>>>>                    {batch,[],0}}
>>>>>> ** Reason for termination ==
>>>>>> ** {noproc,{gen_server,call,[<0.367.0>,{drop,<0.383.0>},infinity]}}
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Scrolling to the beginning of the errors, I find this:
>>>>>> 
>>>>>> 
>>>>>> [Thu, 14 Jun 2012 23:15:54 GMT] [error] [<0.164.0>] Replication `543f76281e8d52d6ce5b51fddf0588e7+continuous` (`photo` -> `http://dustin:*****@dustinphoto.couchone.com/photo/`) failed: source_db_down
>>>>>> [Thu, 14 Jun 2012 23:18:57 GMT] [info] [<0.358.0>] 127.0.0.1 - - GET /_all_dbs 200
>>>>>> [Thu, 14 Jun 2012 23:19:52 GMT] [error] [<0.289.0>] ** Generic server <0.289.0> terminating
>>>>>> ** Last message in was {update_docs,<0.272.0>,[],
>>>>>>                      [{{doc,
>>>>>>                            <<"_local/c4cc070f896d7267e52ba012856fed4b">>,
>>>>>>                            {0,[<<"346185">>]},
>>>>>>                            {[{<<"session_id">>,
>>>>>>                               <<"9fb3475683d44bb1e151031dd42cc59f">>},
>>>>>>                              {<<"source_last_seq">>,1419004},
>>>>>>                              {<<"replication_id_version">>,2},
>>>>>>                              {<<"history">>,
>>>>>>                               [{[{<<"session_id">>,
>>>>>>                                   <<"9fb3475683d44bb1e151031dd42cc59f">>},
>>>>>>                                  {<<"start_time">>,
>>>>>>                                   <<"Thu, 14 Jun 2012 01:35:02 GMT">>},
>>>>>>                                  {<<"end_time">>,
>>>>>>                                   <<"Thu, 14 Jun 2012 23:15:29 GMT">>},
>>>>>>                                  {<<"start_last_seq">>,1410146},
>>>>>>                                  {<<"end_last_seq">>,1419004},
>>>>>>                                  {<<"recorded_seq">>,1419004},
>>>>>>                                  {<<"missing_checked">>,8100},
>>>>>>                                  {<<"missing_found">>,8100},
>>>>>>                                  {<<"docs_read">>,8100},
>>>>>>                                  {<<"docs_written">>,8100},
>>>>>>                                  {<<"doc_write_failures">>,0}]},
>>>>>>                                {[{<<"session_id">>,
>>>>>>                                   <<"3edd7c50327eab7ec0768451e34efa8b">>},
>>>>>>                                  {<<"start_time">>,
>>>>>>                                   <<"Tue, 12 Jun 2012 05:51:17 GMT">>},
>>>>>>                                  {<<"end_time">>,
>>>>>>                                   <<"Tue, 12 Jun 2012 13:02:37 GMT">>},
>>>>>>                                  {<<"start_last_seq">>,1407186},
>>>>>>                                  {<<"end_last_seq">>,1410146},
>>>>>>                                  {<<"recorded_seq">>,1410146},
>>>>>>                                  {<<"missing_checked">>,2583},
>>>>>>                                  {<<"missing_found">>,2577},
>>>>>>                                  {<<"docs_read">>,2577},
>>>>>>                                  {<<"docs_written">>,2577},
>>>>>>                                  {<<"doc_write_failures">>,0}]},
>>>>>>                                {[{<<"session_id">>,
>>>>>>                                   <<"172de62044281a01b1584a9d099f42af">>},
>>>>>>                                  {<<"start_time">>,
>>>>>>                                   <<"Mon, 11 Jun 2012 03:40:11 GMT">>},
>>>>>>                                  {<<"end_time">>,
>>>>>>                                   <<"Mon, 11 Jun 2012 15:16:24 GMT">>},
>>>>>>                                  {<<"start_last_seq">>,1405428},
>>>>>>                                  {<<"end_last_seq">>,1407186},
>>>>>>                                  {<<"recorded_seq">>,1407186},
>>>>>>                                  {<<"missing_checked">>,1721},
>>>>>>                                  {<<"missing_found">>,1721},
>>>>>>                                  {<<"docs_read">>,1721},
>>>>>>                                  {<<"docs_written">>,1721},
>>>>>>                                  {<<"doc_write_failures">>,0}]},
>>>>>>                                {[{<<"session_id">>,
>>>>>>                                   <<"e60a126a2036c5fab00a1249101820c8">>},
>>>>>>                                  {<<"start_time">>,
>>>>>>                                   <<"Sat, 09 Jun 2012 07:47:22 GMT">>},
>>>>>>                                  {<<"end_time">>,
>>>>>>                                   <<"Sun, 10 Jun 2012 21:16:20 GMT">>},
>>>>>>                                  {<<"start_last_seq">>,1386289},
>>>>>>                                  {<<"end_last_seq">>,1405428},
>>>>>>                                  {<<"recorded_seq">>,1405428},
>>>>>>                                  {<<"missing_checked">>,16977},
>>>>>>                                  {<<"missing_found">>,16977},
>>>>>>                                  {<<"docs_read">>,16977},
>>>>>>                                  {<<"docs_written">>,16977},
>>>>>>                                  {<<"doc_write_failures">>,0}]},
>>>>>>                                {[{<<"session_id">>,
>>>>>>                                   <<"ef3e4333d340dcf73ddfa3fe8c720042">>},
>>>>>>                                  {<<"start_time">>,
>>>>>>                                   <<"Mon, 04 Jun 2012 02:39:44 GMT">>},
>>>>>>                                  {<<"end_time">>,
>>>>>>                                   <<"Mon, 04 Jun 2012 12:35:50 GMT">>},
>>>>>>                                  {<<"start_last_seq">>,1384738},
>>>>>>                                  {<<"end_last_seq">>,1386289},
>>>>>>                                  {<<"recorded_seq">>,1386289},
>>>>>>                                  {<<"missing_checked">>,1551},
>>>>>>                                  {<<"missing_found">>,1550},
>>>>>>                                  {<<"docs_read">>,1550},
>>>>>>                                  {<<"docs_written">>,1550},
>>>>>>                                  {<<"doc_write_failures">>,0}]},
>>>>>>                                {[{<<"session_id">>,
>>>>>>                                   <<"d5123a3caf462794aaf5a47be1bb3b6e">>},
>>>>>>                                  {<<"start_time">>,
>>>>>>                                   <<"Wed, 30 May 2012 20:41:43 GMT">>},
>>>>>>                                  {<<"end_time">>,
>>>>>>                                   <<"Mon, 04 Jun 2012 02:37:33 GMT">>},
>>>>>>                                  {<<"start_last_seq">>,1372404},
>>>>>>                                  {<<"end_last_seq">>,1384738},
>>>>>>                                  {<<"recorded_seq">>,1384738},
>>>>>>                                  {<<"missing_checked">>,12334},
>>>>>>                                  {<<"missing_found">>,12333},
>>>>>>                                  {<<"docs_read">>,12333},
>>>>>>                                  {<<"docs_written">>,12333},
>>>>>>                                  {<<"doc_write_failures">>,0}]},
>>>>>>                                {[{<<"session_id">>,
>>>>>>                                   <<"52a16e8832f70dc094f6fff5e9b7d75b">>},
>>>>>>                                  {<<"start_time">>,
>>>>>>                                   <<"Sun, 27 May 2012 23:36:41 GMT">>},
>>>>>>                                  {<<"end_time">>,
>>>>>>                                   <<"Wed, 30 May 2012 20:40:14 GMT">>},
>>>>>>                                  {<<"start_last_seq">>,1361049},
>>>>>>                                  {<<"end_last_seq">>,1372404},
>>>>>>                                  {<<"recorded_seq">>,1372404},
>>>>>>                                  {<<"missing_checked">>,11355},
>>>>>>                                  {<<"missing_found">>,11355},
>>>>>>                                  {<<"docs_read">>,11355},
>>>>>>                                  {<<"docs_written">>,11355},
>>>>>>                                  {<<"doc_write_failures">>,0}]},
>>>>>> [...lots of these...]
>>>>>> 
>>>>>>                            [],false,[]},
>>>>>>                        #Ref<0.0.15.159973>}],
>>>>>>                      false,false}
>>>>>> ** When Server state == {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,
>>>>>>                       <0.290.0>,<0.286.0>,<0.367.0>,
>>>>>>                       {db_header,6,992456,0,
>>>>>>                           {943280145,{744250,975,647546641},60017672},
>>>>>>                           {943282327,745225,42485979},
>>>>>>                           {943267963,[],5753},
>>>>>>                           0,nil,nil,1000},
>>>>>>                       992456,
>>>>>>                       {btree,<0.286.0>,
>>>>>>                           {943280145,{744250,975,647546641},60017672},
>>>>>>                           #Fun<couch_db_updater.10.57960608>,
>>>>>>                           #Fun<couch_db_updater.11.57960608>,
>>>>>>                           #Fun<couch_btree.5.133731799>,
>>>>>>                           #Fun<couch_db_updater.12.57960608>,snappy},
>>>>>>                       {btree,<0.286.0>,
>>>>>>                           {943282327,745225,42485979},
>>>>>>                           #Fun<couch_db_updater.13.57960608>,
>>>>>>                           #Fun<couch_db_updater.14.57960608>,
>>>>>>                           #Fun<couch_btree.5.133731799>,
>>>>>>                           #Fun<couch_db_updater.15.57960608>,snappy},
>>>>>>                       {btree,<0.286.0>,
>>>>>>                           {943267963,[],5753},
>>>>>>                           #Fun<couch_btree.3.133731799>,
>>>>>>                           #Fun<couch_btree.4.133731799>,
>>>>>>                           #Fun<couch_btree.5.133731799>,nil,snappy},
>>>>>>                       992456,<<"cbstats">>,
>>>>>>                       "/Volumes/terror/db/couchdb/cbstats.couch",[],[],
>>>>>>                       nil,
>>>>>>                       {user_ctx,null,[],undefined},
>>>>>>                       nil,1000,
>>>>>>                       [before_header,after_header,on_file_open],
>>>>>>                       [{user_ctx,
>>>>>>                            {user_ctx,null,[<<"_admin">>],undefined}}],
>>>>>>                       snappy,nil,nil}
>>>>>> ** Reason for termination ==
>>>>>> ** {timeout,
>>>>>>  {gen_server,call,
>>>>>>      [<0.288.0>,
>>>>>>       {db_updated,
>>>>>>           {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,<0.290.0>,
>>>>>>               <0.286.0>,<0.367.0>,
>>>>>>               {db_header,6,992456,0,
>>>>>>                   {943280145,{744250,975,647546641},60017672},
>>>>>>                   {943282327,745225,42485979},
>>>>>>                   {943267963,[],5753},
>>>>>>                   0,nil,nil,1000},
>>>>>>               992456,
>>>>>>               {btree,<0.286.0>,
>>>>>>                   {943280145,{744250,975,647546641},60017672},
>>>>>>                   #Fun<couch_db_updater.10.57960608>,
>>>>>>                   #Fun<couch_db_updater.11.57960608>,
>>>>>>                   #Fun<couch_btree.5.133731799>,
>>>>>>                   #Fun<couch_db_updater.12.57960608>,snappy},
>>>>>>               {btree,<0.286.0>,
>>>>>>                   {943282327,745225,42485979},
>>>>>>                   #Fun<couch_db_updater.13.57960608>,
>>>>>>                   #Fun<couch_db_updater.14.57960608>,
>>>>>>                   #Fun<couch_btree.5.133731799>,
>>>>>>                   #Fun<couch_db_updater.15.57960608>,snappy},
>>>>>>               {btree,<0.286.0>,
>>>>>>                   {943284347,[],5756},
>>>>>>                   #Fun<couch_btree.3.133731799>,
>>>>>>                   #Fun<couch_btree.4.133731799>,
>>>>>>                   #Fun<couch_btree.5.133731799>,nil,snappy},
>>>>>>               992456,<<"cbstats">>,
>>>>>>               "/Volumes/terror/db/couchdb/cbstats.couch",[],[],nil,
>>>>>>               {user_ctx,null,[],undefined},
>>>>>>               #Ref<0.0.15.160107>,1000,
>>>>>>               [before_header,after_header,on_file_open],
>>>>>>               [{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}],
>>>>>>               snappy,nil,nil}}]}}
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> dustin sallings
>>>>> 
>>>>> --
>>>>> dustin sallings
>> 


Re: replication problems

Posted by Robert Newson <ro...@gmail.com>.
I'll note here that the attached patch is wrong. It uses a single uuid
from the node running replication, which might not be the source or
target. Instead, the uuid of source and target must be retrieved and
used instead of the host:port. Jason's suggestion to add the uuid
(stored in the ini file) to the welcome message sounds really good to
me.

Can't attach this to the ticket today as I don't have my Jira creds.

Sent from the ocean floor

On 10 Oct 2012, at 21:40, Jan Lehnardt <ja...@apache.org> wrote:

> flagged.
>
> On Oct 10, 2012, at 22:34 , Robert Newson <ro...@gmail.com> wrote:
>
>> Jan,
>>
>> Flag that as fix-for 1.3? I don't have my creds on my phone to do it.
>>
>> I like the ini uuid idea best, modelled after the cookie with secret.
>> If we have the uuid, we'd omit host name as well as port, right?
>>
>> Sent from the ocean floor
>>
>> On 10 Oct 2012, at 21:12, Jan Lehnardt <ja...@apache.org> wrote:
>>
>>> Filipe tells me this is https://issues.apache.org/jira/browse/COUCHDB-1259
>>>
>>> Cheers
>>> Jan
>>> --
>>>
>>> On Oct 4, 2012, at 02:28 , Dustin Sallings <du...@spy.net> wrote:
>>>
>>>>
>>>>  I'm bringing this back up as requested.  I'm currently simultaneously in the "not replicating interesting things" and "has duplicate replicates state".  I think the stuff below shows the "not replicating" stuff.
>>>>
>>>>  Active tasks shows the other (these are based on replicator DB documents (example below):
>>>>
>>>> [
>>>> {
>>>>     "checkpointed_source_seq": 2022317,
>>>>     "continuous": true,
>>>>     "doc_id": "cbstats-from-dogbowl",
>>>>     "doc_write_failures": 0,
>>>>     "docs_read": 300,
>>>>     "docs_written": 300,
>>>>     "missing_revisions_found": 300,
>>>>     "pid": "<0.10466.12>",
>>>>     "progress": 100,
>>>>     "replication_id": "50daecd0a29f4b7e5d102990831f3d64+continuous",
>>>>     "revisions_checked": 304,
>>>>     "source": "http://dustin:*****@single.couchbase.net/cbstats/",
>>>>     "source_seq": 2022317,
>>>>     "started_on": 1349309457,
>>>>     "target": "cbstats",
>>>>     "type": "replication",
>>>>     "updated_on": 1349310442
>>>> },
>>>> {
>>>>     "checkpointed_source_seq": 2022317,
>>>>     "continuous": true,
>>>>     "doc_id": "cbstats-from-dogbowl",
>>>>     "doc_write_failures": 0,
>>>>     "docs_read": 62,
>>>>     "docs_written": 62,
>>>>     "missing_revisions_found": 62,
>>>>     "pid": "<0.11019.12>",
>>>>     "progress": 100,
>>>>     "replication_id": "411e341d5aa9a3fe636cf4ea8ba71720+continuous",
>>>>     "revisions_checked": 304,
>>>>     "source": "http://dustin:*****@single.couchbase.net/cbstats/",
>>>>     "source_seq": 2022317,
>>>>     "started_on": 1349309471,
>>>>     "target": "cbstats",
>>>>     "type": "replication",
>>>>     "updated_on": 1349310443
>>>> },
>>>> {
>>>>     "checkpointed_source_seq": 107068,
>>>>     "continuous": true,
>>>>     "doc_id": "gerrit-from-prod",
>>>>     "doc_write_failures": 0,
>>>>     "docs_read": 22,
>>>>     "docs_written": 22,
>>>>     "missing_revisions_found": 22,
>>>>     "pid": "<0.11086.12>",
>>>>     "progress": 100,
>>>>     "replication_id": "4a21031dac0d81637a23c32bad620be9+continuous",
>>>>     "revisions_checked": 26,
>>>>     "source": "http://dustinphoto.iriscouch.com/gerrit/",
>>>>     "source_seq": 107068,
>>>>     "started_on": 1349309487,
>>>>     "target": "gerrit",
>>>>     "type": "replication",
>>>>     "updated_on": 1349310445
>>>> },
>>>> {
>>>>     "checkpointed_source_seq": 107068,
>>>>     "continuous": true,
>>>>     "doc_id": "gerrit-from-prod",
>>>>     "doc_write_failures": 0,
>>>>     "docs_read": 17,
>>>>     "docs_written": 17,
>>>>     "missing_revisions_found": 17,
>>>>     "pid": "<0.11107.12>",
>>>>     "progress": 100,
>>>>     "replication_id": "b4ad5d3f2e5b78670e4c8364b18000e9+continuous",
>>>>     "revisions_checked": 26,
>>>>     "source": "http://dustinphoto.iriscouch.com/gerrit/",
>>>>     "source_seq": 107068,
>>>>     "started_on": 1349309488,
>>>>     "target": "gerrit",
>>>>     "type": "replication",
>>>>     "updated_on": 1349310445
>>>> }
>>>> ]
>>>>
>>>>
>>>>  The replicator document for the latter, for example is this:
>>>>
>>>> {
>>>> "_id": "gerrit-from-prod",
>>>> "_rev": "2235-36de10fb757581a1782dacbb26ee4809",
>>>> "source": "http://dustinphoto.iriscouch.com/gerrit",
>>>> "target": "gerrit",
>>>> "continuous": true,
>>>> "user_ctx": {
>>>>    "roles": [
>>>>        "_admin"
>>>>    ]
>>>> },
>>>> "_replication_state_time": "2012-10-03T17:11:27-07:00",
>>>> "_replication_id": "b4ad5d3f2e5b78670e4c8364b18000e9",
>>>> "_replication_state": "triggered"
>>>> }
>>>>
>>>>
>>>> Begin forwarded message:
>>>>
>>>>> From: Dustin Sallings <du...@spy.net>
>>>>> Subject: Re: replication problems
>>>>> Date: June 15, 2012 0:10:04 PDT
>>>>> To: dev@couchdb.apache.org
>>>>> Reply-To: dev@couchdb.apache.org
>>>>>
>>>>>
>>>>> On Jun 14, 2012, at 11:28 PM, Benoit Chesneau wrote:
>>>>>
>>>>>> Ar you using _replicate or _replicator ? Anything interresting in logs?
>>>>>
>>>>>
>>>>>  I'm using _replicator (wonderful feature, I just kill the DB and everything goes back the way I want it).
>>>>>
>>>>>  Hmm...  I do think I found some stuff digging through the logs.  This is the local DB I noticed not doing its thing, although there were tons of errors all around this.  Looks like the server got into some kind of bad state and sort of half-crashed.
>>>>>
>>>>>
>>>>> [Thu, 14 Jun 2012 23:20:12 GMT] [error] [<0.133.0>] Replication `ae601df0373da82d1b4a9ff741c8ba18+continuous` (`rpics` -> `rpics-processed`) failed: {{timeout,{gen_server,call,[<0.213.0>,{open_ref_count,<0.4
>>>>> 42.0>}]}},
>>>>> {gen_server,call,
>>>>>         [couch_server,
>>>>>          {open,<<"rpics">>,
>>>>>                [{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}]},
>>>>>          infinity]}}
>>>>> [Thu, 14 Jun 2012 23:20:25 GMT] [error] [<0.383.0>] ** Generic server <0.383.0> terminating
>>>>> ** Last message in was {'EXIT',<0.384.0>,
>>>>>                    {{timeout,
>>>>>                      {gen_server,call,
>>>>>                       [<0.213.0>,{open_ref_count,<0.442.0>}]}},
>>>>>                     {gen_server,call,
>>>>>                      [couch_server,
>>>>>                       {open,<<"cbstats">>,
>>>>>                        [{user_ctx,
>>>>>                          {user_ctx,null,[<<"_admin">>],undefined}},
>>>>>                         {user_ctx,
>>>>>                          {user_ctx,null,[<<"_admin">>],undefined}}]},
>>>>>                       infinity]}}}
>>>>>
>>>>> ** When Server state == {state,<0.272.0>,<0.384.0>,20,
>>>>>                     {httpdb,
>>>>>                      "http://dustin:LOGGED_PASSWORD@single.couchbase.net/cbstats/",
>>>>>                      nil,
>>>>>                      [{"Accept","application/json"},
>>>>>                       {"User-Agent","CouchDB/1.2.0"}],
>>>>>                      30000,
>>>>>                      [{socket_options,
>>>>>                        [{keepalive,true},{nodelay,false}]}],
>>>>>                      10,250,<0.273.0>,20},
>>>>>                     {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,
>>>>>                      <0.290.0>,<0.286.0>,<0.367.0>,
>>>>>                      {db_header,6,984356,0,
>>>>>                       {860345646,{737369,975,640891414},59433736},
>>>>>                       {860348005,738344,42056446},
>>>>>                       {860352635,[],5737},
>>>>>                       0,nil,nil,1000},
>>>>>                      984356,
>>>>>                      {btree,<0.286.0>,
>>>>>                       {860345646,{737369,975,640891414},59433736},
>>>>>                       #Fun<couch_db_updater.10.57960608>,
>>>>>                       #Fun<couch_db_updater.11.57960608>,
>>>>>                       #Fun<couch_btree.5.133731799>,
>>>>>                       #Fun<couch_db_updater.12.57960608>,snappy},
>>>>>                      {btree,<0.286.0>,
>>>>>                       {860348005,738344,42056446},
>>>>>                       #Fun<couch_db_updater.13.57960608>,
>>>>>                       #Fun<couch_db_updater.14.57960608>,
>>>>>                       #Fun<couch_btree.5.133731799>,
>>>>>                       #Fun<couch_db_updater.15.57960608>,snappy},
>>>>>                      {btree,<0.286.0>,
>>>>>                       {860352635,[],5737},
>>>>>                       #Fun<couch_btree.3.133731799>,
>>>>>                       #Fun<couch_btree.4.133731799>,
>>>>>                       #Fun<couch_btree.5.133731799>,nil,snappy},
>>>>>                      984356,<<"cbstats">>,
>>>>>                      "/Volumes/terror/db/couchdb/cbstats.couch",[],[],
>>>>>                      nil,
>>>>>                      {user_ctx,null,[<<"_admin">>],undefined},
>>>>>                      nil,1000,
>>>>>                      [before_header,after_header,on_file_open],
>>>>>                      [{user_ctx,
>>>>>                        {user_ctx,null,[<<"_admin">>],undefined}}],
>>>>>                      snappy,nil,nil},
>>>>>                     [],nil,nil,nil,
>>>>>                     {rep_stats,0,0,0,0,0},
>>>>>                     nil,<0.385.0>,
>>>>>                     {batch,[],0}}
>>>>> ** Reason for termination ==
>>>>> ** {noproc,{gen_server,call,[<0.367.0>,{drop,<0.383.0>},infinity]}}
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  Scrolling to the beginning of the errors, I find this:
>>>>>
>>>>>
>>>>> [Thu, 14 Jun 2012 23:15:54 GMT] [error] [<0.164.0>] Replication `543f76281e8d52d6ce5b51fddf0588e7+continuous` (`photo` -> `http://dustin:*****@dustinphoto.couchone.com/photo/`) failed: source_db_down
>>>>> [Thu, 14 Jun 2012 23:18:57 GMT] [info] [<0.358.0>] 127.0.0.1 - - GET /_all_dbs 200
>>>>> [Thu, 14 Jun 2012 23:19:52 GMT] [error] [<0.289.0>] ** Generic server <0.289.0> terminating
>>>>> ** Last message in was {update_docs,<0.272.0>,[],
>>>>>                       [{{doc,
>>>>>                             <<"_local/c4cc070f896d7267e52ba012856fed4b">>,
>>>>>                             {0,[<<"346185">>]},
>>>>>                             {[{<<"session_id">>,
>>>>>                                <<"9fb3475683d44bb1e151031dd42cc59f">>},
>>>>>                               {<<"source_last_seq">>,1419004},
>>>>>                               {<<"replication_id_version">>,2},
>>>>>                               {<<"history">>,
>>>>>                                [{[{<<"session_id">>,
>>>>>                                    <<"9fb3475683d44bb1e151031dd42cc59f">>},
>>>>>                                   {<<"start_time">>,
>>>>>                                    <<"Thu, 14 Jun 2012 01:35:02 GMT">>},
>>>>>                                   {<<"end_time">>,
>>>>>                                    <<"Thu, 14 Jun 2012 23:15:29 GMT">>},
>>>>>                                   {<<"start_last_seq">>,1410146},
>>>>>                                   {<<"end_last_seq">>,1419004},
>>>>>                                   {<<"recorded_seq">>,1419004},
>>>>>                                   {<<"missing_checked">>,8100},
>>>>>                                   {<<"missing_found">>,8100},
>>>>>                                   {<<"docs_read">>,8100},
>>>>>                                   {<<"docs_written">>,8100},
>>>>>                                   {<<"doc_write_failures">>,0}]},
>>>>>                                 {[{<<"session_id">>,
>>>>>                                    <<"3edd7c50327eab7ec0768451e34efa8b">>},
>>>>>                                   {<<"start_time">>,
>>>>>                                    <<"Tue, 12 Jun 2012 05:51:17 GMT">>},
>>>>>                                   {<<"end_time">>,
>>>>>                                    <<"Tue, 12 Jun 2012 13:02:37 GMT">>},
>>>>>                                   {<<"start_last_seq">>,1407186},
>>>>>                                   {<<"end_last_seq">>,1410146},
>>>>>                                   {<<"recorded_seq">>,1410146},
>>>>>                                   {<<"missing_checked">>,2583},
>>>>>                                   {<<"missing_found">>,2577},
>>>>>                                   {<<"docs_read">>,2577},
>>>>>                                   {<<"docs_written">>,2577},
>>>>>                                   {<<"doc_write_failures">>,0}]},
>>>>>                                 {[{<<"session_id">>,
>>>>>                                    <<"172de62044281a01b1584a9d099f42af">>},
>>>>>                                   {<<"start_time">>,
>>>>>                                    <<"Mon, 11 Jun 2012 03:40:11 GMT">>},
>>>>>                                   {<<"end_time">>,
>>>>>                                    <<"Mon, 11 Jun 2012 15:16:24 GMT">>},
>>>>>                                   {<<"start_last_seq">>,1405428},
>>>>>                                   {<<"end_last_seq">>,1407186},
>>>>>                                   {<<"recorded_seq">>,1407186},
>>>>>                                   {<<"missing_checked">>,1721},
>>>>>                                   {<<"missing_found">>,1721},
>>>>>                                   {<<"docs_read">>,1721},
>>>>>                                   {<<"docs_written">>,1721},
>>>>>                                   {<<"doc_write_failures">>,0}]},
>>>>>                                 {[{<<"session_id">>,
>>>>>                                    <<"e60a126a2036c5fab00a1249101820c8">>},
>>>>>                                   {<<"start_time">>,
>>>>>                                    <<"Sat, 09 Jun 2012 07:47:22 GMT">>},
>>>>>                                   {<<"end_time">>,
>>>>>                                    <<"Sun, 10 Jun 2012 21:16:20 GMT">>},
>>>>>                                   {<<"start_last_seq">>,1386289},
>>>>>                                   {<<"end_last_seq">>,1405428},
>>>>>                                   {<<"recorded_seq">>,1405428},
>>>>>                                   {<<"missing_checked">>,16977},
>>>>>                                   {<<"missing_found">>,16977},
>>>>>                                   {<<"docs_read">>,16977},
>>>>>                                   {<<"docs_written">>,16977},
>>>>>                                   {<<"doc_write_failures">>,0}]},
>>>>>                                 {[{<<"session_id">>,
>>>>>                                    <<"ef3e4333d340dcf73ddfa3fe8c720042">>},
>>>>>                                   {<<"start_time">>,
>>>>>                                    <<"Mon, 04 Jun 2012 02:39:44 GMT">>},
>>>>>                                   {<<"end_time">>,
>>>>>                                    <<"Mon, 04 Jun 2012 12:35:50 GMT">>},
>>>>>                                   {<<"start_last_seq">>,1384738},
>>>>>                                   {<<"end_last_seq">>,1386289},
>>>>>                                   {<<"recorded_seq">>,1386289},
>>>>>                                   {<<"missing_checked">>,1551},
>>>>>                                   {<<"missing_found">>,1550},
>>>>>                                   {<<"docs_read">>,1550},
>>>>>                                   {<<"docs_written">>,1550},
>>>>>                                   {<<"doc_write_failures">>,0}]},
>>>>>                                 {[{<<"session_id">>,
>>>>>                                    <<"d5123a3caf462794aaf5a47be1bb3b6e">>},
>>>>>                                   {<<"start_time">>,
>>>>>                                    <<"Wed, 30 May 2012 20:41:43 GMT">>},
>>>>>                                   {<<"end_time">>,
>>>>>                                    <<"Mon, 04 Jun 2012 02:37:33 GMT">>},
>>>>>                                   {<<"start_last_seq">>,1372404},
>>>>>                                   {<<"end_last_seq">>,1384738},
>>>>>                                   {<<"recorded_seq">>,1384738},
>>>>>                                   {<<"missing_checked">>,12334},
>>>>>                                   {<<"missing_found">>,12333},
>>>>>                                   {<<"docs_read">>,12333},
>>>>>                                   {<<"docs_written">>,12333},
>>>>>                                   {<<"doc_write_failures">>,0}]},
>>>>>                                 {[{<<"session_id">>,
>>>>>                                    <<"52a16e8832f70dc094f6fff5e9b7d75b">>},
>>>>>                                   {<<"start_time">>,
>>>>>                                    <<"Sun, 27 May 2012 23:36:41 GMT">>},
>>>>>                                   {<<"end_time">>,
>>>>>                                    <<"Wed, 30 May 2012 20:40:14 GMT">>},
>>>>>                                   {<<"start_last_seq">>,1361049},
>>>>>                                   {<<"end_last_seq">>,1372404},
>>>>>                                   {<<"recorded_seq">>,1372404},
>>>>>                                   {<<"missing_checked">>,11355},
>>>>>                                   {<<"missing_found">>,11355},
>>>>>                                   {<<"docs_read">>,11355},
>>>>>                                   {<<"docs_written">>,11355},
>>>>>                                   {<<"doc_write_failures">>,0}]},
>>>>> [...lots of these...]
>>>>>
>>>>>                             [],false,[]},
>>>>>                         #Ref<0.0.15.159973>}],
>>>>>                       false,false}
>>>>> ** When Server state == {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,
>>>>>                        <0.290.0>,<0.286.0>,<0.367.0>,
>>>>>                        {db_header,6,992456,0,
>>>>>                            {943280145,{744250,975,647546641},60017672},
>>>>>                            {943282327,745225,42485979},
>>>>>                            {943267963,[],5753},
>>>>>                            0,nil,nil,1000},
>>>>>                        992456,
>>>>>                        {btree,<0.286.0>,
>>>>>                            {943280145,{744250,975,647546641},60017672},
>>>>>                            #Fun<couch_db_updater.10.57960608>,
>>>>>                            #Fun<couch_db_updater.11.57960608>,
>>>>>                            #Fun<couch_btree.5.133731799>,
>>>>>                            #Fun<couch_db_updater.12.57960608>,snappy},
>>>>>                        {btree,<0.286.0>,
>>>>>                            {943282327,745225,42485979},
>>>>>                            #Fun<couch_db_updater.13.57960608>,
>>>>>                            #Fun<couch_db_updater.14.57960608>,
>>>>>                            #Fun<couch_btree.5.133731799>,
>>>>>                            #Fun<couch_db_updater.15.57960608>,snappy},
>>>>>                        {btree,<0.286.0>,
>>>>>                            {943267963,[],5753},
>>>>>                            #Fun<couch_btree.3.133731799>,
>>>>>                            #Fun<couch_btree.4.133731799>,
>>>>>                            #Fun<couch_btree.5.133731799>,nil,snappy},
>>>>>                        992456,<<"cbstats">>,
>>>>>                        "/Volumes/terror/db/couchdb/cbstats.couch",[],[],
>>>>>                        nil,
>>>>>                        {user_ctx,null,[],undefined},
>>>>>                        nil,1000,
>>>>>                        [before_header,after_header,on_file_open],
>>>>>                        [{user_ctx,
>>>>>                             {user_ctx,null,[<<"_admin">>],undefined}}],
>>>>>                        snappy,nil,nil}
>>>>> ** Reason for termination ==
>>>>> ** {timeout,
>>>>>   {gen_server,call,
>>>>>       [<0.288.0>,
>>>>>        {db_updated,
>>>>>            {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,<0.290.0>,
>>>>>                <0.286.0>,<0.367.0>,
>>>>>                {db_header,6,992456,0,
>>>>>                    {943280145,{744250,975,647546641},60017672},
>>>>>                    {943282327,745225,42485979},
>>>>>                    {943267963,[],5753},
>>>>>                    0,nil,nil,1000},
>>>>>                992456,
>>>>>                {btree,<0.286.0>,
>>>>>                    {943280145,{744250,975,647546641},60017672},
>>>>>                    #Fun<couch_db_updater.10.57960608>,
>>>>>                    #Fun<couch_db_updater.11.57960608>,
>>>>>                    #Fun<couch_btree.5.133731799>,
>>>>>                    #Fun<couch_db_updater.12.57960608>,snappy},
>>>>>                {btree,<0.286.0>,
>>>>>                    {943282327,745225,42485979},
>>>>>                    #Fun<couch_db_updater.13.57960608>,
>>>>>                    #Fun<couch_db_updater.14.57960608>,
>>>>>                    #Fun<couch_btree.5.133731799>,
>>>>>                    #Fun<couch_db_updater.15.57960608>,snappy},
>>>>>                {btree,<0.286.0>,
>>>>>                    {943284347,[],5756},
>>>>>                    #Fun<couch_btree.3.133731799>,
>>>>>                    #Fun<couch_btree.4.133731799>,
>>>>>                    #Fun<couch_btree.5.133731799>,nil,snappy},
>>>>>                992456,<<"cbstats">>,
>>>>>                "/Volumes/terror/db/couchdb/cbstats.couch",[],[],nil,
>>>>>                {user_ctx,null,[],undefined},
>>>>>                #Ref<0.0.15.160107>,1000,
>>>>>                [before_header,after_header,on_file_open],
>>>>>                [{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}],
>>>>>                snappy,nil,nil}}]}}
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> dustin sallings
>>>>
>>>> --
>>>> dustin sallings
>

Re: replication problems

Posted by Jan Lehnardt <ja...@apache.org>.
flagged.

On Oct 10, 2012, at 22:34 , Robert Newson <ro...@gmail.com> wrote:

> Jan,
> 
> Flag that as fix-for 1.3? I don't have my creds on my phone to do it.
> 
> I like the ini uuid idea best, modelled after the cookie with secret.
> If we have the uuid, we'd omit host name as well as port, right?
> 
> Sent from the ocean floor
> 
> On 10 Oct 2012, at 21:12, Jan Lehnardt <ja...@apache.org> wrote:
> 
>> Filipe tells me this is https://issues.apache.org/jira/browse/COUCHDB-1259
>> 
>> Cheers
>> Jan
>> --
>> 
>> On Oct 4, 2012, at 02:28 , Dustin Sallings <du...@spy.net> wrote:
>> 
>>> 
>>>   I'm bringing this back up as requested.  I'm currently simultaneously in the "not replicating interesting things" and "has duplicate replicates state".  I think the stuff below shows the "not replicating" stuff.
>>> 
>>>   Active tasks shows the other (these are based on replicator DB documents (example below):
>>> 
>>> [
>>>  {
>>>      "checkpointed_source_seq": 2022317,
>>>      "continuous": true,
>>>      "doc_id": "cbstats-from-dogbowl",
>>>      "doc_write_failures": 0,
>>>      "docs_read": 300,
>>>      "docs_written": 300,
>>>      "missing_revisions_found": 300,
>>>      "pid": "<0.10466.12>",
>>>      "progress": 100,
>>>      "replication_id": "50daecd0a29f4b7e5d102990831f3d64+continuous",
>>>      "revisions_checked": 304,
>>>      "source": "http://dustin:*****@single.couchbase.net/cbstats/",
>>>      "source_seq": 2022317,
>>>      "started_on": 1349309457,
>>>      "target": "cbstats",
>>>      "type": "replication",
>>>      "updated_on": 1349310442
>>>  },
>>>  {
>>>      "checkpointed_source_seq": 2022317,
>>>      "continuous": true,
>>>      "doc_id": "cbstats-from-dogbowl",
>>>      "doc_write_failures": 0,
>>>      "docs_read": 62,
>>>      "docs_written": 62,
>>>      "missing_revisions_found": 62,
>>>      "pid": "<0.11019.12>",
>>>      "progress": 100,
>>>      "replication_id": "411e341d5aa9a3fe636cf4ea8ba71720+continuous",
>>>      "revisions_checked": 304,
>>>      "source": "http://dustin:*****@single.couchbase.net/cbstats/",
>>>      "source_seq": 2022317,
>>>      "started_on": 1349309471,
>>>      "target": "cbstats",
>>>      "type": "replication",
>>>      "updated_on": 1349310443
>>>  },
>>>  {
>>>      "checkpointed_source_seq": 107068,
>>>      "continuous": true,
>>>      "doc_id": "gerrit-from-prod",
>>>      "doc_write_failures": 0,
>>>      "docs_read": 22,
>>>      "docs_written": 22,
>>>      "missing_revisions_found": 22,
>>>      "pid": "<0.11086.12>",
>>>      "progress": 100,
>>>      "replication_id": "4a21031dac0d81637a23c32bad620be9+continuous",
>>>      "revisions_checked": 26,
>>>      "source": "http://dustinphoto.iriscouch.com/gerrit/",
>>>      "source_seq": 107068,
>>>      "started_on": 1349309487,
>>>      "target": "gerrit",
>>>      "type": "replication",
>>>      "updated_on": 1349310445
>>>  },
>>>  {
>>>      "checkpointed_source_seq": 107068,
>>>      "continuous": true,
>>>      "doc_id": "gerrit-from-prod",
>>>      "doc_write_failures": 0,
>>>      "docs_read": 17,
>>>      "docs_written": 17,
>>>      "missing_revisions_found": 17,
>>>      "pid": "<0.11107.12>",
>>>      "progress": 100,
>>>      "replication_id": "b4ad5d3f2e5b78670e4c8364b18000e9+continuous",
>>>      "revisions_checked": 26,
>>>      "source": "http://dustinphoto.iriscouch.com/gerrit/",
>>>      "source_seq": 107068,
>>>      "started_on": 1349309488,
>>>      "target": "gerrit",
>>>      "type": "replication",
>>>      "updated_on": 1349310445
>>>  }
>>> ]
>>> 
>>> 
>>>   The replicator document for the latter, for example is this:
>>> 
>>> {
>>> "_id": "gerrit-from-prod",
>>> "_rev": "2235-36de10fb757581a1782dacbb26ee4809",
>>> "source": "http://dustinphoto.iriscouch.com/gerrit",
>>> "target": "gerrit",
>>> "continuous": true,
>>> "user_ctx": {
>>>     "roles": [
>>>         "_admin"
>>>     ]
>>> },
>>> "_replication_state_time": "2012-10-03T17:11:27-07:00",
>>> "_replication_id": "b4ad5d3f2e5b78670e4c8364b18000e9",
>>> "_replication_state": "triggered"
>>> }
>>> 
>>> 
>>> Begin forwarded message:
>>> 
>>>> From: Dustin Sallings <du...@spy.net>
>>>> Subject: Re: replication problems
>>>> Date: June 15, 2012 0:10:04 PDT
>>>> To: dev@couchdb.apache.org
>>>> Reply-To: dev@couchdb.apache.org
>>>> 
>>>> 
>>>> On Jun 14, 2012, at 11:28 PM, Benoit Chesneau wrote:
>>>> 
>>>>> Ar you using _replicate or _replicator ? Anything interresting in logs?
>>>> 
>>>> 
>>>>   I'm using _replicator (wonderful feature, I just kill the DB and everything goes back the way I want it).
>>>> 
>>>>   Hmm...  I do think I found some stuff digging through the logs.  This is the local DB I noticed not doing its thing, although there were tons of errors all around this.  Looks like the server got into some kind of bad state and sort of half-crashed.
>>>> 
>>>> 
>>>> [Thu, 14 Jun 2012 23:20:12 GMT] [error] [<0.133.0>] Replication `ae601df0373da82d1b4a9ff741c8ba18+continuous` (`rpics` -> `rpics-processed`) failed: {{timeout,{gen_server,call,[<0.213.0>,{open_ref_count,<0.4
>>>> 42.0>}]}},
>>>> {gen_server,call,
>>>>          [couch_server,
>>>>           {open,<<"rpics">>,
>>>>                 [{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}]},
>>>>           infinity]}}
>>>> [Thu, 14 Jun 2012 23:20:25 GMT] [error] [<0.383.0>] ** Generic server <0.383.0> terminating
>>>> ** Last message in was {'EXIT',<0.384.0>,
>>>>                     {{timeout,
>>>>                       {gen_server,call,
>>>>                        [<0.213.0>,{open_ref_count,<0.442.0>}]}},
>>>>                      {gen_server,call,
>>>>                       [couch_server,
>>>>                        {open,<<"cbstats">>,
>>>>                         [{user_ctx,
>>>>                           {user_ctx,null,[<<"_admin">>],undefined}},
>>>>                          {user_ctx,
>>>>                           {user_ctx,null,[<<"_admin">>],undefined}}]},
>>>>                        infinity]}}}
>>>> 
>>>> ** When Server state == {state,<0.272.0>,<0.384.0>,20,
>>>>                      {httpdb,
>>>>                       "http://dustin:LOGGED_PASSWORD@single.couchbase.net/cbstats/",
>>>>                       nil,
>>>>                       [{"Accept","application/json"},
>>>>                        {"User-Agent","CouchDB/1.2.0"}],
>>>>                       30000,
>>>>                       [{socket_options,
>>>>                         [{keepalive,true},{nodelay,false}]}],
>>>>                       10,250,<0.273.0>,20},
>>>>                      {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,
>>>>                       <0.290.0>,<0.286.0>,<0.367.0>,
>>>>                       {db_header,6,984356,0,
>>>>                        {860345646,{737369,975,640891414},59433736},
>>>>                        {860348005,738344,42056446},
>>>>                        {860352635,[],5737},
>>>>                        0,nil,nil,1000},
>>>>                       984356,
>>>>                       {btree,<0.286.0>,
>>>>                        {860345646,{737369,975,640891414},59433736},
>>>>                        #Fun<couch_db_updater.10.57960608>,
>>>>                        #Fun<couch_db_updater.11.57960608>,
>>>>                        #Fun<couch_btree.5.133731799>,
>>>>                        #Fun<couch_db_updater.12.57960608>,snappy},
>>>>                       {btree,<0.286.0>,
>>>>                        {860348005,738344,42056446},
>>>>                        #Fun<couch_db_updater.13.57960608>,
>>>>                        #Fun<couch_db_updater.14.57960608>,
>>>>                        #Fun<couch_btree.5.133731799>,
>>>>                        #Fun<couch_db_updater.15.57960608>,snappy},
>>>>                       {btree,<0.286.0>,
>>>>                        {860352635,[],5737},
>>>>                        #Fun<couch_btree.3.133731799>,
>>>>                        #Fun<couch_btree.4.133731799>,
>>>>                        #Fun<couch_btree.5.133731799>,nil,snappy},
>>>>                       984356,<<"cbstats">>,
>>>>                       "/Volumes/terror/db/couchdb/cbstats.couch",[],[],
>>>>                       nil,
>>>>                       {user_ctx,null,[<<"_admin">>],undefined},
>>>>                       nil,1000,
>>>>                       [before_header,after_header,on_file_open],
>>>>                       [{user_ctx,
>>>>                         {user_ctx,null,[<<"_admin">>],undefined}}],
>>>>                       snappy,nil,nil},
>>>>                      [],nil,nil,nil,
>>>>                      {rep_stats,0,0,0,0,0},
>>>>                      nil,<0.385.0>,
>>>>                      {batch,[],0}}
>>>> ** Reason for termination ==
>>>> ** {noproc,{gen_server,call,[<0.367.0>,{drop,<0.383.0>},infinity]}}
>>>> 
>>>> 
>>>> 
>>>> 
>>>>   Scrolling to the beginning of the errors, I find this:
>>>> 
>>>> 
>>>> [Thu, 14 Jun 2012 23:15:54 GMT] [error] [<0.164.0>] Replication `543f76281e8d52d6ce5b51fddf0588e7+continuous` (`photo` -> `http://dustin:*****@dustinphoto.couchone.com/photo/`) failed: source_db_down
>>>> [Thu, 14 Jun 2012 23:18:57 GMT] [info] [<0.358.0>] 127.0.0.1 - - GET /_all_dbs 200
>>>> [Thu, 14 Jun 2012 23:19:52 GMT] [error] [<0.289.0>] ** Generic server <0.289.0> terminating
>>>> ** Last message in was {update_docs,<0.272.0>,[],
>>>>                        [{{doc,
>>>>                              <<"_local/c4cc070f896d7267e52ba012856fed4b">>,
>>>>                              {0,[<<"346185">>]},
>>>>                              {[{<<"session_id">>,
>>>>                                 <<"9fb3475683d44bb1e151031dd42cc59f">>},
>>>>                                {<<"source_last_seq">>,1419004},
>>>>                                {<<"replication_id_version">>,2},
>>>>                                {<<"history">>,
>>>>                                 [{[{<<"session_id">>,
>>>>                                     <<"9fb3475683d44bb1e151031dd42cc59f">>},
>>>>                                    {<<"start_time">>,
>>>>                                     <<"Thu, 14 Jun 2012 01:35:02 GMT">>},
>>>>                                    {<<"end_time">>,
>>>>                                     <<"Thu, 14 Jun 2012 23:15:29 GMT">>},
>>>>                                    {<<"start_last_seq">>,1410146},
>>>>                                    {<<"end_last_seq">>,1419004},
>>>>                                    {<<"recorded_seq">>,1419004},
>>>>                                    {<<"missing_checked">>,8100},
>>>>                                    {<<"missing_found">>,8100},
>>>>                                    {<<"docs_read">>,8100},
>>>>                                    {<<"docs_written">>,8100},
>>>>                                    {<<"doc_write_failures">>,0}]},
>>>>                                  {[{<<"session_id">>,
>>>>                                     <<"3edd7c50327eab7ec0768451e34efa8b">>},
>>>>                                    {<<"start_time">>,
>>>>                                     <<"Tue, 12 Jun 2012 05:51:17 GMT">>},
>>>>                                    {<<"end_time">>,
>>>>                                     <<"Tue, 12 Jun 2012 13:02:37 GMT">>},
>>>>                                    {<<"start_last_seq">>,1407186},
>>>>                                    {<<"end_last_seq">>,1410146},
>>>>                                    {<<"recorded_seq">>,1410146},
>>>>                                    {<<"missing_checked">>,2583},
>>>>                                    {<<"missing_found">>,2577},
>>>>                                    {<<"docs_read">>,2577},
>>>>                                    {<<"docs_written">>,2577},
>>>>                                    {<<"doc_write_failures">>,0}]},
>>>>                                  {[{<<"session_id">>,
>>>>                                     <<"172de62044281a01b1584a9d099f42af">>},
>>>>                                    {<<"start_time">>,
>>>>                                     <<"Mon, 11 Jun 2012 03:40:11 GMT">>},
>>>>                                    {<<"end_time">>,
>>>>                                     <<"Mon, 11 Jun 2012 15:16:24 GMT">>},
>>>>                                    {<<"start_last_seq">>,1405428},
>>>>                                    {<<"end_last_seq">>,1407186},
>>>>                                    {<<"recorded_seq">>,1407186},
>>>>                                    {<<"missing_checked">>,1721},
>>>>                                    {<<"missing_found">>,1721},
>>>>                                    {<<"docs_read">>,1721},
>>>>                                    {<<"docs_written">>,1721},
>>>>                                    {<<"doc_write_failures">>,0}]},
>>>>                                  {[{<<"session_id">>,
>>>>                                     <<"e60a126a2036c5fab00a1249101820c8">>},
>>>>                                    {<<"start_time">>,
>>>>                                     <<"Sat, 09 Jun 2012 07:47:22 GMT">>},
>>>>                                    {<<"end_time">>,
>>>>                                     <<"Sun, 10 Jun 2012 21:16:20 GMT">>},
>>>>                                    {<<"start_last_seq">>,1386289},
>>>>                                    {<<"end_last_seq">>,1405428},
>>>>                                    {<<"recorded_seq">>,1405428},
>>>>                                    {<<"missing_checked">>,16977},
>>>>                                    {<<"missing_found">>,16977},
>>>>                                    {<<"docs_read">>,16977},
>>>>                                    {<<"docs_written">>,16977},
>>>>                                    {<<"doc_write_failures">>,0}]},
>>>>                                  {[{<<"session_id">>,
>>>>                                     <<"ef3e4333d340dcf73ddfa3fe8c720042">>},
>>>>                                    {<<"start_time">>,
>>>>                                     <<"Mon, 04 Jun 2012 02:39:44 GMT">>},
>>>>                                    {<<"end_time">>,
>>>>                                     <<"Mon, 04 Jun 2012 12:35:50 GMT">>},
>>>>                                    {<<"start_last_seq">>,1384738},
>>>>                                    {<<"end_last_seq">>,1386289},
>>>>                                    {<<"recorded_seq">>,1386289},
>>>>                                    {<<"missing_checked">>,1551},
>>>>                                    {<<"missing_found">>,1550},
>>>>                                    {<<"docs_read">>,1550},
>>>>                                    {<<"docs_written">>,1550},
>>>>                                    {<<"doc_write_failures">>,0}]},
>>>>                                  {[{<<"session_id">>,
>>>>                                     <<"d5123a3caf462794aaf5a47be1bb3b6e">>},
>>>>                                    {<<"start_time">>,
>>>>                                     <<"Wed, 30 May 2012 20:41:43 GMT">>},
>>>>                                    {<<"end_time">>,
>>>>                                     <<"Mon, 04 Jun 2012 02:37:33 GMT">>},
>>>>                                    {<<"start_last_seq">>,1372404},
>>>>                                    {<<"end_last_seq">>,1384738},
>>>>                                    {<<"recorded_seq">>,1384738},
>>>>                                    {<<"missing_checked">>,12334},
>>>>                                    {<<"missing_found">>,12333},
>>>>                                    {<<"docs_read">>,12333},
>>>>                                    {<<"docs_written">>,12333},
>>>>                                    {<<"doc_write_failures">>,0}]},
>>>>                                  {[{<<"session_id">>,
>>>>                                     <<"52a16e8832f70dc094f6fff5e9b7d75b">>},
>>>>                                    {<<"start_time">>,
>>>>                                     <<"Sun, 27 May 2012 23:36:41 GMT">>},
>>>>                                    {<<"end_time">>,
>>>>                                     <<"Wed, 30 May 2012 20:40:14 GMT">>},
>>>>                                    {<<"start_last_seq">>,1361049},
>>>>                                    {<<"end_last_seq">>,1372404},
>>>>                                    {<<"recorded_seq">>,1372404},
>>>>                                    {<<"missing_checked">>,11355},
>>>>                                    {<<"missing_found">>,11355},
>>>>                                    {<<"docs_read">>,11355},
>>>>                                    {<<"docs_written">>,11355},
>>>>                                    {<<"doc_write_failures">>,0}]},
>>>> [...lots of these...]
>>>> 
>>>>                              [],false,[]},
>>>>                          #Ref<0.0.15.159973>}],
>>>>                        false,false}
>>>> ** When Server state == {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,
>>>>                         <0.290.0>,<0.286.0>,<0.367.0>,
>>>>                         {db_header,6,992456,0,
>>>>                             {943280145,{744250,975,647546641},60017672},
>>>>                             {943282327,745225,42485979},
>>>>                             {943267963,[],5753},
>>>>                             0,nil,nil,1000},
>>>>                         992456,
>>>>                         {btree,<0.286.0>,
>>>>                             {943280145,{744250,975,647546641},60017672},
>>>>                             #Fun<couch_db_updater.10.57960608>,
>>>>                             #Fun<couch_db_updater.11.57960608>,
>>>>                             #Fun<couch_btree.5.133731799>,
>>>>                             #Fun<couch_db_updater.12.57960608>,snappy},
>>>>                         {btree,<0.286.0>,
>>>>                             {943282327,745225,42485979},
>>>>                             #Fun<couch_db_updater.13.57960608>,
>>>>                             #Fun<couch_db_updater.14.57960608>,
>>>>                             #Fun<couch_btree.5.133731799>,
>>>>                             #Fun<couch_db_updater.15.57960608>,snappy},
>>>>                         {btree,<0.286.0>,
>>>>                             {943267963,[],5753},
>>>>                             #Fun<couch_btree.3.133731799>,
>>>>                             #Fun<couch_btree.4.133731799>,
>>>>                             #Fun<couch_btree.5.133731799>,nil,snappy},
>>>>                         992456,<<"cbstats">>,
>>>>                         "/Volumes/terror/db/couchdb/cbstats.couch",[],[],
>>>>                         nil,
>>>>                         {user_ctx,null,[],undefined},
>>>>                         nil,1000,
>>>>                         [before_header,after_header,on_file_open],
>>>>                         [{user_ctx,
>>>>                              {user_ctx,null,[<<"_admin">>],undefined}}],
>>>>                         snappy,nil,nil}
>>>> ** Reason for termination ==
>>>> ** {timeout,
>>>>    {gen_server,call,
>>>>        [<0.288.0>,
>>>>         {db_updated,
>>>>             {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,<0.290.0>,
>>>>                 <0.286.0>,<0.367.0>,
>>>>                 {db_header,6,992456,0,
>>>>                     {943280145,{744250,975,647546641},60017672},
>>>>                     {943282327,745225,42485979},
>>>>                     {943267963,[],5753},
>>>>                     0,nil,nil,1000},
>>>>                 992456,
>>>>                 {btree,<0.286.0>,
>>>>                     {943280145,{744250,975,647546641},60017672},
>>>>                     #Fun<couch_db_updater.10.57960608>,
>>>>                     #Fun<couch_db_updater.11.57960608>,
>>>>                     #Fun<couch_btree.5.133731799>,
>>>>                     #Fun<couch_db_updater.12.57960608>,snappy},
>>>>                 {btree,<0.286.0>,
>>>>                     {943282327,745225,42485979},
>>>>                     #Fun<couch_db_updater.13.57960608>,
>>>>                     #Fun<couch_db_updater.14.57960608>,
>>>>                     #Fun<couch_btree.5.133731799>,
>>>>                     #Fun<couch_db_updater.15.57960608>,snappy},
>>>>                 {btree,<0.286.0>,
>>>>                     {943284347,[],5756},
>>>>                     #Fun<couch_btree.3.133731799>,
>>>>                     #Fun<couch_btree.4.133731799>,
>>>>                     #Fun<couch_btree.5.133731799>,nil,snappy},
>>>>                 992456,<<"cbstats">>,
>>>>                 "/Volumes/terror/db/couchdb/cbstats.couch",[],[],nil,
>>>>                 {user_ctx,null,[],undefined},
>>>>                 #Ref<0.0.15.160107>,1000,
>>>>                 [before_header,after_header,on_file_open],
>>>>                 [{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}],
>>>>                 snappy,nil,nil}}]}}
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> dustin sallings
>>> 
>>> --
>>> dustin sallings
>> 


Re: replication problems

Posted by Robert Newson <ro...@gmail.com>.
Jan,

Flag that as fix-for 1.3? I don't have my creds on my phone to do it.

I like the ini uuid idea best, modelled after the cookie with secret.
If we have the uuid, we'd omit host name as well as port, right?

Sent from the ocean floor

On 10 Oct 2012, at 21:12, Jan Lehnardt <ja...@apache.org> wrote:

> Filipe tells me this is https://issues.apache.org/jira/browse/COUCHDB-1259
>
> Cheers
> Jan
> --
>
> On Oct 4, 2012, at 02:28 , Dustin Sallings <du...@spy.net> wrote:
>
>>
>>    I'm bringing this back up as requested.  I'm currently simultaneously in the "not replicating interesting things" and "has duplicate replicates state".  I think the stuff below shows the "not replicating" stuff.
>>
>>    Active tasks shows the other (these are based on replicator DB documents (example below):
>>
>> [
>>   {
>>       "checkpointed_source_seq": 2022317,
>>       "continuous": true,
>>       "doc_id": "cbstats-from-dogbowl",
>>       "doc_write_failures": 0,
>>       "docs_read": 300,
>>       "docs_written": 300,
>>       "missing_revisions_found": 300,
>>       "pid": "<0.10466.12>",
>>       "progress": 100,
>>       "replication_id": "50daecd0a29f4b7e5d102990831f3d64+continuous",
>>       "revisions_checked": 304,
>>       "source": "http://dustin:*****@single.couchbase.net/cbstats/",
>>       "source_seq": 2022317,
>>       "started_on": 1349309457,
>>       "target": "cbstats",
>>       "type": "replication",
>>       "updated_on": 1349310442
>>   },
>>   {
>>       "checkpointed_source_seq": 2022317,
>>       "continuous": true,
>>       "doc_id": "cbstats-from-dogbowl",
>>       "doc_write_failures": 0,
>>       "docs_read": 62,
>>       "docs_written": 62,
>>       "missing_revisions_found": 62,
>>       "pid": "<0.11019.12>",
>>       "progress": 100,
>>       "replication_id": "411e341d5aa9a3fe636cf4ea8ba71720+continuous",
>>       "revisions_checked": 304,
>>       "source": "http://dustin:*****@single.couchbase.net/cbstats/",
>>       "source_seq": 2022317,
>>       "started_on": 1349309471,
>>       "target": "cbstats",
>>       "type": "replication",
>>       "updated_on": 1349310443
>>   },
>>   {
>>       "checkpointed_source_seq": 107068,
>>       "continuous": true,
>>       "doc_id": "gerrit-from-prod",
>>       "doc_write_failures": 0,
>>       "docs_read": 22,
>>       "docs_written": 22,
>>       "missing_revisions_found": 22,
>>       "pid": "<0.11086.12>",
>>       "progress": 100,
>>       "replication_id": "4a21031dac0d81637a23c32bad620be9+continuous",
>>       "revisions_checked": 26,
>>       "source": "http://dustinphoto.iriscouch.com/gerrit/",
>>       "source_seq": 107068,
>>       "started_on": 1349309487,
>>       "target": "gerrit",
>>       "type": "replication",
>>       "updated_on": 1349310445
>>   },
>>   {
>>       "checkpointed_source_seq": 107068,
>>       "continuous": true,
>>       "doc_id": "gerrit-from-prod",
>>       "doc_write_failures": 0,
>>       "docs_read": 17,
>>       "docs_written": 17,
>>       "missing_revisions_found": 17,
>>       "pid": "<0.11107.12>",
>>       "progress": 100,
>>       "replication_id": "b4ad5d3f2e5b78670e4c8364b18000e9+continuous",
>>       "revisions_checked": 26,
>>       "source": "http://dustinphoto.iriscouch.com/gerrit/",
>>       "source_seq": 107068,
>>       "started_on": 1349309488,
>>       "target": "gerrit",
>>       "type": "replication",
>>       "updated_on": 1349310445
>>   }
>> ]
>>
>>
>>    The replicator document for the latter, for example is this:
>>
>> {
>>  "_id": "gerrit-from-prod",
>>  "_rev": "2235-36de10fb757581a1782dacbb26ee4809",
>>  "source": "http://dustinphoto.iriscouch.com/gerrit",
>>  "target": "gerrit",
>>  "continuous": true,
>>  "user_ctx": {
>>      "roles": [
>>          "_admin"
>>      ]
>>  },
>>  "_replication_state_time": "2012-10-03T17:11:27-07:00",
>>  "_replication_id": "b4ad5d3f2e5b78670e4c8364b18000e9",
>>  "_replication_state": "triggered"
>> }
>>
>>
>> Begin forwarded message:
>>
>>> From: Dustin Sallings <du...@spy.net>
>>> Subject: Re: replication problems
>>> Date: June 15, 2012 0:10:04 PDT
>>> To: dev@couchdb.apache.org
>>> Reply-To: dev@couchdb.apache.org
>>>
>>>
>>> On Jun 14, 2012, at 11:28 PM, Benoit Chesneau wrote:
>>>
>>>> Ar you using _replicate or _replicator ? Anything interresting in logs?
>>>
>>>
>>>    I'm using _replicator (wonderful feature, I just kill the DB and everything goes back the way I want it).
>>>
>>>    Hmm...  I do think I found some stuff digging through the logs.  This is the local DB I noticed not doing its thing, although there were tons of errors all around this.  Looks like the server got into some kind of bad state and sort of half-crashed.
>>>
>>>
>>> [Thu, 14 Jun 2012 23:20:12 GMT] [error] [<0.133.0>] Replication `ae601df0373da82d1b4a9ff741c8ba18+continuous` (`rpics` -> `rpics-processed`) failed: {{timeout,{gen_server,call,[<0.213.0>,{open_ref_count,<0.4
>>> 42.0>}]}},
>>> {gen_server,call,
>>>           [couch_server,
>>>            {open,<<"rpics">>,
>>>                  [{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}]},
>>>            infinity]}}
>>> [Thu, 14 Jun 2012 23:20:25 GMT] [error] [<0.383.0>] ** Generic server <0.383.0> terminating
>>> ** Last message in was {'EXIT',<0.384.0>,
>>>                      {{timeout,
>>>                        {gen_server,call,
>>>                         [<0.213.0>,{open_ref_count,<0.442.0>}]}},
>>>                       {gen_server,call,
>>>                        [couch_server,
>>>                         {open,<<"cbstats">>,
>>>                          [{user_ctx,
>>>                            {user_ctx,null,[<<"_admin">>],undefined}},
>>>                           {user_ctx,
>>>                            {user_ctx,null,[<<"_admin">>],undefined}}]},
>>>                         infinity]}}}
>>>
>>> ** When Server state == {state,<0.272.0>,<0.384.0>,20,
>>>                       {httpdb,
>>>                        "http://dustin:LOGGED_PASSWORD@single.couchbase.net/cbstats/",
>>>                        nil,
>>>                        [{"Accept","application/json"},
>>>                         {"User-Agent","CouchDB/1.2.0"}],
>>>                        30000,
>>>                        [{socket_options,
>>>                          [{keepalive,true},{nodelay,false}]}],
>>>                        10,250,<0.273.0>,20},
>>>                       {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,
>>>                        <0.290.0>,<0.286.0>,<0.367.0>,
>>>                        {db_header,6,984356,0,
>>>                         {860345646,{737369,975,640891414},59433736},
>>>                         {860348005,738344,42056446},
>>>                         {860352635,[],5737},
>>>                         0,nil,nil,1000},
>>>                        984356,
>>>                        {btree,<0.286.0>,
>>>                         {860345646,{737369,975,640891414},59433736},
>>>                         #Fun<couch_db_updater.10.57960608>,
>>>                         #Fun<couch_db_updater.11.57960608>,
>>>                         #Fun<couch_btree.5.133731799>,
>>>                         #Fun<couch_db_updater.12.57960608>,snappy},
>>>                        {btree,<0.286.0>,
>>>                         {860348005,738344,42056446},
>>>                         #Fun<couch_db_updater.13.57960608>,
>>>                         #Fun<couch_db_updater.14.57960608>,
>>>                         #Fun<couch_btree.5.133731799>,
>>>                         #Fun<couch_db_updater.15.57960608>,snappy},
>>>                        {btree,<0.286.0>,
>>>                         {860352635,[],5737},
>>>                         #Fun<couch_btree.3.133731799>,
>>>                         #Fun<couch_btree.4.133731799>,
>>>                         #Fun<couch_btree.5.133731799>,nil,snappy},
>>>                        984356,<<"cbstats">>,
>>>                        "/Volumes/terror/db/couchdb/cbstats.couch",[],[],
>>>                        nil,
>>>                        {user_ctx,null,[<<"_admin">>],undefined},
>>>                        nil,1000,
>>>                        [before_header,after_header,on_file_open],
>>>                        [{user_ctx,
>>>                          {user_ctx,null,[<<"_admin">>],undefined}}],
>>>                        snappy,nil,nil},
>>>                       [],nil,nil,nil,
>>>                       {rep_stats,0,0,0,0,0},
>>>                       nil,<0.385.0>,
>>>                       {batch,[],0}}
>>> ** Reason for termination ==
>>> ** {noproc,{gen_server,call,[<0.367.0>,{drop,<0.383.0>},infinity]}}
>>>
>>>
>>>
>>>
>>>    Scrolling to the beginning of the errors, I find this:
>>>
>>>
>>> [Thu, 14 Jun 2012 23:15:54 GMT] [error] [<0.164.0>] Replication `543f76281e8d52d6ce5b51fddf0588e7+continuous` (`photo` -> `http://dustin:*****@dustinphoto.couchone.com/photo/`) failed: source_db_down
>>> [Thu, 14 Jun 2012 23:18:57 GMT] [info] [<0.358.0>] 127.0.0.1 - - GET /_all_dbs 200
>>> [Thu, 14 Jun 2012 23:19:52 GMT] [error] [<0.289.0>] ** Generic server <0.289.0> terminating
>>> ** Last message in was {update_docs,<0.272.0>,[],
>>>                         [{{doc,
>>>                               <<"_local/c4cc070f896d7267e52ba012856fed4b">>,
>>>                               {0,[<<"346185">>]},
>>>                               {[{<<"session_id">>,
>>>                                  <<"9fb3475683d44bb1e151031dd42cc59f">>},
>>>                                 {<<"source_last_seq">>,1419004},
>>>                                 {<<"replication_id_version">>,2},
>>>                                 {<<"history">>,
>>>                                  [{[{<<"session_id">>,
>>>                                      <<"9fb3475683d44bb1e151031dd42cc59f">>},
>>>                                     {<<"start_time">>,
>>>                                      <<"Thu, 14 Jun 2012 01:35:02 GMT">>},
>>>                                     {<<"end_time">>,
>>>                                      <<"Thu, 14 Jun 2012 23:15:29 GMT">>},
>>>                                     {<<"start_last_seq">>,1410146},
>>>                                     {<<"end_last_seq">>,1419004},
>>>                                     {<<"recorded_seq">>,1419004},
>>>                                     {<<"missing_checked">>,8100},
>>>                                     {<<"missing_found">>,8100},
>>>                                     {<<"docs_read">>,8100},
>>>                                     {<<"docs_written">>,8100},
>>>                                     {<<"doc_write_failures">>,0}]},
>>>                                   {[{<<"session_id">>,
>>>                                      <<"3edd7c50327eab7ec0768451e34efa8b">>},
>>>                                     {<<"start_time">>,
>>>                                      <<"Tue, 12 Jun 2012 05:51:17 GMT">>},
>>>                                     {<<"end_time">>,
>>>                                      <<"Tue, 12 Jun 2012 13:02:37 GMT">>},
>>>                                     {<<"start_last_seq">>,1407186},
>>>                                     {<<"end_last_seq">>,1410146},
>>>                                     {<<"recorded_seq">>,1410146},
>>>                                     {<<"missing_checked">>,2583},
>>>                                     {<<"missing_found">>,2577},
>>>                                     {<<"docs_read">>,2577},
>>>                                     {<<"docs_written">>,2577},
>>>                                     {<<"doc_write_failures">>,0}]},
>>>                                   {[{<<"session_id">>,
>>>                                      <<"172de62044281a01b1584a9d099f42af">>},
>>>                                     {<<"start_time">>,
>>>                                      <<"Mon, 11 Jun 2012 03:40:11 GMT">>},
>>>                                     {<<"end_time">>,
>>>                                      <<"Mon, 11 Jun 2012 15:16:24 GMT">>},
>>>                                     {<<"start_last_seq">>,1405428},
>>>                                     {<<"end_last_seq">>,1407186},
>>>                                     {<<"recorded_seq">>,1407186},
>>>                                     {<<"missing_checked">>,1721},
>>>                                     {<<"missing_found">>,1721},
>>>                                     {<<"docs_read">>,1721},
>>>                                     {<<"docs_written">>,1721},
>>>                                     {<<"doc_write_failures">>,0}]},
>>>                                   {[{<<"session_id">>,
>>>                                      <<"e60a126a2036c5fab00a1249101820c8">>},
>>>                                     {<<"start_time">>,
>>>                                      <<"Sat, 09 Jun 2012 07:47:22 GMT">>},
>>>                                     {<<"end_time">>,
>>>                                      <<"Sun, 10 Jun 2012 21:16:20 GMT">>},
>>>                                     {<<"start_last_seq">>,1386289},
>>>                                     {<<"end_last_seq">>,1405428},
>>>                                     {<<"recorded_seq">>,1405428},
>>>                                     {<<"missing_checked">>,16977},
>>>                                     {<<"missing_found">>,16977},
>>>                                     {<<"docs_read">>,16977},
>>>                                     {<<"docs_written">>,16977},
>>>                                     {<<"doc_write_failures">>,0}]},
>>>                                   {[{<<"session_id">>,
>>>                                      <<"ef3e4333d340dcf73ddfa3fe8c720042">>},
>>>                                     {<<"start_time">>,
>>>                                      <<"Mon, 04 Jun 2012 02:39:44 GMT">>},
>>>                                     {<<"end_time">>,
>>>                                      <<"Mon, 04 Jun 2012 12:35:50 GMT">>},
>>>                                     {<<"start_last_seq">>,1384738},
>>>                                     {<<"end_last_seq">>,1386289},
>>>                                     {<<"recorded_seq">>,1386289},
>>>                                     {<<"missing_checked">>,1551},
>>>                                     {<<"missing_found">>,1550},
>>>                                     {<<"docs_read">>,1550},
>>>                                     {<<"docs_written">>,1550},
>>>                                     {<<"doc_write_failures">>,0}]},
>>>                                   {[{<<"session_id">>,
>>>                                      <<"d5123a3caf462794aaf5a47be1bb3b6e">>},
>>>                                     {<<"start_time">>,
>>>                                      <<"Wed, 30 May 2012 20:41:43 GMT">>},
>>>                                     {<<"end_time">>,
>>>                                      <<"Mon, 04 Jun 2012 02:37:33 GMT">>},
>>>                                     {<<"start_last_seq">>,1372404},
>>>                                     {<<"end_last_seq">>,1384738},
>>>                                     {<<"recorded_seq">>,1384738},
>>>                                     {<<"missing_checked">>,12334},
>>>                                     {<<"missing_found">>,12333},
>>>                                     {<<"docs_read">>,12333},
>>>                                     {<<"docs_written">>,12333},
>>>                                     {<<"doc_write_failures">>,0}]},
>>>                                   {[{<<"session_id">>,
>>>                                      <<"52a16e8832f70dc094f6fff5e9b7d75b">>},
>>>                                     {<<"start_time">>,
>>>                                      <<"Sun, 27 May 2012 23:36:41 GMT">>},
>>>                                     {<<"end_time">>,
>>>                                      <<"Wed, 30 May 2012 20:40:14 GMT">>},
>>>                                     {<<"start_last_seq">>,1361049},
>>>                                     {<<"end_last_seq">>,1372404},
>>>                                     {<<"recorded_seq">>,1372404},
>>>                                     {<<"missing_checked">>,11355},
>>>                                     {<<"missing_found">>,11355},
>>>                                     {<<"docs_read">>,11355},
>>>                                     {<<"docs_written">>,11355},
>>>                                     {<<"doc_write_failures">>,0}]},
>>> [...lots of these...]
>>>
>>>                               [],false,[]},
>>>                           #Ref<0.0.15.159973>}],
>>>                         false,false}
>>> ** When Server state == {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,
>>>                          <0.290.0>,<0.286.0>,<0.367.0>,
>>>                          {db_header,6,992456,0,
>>>                              {943280145,{744250,975,647546641},60017672},
>>>                              {943282327,745225,42485979},
>>>                              {943267963,[],5753},
>>>                              0,nil,nil,1000},
>>>                          992456,
>>>                          {btree,<0.286.0>,
>>>                              {943280145,{744250,975,647546641},60017672},
>>>                              #Fun<couch_db_updater.10.57960608>,
>>>                              #Fun<couch_db_updater.11.57960608>,
>>>                              #Fun<couch_btree.5.133731799>,
>>>                              #Fun<couch_db_updater.12.57960608>,snappy},
>>>                          {btree,<0.286.0>,
>>>                              {943282327,745225,42485979},
>>>                              #Fun<couch_db_updater.13.57960608>,
>>>                              #Fun<couch_db_updater.14.57960608>,
>>>                              #Fun<couch_btree.5.133731799>,
>>>                              #Fun<couch_db_updater.15.57960608>,snappy},
>>>                          {btree,<0.286.0>,
>>>                              {943267963,[],5753},
>>>                              #Fun<couch_btree.3.133731799>,
>>>                              #Fun<couch_btree.4.133731799>,
>>>                              #Fun<couch_btree.5.133731799>,nil,snappy},
>>>                          992456,<<"cbstats">>,
>>>                          "/Volumes/terror/db/couchdb/cbstats.couch",[],[],
>>>                          nil,
>>>                          {user_ctx,null,[],undefined},
>>>                          nil,1000,
>>>                          [before_header,after_header,on_file_open],
>>>                          [{user_ctx,
>>>                               {user_ctx,null,[<<"_admin">>],undefined}}],
>>>                          snappy,nil,nil}
>>> ** Reason for termination ==
>>> ** {timeout,
>>>     {gen_server,call,
>>>         [<0.288.0>,
>>>          {db_updated,
>>>              {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,<0.290.0>,
>>>                  <0.286.0>,<0.367.0>,
>>>                  {db_header,6,992456,0,
>>>                      {943280145,{744250,975,647546641},60017672},
>>>                      {943282327,745225,42485979},
>>>                      {943267963,[],5753},
>>>                      0,nil,nil,1000},
>>>                  992456,
>>>                  {btree,<0.286.0>,
>>>                      {943280145,{744250,975,647546641},60017672},
>>>                      #Fun<couch_db_updater.10.57960608>,
>>>                      #Fun<couch_db_updater.11.57960608>,
>>>                      #Fun<couch_btree.5.133731799>,
>>>                      #Fun<couch_db_updater.12.57960608>,snappy},
>>>                  {btree,<0.286.0>,
>>>                      {943282327,745225,42485979},
>>>                      #Fun<couch_db_updater.13.57960608>,
>>>                      #Fun<couch_db_updater.14.57960608>,
>>>                      #Fun<couch_btree.5.133731799>,
>>>                      #Fun<couch_db_updater.15.57960608>,snappy},
>>>                  {btree,<0.286.0>,
>>>                      {943284347,[],5756},
>>>                      #Fun<couch_btree.3.133731799>,
>>>                      #Fun<couch_btree.4.133731799>,
>>>                      #Fun<couch_btree.5.133731799>,nil,snappy},
>>>                  992456,<<"cbstats">>,
>>>                  "/Volumes/terror/db/couchdb/cbstats.couch",[],[],nil,
>>>                  {user_ctx,null,[],undefined},
>>>                  #Ref<0.0.15.160107>,1000,
>>>                  [before_header,after_header,on_file_open],
>>>                  [{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}],
>>>                  snappy,nil,nil}}]}}
>>>
>>>
>>>
>>>
>>> --
>>> dustin sallings
>>
>> --
>> dustin sallings
>

Re: replication problems

Posted by Jan Lehnardt <ja...@apache.org>.
Filipe tells me this is https://issues.apache.org/jira/browse/COUCHDB-1259

Cheers
Jan
--

On Oct 4, 2012, at 02:28 , Dustin Sallings <du...@spy.net> wrote:

> 
> 	I'm bringing this back up as requested.  I'm currently simultaneously in the "not replicating interesting things" and "has duplicate replicates state".  I think the stuff below shows the "not replicating" stuff.
> 
> 	Active tasks shows the other (these are based on replicator DB documents (example below):
> 
> [
>    {
>        "checkpointed_source_seq": 2022317, 
>        "continuous": true, 
>        "doc_id": "cbstats-from-dogbowl", 
>        "doc_write_failures": 0, 
>        "docs_read": 300, 
>        "docs_written": 300, 
>        "missing_revisions_found": 300, 
>        "pid": "<0.10466.12>", 
>        "progress": 100, 
>        "replication_id": "50daecd0a29f4b7e5d102990831f3d64+continuous", 
>        "revisions_checked": 304, 
>        "source": "http://dustin:*****@single.couchbase.net/cbstats/", 
>        "source_seq": 2022317, 
>        "started_on": 1349309457, 
>        "target": "cbstats", 
>        "type": "replication", 
>        "updated_on": 1349310442
>    }, 
>    {
>        "checkpointed_source_seq": 2022317, 
>        "continuous": true, 
>        "doc_id": "cbstats-from-dogbowl", 
>        "doc_write_failures": 0, 
>        "docs_read": 62, 
>        "docs_written": 62, 
>        "missing_revisions_found": 62, 
>        "pid": "<0.11019.12>", 
>        "progress": 100, 
>        "replication_id": "411e341d5aa9a3fe636cf4ea8ba71720+continuous", 
>        "revisions_checked": 304, 
>        "source": "http://dustin:*****@single.couchbase.net/cbstats/", 
>        "source_seq": 2022317, 
>        "started_on": 1349309471, 
>        "target": "cbstats", 
>        "type": "replication", 
>        "updated_on": 1349310443
>    }, 
>    {
>        "checkpointed_source_seq": 107068, 
>        "continuous": true, 
>        "doc_id": "gerrit-from-prod", 
>        "doc_write_failures": 0, 
>        "docs_read": 22, 
>        "docs_written": 22, 
>        "missing_revisions_found": 22, 
>        "pid": "<0.11086.12>", 
>        "progress": 100, 
>        "replication_id": "4a21031dac0d81637a23c32bad620be9+continuous", 
>        "revisions_checked": 26, 
>        "source": "http://dustinphoto.iriscouch.com/gerrit/", 
>        "source_seq": 107068, 
>        "started_on": 1349309487, 
>        "target": "gerrit", 
>        "type": "replication", 
>        "updated_on": 1349310445
>    }, 
>    {
>        "checkpointed_source_seq": 107068, 
>        "continuous": true, 
>        "doc_id": "gerrit-from-prod", 
>        "doc_write_failures": 0, 
>        "docs_read": 17, 
>        "docs_written": 17, 
>        "missing_revisions_found": 17, 
>        "pid": "<0.11107.12>", 
>        "progress": 100, 
>        "replication_id": "b4ad5d3f2e5b78670e4c8364b18000e9+continuous", 
>        "revisions_checked": 26, 
>        "source": "http://dustinphoto.iriscouch.com/gerrit/", 
>        "source_seq": 107068, 
>        "started_on": 1349309488, 
>        "target": "gerrit", 
>        "type": "replication", 
>        "updated_on": 1349310445
>    }
> ]
> 
> 
> 	The replicator document for the latter, for example is this:
> 
> {
>   "_id": "gerrit-from-prod",
>   "_rev": "2235-36de10fb757581a1782dacbb26ee4809",
>   "source": "http://dustinphoto.iriscouch.com/gerrit",
>   "target": "gerrit",
>   "continuous": true,
>   "user_ctx": {
>       "roles": [
>           "_admin"
>       ]
>   },
>   "_replication_state_time": "2012-10-03T17:11:27-07:00",
>   "_replication_id": "b4ad5d3f2e5b78670e4c8364b18000e9",
>   "_replication_state": "triggered"
> }
> 
> 
> Begin forwarded message:
> 
>> From: Dustin Sallings <du...@spy.net>
>> Subject: Re: replication problems
>> Date: June 15, 2012 0:10:04 PDT
>> To: dev@couchdb.apache.org
>> Reply-To: dev@couchdb.apache.org
>> 
>> 
>> On Jun 14, 2012, at 11:28 PM, Benoit Chesneau wrote:
>> 
>>> Ar you using _replicate or _replicator ? Anything interresting in logs?
>> 
>> 
>> 	I'm using _replicator (wonderful feature, I just kill the DB and everything goes back the way I want it).
>> 
>> 	Hmm...  I do think I found some stuff digging through the logs.  This is the local DB I noticed not doing its thing, although there were tons of errors all around this.  Looks like the server got into some kind of bad state and sort of half-crashed.
>> 
>> 
>> [Thu, 14 Jun 2012 23:20:12 GMT] [error] [<0.133.0>] Replication `ae601df0373da82d1b4a9ff741c8ba18+continuous` (`rpics` -> `rpics-processed`) failed: {{timeout,{gen_server,call,[<0.213.0>,{open_ref_count,<0.4
>> 42.0>}]}},
>> {gen_server,call,
>>            [couch_server,
>>             {open,<<"rpics">>,
>>                   [{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}]},
>>             infinity]}}
>> [Thu, 14 Jun 2012 23:20:25 GMT] [error] [<0.383.0>] ** Generic server <0.383.0> terminating 
>> ** Last message in was {'EXIT',<0.384.0>,
>>                       {{timeout,
>>                         {gen_server,call,
>>                          [<0.213.0>,{open_ref_count,<0.442.0>}]}},
>>                        {gen_server,call,
>>                         [couch_server,
>>                          {open,<<"cbstats">>,
>>                           [{user_ctx,
>>                             {user_ctx,null,[<<"_admin">>],undefined}},
>>                            {user_ctx,
>>                             {user_ctx,null,[<<"_admin">>],undefined}}]},
>>                          infinity]}}}
>> 
>> ** When Server state == {state,<0.272.0>,<0.384.0>,20,
>>                        {httpdb,
>>                         "http://dustin:LOGGED_PASSWORD@single.couchbase.net/cbstats/",
>>                         nil,
>>                         [{"Accept","application/json"},
>>                          {"User-Agent","CouchDB/1.2.0"}],
>>                         30000,
>>                         [{socket_options,
>>                           [{keepalive,true},{nodelay,false}]}],
>>                         10,250,<0.273.0>,20},
>>                        {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,
>>                         <0.290.0>,<0.286.0>,<0.367.0>,
>>                         {db_header,6,984356,0,
>>                          {860345646,{737369,975,640891414},59433736},
>>                          {860348005,738344,42056446},
>>                          {860352635,[],5737},
>>                          0,nil,nil,1000},
>>                         984356,
>>                         {btree,<0.286.0>,
>>                          {860345646,{737369,975,640891414},59433736},
>>                          #Fun<couch_db_updater.10.57960608>,
>>                          #Fun<couch_db_updater.11.57960608>,
>>                          #Fun<couch_btree.5.133731799>,
>>                          #Fun<couch_db_updater.12.57960608>,snappy},
>>                         {btree,<0.286.0>,
>>                          {860348005,738344,42056446},
>>                          #Fun<couch_db_updater.13.57960608>,
>>                          #Fun<couch_db_updater.14.57960608>,
>>                          #Fun<couch_btree.5.133731799>,
>>                          #Fun<couch_db_updater.15.57960608>,snappy},
>>                         {btree,<0.286.0>,
>>                          {860352635,[],5737},
>>                          #Fun<couch_btree.3.133731799>,
>>                          #Fun<couch_btree.4.133731799>,
>>                          #Fun<couch_btree.5.133731799>,nil,snappy},
>>                         984356,<<"cbstats">>,
>>                         "/Volumes/terror/db/couchdb/cbstats.couch",[],[],
>>                         nil,
>>                         {user_ctx,null,[<<"_admin">>],undefined},
>>                         nil,1000,
>>                         [before_header,after_header,on_file_open],
>>                         [{user_ctx,
>>                           {user_ctx,null,[<<"_admin">>],undefined}}],
>>                         snappy,nil,nil},
>>                        [],nil,nil,nil,
>>                        {rep_stats,0,0,0,0,0},
>>                        nil,<0.385.0>,
>>                        {batch,[],0}}
>> ** Reason for termination == 
>> ** {noproc,{gen_server,call,[<0.367.0>,{drop,<0.383.0>},infinity]}}
>> 
>> 
>> 
>> 
>> 	Scrolling to the beginning of the errors, I find this:
>> 
>> 
>> [Thu, 14 Jun 2012 23:15:54 GMT] [error] [<0.164.0>] Replication `543f76281e8d52d6ce5b51fddf0588e7+continuous` (`photo` -> `http://dustin:*****@dustinphoto.couchone.com/photo/`) failed: source_db_down
>> [Thu, 14 Jun 2012 23:18:57 GMT] [info] [<0.358.0>] 127.0.0.1 - - GET /_all_dbs 200
>> [Thu, 14 Jun 2012 23:19:52 GMT] [error] [<0.289.0>] ** Generic server <0.289.0> terminating 
>> ** Last message in was {update_docs,<0.272.0>,[],
>>                          [{{doc,
>>                                <<"_local/c4cc070f896d7267e52ba012856fed4b">>,
>>                                {0,[<<"346185">>]},
>>                                {[{<<"session_id">>,
>>                                   <<"9fb3475683d44bb1e151031dd42cc59f">>},
>>                                  {<<"source_last_seq">>,1419004},
>>                                  {<<"replication_id_version">>,2},
>>                                  {<<"history">>,
>>                                   [{[{<<"session_id">>,
>>                                       <<"9fb3475683d44bb1e151031dd42cc59f">>},
>>                                      {<<"start_time">>,
>>                                       <<"Thu, 14 Jun 2012 01:35:02 GMT">>},
>>                                      {<<"end_time">>,
>>                                       <<"Thu, 14 Jun 2012 23:15:29 GMT">>},
>>                                      {<<"start_last_seq">>,1410146},
>>                                      {<<"end_last_seq">>,1419004},
>>                                      {<<"recorded_seq">>,1419004},
>>                                      {<<"missing_checked">>,8100},
>>                                      {<<"missing_found">>,8100},
>>                                      {<<"docs_read">>,8100},
>>                                      {<<"docs_written">>,8100},
>>                                      {<<"doc_write_failures">>,0}]},
>>                                    {[{<<"session_id">>,
>>                                       <<"3edd7c50327eab7ec0768451e34efa8b">>},
>>                                      {<<"start_time">>,
>>                                       <<"Tue, 12 Jun 2012 05:51:17 GMT">>},
>>                                      {<<"end_time">>,
>>                                       <<"Tue, 12 Jun 2012 13:02:37 GMT">>},
>>                                      {<<"start_last_seq">>,1407186},
>>                                      {<<"end_last_seq">>,1410146},
>>                                      {<<"recorded_seq">>,1410146},
>>                                      {<<"missing_checked">>,2583},
>>                                      {<<"missing_found">>,2577},
>>                                      {<<"docs_read">>,2577},
>>                                      {<<"docs_written">>,2577},
>>                                      {<<"doc_write_failures">>,0}]},
>>                                    {[{<<"session_id">>,
>>                                       <<"172de62044281a01b1584a9d099f42af">>},
>>                                      {<<"start_time">>,
>>                                       <<"Mon, 11 Jun 2012 03:40:11 GMT">>},
>>                                      {<<"end_time">>,
>>                                       <<"Mon, 11 Jun 2012 15:16:24 GMT">>},
>>                                      {<<"start_last_seq">>,1405428},
>>                                      {<<"end_last_seq">>,1407186},
>>                                      {<<"recorded_seq">>,1407186},
>>                                      {<<"missing_checked">>,1721},
>>                                      {<<"missing_found">>,1721},
>>                                      {<<"docs_read">>,1721},
>>                                      {<<"docs_written">>,1721},
>>                                      {<<"doc_write_failures">>,0}]},
>>                                    {[{<<"session_id">>,
>>                                       <<"e60a126a2036c5fab00a1249101820c8">>},
>>                                      {<<"start_time">>,
>>                                       <<"Sat, 09 Jun 2012 07:47:22 GMT">>},
>>                                      {<<"end_time">>,
>>                                       <<"Sun, 10 Jun 2012 21:16:20 GMT">>},
>>                                      {<<"start_last_seq">>,1386289},
>>                                      {<<"end_last_seq">>,1405428},
>>                                      {<<"recorded_seq">>,1405428},
>>                                      {<<"missing_checked">>,16977},
>>                                      {<<"missing_found">>,16977},
>>                                      {<<"docs_read">>,16977},
>>                                      {<<"docs_written">>,16977},
>>                                      {<<"doc_write_failures">>,0}]},
>>                                    {[{<<"session_id">>,
>>                                       <<"ef3e4333d340dcf73ddfa3fe8c720042">>},
>>                                      {<<"start_time">>,
>>                                       <<"Mon, 04 Jun 2012 02:39:44 GMT">>},
>>                                      {<<"end_time">>,
>>                                       <<"Mon, 04 Jun 2012 12:35:50 GMT">>},
>>                                      {<<"start_last_seq">>,1384738},
>>                                      {<<"end_last_seq">>,1386289},
>>                                      {<<"recorded_seq">>,1386289},
>>                                      {<<"missing_checked">>,1551},
>>                                      {<<"missing_found">>,1550},
>>                                      {<<"docs_read">>,1550},
>>                                      {<<"docs_written">>,1550},
>>                                      {<<"doc_write_failures">>,0}]},
>>                                    {[{<<"session_id">>,
>>                                       <<"d5123a3caf462794aaf5a47be1bb3b6e">>},
>>                                      {<<"start_time">>,
>>                                       <<"Wed, 30 May 2012 20:41:43 GMT">>},
>>                                      {<<"end_time">>,
>>                                       <<"Mon, 04 Jun 2012 02:37:33 GMT">>},
>>                                      {<<"start_last_seq">>,1372404},
>>                                      {<<"end_last_seq">>,1384738},
>>                                      {<<"recorded_seq">>,1384738},
>>                                      {<<"missing_checked">>,12334},
>>                                      {<<"missing_found">>,12333},
>>                                      {<<"docs_read">>,12333},
>>                                      {<<"docs_written">>,12333},
>>                                      {<<"doc_write_failures">>,0}]},
>>                                    {[{<<"session_id">>,
>>                                       <<"52a16e8832f70dc094f6fff5e9b7d75b">>},
>>                                      {<<"start_time">>,
>>                                       <<"Sun, 27 May 2012 23:36:41 GMT">>},
>>                                      {<<"end_time">>,
>>                                       <<"Wed, 30 May 2012 20:40:14 GMT">>},
>>                                      {<<"start_last_seq">>,1361049},
>>                                      {<<"end_last_seq">>,1372404},
>>                                      {<<"recorded_seq">>,1372404},
>>                                      {<<"missing_checked">>,11355},
>>                                      {<<"missing_found">>,11355},
>>                                      {<<"docs_read">>,11355},
>>                                      {<<"docs_written">>,11355},
>>                                      {<<"doc_write_failures">>,0}]},
>> [...lots of these...]
>> 
>>                                [],false,[]},
>>                            #Ref<0.0.15.159973>}],
>>                          false,false}
>> ** When Server state == {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,
>>                           <0.290.0>,<0.286.0>,<0.367.0>,
>>                           {db_header,6,992456,0,
>>                               {943280145,{744250,975,647546641},60017672},
>>                               {943282327,745225,42485979},
>>                               {943267963,[],5753},
>>                               0,nil,nil,1000},
>>                           992456,
>>                           {btree,<0.286.0>,
>>                               {943280145,{744250,975,647546641},60017672},
>>                               #Fun<couch_db_updater.10.57960608>,
>>                               #Fun<couch_db_updater.11.57960608>,
>>                               #Fun<couch_btree.5.133731799>,
>>                               #Fun<couch_db_updater.12.57960608>,snappy},
>>                           {btree,<0.286.0>,
>>                               {943282327,745225,42485979},
>>                               #Fun<couch_db_updater.13.57960608>,
>>                               #Fun<couch_db_updater.14.57960608>,
>>                               #Fun<couch_btree.5.133731799>,
>>                               #Fun<couch_db_updater.15.57960608>,snappy},
>>                           {btree,<0.286.0>,
>>                               {943267963,[],5753},
>>                               #Fun<couch_btree.3.133731799>,
>>                               #Fun<couch_btree.4.133731799>,
>>                               #Fun<couch_btree.5.133731799>,nil,snappy},
>>                           992456,<<"cbstats">>,
>>                           "/Volumes/terror/db/couchdb/cbstats.couch",[],[],
>>                           nil,
>>                           {user_ctx,null,[],undefined},
>>                           nil,1000,
>>                           [before_header,after_header,on_file_open],
>>                           [{user_ctx,
>>                                {user_ctx,null,[<<"_admin">>],undefined}}],
>>                           snappy,nil,nil}
>> ** Reason for termination == 
>> ** {timeout,
>>      {gen_server,call,
>>          [<0.288.0>,
>>           {db_updated,
>>               {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,<0.290.0>,
>>                   <0.286.0>,<0.367.0>,
>>                   {db_header,6,992456,0,
>>                       {943280145,{744250,975,647546641},60017672},
>>                       {943282327,745225,42485979},
>>                       {943267963,[],5753},
>>                       0,nil,nil,1000},
>>                   992456,
>>                   {btree,<0.286.0>,
>>                       {943280145,{744250,975,647546641},60017672},
>>                       #Fun<couch_db_updater.10.57960608>,
>>                       #Fun<couch_db_updater.11.57960608>,
>>                       #Fun<couch_btree.5.133731799>,
>>                       #Fun<couch_db_updater.12.57960608>,snappy},
>>                   {btree,<0.286.0>,
>>                       {943282327,745225,42485979},
>>                       #Fun<couch_db_updater.13.57960608>,
>>                       #Fun<couch_db_updater.14.57960608>,
>>                       #Fun<couch_btree.5.133731799>,
>>                       #Fun<couch_db_updater.15.57960608>,snappy},
>>                   {btree,<0.286.0>,
>>                       {943284347,[],5756},
>>                       #Fun<couch_btree.3.133731799>,
>>                       #Fun<couch_btree.4.133731799>,
>>                       #Fun<couch_btree.5.133731799>,nil,snappy},
>>                   992456,<<"cbstats">>,
>>                   "/Volumes/terror/db/couchdb/cbstats.couch",[],[],nil,
>>                   {user_ctx,null,[],undefined},
>>                   #Ref<0.0.15.160107>,1000,
>>                   [before_header,after_header,on_file_open],
>>                   [{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}],
>>                   snappy,nil,nil}}]}}
>> 
>> 
>> 
>> 
>> -- 
>> dustin sallings
>> 
>> 
>> 
> 
> -- 
> dustin sallings
> 
> 
>