You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Jeff Hinrichs - DM&T <je...@dundeemt.com> on 2009/02/27 15:13:25 UTC

Replication is Failing is this a known problem?

Attempting to replicate a database with largish attachments (<= ~18MB
of attachments in a doc, less thatn 200 docs)  from one machine to
another fails consistently and at the same point.

Scenario:
Both servers are running from HEAD and I've been tracking for some
time.  This problem has been around as long as I've been using couch.

Machine A holds the original database, Machine B is the server that is
doing a PULL replication

During the replication, Machine A starts showing the following
sporadically in the log:
[Fri, 27 Feb 2009 14:02:48 GMT] [debug] [<0.5902.3>] 'GET'
/delasco-invoices/INV00652429?revs=true&attachments=true&latest=true&open_revs=["425644723"]
{1,

                             1}
Headers: [{'Host',"192.168.2.52:5984"}]

[Fri, 27 Feb 2009 14:02:48 GMT] [error] [<0.5901.3>] Uncaught error in
HTTP request: {exit,normal}

[Fri, 27 Feb 2009 14:02:48 GMT] [debug] [<0.5901.3>] Stacktrace:
[{mochiweb_request,send,2},
             {couch_httpd,send_chunk,2},
             {couch_httpd_db,db_doc_req,3},
             {couch_httpd_db,do_db_req,2},
             {couch_httpd,handle_request,3},
             {mochiweb_http,headers,5},
             {proc_lib,init_p,5}]

[Fri, 27 Feb 2009 14:02:48 GMT] [debug] [<0.5901.3>] HTTPd 500 error response:
 {"error":"error","reason":"normal"}

As the replication continues, the frequency of these error "Uncaught
error in HTTP request: {exit,normal}"  increase.  Until the error is
being constantly repeated.  Then Machine B stops sending requests, no
mor log output, no errors, the last thing in Machine B's log file is:
[Fri, 27 Feb 2009 14:03:24 GMT] [info] [<0.20893.1>] retrying
couch_rep HTTP get request due to {error, req_timedout}: [104,116,
                                                                   116,112,58,
                                                                   47,47,49,
                                                                   57,50,46,
                                                                   49,54,56,
                                                                   46,50,46,
                                                                   53,50,58,
                                                                   53,57,56,
                                                                   52,47,100,
                                                                   101,108,97,
                                                                   115,99,111,
                                                                   45,105,110,
                                                                   118,111,
                                                                   105,99,101,
                                                                   115,47,73,
                                                                   78,86,48,
                                                                   48,54,53,
                                                                   50,49,51,
                                                                   56,63,114,
                                                                   101,118,
                                                                   115,61,116,
                                                                   114,117,
                                                                   101,38,97,
                                                                   116,116,97,
                                                                   99,104,109,
                                                                   101,110,
                                                                   116,115,61,
                                                                   116,114,
                                                                   117,101,38,
                                                                   108,97,116,
                                                                   101,115,
                                                                   116,61,116,
                                                                   114,117,
                                                                   101,38,111,
                                                                   112,101,
                                                                   110,95,114,
                                                                   101,118,
                                                                   115,61,91,
                                                                   34,

<<"3070455362">>,
                                                                   34,93]

A request for status from the couchdb init.d script returns nothing
and checking the processes returns:
(demo-couchdb)jlh@mars:~/projects/venvs/demo-couchdb/src$ ps ax|grep cou
29281 pts/2    S+     0:00 grep cou
(demo-couchdb)jlh@mars:~/projects/venvs/demo-couchdb/src$ ps ax|grep beam
29305 pts/2    R+     0:00 grep beam

In fact, couch has gone away completely on Machine B.  In fact,
couch's death is so quick it can't even say why.

Attempts to incrementally replicate after the first failure die at
exactly the same place.

I can replicate this same database on the same machine from one
database to another without issue.  I can dump and reload the database
with no problems.

I have reported this earlier and no one seemed to have an answer.  Is
there a specific issue in JIRA that addresses this problem?  If not,
is what I have here enough to start one and should I?

Regards,

Jeff Hinrichs

Re: Replication is Failing is this a known problem?

Posted by Jeff Hinrichs - DM&T <du...@gmail.com>.
On Fri, Feb 27, 2009 at 8:57 AM, Adam Kocoloski
<ad...@gmail.com> wrote:
> Hi Jeff, I can pick this one up, but not before Monday. We do have some
> replicating-attachment JIRA tickets open and active, but it looks like
> there's some new stuff in this report too.  Feel free to file another one.
>  Best,
>
> Adam

Adam,
https://issues.apache.org/jira/browse/COUCHDB-270

Thanks again,

Jeff
> Sent from my iPhone
>
> On Feb 27, 2009, at 9:13 AM, "Jeff Hinrichs - DM&T" <je...@dundeemt.com>
> wrote:
>
>> Attempting to replicate a database with largish attachments (<= ~18MB
>> of attachments in a doc, less thatn 200 docs)  from one machine to
>> another fails consistently and at the same point.
>>
>> Scenario:
>> Both servers are running from HEAD and I've been tracking for some
>> time.  This problem has been around as long as I've been using couch.
>>
>> Machine A holds the original database, Machine B is the server that is
>> doing a PULL replication
>>
>> During the replication, Machine A starts showing the following
>> sporadically in the log:
>> [Fri, 27 Feb 2009 14:02:48 GMT] [debug] [<0.5902.3>] 'GET'
>>
>> /delasco-invoices/INV00652429?revs=true&attachments=true&latest=true&open_revs=["425644723"]
>> {1,
>>
>>                            1}
>> Headers: [{'Host',"192.168.2.52:5984"}]
>>
>> [Fri, 27 Feb 2009 14:02:48 GMT] [error] [<0.5901.3>] Uncaught error in
>> HTTP request: {exit,normal}
>>
>> [Fri, 27 Feb 2009 14:02:48 GMT] [debug] [<0.5901.3>] Stacktrace:
>> [{mochiweb_request,send,2},
>>            {couch_httpd,send_chunk,2},
>>            {couch_httpd_db,db_doc_req,3},
>>            {couch_httpd_db,do_db_req,2},
>>            {couch_httpd,handle_request,3},
>>            {mochiweb_http,headers,5},
>>            {proc_lib,init_p,5}]
>>
>> [Fri, 27 Feb 2009 14:02:48 GMT] [debug] [<0.5901.3>] HTTPd 500 error
>> response:
>> {"error":"error","reason":"normal"}
>>
>> As the replication continues, the frequency of these error "Uncaught
>> error in HTTP request: {exit,normal}"  increase.  Until the error is
>> being constantly repeated.  Then Machine B stops sending requests, no
>> mor log output, no errors, the last thing in Machine B's log file is:
>> [Fri, 27 Feb 2009 14:03:24 GMT] [info] [<0.20893.1>] retrying
>> couch_rep HTTP get request due to {error, req_timedout}: [104,116,
>>
>>  116,112,58,
>>                                                                  47,47,49,
>>                                                                  57,50,46,
>>                                                                  49,54,56,
>>                                                                  46,50,46,
>>                                                                  53,50,58,
>>                                                                  53,57,56,
>>
>>  52,47,100,
>>
>>  101,108,97,
>>
>>  115,99,111,
>>
>>  45,105,110,
>>                                                                  118,111,
>>
>>  105,99,101,
>>
>>  115,47,73,
>>                                                                  78,86,48,
>>                                                                  48,54,53,
>>                                                                  50,49,51,
>>
>>  56,63,114,
>>                                                                  101,118,
>>
>>  115,61,116,
>>                                                                  114,117,
>>
>>  101,38,97,
>>
>>  116,116,97,
>>
>>  99,104,109,
>>                                                                  101,110,
>>
>>  116,115,61,
>>                                                                  116,114,
>>
>>  117,101,38,
>>
>>  108,97,116,
>>                                                                  101,115,
>>
>>  116,61,116,
>>                                                                  114,117,
>>
>>  101,38,111,
>>                                                                  112,101,
>>
>>  110,95,114,
>>                                                                  101,118,
>>
>>  115,61,91,
>>                                                                  34,
>>
>> <<"3070455362">>,
>>                                                                  34,93]
>>
>> A request for status from the couchdb init.d script returns nothing
>> and checking the processes returns:
>> (demo-couchdb)jlh@mars:~/projects/venvs/demo-couchdb/src$ ps ax|grep cou
>> 29281 pts/2    S+     0:00 grep cou
>> (demo-couchdb)jlh@mars:~/projects/venvs/demo-couchdb/src$ ps ax|grep beam
>> 29305 pts/2    R+     0:00 grep beam
>>
>> In fact, couch has gone away completely on Machine B.  In fact,
>> couch's death is so quick it can't even say why.
>>
>> Attempts to incrementally replicate after the first failure die at
>> exactly the same place.
>>
>> I can replicate this same database on the same machine from one
>> database to another without issue.  I can dump and reload the database
>> with no problems.
>>
>> I have reported this earlier and no one seemed to have an answer.  Is
>> there a specific issue in JIRA that addresses this problem?  If not,
>> is what I have here enough to start one and should I?
>>
>> Regards,
>>
>> Jeff Hinrichs
>

Re: Replication is Failing is this a known problem?

Posted by Jeff Hinrichs - DM&T <du...@gmail.com>.
On Fri, Feb 27, 2009 at 8:57 AM, Adam Kocoloski
<ad...@gmail.com> wrote:
> Hi Jeff, I can pick this one up, but not before Monday. We do have some
> replicating-attachment JIRA tickets open and active, but it looks like
> there's some new stuff in this report too.  Feel free to file another one.
>  Best,
>
> Adam
I'll review the current JIRA tickets to avoid a dupe if found, I'll
also work on building a reproducible test case for you.  Hope that
python script is ok with you.

Regards,

Jeff
>
> Sent from my iPhone
>
> On Feb 27, 2009, at 9:13 AM, "Jeff Hinrichs - DM&T" <je...@dundeemt.com>
> wrote:
>
>> Attempting to replicate a database with largish attachments (<= ~18MB
>> of attachments in a doc, less thatn 200 docs)  from one machine to
>> another fails consistently and at the same point.
>>
>> Scenario:
>> Both servers are running from HEAD and I've been tracking for some
>> time.  This problem has been around as long as I've been using couch.
>>
>> Machine A holds the original database, Machine B is the server that is
>> doing a PULL replication
>>
>> During the replication, Machine A starts showing the following
>> sporadically in the log:
>> [Fri, 27 Feb 2009 14:02:48 GMT] [debug] [<0.5902.3>] 'GET'
>>
>> /delasco-invoices/INV00652429?revs=true&attachments=true&latest=true&open_revs=["425644723"]
>> {1,
>>
>>                            1}
>> Headers: [{'Host',"192.168.2.52:5984"}]
>>
>> [Fri, 27 Feb 2009 14:02:48 GMT] [error] [<0.5901.3>] Uncaught error in
>> HTTP request: {exit,normal}
>>
>> [Fri, 27 Feb 2009 14:02:48 GMT] [debug] [<0.5901.3>] Stacktrace:
>> [{mochiweb_request,send,2},
>>            {couch_httpd,send_chunk,2},
>>            {couch_httpd_db,db_doc_req,3},
>>            {couch_httpd_db,do_db_req,2},
>>            {couch_httpd,handle_request,3},
>>            {mochiweb_http,headers,5},
>>            {proc_lib,init_p,5}]
>>
>> [Fri, 27 Feb 2009 14:02:48 GMT] [debug] [<0.5901.3>] HTTPd 500 error
>> response:
>> {"error":"error","reason":"normal"}
>>
>> As the replication continues, the frequency of these error "Uncaught
>> error in HTTP request: {exit,normal}"  increase.  Until the error is
>> being constantly repeated.  Then Machine B stops sending requests, no
>> mor log output, no errors, the last thing in Machine B's log file is:
>> [Fri, 27 Feb 2009 14:03:24 GMT] [info] [<0.20893.1>] retrying
>> couch_rep HTTP get request due to {error, req_timedout}: [104,116,
>>
>>  116,112,58,
>>                                                                  47,47,49,
>>                                                                  57,50,46,
>>                                                                  49,54,56,
>>                                                                  46,50,46,
>>                                                                  53,50,58,
>>                                                                  53,57,56,
>>
>>  52,47,100,
>>
>>  101,108,97,
>>
>>  115,99,111,
>>
>>  45,105,110,
>>                                                                  118,111,
>>
>>  105,99,101,
>>
>>  115,47,73,
>>                                                                  78,86,48,
>>                                                                  48,54,53,
>>                                                                  50,49,51,
>>
>>  56,63,114,
>>                                                                  101,118,
>>
>>  115,61,116,
>>                                                                  114,117,
>>
>>  101,38,97,
>>
>>  116,116,97,
>>
>>  99,104,109,
>>                                                                  101,110,
>>
>>  116,115,61,
>>                                                                  116,114,
>>
>>  117,101,38,
>>
>>  108,97,116,
>>                                                                  101,115,
>>
>>  116,61,116,
>>                                                                  114,117,
>>
>>  101,38,111,
>>                                                                  112,101,
>>
>>  110,95,114,
>>                                                                  101,118,
>>
>>  115,61,91,
>>                                                                  34,
>>
>> <<"3070455362">>,
>>                                                                  34,93]
>>
>> A request for status from the couchdb init.d script returns nothing
>> and checking the processes returns:
>> (demo-couchdb)jlh@mars:~/projects/venvs/demo-couchdb/src$ ps ax|grep cou
>> 29281 pts/2    S+     0:00 grep cou
>> (demo-couchdb)jlh@mars:~/projects/venvs/demo-couchdb/src$ ps ax|grep beam
>> 29305 pts/2    R+     0:00 grep beam
>>
>> In fact, couch has gone away completely on Machine B.  In fact,
>> couch's death is so quick it can't even say why.
>>
>> Attempts to incrementally replicate after the first failure die at
>> exactly the same place.
>>
>> I can replicate this same database on the same machine from one
>> database to another without issue.  I can dump and reload the database
>> with no problems.
>>
>> I have reported this earlier and no one seemed to have an answer.  Is
>> there a specific issue in JIRA that addresses this problem?  If not,
>> is what I have here enough to start one and should I?
>>
>> Regards,
>>
>> Jeff Hinrichs
>

Re: Replication is Failing is this a known problem?

Posted by Adam Kocoloski <ad...@gmail.com>.
Hi Jeff, I can pick this one up, but not before Monday. We do have  
some replicating-attachment JIRA tickets open and active, but it looks  
like there's some new stuff in this report too.  Feel free to file  
another one.  Best,

Adam

Sent from my iPhone

On Feb 27, 2009, at 9:13 AM, "Jeff Hinrichs - DM&T"  
<je...@dundeemt.com> wrote:

> Attempting to replicate a database with largish attachments (<= ~18MB
> of attachments in a doc, less thatn 200 docs)  from one machine to
> another fails consistently and at the same point.
>
> Scenario:
> Both servers are running from HEAD and I've been tracking for some
> time.  This problem has been around as long as I've been using couch.
>
> Machine A holds the original database, Machine B is the server that is
> doing a PULL replication
>
> During the replication, Machine A starts showing the following
> sporadically in the log:
> [Fri, 27 Feb 2009 14:02:48 GMT] [debug] [<0.5902.3>] 'GET'
> /delasco-invoices/INV00652429? 
> revs=true&attachments=true&latest=true&open_revs=["425644723"]
> {1,
>
>                             1}
> Headers: [{'Host',"192.168.2.52:5984"}]
>
> [Fri, 27 Feb 2009 14:02:48 GMT] [error] [<0.5901.3>] Uncaught error in
> HTTP request: {exit,normal}
>
> [Fri, 27 Feb 2009 14:02:48 GMT] [debug] [<0.5901.3>] Stacktrace:
> [{mochiweb_request,send,2},
>             {couch_httpd,send_chunk,2},
>             {couch_httpd_db,db_doc_req,3},
>             {couch_httpd_db,do_db_req,2},
>             {couch_httpd,handle_request,3},
>             {mochiweb_http,headers,5},
>             {proc_lib,init_p,5}]
>
> [Fri, 27 Feb 2009 14:02:48 GMT] [debug] [<0.5901.3>] HTTPd 500 error  
> response:
> {"error":"error","reason":"normal"}
>
> As the replication continues, the frequency of these error "Uncaught
> error in HTTP request: {exit,normal}"  increase.  Until the error is
> being constantly repeated.  Then Machine B stops sending requests, no
> mor log output, no errors, the last thing in Machine B's log file is:
> [Fri, 27 Feb 2009 14:03:24 GMT] [info] [<0.20893.1>] retrying
> couch_rep HTTP get request due to {error, req_timedout}: [104,116,
>                                                                    
> 116,112,58,
>                                                                    
> 47,47,49,
>                                                                    
> 57,50,46,
>                                                                    
> 49,54,56,
>                                                                    
> 46,50,46,
>                                                                    
> 53,50,58,
>                                                                    
> 53,57,56,
>                                                                    
> 52,47,100,
>                                                                    
> 101,108,97,
>                                                                    
> 115,99,111,
>                                                                    
> 45,105,110,
>                                                                    
> 118,111,
>                                                                    
> 105,99,101,
>                                                                    
> 115,47,73,
>                                                                    
> 78,86,48,
>                                                                    
> 48,54,53,
>                                                                    
> 50,49,51,
>                                                                    
> 56,63,114,
>                                                                    
> 101,118,
>                                                                    
> 115,61,116,
>                                                                    
> 114,117,
>                                                                    
> 101,38,97,
>                                                                    
> 116,116,97,
>                                                                    
> 99,104,109,
>                                                                    
> 101,110,
>                                                                    
> 116,115,61,
>                                                                    
> 116,114,
>                                                                    
> 117,101,38,
>                                                                    
> 108,97,116,
>                                                                    
> 101,115,
>                                                                    
> 116,61,116,
>                                                                    
> 114,117,
>                                                                    
> 101,38,111,
>                                                                    
> 112,101,
>                                                                    
> 110,95,114,
>                                                                    
> 101,118,
>                                                                    
> 115,61,91,
>                                                                   34,
>
> <<"3070455362">>,
>                                                                    
> 34,93]
>
> A request for status from the couchdb init.d script returns nothing
> and checking the processes returns:
> (demo-couchdb)jlh@mars:~/projects/venvs/demo-couchdb/src$ ps ax|grep  
> cou
> 29281 pts/2    S+     0:00 grep cou
> (demo-couchdb)jlh@mars:~/projects/venvs/demo-couchdb/src$ ps ax|grep  
> beam
> 29305 pts/2    R+     0:00 grep beam
>
> In fact, couch has gone away completely on Machine B.  In fact,
> couch's death is so quick it can't even say why.
>
> Attempts to incrementally replicate after the first failure die at
> exactly the same place.
>
> I can replicate this same database on the same machine from one
> database to another without issue.  I can dump and reload the database
> with no problems.
>
> I have reported this earlier and no one seemed to have an answer.  Is
> there a specific issue in JIRA that addresses this problem?  If not,
> is what I have here enough to start one and should I?
>
> Regards,
>
> Jeff Hinrichs