You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by J Chris Anderson <jc...@apache.org> on 2010/06/10 02:33:52 UTC

replicator test hanging

Devs,

Is anyone else seeing the replicator test hang and never finish?

It never hangs the first few runs, but after running ten or so times, I'll end up with the test suite waiting for a replication that never finishes. It's the same story on 0.11.0, 0.11.x, and trunk.

Is anyone else able to reproduce this? Am I crazy?

I'd like to get to the bottom of this before we cut 1.0

Thanks!

Chris

Re: replicator test hanging

Posted by J Chris Anderson <jc...@gmail.com>.
On Jun 9, 2010, at 6:08 PM, Jan Lehnardt wrote:

> You're not crazy :)
> 
> I'm seeing this too, Mac OS X, Firefox. I haven't yet found a pattern on when it works vs. it doesn't. FWIW, the replication test alone usually doesn't hang, only during "run all".
> 

I've found once it gets in the mood to hang, I can hang it just fine by running the test alone. Firebug or not, doesn't seem to make a difference.

> Cheers
> Jan
> --
> 
> 
> On 10 Jun 2010, at 01:33, J Chris Anderson wrote:
> 
>> Devs,
>> 
>> Is anyone else seeing the replicator test hang and never finish?
>> 
>> It never hangs the first few runs, but after running ten or so times, I'll end up with the test suite waiting for a replication that never finishes. It's the same story on 0.11.0, 0.11.x, and trunk.
>> 
>> Is anyone else able to reproduce this? Am I crazy?
>> 
>> I'd like to get to the bottom of this before we cut 1.0
>> 
>> Thanks!
>> 
>> Chris
> 


Re: replicator test hanging

Posted by Jan Lehnardt <ja...@apache.org>.
You're not crazy :)

I'm seeing this too, Mac OS X, Firefox. I haven't yet found a pattern on when it works vs. it doesn't. FWIW, the replication test alone usually doesn't hang, only during "run all".

Cheers
Jan
--


On 10 Jun 2010, at 01:33, J Chris Anderson wrote:

> Devs,
> 
> Is anyone else seeing the replicator test hang and never finish?
> 
> It never hangs the first few runs, but after running ten or so times, I'll end up with the test suite waiting for a replication that never finishes. It's the same story on 0.11.0, 0.11.x, and trunk.
> 
> Is anyone else able to reproduce this? Am I crazy?
> 
> I'd like to get to the bottom of this before we cut 1.0
> 
> Thanks!
> 
> Chris


Re: replicator test hanging

Posted by J Chris Anderson <jc...@gmail.com>.
On Jun 9, 2010, at 6:26 PM, Paul Bonser wrote:

> Oh, I should also mention that I got the exact same error in multiple
> freezes. Twice it was in the same exact order, and once it was in this
> order:
> 

yep. It looks like you are seeing what I'm seeing.

Now that we know I'm not crazy ... what to do about it?

Chris

> -- 
> Paul Bonser
> http://probablyprogramming.com


Re: replicator test hanging

Posted by Robert Dionne <di...@dionne-associates.com>.
same here, I can reproduce it every time on OS X with chrome.

Oddly for me, it work when I do a run all




On Jun 10, 2010, at 5:41 AM, Filipe David Manana wrote:

> I have the problem in non-SSD machines, both Linux and OS X
> 
> On Thu, Jun 10, 2010 at 10:39 AM, Jan Lehnardt <ja...@apache.org> wrote:
> 
>> Hi Paul,
>> 
>> thanks for the report. Out of curiosity, are you running an SSD drive in
>> the box that reproduces the hangs?
>> 
>> And anyone: Can you reproduce this on non-SSD machines?
>> 
>> Cheers
>> Jan
>> --
>> 
>> On 10 Jun 2010, at 02:26, Paul Bonser wrote:
>> 
>>> Oh, I should also mention that I got the exact same error in multiple
>>> freezes. Twice it was in the same exact order, and once it was in this
>>> order:
>>> 
>>> [info] [<0.95.0>] starting replication "15c25eda4ea6308af6bea9864d5319ef"
>> at
>>> <0.1845.0>
>>> [debug] [<0.1207.0>] OAuth Params: [{"att_encoding_info","true"}]
>>> [info] [<0.1207.0>] 127.0.0.1 - - 'GET'
>>> /test_suite_rep_docs_db_a/foo2?att_encoding_info=true 200
>>> [debug] [<0.1207.0>] 'POST' /test_suite_rep_docs_db_b/_bulk_docs {1,1}
>>> Headers: [{'Accept',"application/json"},
>>>         {'Accept-Encoding',"gzip"},
>>>         {'Content-Length',"167"},
>>>         {'Host',"localhost:5985"},
>>>         {'User-Agent',"CouchDB/0.12.0a953193"},
>>>         {"X-Couch-Full-Commit","false"}]
>>> [debug] [<0.1207.0>] OAuth Params: []
>>> [info] [<0.1207.0>] 127.0.0.1 - - 'POST'
>>> /test_suite_rep_docs_db_b/_bulk_docs 201
>>> [debug] [<0.1076.0>] 'GET'
>>> /test_suite_rep_docs_db_a/foo666?att_encoding_info=true {1,1}
>>> Headers: [{'Accept',"application/json"},
>>>         {'Accept-Encoding',"gzip"},
>>>         {'Host',"localhost:5985"},
>>>         {'User-Agent',"CouchDB/0.12.0a953193"}]
>>> [debug] [<0.1076.0>] OAuth Params: [{"att_encoding_info","true"}]
>>> [debug] [<0.1076.0>] Minor error in HTTP request: {not_found,missing}
>>> [debug] [<0.1076.0>] Stacktrace: [{couch_httpd_db,couch_doc_open,4},
>>>            {couch_httpd_db,db_doc_req,3},
>>>            {couch_httpd_db,do_db_req,2},
>>>            {couch_httpd,handle_request_int,5},
>>>            {mochiweb_http,headers,5},
>>>            {proc_lib,init_p_do_apply,3}]
>>> [info] [<0.1076.0>] 127.0.0.1 - - 'GET'
>>> /test_suite_rep_docs_db_a/foo666?att_encoding_info=true 404
>>> [debug] [<0.1076.0>] httpd 404 error response:
>>> {"error":"not_found","reason":"missing"}
>>> 
>>> 
>>> Could it be some sort of race condition?
>>> 
>>> 
>>> 
>>> On Wed, Jun 9, 2010 at 8:22 PM, Paul Bonser <mi...@gmail.com> wrote:
>>> 
>>>> 
>>>> 
>>>> On Wed, Jun 9, 2010 at 7:33 PM, J Chris Anderson <jchris@apache.org
>>> wrote:
>>>> 
>>>>> Devs,
>>>>> 
>>>>> Is anyone else seeing the replicator test hang and never finish?
>>>>> 
>>>>> It never hangs the first few runs, but after running ten or so times,
>> I'll
>>>>> end up with the test suite waiting for a replication that never
>> finishes.
>>>>> It's the same story on 0.11.0, 0.11.x, and trunk.
>>>>> 
>>>>> Is anyone else able to reproduce this? Am I crazy?
>>>>> 
>>>> 
>>>> It just froze for me on the first try, using 0.12.0a935298, then ran
>>>> successfully 3 times, then froze again. The last thing logged the first
>> time
>>>> was a _bulk_docs requests, the last thing logged this time was a PUT to
>>>> /test_suite_db_b/_local%2F6598a76aa55cd39645e4730b4cb28c00
>>>> 
>>>> I'm running a Firefox 3.6 nightly build on Linux. For me, it froze the
>>>> first time when I did a "run all" and the second time when just directly
>>>> running the replication test.
>>>> 
>>>> After svn up-ing to the latest in trunk, it froze on the first try,
>>>> directly running the replication test.
>>>> 
>>>> Here's the debug output for the last _replicate request where it
>> freezes.
>>>> It's requesting a document that isn't there.
>>>> 
>>>> 
>>>> [info] [<0.95.0>] starting new replication
>>>> "15c25eda4ea6308af6bea9864d5319ef" at <0.848.0>
>>>> [debug] [<0.191.0>] 'GET'
>>>> /test_suite_rep_docs_db_a/foo2?att_encoding_info=true {1,1}
>>>> Headers: [{'Accept',"application/json"},
>>>>         {'Accept-Encoding',"gzip"},
>>>>         {'Host',"localhost:5985"},
>>>>         {'User-Agent',"CouchDB/0.12.0a953193"}]
>>>> [debug] [<0.191.0>] OAuth Params: [{"att_encoding_info","true"}]
>>>> [info] [<0.191.0>] 127.0.0.1 - - 'GET'
>>>> /test_suite_rep_docs_db_a/foo2?att_encoding_info=true 200
>>>> [debug] [<0.189.0>] 'GET'
>>>> /test_suite_rep_docs_db_a/foo666?att_encoding_info=true {1,1}
>>>> Headers: [{'Accept',"application/json"},
>>>>         {'Accept-Encoding',"gzip"},
>>>>         {'Host',"localhost:5985"},
>>>>         {'User-Agent',"CouchDB/0.12.0a953193"}]
>>>> [debug] [<0.189.0>] OAuth Params: [{"att_encoding_info","true"}]
>>>> [debug] [<0.189.0>] Minor error in HTTP request: {not_found,missing}
>>>> [debug] [<0.189.0>] Stacktrace: [{couch_httpd_db,couch_doc_open,4},
>>>>            {couch_httpd_db,db_doc_req,3},
>>>>            {couch_httpd_db,do_db_req,2},
>>>>            {couch_httpd,handle_request_int,5},
>>>>            {mochiweb_http,headers,5},
>>>>            {proc_lib,init_p_do_apply,3}]
>>>> [info] [<0.189.0>] 127.0.0.1 - - 'GET'
>>>> /test_suite_rep_docs_db_a/foo666?att_encoding_info=true 404
>>>> [debug] [<0.189.0>] httpd 404 error response:
>>>> {"error":"not_found","reason":"missing"}
>>>> 
>>>> [debug] [<0.191.0>] 'POST' /test_suite_rep_docs_db_b/_bulk_docs {1,1}
>>>> Headers: [{'Accept',"application/json"},
>>>>         {'Accept-Encoding',"gzip"},
>>>>         {'Content-Length',"167"},
>>>>         {'Host',"localhost:5985"},
>>>>         {'User-Agent',"CouchDB/0.12.0a953193"},
>>>>         {"X-Couch-Full-Commit","false"}]
>>>> [debug] [<0.191.0>] OAuth Params: []
>>>> [info] [<0.191.0>] 127.0.0.1 - - 'POST'
>>>> /test_suite_rep_docs_db_b/_bulk_docs 201
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Paul Bonser
>>>> http://probablyprogramming.com
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Paul Bonser
>>> http://probablyprogramming.com
>> 
>> 
> 
> 
> -- 
> Filipe David Manana,
> fdmanana@gmail.com
> 
> "Reasonable men adapt themselves to the world.
> Unreasonable men adapt the world to themselves.
> That's why all progress depends on unreasonable men."


Re: replicator test hanging

Posted by Filipe David Manana <fd...@gmail.com>.
I have the problem in non-SSD machines, both Linux and OS X

On Thu, Jun 10, 2010 at 10:39 AM, Jan Lehnardt <ja...@apache.org> wrote:

> Hi Paul,
>
> thanks for the report. Out of curiosity, are you running an SSD drive in
> the box that reproduces the hangs?
>
> And anyone: Can you reproduce this on non-SSD machines?
>
> Cheers
> Jan
> --
>
> On 10 Jun 2010, at 02:26, Paul Bonser wrote:
>
> > Oh, I should also mention that I got the exact same error in multiple
> > freezes. Twice it was in the same exact order, and once it was in this
> > order:
> >
> > [info] [<0.95.0>] starting replication "15c25eda4ea6308af6bea9864d5319ef"
> at
> > <0.1845.0>
> > [debug] [<0.1207.0>] OAuth Params: [{"att_encoding_info","true"}]
> > [info] [<0.1207.0>] 127.0.0.1 - - 'GET'
> > /test_suite_rep_docs_db_a/foo2?att_encoding_info=true 200
> > [debug] [<0.1207.0>] 'POST' /test_suite_rep_docs_db_b/_bulk_docs {1,1}
> > Headers: [{'Accept',"application/json"},
> >          {'Accept-Encoding',"gzip"},
> >          {'Content-Length',"167"},
> >          {'Host',"localhost:5985"},
> >          {'User-Agent',"CouchDB/0.12.0a953193"},
> >          {"X-Couch-Full-Commit","false"}]
> > [debug] [<0.1207.0>] OAuth Params: []
> > [info] [<0.1207.0>] 127.0.0.1 - - 'POST'
> > /test_suite_rep_docs_db_b/_bulk_docs 201
> > [debug] [<0.1076.0>] 'GET'
> > /test_suite_rep_docs_db_a/foo666?att_encoding_info=true {1,1}
> > Headers: [{'Accept',"application/json"},
> >          {'Accept-Encoding',"gzip"},
> >          {'Host',"localhost:5985"},
> >          {'User-Agent',"CouchDB/0.12.0a953193"}]
> > [debug] [<0.1076.0>] OAuth Params: [{"att_encoding_info","true"}]
> > [debug] [<0.1076.0>] Minor error in HTTP request: {not_found,missing}
> > [debug] [<0.1076.0>] Stacktrace: [{couch_httpd_db,couch_doc_open,4},
> >             {couch_httpd_db,db_doc_req,3},
> >             {couch_httpd_db,do_db_req,2},
> >             {couch_httpd,handle_request_int,5},
> >             {mochiweb_http,headers,5},
> >             {proc_lib,init_p_do_apply,3}]
> > [info] [<0.1076.0>] 127.0.0.1 - - 'GET'
> > /test_suite_rep_docs_db_a/foo666?att_encoding_info=true 404
> > [debug] [<0.1076.0>] httpd 404 error response:
> > {"error":"not_found","reason":"missing"}
> >
> >
> > Could it be some sort of race condition?
> >
> >
> >
> > On Wed, Jun 9, 2010 at 8:22 PM, Paul Bonser <mi...@gmail.com> wrote:
> >
> >>
> >>
> >> On Wed, Jun 9, 2010 at 7:33 PM, J Chris Anderson <jchris@apache.org
> >wrote:
> >>
> >>> Devs,
> >>>
> >>> Is anyone else seeing the replicator test hang and never finish?
> >>>
> >>> It never hangs the first few runs, but after running ten or so times,
> I'll
> >>> end up with the test suite waiting for a replication that never
> finishes.
> >>> It's the same story on 0.11.0, 0.11.x, and trunk.
> >>>
> >>> Is anyone else able to reproduce this? Am I crazy?
> >>>
> >>
> >> It just froze for me on the first try, using 0.12.0a935298, then ran
> >> successfully 3 times, then froze again. The last thing logged the first
> time
> >> was a _bulk_docs requests, the last thing logged this time was a PUT to
> >> /test_suite_db_b/_local%2F6598a76aa55cd39645e4730b4cb28c00
> >>
> >> I'm running a Firefox 3.6 nightly build on Linux. For me, it froze the
> >> first time when I did a "run all" and the second time when just directly
> >> running the replication test.
> >>
> >> After svn up-ing to the latest in trunk, it froze on the first try,
> >> directly running the replication test.
> >>
> >> Here's the debug output for the last _replicate request where it
> freezes.
> >> It's requesting a document that isn't there.
> >>
> >>
> >> [info] [<0.95.0>] starting new replication
> >> "15c25eda4ea6308af6bea9864d5319ef" at <0.848.0>
> >> [debug] [<0.191.0>] 'GET'
> >> /test_suite_rep_docs_db_a/foo2?att_encoding_info=true {1,1}
> >> Headers: [{'Accept',"application/json"},
> >>          {'Accept-Encoding',"gzip"},
> >>          {'Host',"localhost:5985"},
> >>          {'User-Agent',"CouchDB/0.12.0a953193"}]
> >> [debug] [<0.191.0>] OAuth Params: [{"att_encoding_info","true"}]
> >> [info] [<0.191.0>] 127.0.0.1 - - 'GET'
> >> /test_suite_rep_docs_db_a/foo2?att_encoding_info=true 200
> >> [debug] [<0.189.0>] 'GET'
> >> /test_suite_rep_docs_db_a/foo666?att_encoding_info=true {1,1}
> >> Headers: [{'Accept',"application/json"},
> >>          {'Accept-Encoding',"gzip"},
> >>          {'Host',"localhost:5985"},
> >>          {'User-Agent',"CouchDB/0.12.0a953193"}]
> >> [debug] [<0.189.0>] OAuth Params: [{"att_encoding_info","true"}]
> >> [debug] [<0.189.0>] Minor error in HTTP request: {not_found,missing}
> >> [debug] [<0.189.0>] Stacktrace: [{couch_httpd_db,couch_doc_open,4},
> >>             {couch_httpd_db,db_doc_req,3},
> >>             {couch_httpd_db,do_db_req,2},
> >>             {couch_httpd,handle_request_int,5},
> >>             {mochiweb_http,headers,5},
> >>             {proc_lib,init_p_do_apply,3}]
> >> [info] [<0.189.0>] 127.0.0.1 - - 'GET'
> >> /test_suite_rep_docs_db_a/foo666?att_encoding_info=true 404
> >> [debug] [<0.189.0>] httpd 404 error response:
> >> {"error":"not_found","reason":"missing"}
> >>
> >> [debug] [<0.191.0>] 'POST' /test_suite_rep_docs_db_b/_bulk_docs {1,1}
> >> Headers: [{'Accept',"application/json"},
> >>          {'Accept-Encoding',"gzip"},
> >>          {'Content-Length',"167"},
> >>          {'Host',"localhost:5985"},
> >>          {'User-Agent',"CouchDB/0.12.0a953193"},
> >>          {"X-Couch-Full-Commit","false"}]
> >> [debug] [<0.191.0>] OAuth Params: []
> >> [info] [<0.191.0>] 127.0.0.1 - - 'POST'
> >> /test_suite_rep_docs_db_b/_bulk_docs 201
> >>
> >>
> >>
> >>
> >> --
> >> Paul Bonser
> >> http://probablyprogramming.com
> >>
> >
> >
> >
> > --
> > Paul Bonser
> > http://probablyprogramming.com
>
>


-- 
Filipe David Manana,
fdmanana@gmail.com

"Reasonable men adapt themselves to the world.
Unreasonable men adapt the world to themselves.
That's why all progress depends on unreasonable men."

Re: replicator test hanging

Posted by Adam Kocoloski <ko...@apache.org>.
On Jun 10, 2010, at 1:39 PM, Robert Dionne wrote:

> On Jun 10, 2010, at 1:29 PM, J Chris Anderson wrote:
> 
>> 
>> On Jun 10, 2010, at 10:27 AM, Adam Kocoloski wrote:
>> 
>>> Thanks Paul!  Good sleuthing.  We'll get it fixed,
>>> 
>> 
>> I believe Filipe Manana has a fix for the replicator hang. He's told me he's having trouble with his emails getting rejects as spam by dev@.
> 
> must be all those patches he's been attaching :)

Thanks to Filipe, this should be fixed in 0.10.x, 0.11.x, and trunk.  Best,

Adam

Re: replicator test hanging

Posted by Robert Dionne <di...@dionne-associates.com>.




On Jun 10, 2010, at 1:29 PM, J Chris Anderson wrote:

> 
> On Jun 10, 2010, at 10:27 AM, Adam Kocoloski wrote:
> 
>> Thanks Paul!  Good sleuthing.  We'll get it fixed,
>> 
> 
> I believe Filipe Manana has a fix for the replicator hang. He's told me he's having trouble with his emails getting rejects as spam by dev@.

must be all those patches he's been attaching :)


> 
> Chris
> 
> 


Re: replicator test hanging

Posted by J Chris Anderson <jc...@apache.org>.
On Jun 10, 2010, at 10:27 AM, Adam Kocoloski wrote:

> Thanks Paul!  Good sleuthing.  We'll get it fixed,
> 

I believe Filipe Manana has a fix for the replicator hang. He's told me he's having trouble with his emails getting rejects as spam by dev@.

Chris



Re: replicator test hanging

Posted by Adam Kocoloski <ko...@apache.org>.
Thanks Paul!  Good sleuthing.  We'll get it fixed,

Adam

On Jun 10, 2010, at 11:43 AM, Paul Bonser wrote:

> Ok, so I've tracked it down to the specific location where it happens
> 
> - couch_rep_reader:spawn_document_request/2 is called
> - in the SpawnFun defined in there, it calls couch_rep_reader:open_doc
> - open_doc gets an error, not_found response (not sure why, shouldn't the
> doc be there already?)
> - open_doc returns [] back to the SpawnFun
> - SpawnFun calls gen_server:call(Server, {add_docs, nil, Results}... with
> Results being []
> - handle_call(add_docs) calls handle_add_docs, which increments the document
> count..by 0..
> - and then returns {noreply,...}
> - then everything just sits there, because each part is waiting for another
> part to do something
> 
> It seems the solution here is to either add a retry into
> spawn_document_request's SpawnFun, or at the very least, fail when open_doc
> returns [], rather than continuing on, since that results in a set of
> deadlocked processes.
> 
> On Thu, Jun 10, 2010 at 9:28 AM, Paul Bonser <mi...@gmail.com> wrote:
> 
>> Nope, just a regular 7200RPM SATA drive.
>> 
>> So you guys may already know tihs, but I've tracked it down to a couch_rep
>> gen_server never terminating, and thus not calling do_terminate, and thus
>> the call to gen_server:call(Server, get_result, infinity) in
>> couch_rep:get_result just hangs forever.
>> 
>> 
>> On Thu, Jun 10, 2010 at 4:39 AM, Jan Lehnardt <ja...@apache.org> wrote:
>> 
>>> Hi Paul,
>>> 
>>> thanks for the report. Out of curiosity, are you running an SSD drive in
>>> the box that reproduces the hangs?
>>> 
>>> And anyone: Can you reproduce this on non-SSD machines?
>>> 
>>> Cheers
>>> Jan
>>> --
>>> 
>>> On 10 Jun 2010, at 02:26, Paul Bonser wrote:
>>> 
>>>> Oh, I should also mention that I got the exact same error in multiple
>>>> freezes. Twice it was in the same exact order, and once it was in this
>>>> order:
>>>> 
>>>> [info] [<0.95.0>] starting replication
>>> "15c25eda4ea6308af6bea9864d5319ef" at
>>>> <0.1845.0>
>>>> [debug] [<0.1207.0>] OAuth Params: [{"att_encoding_info","true"}]
>>>> [info] [<0.1207.0>] 127.0.0.1 - - 'GET'
>>>> /test_suite_rep_docs_db_a/foo2?att_encoding_info=true 200
>>>> [debug] [<0.1207.0>] 'POST' /test_suite_rep_docs_db_b/_bulk_docs {1,1}
>>>> Headers: [{'Accept',"application/json"},
>>>>         {'Accept-Encoding',"gzip"},
>>>>         {'Content-Length',"167"},
>>>>         {'Host',"localhost:5985"},
>>>>         {'User-Agent',"CouchDB/0.12.0a953193"},
>>>>         {"X-Couch-Full-Commit","false"}]
>>>> [debug] [<0.1207.0>] OAuth Params: []
>>>> [info] [<0.1207.0>] 127.0.0.1 - - 'POST'
>>>> /test_suite_rep_docs_db_b/_bulk_docs 201
>>>> [debug] [<0.1076.0>] 'GET'
>>>> /test_suite_rep_docs_db_a/foo666?att_encoding_info=true {1,1}
>>>> Headers: [{'Accept',"application/json"},
>>>>         {'Accept-Encoding',"gzip"},
>>>>         {'Host',"localhost:5985"},
>>>>         {'User-Agent',"CouchDB/0.12.0a953193"}]
>>>> [debug] [<0.1076.0>] OAuth Params: [{"att_encoding_info","true"}]
>>>> [debug] [<0.1076.0>] Minor error in HTTP request: {not_found,missing}
>>>> [debug] [<0.1076.0>] Stacktrace: [{couch_httpd_db,couch_doc_open,4},
>>>>            {couch_httpd_db,db_doc_req,3},
>>>>            {couch_httpd_db,do_db_req,2},
>>>>            {couch_httpd,handle_request_int,5},
>>>>            {mochiweb_http,headers,5},
>>>>            {proc_lib,init_p_do_apply,3}]
>>>> [info] [<0.1076.0>] 127.0.0.1 - - 'GET'
>>>> /test_suite_rep_docs_db_a/foo666?att_encoding_info=true 404
>>>> [debug] [<0.1076.0>] httpd 404 error response:
>>>> {"error":"not_found","reason":"missing"}
>>>> 
>>>> 
>>>> Could it be some sort of race condition?
>>>> 
>>>> 
>>>> 
>>>> On Wed, Jun 9, 2010 at 8:22 PM, Paul Bonser <mi...@gmail.com>
>>> wrote:
>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, Jun 9, 2010 at 7:33 PM, J Chris Anderson <jchris@apache.org
>>>> wrote:
>>>>> 
>>>>>> Devs,
>>>>>> 
>>>>>> Is anyone else seeing the replicator test hang and never finish?
>>>>>> 
>>>>>> It never hangs the first few runs, but after running ten or so times,
>>> I'll
>>>>>> end up with the test suite waiting for a replication that never
>>> finishes.
>>>>>> It's the same story on 0.11.0, 0.11.x, and trunk.
>>>>>> 
>>>>>> Is anyone else able to reproduce this? Am I crazy?
>>>>>> 
>>>>> 
>>>>> It just froze for me on the first try, using 0.12.0a935298, then ran
>>>>> successfully 3 times, then froze again. The last thing logged the first
>>> time
>>>>> was a _bulk_docs requests, the last thing logged this time was a PUT to
>>>>> /test_suite_db_b/_local%2F6598a76aa55cd39645e4730b4cb28c00
>>>>> 
>>>>> I'm running a Firefox 3.6 nightly build on Linux. For me, it froze the
>>>>> first time when I did a "run all" and the second time when just
>>> directly
>>>>> running the replication test.
>>>>> 
>>>>> After svn up-ing to the latest in trunk, it froze on the first try,
>>>>> directly running the replication test.
>>>>> 
>>>>> Here's the debug output for the last _replicate request where it
>>> freezes.
>>>>> It's requesting a document that isn't there.
>>>>> 
>>>>> 
>>>>> [info] [<0.95.0>] starting new replication
>>>>> "15c25eda4ea6308af6bea9864d5319ef" at <0.848.0>
>>>>> [debug] [<0.191.0>] 'GET'
>>>>> /test_suite_rep_docs_db_a/foo2?att_encoding_info=true {1,1}
>>>>> Headers: [{'Accept',"application/json"},
>>>>>         {'Accept-Encoding',"gzip"},
>>>>>         {'Host',"localhost:5985"},
>>>>>         {'User-Agent',"CouchDB/0.12.0a953193"}]
>>>>> [debug] [<0.191.0>] OAuth Params: [{"att_encoding_info","true"}]
>>>>> [info] [<0.191.0>] 127.0.0.1 - - 'GET'
>>>>> /test_suite_rep_docs_db_a/foo2?att_encoding_info=true 200
>>>>> [debug] [<0.189.0>] 'GET'
>>>>> /test_suite_rep_docs_db_a/foo666?att_encoding_info=true {1,1}
>>>>> Headers: [{'Accept',"application/json"},
>>>>>         {'Accept-Encoding',"gzip"},
>>>>>         {'Host',"localhost:5985"},
>>>>>         {'User-Agent',"CouchDB/0.12.0a953193"}]
>>>>> [debug] [<0.189.0>] OAuth Params: [{"att_encoding_info","true"}]
>>>>> [debug] [<0.189.0>] Minor error in HTTP request: {not_found,missing}
>>>>> [debug] [<0.189.0>] Stacktrace: [{couch_httpd_db,couch_doc_open,4},
>>>>>            {couch_httpd_db,db_doc_req,3},
>>>>>            {couch_httpd_db,do_db_req,2},
>>>>>            {couch_httpd,handle_request_int,5},
>>>>>            {mochiweb_http,headers,5},
>>>>>            {proc_lib,init_p_do_apply,3}]
>>>>> [info] [<0.189.0>] 127.0.0.1 - - 'GET'
>>>>> /test_suite_rep_docs_db_a/foo666?att_encoding_info=true 404
>>>>> [debug] [<0.189.0>] httpd 404 error response:
>>>>> {"error":"not_found","reason":"missing"}
>>>>> 
>>>>> [debug] [<0.191.0>] 'POST' /test_suite_rep_docs_db_b/_bulk_docs {1,1}
>>>>> Headers: [{'Accept',"application/json"},
>>>>>         {'Accept-Encoding',"gzip"},
>>>>>         {'Content-Length',"167"},
>>>>>         {'Host',"localhost:5985"},
>>>>>         {'User-Agent',"CouchDB/0.12.0a953193"},
>>>>>         {"X-Couch-Full-Commit","false"}]
>>>>> [debug] [<0.191.0>] OAuth Params: []
>>>>> [info] [<0.191.0>] 127.0.0.1 - - 'POST'
>>>>> /test_suite_rep_docs_db_b/_bulk_docs 201
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Paul Bonser
>>>>> http://probablyprogramming.com
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Paul Bonser
>>>> http://probablyprogramming.com
>>> 
>>> 
>> 
>> 
>> --
>> Paul Bonser
>> http://probablyprogramming.com
>> 
> 
> 
> 
> -- 
> Paul Bonser
> http://probablyprogramming.com


Re: replicator test hanging

Posted by Paul Bonser <mi...@gmail.com>.
Ok, so I've tracked it down to the specific location where it happens

- couch_rep_reader:spawn_document_request/2 is called
- in the SpawnFun defined in there, it calls couch_rep_reader:open_doc
- open_doc gets an error, not_found response (not sure why, shouldn't the
doc be there already?)
- open_doc returns [] back to the SpawnFun
- SpawnFun calls gen_server:call(Server, {add_docs, nil, Results}... with
Results being []
- handle_call(add_docs) calls handle_add_docs, which increments the document
count..by 0..
- and then returns {noreply,...}
- then everything just sits there, because each part is waiting for another
part to do something

It seems the solution here is to either add a retry into
spawn_document_request's SpawnFun, or at the very least, fail when open_doc
returns [], rather than continuing on, since that results in a set of
deadlocked processes.

On Thu, Jun 10, 2010 at 9:28 AM, Paul Bonser <mi...@gmail.com> wrote:

> Nope, just a regular 7200RPM SATA drive.
>
> So you guys may already know tihs, but I've tracked it down to a couch_rep
> gen_server never terminating, and thus not calling do_terminate, and thus
> the call to gen_server:call(Server, get_result, infinity) in
> couch_rep:get_result just hangs forever.
>
>
> On Thu, Jun 10, 2010 at 4:39 AM, Jan Lehnardt <ja...@apache.org> wrote:
>
>> Hi Paul,
>>
>> thanks for the report. Out of curiosity, are you running an SSD drive in
>> the box that reproduces the hangs?
>>
>> And anyone: Can you reproduce this on non-SSD machines?
>>
>> Cheers
>> Jan
>> --
>>
>> On 10 Jun 2010, at 02:26, Paul Bonser wrote:
>>
>> > Oh, I should also mention that I got the exact same error in multiple
>> > freezes. Twice it was in the same exact order, and once it was in this
>> > order:
>> >
>> > [info] [<0.95.0>] starting replication
>> "15c25eda4ea6308af6bea9864d5319ef" at
>> > <0.1845.0>
>> > [debug] [<0.1207.0>] OAuth Params: [{"att_encoding_info","true"}]
>> > [info] [<0.1207.0>] 127.0.0.1 - - 'GET'
>> > /test_suite_rep_docs_db_a/foo2?att_encoding_info=true 200
>> > [debug] [<0.1207.0>] 'POST' /test_suite_rep_docs_db_b/_bulk_docs {1,1}
>> > Headers: [{'Accept',"application/json"},
>> >          {'Accept-Encoding',"gzip"},
>> >          {'Content-Length',"167"},
>> >          {'Host',"localhost:5985"},
>> >          {'User-Agent',"CouchDB/0.12.0a953193"},
>> >          {"X-Couch-Full-Commit","false"}]
>> > [debug] [<0.1207.0>] OAuth Params: []
>> > [info] [<0.1207.0>] 127.0.0.1 - - 'POST'
>> > /test_suite_rep_docs_db_b/_bulk_docs 201
>> > [debug] [<0.1076.0>] 'GET'
>> > /test_suite_rep_docs_db_a/foo666?att_encoding_info=true {1,1}
>> > Headers: [{'Accept',"application/json"},
>> >          {'Accept-Encoding',"gzip"},
>> >          {'Host',"localhost:5985"},
>> >          {'User-Agent',"CouchDB/0.12.0a953193"}]
>> > [debug] [<0.1076.0>] OAuth Params: [{"att_encoding_info","true"}]
>> > [debug] [<0.1076.0>] Minor error in HTTP request: {not_found,missing}
>> > [debug] [<0.1076.0>] Stacktrace: [{couch_httpd_db,couch_doc_open,4},
>> >             {couch_httpd_db,db_doc_req,3},
>> >             {couch_httpd_db,do_db_req,2},
>> >             {couch_httpd,handle_request_int,5},
>> >             {mochiweb_http,headers,5},
>> >             {proc_lib,init_p_do_apply,3}]
>> > [info] [<0.1076.0>] 127.0.0.1 - - 'GET'
>> > /test_suite_rep_docs_db_a/foo666?att_encoding_info=true 404
>> > [debug] [<0.1076.0>] httpd 404 error response:
>> > {"error":"not_found","reason":"missing"}
>> >
>> >
>> > Could it be some sort of race condition?
>> >
>> >
>> >
>> > On Wed, Jun 9, 2010 at 8:22 PM, Paul Bonser <mi...@gmail.com>
>> wrote:
>> >
>> >>
>> >>
>> >> On Wed, Jun 9, 2010 at 7:33 PM, J Chris Anderson <jchris@apache.org
>> >wrote:
>> >>
>> >>> Devs,
>> >>>
>> >>> Is anyone else seeing the replicator test hang and never finish?
>> >>>
>> >>> It never hangs the first few runs, but after running ten or so times,
>> I'll
>> >>> end up with the test suite waiting for a replication that never
>> finishes.
>> >>> It's the same story on 0.11.0, 0.11.x, and trunk.
>> >>>
>> >>> Is anyone else able to reproduce this? Am I crazy?
>> >>>
>> >>
>> >> It just froze for me on the first try, using 0.12.0a935298, then ran
>> >> successfully 3 times, then froze again. The last thing logged the first
>> time
>> >> was a _bulk_docs requests, the last thing logged this time was a PUT to
>> >> /test_suite_db_b/_local%2F6598a76aa55cd39645e4730b4cb28c00
>> >>
>> >> I'm running a Firefox 3.6 nightly build on Linux. For me, it froze the
>> >> first time when I did a "run all" and the second time when just
>> directly
>> >> running the replication test.
>> >>
>> >> After svn up-ing to the latest in trunk, it froze on the first try,
>> >> directly running the replication test.
>> >>
>> >> Here's the debug output for the last _replicate request where it
>> freezes.
>> >> It's requesting a document that isn't there.
>> >>
>> >>
>> >> [info] [<0.95.0>] starting new replication
>> >> "15c25eda4ea6308af6bea9864d5319ef" at <0.848.0>
>> >> [debug] [<0.191.0>] 'GET'
>> >> /test_suite_rep_docs_db_a/foo2?att_encoding_info=true {1,1}
>> >> Headers: [{'Accept',"application/json"},
>> >>          {'Accept-Encoding',"gzip"},
>> >>          {'Host',"localhost:5985"},
>> >>          {'User-Agent',"CouchDB/0.12.0a953193"}]
>> >> [debug] [<0.191.0>] OAuth Params: [{"att_encoding_info","true"}]
>> >> [info] [<0.191.0>] 127.0.0.1 - - 'GET'
>> >> /test_suite_rep_docs_db_a/foo2?att_encoding_info=true 200
>> >> [debug] [<0.189.0>] 'GET'
>> >> /test_suite_rep_docs_db_a/foo666?att_encoding_info=true {1,1}
>> >> Headers: [{'Accept',"application/json"},
>> >>          {'Accept-Encoding',"gzip"},
>> >>          {'Host',"localhost:5985"},
>> >>          {'User-Agent',"CouchDB/0.12.0a953193"}]
>> >> [debug] [<0.189.0>] OAuth Params: [{"att_encoding_info","true"}]
>> >> [debug] [<0.189.0>] Minor error in HTTP request: {not_found,missing}
>> >> [debug] [<0.189.0>] Stacktrace: [{couch_httpd_db,couch_doc_open,4},
>> >>             {couch_httpd_db,db_doc_req,3},
>> >>             {couch_httpd_db,do_db_req,2},
>> >>             {couch_httpd,handle_request_int,5},
>> >>             {mochiweb_http,headers,5},
>> >>             {proc_lib,init_p_do_apply,3}]
>> >> [info] [<0.189.0>] 127.0.0.1 - - 'GET'
>> >> /test_suite_rep_docs_db_a/foo666?att_encoding_info=true 404
>> >> [debug] [<0.189.0>] httpd 404 error response:
>> >> {"error":"not_found","reason":"missing"}
>> >>
>> >> [debug] [<0.191.0>] 'POST' /test_suite_rep_docs_db_b/_bulk_docs {1,1}
>> >> Headers: [{'Accept',"application/json"},
>> >>          {'Accept-Encoding',"gzip"},
>> >>          {'Content-Length',"167"},
>> >>          {'Host',"localhost:5985"},
>> >>          {'User-Agent',"CouchDB/0.12.0a953193"},
>> >>          {"X-Couch-Full-Commit","false"}]
>> >> [debug] [<0.191.0>] OAuth Params: []
>> >> [info] [<0.191.0>] 127.0.0.1 - - 'POST'
>> >> /test_suite_rep_docs_db_b/_bulk_docs 201
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Paul Bonser
>> >> http://probablyprogramming.com
>> >>
>> >
>> >
>> >
>> > --
>> > Paul Bonser
>> > http://probablyprogramming.com
>>
>>
>
>
> --
> Paul Bonser
> http://probablyprogramming.com
>



-- 
Paul Bonser
http://probablyprogramming.com

Re: replicator test hanging

Posted by Paul Bonser <mi...@gmail.com>.
Nope, just a regular 7200RPM SATA drive.

So you guys may already know tihs, but I've tracked it down to a couch_rep
gen_server never terminating, and thus not calling do_terminate, and thus
the call to gen_server:call(Server, get_result, infinity) in
couch_rep:get_result just hangs forever.

On Thu, Jun 10, 2010 at 4:39 AM, Jan Lehnardt <ja...@apache.org> wrote:

> Hi Paul,
>
> thanks for the report. Out of curiosity, are you running an SSD drive in
> the box that reproduces the hangs?
>
> And anyone: Can you reproduce this on non-SSD machines?
>
> Cheers
> Jan
> --
>
> On 10 Jun 2010, at 02:26, Paul Bonser wrote:
>
> > Oh, I should also mention that I got the exact same error in multiple
> > freezes. Twice it was in the same exact order, and once it was in this
> > order:
> >
> > [info] [<0.95.0>] starting replication "15c25eda4ea6308af6bea9864d5319ef"
> at
> > <0.1845.0>
> > [debug] [<0.1207.0>] OAuth Params: [{"att_encoding_info","true"}]
> > [info] [<0.1207.0>] 127.0.0.1 - - 'GET'
> > /test_suite_rep_docs_db_a/foo2?att_encoding_info=true 200
> > [debug] [<0.1207.0>] 'POST' /test_suite_rep_docs_db_b/_bulk_docs {1,1}
> > Headers: [{'Accept',"application/json"},
> >          {'Accept-Encoding',"gzip"},
> >          {'Content-Length',"167"},
> >          {'Host',"localhost:5985"},
> >          {'User-Agent',"CouchDB/0.12.0a953193"},
> >          {"X-Couch-Full-Commit","false"}]
> > [debug] [<0.1207.0>] OAuth Params: []
> > [info] [<0.1207.0>] 127.0.0.1 - - 'POST'
> > /test_suite_rep_docs_db_b/_bulk_docs 201
> > [debug] [<0.1076.0>] 'GET'
> > /test_suite_rep_docs_db_a/foo666?att_encoding_info=true {1,1}
> > Headers: [{'Accept',"application/json"},
> >          {'Accept-Encoding',"gzip"},
> >          {'Host',"localhost:5985"},
> >          {'User-Agent',"CouchDB/0.12.0a953193"}]
> > [debug] [<0.1076.0>] OAuth Params: [{"att_encoding_info","true"}]
> > [debug] [<0.1076.0>] Minor error in HTTP request: {not_found,missing}
> > [debug] [<0.1076.0>] Stacktrace: [{couch_httpd_db,couch_doc_open,4},
> >             {couch_httpd_db,db_doc_req,3},
> >             {couch_httpd_db,do_db_req,2},
> >             {couch_httpd,handle_request_int,5},
> >             {mochiweb_http,headers,5},
> >             {proc_lib,init_p_do_apply,3}]
> > [info] [<0.1076.0>] 127.0.0.1 - - 'GET'
> > /test_suite_rep_docs_db_a/foo666?att_encoding_info=true 404
> > [debug] [<0.1076.0>] httpd 404 error response:
> > {"error":"not_found","reason":"missing"}
> >
> >
> > Could it be some sort of race condition?
> >
> >
> >
> > On Wed, Jun 9, 2010 at 8:22 PM, Paul Bonser <mi...@gmail.com> wrote:
> >
> >>
> >>
> >> On Wed, Jun 9, 2010 at 7:33 PM, J Chris Anderson <jchris@apache.org
> >wrote:
> >>
> >>> Devs,
> >>>
> >>> Is anyone else seeing the replicator test hang and never finish?
> >>>
> >>> It never hangs the first few runs, but after running ten or so times,
> I'll
> >>> end up with the test suite waiting for a replication that never
> finishes.
> >>> It's the same story on 0.11.0, 0.11.x, and trunk.
> >>>
> >>> Is anyone else able to reproduce this? Am I crazy?
> >>>
> >>
> >> It just froze for me on the first try, using 0.12.0a935298, then ran
> >> successfully 3 times, then froze again. The last thing logged the first
> time
> >> was a _bulk_docs requests, the last thing logged this time was a PUT to
> >> /test_suite_db_b/_local%2F6598a76aa55cd39645e4730b4cb28c00
> >>
> >> I'm running a Firefox 3.6 nightly build on Linux. For me, it froze the
> >> first time when I did a "run all" and the second time when just directly
> >> running the replication test.
> >>
> >> After svn up-ing to the latest in trunk, it froze on the first try,
> >> directly running the replication test.
> >>
> >> Here's the debug output for the last _replicate request where it
> freezes.
> >> It's requesting a document that isn't there.
> >>
> >>
> >> [info] [<0.95.0>] starting new replication
> >> "15c25eda4ea6308af6bea9864d5319ef" at <0.848.0>
> >> [debug] [<0.191.0>] 'GET'
> >> /test_suite_rep_docs_db_a/foo2?att_encoding_info=true {1,1}
> >> Headers: [{'Accept',"application/json"},
> >>          {'Accept-Encoding',"gzip"},
> >>          {'Host',"localhost:5985"},
> >>          {'User-Agent',"CouchDB/0.12.0a953193"}]
> >> [debug] [<0.191.0>] OAuth Params: [{"att_encoding_info","true"}]
> >> [info] [<0.191.0>] 127.0.0.1 - - 'GET'
> >> /test_suite_rep_docs_db_a/foo2?att_encoding_info=true 200
> >> [debug] [<0.189.0>] 'GET'
> >> /test_suite_rep_docs_db_a/foo666?att_encoding_info=true {1,1}
> >> Headers: [{'Accept',"application/json"},
> >>          {'Accept-Encoding',"gzip"},
> >>          {'Host',"localhost:5985"},
> >>          {'User-Agent',"CouchDB/0.12.0a953193"}]
> >> [debug] [<0.189.0>] OAuth Params: [{"att_encoding_info","true"}]
> >> [debug] [<0.189.0>] Minor error in HTTP request: {not_found,missing}
> >> [debug] [<0.189.0>] Stacktrace: [{couch_httpd_db,couch_doc_open,4},
> >>             {couch_httpd_db,db_doc_req,3},
> >>             {couch_httpd_db,do_db_req,2},
> >>             {couch_httpd,handle_request_int,5},
> >>             {mochiweb_http,headers,5},
> >>             {proc_lib,init_p_do_apply,3}]
> >> [info] [<0.189.0>] 127.0.0.1 - - 'GET'
> >> /test_suite_rep_docs_db_a/foo666?att_encoding_info=true 404
> >> [debug] [<0.189.0>] httpd 404 error response:
> >> {"error":"not_found","reason":"missing"}
> >>
> >> [debug] [<0.191.0>] 'POST' /test_suite_rep_docs_db_b/_bulk_docs {1,1}
> >> Headers: [{'Accept',"application/json"},
> >>          {'Accept-Encoding',"gzip"},
> >>          {'Content-Length',"167"},
> >>          {'Host',"localhost:5985"},
> >>          {'User-Agent',"CouchDB/0.12.0a953193"},
> >>          {"X-Couch-Full-Commit","false"}]
> >> [debug] [<0.191.0>] OAuth Params: []
> >> [info] [<0.191.0>] 127.0.0.1 - - 'POST'
> >> /test_suite_rep_docs_db_b/_bulk_docs 201
> >>
> >>
> >>
> >>
> >> --
> >> Paul Bonser
> >> http://probablyprogramming.com
> >>
> >
> >
> >
> > --
> > Paul Bonser
> > http://probablyprogramming.com
>
>


-- 
Paul Bonser
http://probablyprogramming.com

Re: replicator test hanging

Posted by Jan Lehnardt <ja...@apache.org>.
Hi Paul,

thanks for the report. Out of curiosity, are you running an SSD drive in
the box that reproduces the hangs?

And anyone: Can you reproduce this on non-SSD machines?

Cheers
Jan
-- 

On 10 Jun 2010, at 02:26, Paul Bonser wrote:

> Oh, I should also mention that I got the exact same error in multiple
> freezes. Twice it was in the same exact order, and once it was in this
> order:
> 
> [info] [<0.95.0>] starting replication "15c25eda4ea6308af6bea9864d5319ef" at
> <0.1845.0>
> [debug] [<0.1207.0>] OAuth Params: [{"att_encoding_info","true"}]
> [info] [<0.1207.0>] 127.0.0.1 - - 'GET'
> /test_suite_rep_docs_db_a/foo2?att_encoding_info=true 200
> [debug] [<0.1207.0>] 'POST' /test_suite_rep_docs_db_b/_bulk_docs {1,1}
> Headers: [{'Accept',"application/json"},
>          {'Accept-Encoding',"gzip"},
>          {'Content-Length',"167"},
>          {'Host',"localhost:5985"},
>          {'User-Agent',"CouchDB/0.12.0a953193"},
>          {"X-Couch-Full-Commit","false"}]
> [debug] [<0.1207.0>] OAuth Params: []
> [info] [<0.1207.0>] 127.0.0.1 - - 'POST'
> /test_suite_rep_docs_db_b/_bulk_docs 201
> [debug] [<0.1076.0>] 'GET'
> /test_suite_rep_docs_db_a/foo666?att_encoding_info=true {1,1}
> Headers: [{'Accept',"application/json"},
>          {'Accept-Encoding',"gzip"},
>          {'Host',"localhost:5985"},
>          {'User-Agent',"CouchDB/0.12.0a953193"}]
> [debug] [<0.1076.0>] OAuth Params: [{"att_encoding_info","true"}]
> [debug] [<0.1076.0>] Minor error in HTTP request: {not_found,missing}
> [debug] [<0.1076.0>] Stacktrace: [{couch_httpd_db,couch_doc_open,4},
>             {couch_httpd_db,db_doc_req,3},
>             {couch_httpd_db,do_db_req,2},
>             {couch_httpd,handle_request_int,5},
>             {mochiweb_http,headers,5},
>             {proc_lib,init_p_do_apply,3}]
> [info] [<0.1076.0>] 127.0.0.1 - - 'GET'
> /test_suite_rep_docs_db_a/foo666?att_encoding_info=true 404
> [debug] [<0.1076.0>] httpd 404 error response:
> {"error":"not_found","reason":"missing"}
> 
> 
> Could it be some sort of race condition?
> 
> 
> 
> On Wed, Jun 9, 2010 at 8:22 PM, Paul Bonser <mi...@gmail.com> wrote:
> 
>> 
>> 
>> On Wed, Jun 9, 2010 at 7:33 PM, J Chris Anderson <jc...@apache.org>wrote:
>> 
>>> Devs,
>>> 
>>> Is anyone else seeing the replicator test hang and never finish?
>>> 
>>> It never hangs the first few runs, but after running ten or so times, I'll
>>> end up with the test suite waiting for a replication that never finishes.
>>> It's the same story on 0.11.0, 0.11.x, and trunk.
>>> 
>>> Is anyone else able to reproduce this? Am I crazy?
>>> 
>> 
>> It just froze for me on the first try, using 0.12.0a935298, then ran
>> successfully 3 times, then froze again. The last thing logged the first time
>> was a _bulk_docs requests, the last thing logged this time was a PUT to
>> /test_suite_db_b/_local%2F6598a76aa55cd39645e4730b4cb28c00
>> 
>> I'm running a Firefox 3.6 nightly build on Linux. For me, it froze the
>> first time when I did a "run all" and the second time when just directly
>> running the replication test.
>> 
>> After svn up-ing to the latest in trunk, it froze on the first try,
>> directly running the replication test.
>> 
>> Here's the debug output for the last _replicate request where it freezes.
>> It's requesting a document that isn't there.
>> 
>> 
>> [info] [<0.95.0>] starting new replication
>> "15c25eda4ea6308af6bea9864d5319ef" at <0.848.0>
>> [debug] [<0.191.0>] 'GET'
>> /test_suite_rep_docs_db_a/foo2?att_encoding_info=true {1,1}
>> Headers: [{'Accept',"application/json"},
>>          {'Accept-Encoding',"gzip"},
>>          {'Host',"localhost:5985"},
>>          {'User-Agent',"CouchDB/0.12.0a953193"}]
>> [debug] [<0.191.0>] OAuth Params: [{"att_encoding_info","true"}]
>> [info] [<0.191.0>] 127.0.0.1 - - 'GET'
>> /test_suite_rep_docs_db_a/foo2?att_encoding_info=true 200
>> [debug] [<0.189.0>] 'GET'
>> /test_suite_rep_docs_db_a/foo666?att_encoding_info=true {1,1}
>> Headers: [{'Accept',"application/json"},
>>          {'Accept-Encoding',"gzip"},
>>          {'Host',"localhost:5985"},
>>          {'User-Agent',"CouchDB/0.12.0a953193"}]
>> [debug] [<0.189.0>] OAuth Params: [{"att_encoding_info","true"}]
>> [debug] [<0.189.0>] Minor error in HTTP request: {not_found,missing}
>> [debug] [<0.189.0>] Stacktrace: [{couch_httpd_db,couch_doc_open,4},
>>             {couch_httpd_db,db_doc_req,3},
>>             {couch_httpd_db,do_db_req,2},
>>             {couch_httpd,handle_request_int,5},
>>             {mochiweb_http,headers,5},
>>             {proc_lib,init_p_do_apply,3}]
>> [info] [<0.189.0>] 127.0.0.1 - - 'GET'
>> /test_suite_rep_docs_db_a/foo666?att_encoding_info=true 404
>> [debug] [<0.189.0>] httpd 404 error response:
>> {"error":"not_found","reason":"missing"}
>> 
>> [debug] [<0.191.0>] 'POST' /test_suite_rep_docs_db_b/_bulk_docs {1,1}
>> Headers: [{'Accept',"application/json"},
>>          {'Accept-Encoding',"gzip"},
>>          {'Content-Length',"167"},
>>          {'Host',"localhost:5985"},
>>          {'User-Agent',"CouchDB/0.12.0a953193"},
>>          {"X-Couch-Full-Commit","false"}]
>> [debug] [<0.191.0>] OAuth Params: []
>> [info] [<0.191.0>] 127.0.0.1 - - 'POST'
>> /test_suite_rep_docs_db_b/_bulk_docs 201
>> 
>> 
>> 
>> 
>> --
>> Paul Bonser
>> http://probablyprogramming.com
>> 
> 
> 
> 
> -- 
> Paul Bonser
> http://probablyprogramming.com


Re: replicator test hanging

Posted by Paul Bonser <mi...@gmail.com>.
Oh, I should also mention that I got the exact same error in multiple
freezes. Twice it was in the same exact order, and once it was in this
order:

[info] [<0.95.0>] starting replication "15c25eda4ea6308af6bea9864d5319ef" at
<0.1845.0>
[debug] [<0.1207.0>] OAuth Params: [{"att_encoding_info","true"}]
[info] [<0.1207.0>] 127.0.0.1 - - 'GET'
/test_suite_rep_docs_db_a/foo2?att_encoding_info=true 200
[debug] [<0.1207.0>] 'POST' /test_suite_rep_docs_db_b/_bulk_docs {1,1}
Headers: [{'Accept',"application/json"},
          {'Accept-Encoding',"gzip"},
          {'Content-Length',"167"},
          {'Host',"localhost:5985"},
          {'User-Agent',"CouchDB/0.12.0a953193"},
          {"X-Couch-Full-Commit","false"}]
[debug] [<0.1207.0>] OAuth Params: []
[info] [<0.1207.0>] 127.0.0.1 - - 'POST'
/test_suite_rep_docs_db_b/_bulk_docs 201
[debug] [<0.1076.0>] 'GET'
/test_suite_rep_docs_db_a/foo666?att_encoding_info=true {1,1}
Headers: [{'Accept',"application/json"},
          {'Accept-Encoding',"gzip"},
          {'Host',"localhost:5985"},
          {'User-Agent',"CouchDB/0.12.0a953193"}]
[debug] [<0.1076.0>] OAuth Params: [{"att_encoding_info","true"}]
[debug] [<0.1076.0>] Minor error in HTTP request: {not_found,missing}
[debug] [<0.1076.0>] Stacktrace: [{couch_httpd_db,couch_doc_open,4},
             {couch_httpd_db,db_doc_req,3},
             {couch_httpd_db,do_db_req,2},
             {couch_httpd,handle_request_int,5},
             {mochiweb_http,headers,5},
             {proc_lib,init_p_do_apply,3}]
[info] [<0.1076.0>] 127.0.0.1 - - 'GET'
/test_suite_rep_docs_db_a/foo666?att_encoding_info=true 404
[debug] [<0.1076.0>] httpd 404 error response:
 {"error":"not_found","reason":"missing"}


Could it be some sort of race condition?



On Wed, Jun 9, 2010 at 8:22 PM, Paul Bonser <mi...@gmail.com> wrote:

>
>
> On Wed, Jun 9, 2010 at 7:33 PM, J Chris Anderson <jc...@apache.org>wrote:
>
>> Devs,
>>
>> Is anyone else seeing the replicator test hang and never finish?
>>
>> It never hangs the first few runs, but after running ten or so times, I'll
>> end up with the test suite waiting for a replication that never finishes.
>> It's the same story on 0.11.0, 0.11.x, and trunk.
>>
>> Is anyone else able to reproduce this? Am I crazy?
>>
>
> It just froze for me on the first try, using 0.12.0a935298, then ran
> successfully 3 times, then froze again. The last thing logged the first time
> was a _bulk_docs requests, the last thing logged this time was a PUT to
> /test_suite_db_b/_local%2F6598a76aa55cd39645e4730b4cb28c00
>
> I'm running a Firefox 3.6 nightly build on Linux. For me, it froze the
> first time when I did a "run all" and the second time when just directly
> running the replication test.
>
> After svn up-ing to the latest in trunk, it froze on the first try,
> directly running the replication test.
>
> Here's the debug output for the last _replicate request where it freezes.
> It's requesting a document that isn't there.
>
>
>  [info] [<0.95.0>] starting new replication
> "15c25eda4ea6308af6bea9864d5319ef" at <0.848.0>
> [debug] [<0.191.0>] 'GET'
> /test_suite_rep_docs_db_a/foo2?att_encoding_info=true {1,1}
> Headers: [{'Accept',"application/json"},
>           {'Accept-Encoding',"gzip"},
>           {'Host',"localhost:5985"},
>           {'User-Agent',"CouchDB/0.12.0a953193"}]
> [debug] [<0.191.0>] OAuth Params: [{"att_encoding_info","true"}]
> [info] [<0.191.0>] 127.0.0.1 - - 'GET'
> /test_suite_rep_docs_db_a/foo2?att_encoding_info=true 200
> [debug] [<0.189.0>] 'GET'
> /test_suite_rep_docs_db_a/foo666?att_encoding_info=true {1,1}
> Headers: [{'Accept',"application/json"},
>           {'Accept-Encoding',"gzip"},
>           {'Host',"localhost:5985"},
>           {'User-Agent',"CouchDB/0.12.0a953193"}]
> [debug] [<0.189.0>] OAuth Params: [{"att_encoding_info","true"}]
> [debug] [<0.189.0>] Minor error in HTTP request: {not_found,missing}
> [debug] [<0.189.0>] Stacktrace: [{couch_httpd_db,couch_doc_open,4},
>              {couch_httpd_db,db_doc_req,3},
>              {couch_httpd_db,do_db_req,2},
>              {couch_httpd,handle_request_int,5},
>              {mochiweb_http,headers,5},
>              {proc_lib,init_p_do_apply,3}]
> [info] [<0.189.0>] 127.0.0.1 - - 'GET'
> /test_suite_rep_docs_db_a/foo666?att_encoding_info=true 404
> [debug] [<0.189.0>] httpd 404 error response:
>  {"error":"not_found","reason":"missing"}
>
> [debug] [<0.191.0>] 'POST' /test_suite_rep_docs_db_b/_bulk_docs {1,1}
> Headers: [{'Accept',"application/json"},
>           {'Accept-Encoding',"gzip"},
>           {'Content-Length',"167"},
>           {'Host',"localhost:5985"},
>           {'User-Agent',"CouchDB/0.12.0a953193"},
>           {"X-Couch-Full-Commit","false"}]
> [debug] [<0.191.0>] OAuth Params: []
> [info] [<0.191.0>] 127.0.0.1 - - 'POST'
> /test_suite_rep_docs_db_b/_bulk_docs 201
>
>
>
>
> --
> Paul Bonser
> http://probablyprogramming.com
>



-- 
Paul Bonser
http://probablyprogramming.com

Re: replicator test hanging

Posted by Paul Bonser <mi...@gmail.com>.
On Wed, Jun 9, 2010 at 7:33 PM, J Chris Anderson <jc...@apache.org> wrote:

> Devs,
>
> Is anyone else seeing the replicator test hang and never finish?
>
> It never hangs the first few runs, but after running ten or so times, I'll
> end up with the test suite waiting for a replication that never finishes.
> It's the same story on 0.11.0, 0.11.x, and trunk.
>
> Is anyone else able to reproduce this? Am I crazy?
>

It just froze for me on the first try, using 0.12.0a935298, then ran
successfully 3 times, then froze again. The last thing logged the first time
was a _bulk_docs requests, the last thing logged this time was a PUT to
/test_suite_db_b/_local%2F6598a76aa55cd39645e4730b4cb28c00

I'm running a Firefox 3.6 nightly build on Linux. For me, it froze the first
time when I did a "run all" and the second time when just directly running
the replication test.

After svn up-ing to the latest in trunk, it froze on the first try, directly
running the replication test.

Here's the debug output for the last _replicate request where it freezes.
It's requesting a document that isn't there.


 [info] [<0.95.0>] starting new replication
"15c25eda4ea6308af6bea9864d5319ef" at <0.848.0>
[debug] [<0.191.0>] 'GET'
/test_suite_rep_docs_db_a/foo2?att_encoding_info=true {1,1}
Headers: [{'Accept',"application/json"},
          {'Accept-Encoding',"gzip"},
          {'Host',"localhost:5985"},
          {'User-Agent',"CouchDB/0.12.0a953193"}]
[debug] [<0.191.0>] OAuth Params: [{"att_encoding_info","true"}]
[info] [<0.191.0>] 127.0.0.1 - - 'GET'
/test_suite_rep_docs_db_a/foo2?att_encoding_info=true 200
[debug] [<0.189.0>] 'GET'
/test_suite_rep_docs_db_a/foo666?att_encoding_info=true {1,1}
Headers: [{'Accept',"application/json"},
          {'Accept-Encoding',"gzip"},
          {'Host',"localhost:5985"},
          {'User-Agent',"CouchDB/0.12.0a953193"}]
[debug] [<0.189.0>] OAuth Params: [{"att_encoding_info","true"}]
[debug] [<0.189.0>] Minor error in HTTP request: {not_found,missing}
[debug] [<0.189.0>] Stacktrace: [{couch_httpd_db,couch_doc_open,4},
             {couch_httpd_db,db_doc_req,3},
             {couch_httpd_db,do_db_req,2},
             {couch_httpd,handle_request_int,5},
             {mochiweb_http,headers,5},
             {proc_lib,init_p_do_apply,3}]
[info] [<0.189.0>] 127.0.0.1 - - 'GET'
/test_suite_rep_docs_db_a/foo666?att_encoding_info=true 404
[debug] [<0.189.0>] httpd 404 error response:
 {"error":"not_found","reason":"missing"}

[debug] [<0.191.0>] 'POST' /test_suite_rep_docs_db_b/_bulk_docs {1,1}
Headers: [{'Accept',"application/json"},
          {'Accept-Encoding',"gzip"},
          {'Content-Length',"167"},
          {'Host',"localhost:5985"},
          {'User-Agent',"CouchDB/0.12.0a953193"},
          {"X-Couch-Full-Commit","false"}]
[debug] [<0.191.0>] OAuth Params: []
[info] [<0.191.0>] 127.0.0.1 - - 'POST' /test_suite_rep_docs_db_b/_bulk_docs
201




-- 
Paul Bonser
http://probablyprogramming.com