You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Ian Danforth <id...@fetchrobotics.com> on 2016/08/08 22:32:34 UTC

14 days and couchdb becomes unresponsive

Hello!

 First post to the list so please forgive any faux-pas. I'm running couchdb
1.5.0 on Ubuntu 14.04 and I am consistently running into a state where,
after 14 days of uptime on the computer, couchdb becomes unresponsive.
Requests to the db start queueing up until all I'm getting from the python
client are 599 relax exceptions.

couch.couch.CouchException: HTTP 599: Unknown

 Asking the service to stop and restart does not recover and
/var/log/couchdb/couch.log doesn't have any errors.

 I have been unable to find reports of similar errors in various Google
searches, so I thought I'd ask here. Additional debugging and logging
suggestions are welcome!

-- 
Ian Danforth
Fetch Robotics
Lead Robotics Engineer
650-391-4467

Re: 14 days and couchdb becomes unresponsive

Posted by Ian Danforth <id...@fetchrobotics.com>.
Bill,

 Thanks for the thoughts. The log of responses from couch are all 200's and
300's as you would expect. Up until it stops serving any requests, but I
didn't see any 500s.

 When couch becomes unresponsive futon is also inaccessible.

Ian


On Mon, Aug 8, 2016 at 3:42 PM, Bill Stephenson <bi...@ezinvoice.com> wrote:

> I’m not a CouchDB expert but I wonder if you’re logging the responses to
> your requests, and when you say it becomes unresponsive I wonder what
> happens when you login to Futon? Does it not respond as well?
>
> Kindest Regards,
>
> Bill Stephenson
> Tech Support
> www.ezInvoice.com <http://www.ezinvoice.com/>
> 1-417-546-8390
>
> > On Aug 8, 2016, at 5:32 PM, Ian Danforth <id...@fetchrobotics.com>
> wrote:
> >
> > Hello!
> >
> > First post to the list so please forgive any faux-pas. I'm running
> couchdb
> > 1.5.0 on Ubuntu 14.04 and I am consistently running into a state where,
> > after 14 days of uptime on the computer, couchdb becomes unresponsive.
> > Requests to the db start queueing up until all I'm getting from the
> python
> > client are 599 relax exceptions.
> >
> > couch.couch.CouchException: HTTP 599: Unknown
> >
> > Asking the service to stop and restart does not recover and
> > /var/log/couchdb/couch.log doesn't have any errors.
> >
> > I have been unable to find reports of similar errors in various Google
> > searches, so I thought I'd ask here. Additional debugging and logging
> > suggestions are welcome!
> >
> > --
> > Ian Danforth
> > Fetch Robotics
> > Lead Robotics Engineer
> > 650-391-4467
>
>


-- 
Ian Danforth
Fetch Robotics
Lead Robotics Engineer
650-391-4467

Re: 14 days and couchdb becomes unresponsive

Posted by Bill Stephenson <bi...@ezinvoice.com>.
I’m not a CouchDB expert but I wonder if you’re logging the responses to your requests, and when you say it becomes unresponsive I wonder what happens when you login to Futon? Does it not respond as well?

Kindest Regards,

Bill Stephenson
Tech Support
www.ezInvoice.com <http://www.ezinvoice.com/>
1-417-546-8390

> On Aug 8, 2016, at 5:32 PM, Ian Danforth <id...@fetchrobotics.com> wrote:
> 
> Hello!
> 
> First post to the list so please forgive any faux-pas. I'm running couchdb
> 1.5.0 on Ubuntu 14.04 and I am consistently running into a state where,
> after 14 days of uptime on the computer, couchdb becomes unresponsive.
> Requests to the db start queueing up until all I'm getting from the python
> client are 599 relax exceptions.
> 
> couch.couch.CouchException: HTTP 599: Unknown
> 
> Asking the service to stop and restart does not recover and
> /var/log/couchdb/couch.log doesn't have any errors.
> 
> I have been unable to find reports of similar errors in various Google
> searches, so I thought I'd ask here. Additional debugging and logging
> suggestions are welcome!
> 
> -- 
> Ian Danforth
> Fetch Robotics
> Lead Robotics Engineer
> 650-391-4467


Re: 14 days and couchdb becomes unresponsive

Posted by Robert Samuel Newson <rn...@apache.org>.
hrm, well, this certainly isn't normal, and I've not seen this behaviour at cloudant either. Not that this helps you much...

I would suggest recording /_stats periodically but nothing in there seems helpful for this, it doesn't shown current connection count.

Are all requests mediated by Tornado? Is it possible to see if these symptoms manifest without it interceding?

B.


> On 23 Aug 2016, at 21:02, Ian Danforth <id...@fetchrobotics.com> wrote:
> 
> Robert,
> 
> I haven't done this comparison, was hoping to avoid it :) Couch.log has
> nothing but 200's up until the point where it becomes unresponsive.
> 
> Ian
> 
> On Tue, Aug 23, 2016 at 12:56 PM, Robert Samuel Newson <rn...@apache.org>
> wrote:
> 
>> 
>> Are you able to compare with couchdb 1.6.1 (1.5.0 is fairly old, though I
>> don't recall a fix between 1.5.0 and 1.6.1 that matches your symptoms)?
>> 
>> couch.log has nothing interesting to say leading up to this point?
>> 
>> 
>>> On 23 Aug 2016, at 20:45, Ian Danforth <id...@fetchrobotics.com>
>> wrote:
>>> 
>>> Robert,
>>> 
>>> Yes. We had encountered ulimit issues previously with our setup because
>> we
>>> weren't properly closing client connections to couch, so we are well
>> aware
>>> of that possibility. Thanks again for continuing to think about this!
>>> 
>>> On Tue, Aug 23, 2016 at 12:42 PM, Robert Samuel Newson <
>> rnewson@apache.org>
>>> wrote:
>>> 
>>>> sorry for not getting back to you.
>>>> 
>>>> Do you have enough monitoring here to rule out things like hitting a
>> file
>>>> descriptor ulimit or ephemeral ports?
>>>> 
>>>> 
>>>>> On 9 Aug 2016, at 00:05, Ian Danforth <id...@fetchrobotics.com>
>>>> wrote:
>>>>> 
>>>>> Robert,
>>>>> 
>>>>> Sorry that error code is thrown by tornado-couch (a python library we
>> use
>>>>> to make async requests to couch from our Tornado server). That is the
>>>> error
>>>>> of last resort when no response is forthcoming.
>>>>> 
>>>>> curl (or any) requests to couch endpoints simply do not return.
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Ian
>>>>> 
>>>>> On Mon, Aug 8, 2016 at 3:59 PM, Robert Samuel Newson <
>> rnewson@apache.org
>>>>> 
>>>>> wrote:
>>>>> 
>>>>>> I am pretty sure couchdb does not send 599 status code. can you show a
>>>>>> full request/response please (a curl -v would do it)?
>>>>>> 
>>>>>>> On 8 Aug 2016, at 23:32, Ian Danforth <id...@fetchrobotics.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Hello!
>>>>>>> 
>>>>>>> First post to the list so please forgive any faux-pas. I'm running
>>>>>> couchdb
>>>>>>> 1.5.0 on Ubuntu 14.04 and I am consistently running into a state
>> where,
>>>>>>> after 14 days of uptime on the computer, couchdb becomes
>> unresponsive.
>>>>>>> Requests to the db start queueing up until all I'm getting from the
>>>>>> python
>>>>>>> client are 599 relax exceptions.
>>>>>>> 
>>>>>>> couch.couch.CouchException: HTTP 599: Unknown
>>>>>>> 
>>>>>>> Asking the service to stop and restart does not recover and
>>>>>>> /var/log/couchdb/couch.log doesn't have any errors.
>>>>>>> 
>>>>>>> I have been unable to find reports of similar errors in various
>> Google
>>>>>>> searches, so I thought I'd ask here. Additional debugging and logging
>>>>>>> suggestions are welcome!
>>>>>>> 
>>>>>>> --
>>>>>>> Ian Danforth
>>>>>>> Fetch Robotics
>>>>>>> Lead Robotics Engineer
>>>>>>> 650-391-4467
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Ian Danforth
>>>>> Fetch Robotics
>>>>> Lead Robotics Engineer
>>>>> 650-391-4467
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Ian Danforth
>>> Fetch Robotics
>>> Lead Robotics Engineer
>>> 650-391-4467
>> 
>> 
> 
> 
> -- 
> Ian Danforth
> Fetch Robotics
> Lead Robotics Engineer
> 650-391-4467


Re: 14 days and couchdb becomes unresponsive

Posted by Ian Danforth <id...@fetchrobotics.com>.
Robert,

 I haven't done this comparison, was hoping to avoid it :) Couch.log has
nothing but 200's up until the point where it becomes unresponsive.

Ian

On Tue, Aug 23, 2016 at 12:56 PM, Robert Samuel Newson <rn...@apache.org>
wrote:

>
> Are you able to compare with couchdb 1.6.1 (1.5.0 is fairly old, though I
> don't recall a fix between 1.5.0 and 1.6.1 that matches your symptoms)?
>
> couch.log has nothing interesting to say leading up to this point?
>
>
> > On 23 Aug 2016, at 20:45, Ian Danforth <id...@fetchrobotics.com>
> wrote:
> >
> > Robert,
> >
> > Yes. We had encountered ulimit issues previously with our setup because
> we
> > weren't properly closing client connections to couch, so we are well
> aware
> > of that possibility. Thanks again for continuing to think about this!
> >
> > On Tue, Aug 23, 2016 at 12:42 PM, Robert Samuel Newson <
> rnewson@apache.org>
> > wrote:
> >
> >> sorry for not getting back to you.
> >>
> >> Do you have enough monitoring here to rule out things like hitting a
> file
> >> descriptor ulimit or ephemeral ports?
> >>
> >>
> >>> On 9 Aug 2016, at 00:05, Ian Danforth <id...@fetchrobotics.com>
> >> wrote:
> >>>
> >>> Robert,
> >>>
> >>> Sorry that error code is thrown by tornado-couch (a python library we
> use
> >>> to make async requests to couch from our Tornado server). That is the
> >> error
> >>> of last resort when no response is forthcoming.
> >>>
> >>> curl (or any) requests to couch endpoints simply do not return.
> >>>
> >>> Thanks,
> >>>
> >>> Ian
> >>>
> >>> On Mon, Aug 8, 2016 at 3:59 PM, Robert Samuel Newson <
> rnewson@apache.org
> >>>
> >>> wrote:
> >>>
> >>>> I am pretty sure couchdb does not send 599 status code. can you show a
> >>>> full request/response please (a curl -v would do it)?
> >>>>
> >>>>> On 8 Aug 2016, at 23:32, Ian Danforth <id...@fetchrobotics.com>
> >>>> wrote:
> >>>>>
> >>>>> Hello!
> >>>>>
> >>>>> First post to the list so please forgive any faux-pas. I'm running
> >>>> couchdb
> >>>>> 1.5.0 on Ubuntu 14.04 and I am consistently running into a state
> where,
> >>>>> after 14 days of uptime on the computer, couchdb becomes
> unresponsive.
> >>>>> Requests to the db start queueing up until all I'm getting from the
> >>>> python
> >>>>> client are 599 relax exceptions.
> >>>>>
> >>>>> couch.couch.CouchException: HTTP 599: Unknown
> >>>>>
> >>>>> Asking the service to stop and restart does not recover and
> >>>>> /var/log/couchdb/couch.log doesn't have any errors.
> >>>>>
> >>>>> I have been unable to find reports of similar errors in various
> Google
> >>>>> searches, so I thought I'd ask here. Additional debugging and logging
> >>>>> suggestions are welcome!
> >>>>>
> >>>>> --
> >>>>> Ian Danforth
> >>>>> Fetch Robotics
> >>>>> Lead Robotics Engineer
> >>>>> 650-391-4467
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Ian Danforth
> >>> Fetch Robotics
> >>> Lead Robotics Engineer
> >>> 650-391-4467
> >>
> >>
> >
> >
> > --
> > Ian Danforth
> > Fetch Robotics
> > Lead Robotics Engineer
> > 650-391-4467
>
>


-- 
Ian Danforth
Fetch Robotics
Lead Robotics Engineer
650-391-4467

Re: 14 days and couchdb becomes unresponsive

Posted by Robert Samuel Newson <rn...@apache.org>.
Are you able to compare with couchdb 1.6.1 (1.5.0 is fairly old, though I don't recall a fix between 1.5.0 and 1.6.1 that matches your symptoms)?

couch.log has nothing interesting to say leading up to this point?


> On 23 Aug 2016, at 20:45, Ian Danforth <id...@fetchrobotics.com> wrote:
> 
> Robert,
> 
> Yes. We had encountered ulimit issues previously with our setup because we
> weren't properly closing client connections to couch, so we are well aware
> of that possibility. Thanks again for continuing to think about this!
> 
> On Tue, Aug 23, 2016 at 12:42 PM, Robert Samuel Newson <rn...@apache.org>
> wrote:
> 
>> sorry for not getting back to you.
>> 
>> Do you have enough monitoring here to rule out things like hitting a file
>> descriptor ulimit or ephemeral ports?
>> 
>> 
>>> On 9 Aug 2016, at 00:05, Ian Danforth <id...@fetchrobotics.com>
>> wrote:
>>> 
>>> Robert,
>>> 
>>> Sorry that error code is thrown by tornado-couch (a python library we use
>>> to make async requests to couch from our Tornado server). That is the
>> error
>>> of last resort when no response is forthcoming.
>>> 
>>> curl (or any) requests to couch endpoints simply do not return.
>>> 
>>> Thanks,
>>> 
>>> Ian
>>> 
>>> On Mon, Aug 8, 2016 at 3:59 PM, Robert Samuel Newson <rnewson@apache.org
>>> 
>>> wrote:
>>> 
>>>> I am pretty sure couchdb does not send 599 status code. can you show a
>>>> full request/response please (a curl -v would do it)?
>>>> 
>>>>> On 8 Aug 2016, at 23:32, Ian Danforth <id...@fetchrobotics.com>
>>>> wrote:
>>>>> 
>>>>> Hello!
>>>>> 
>>>>> First post to the list so please forgive any faux-pas. I'm running
>>>> couchdb
>>>>> 1.5.0 on Ubuntu 14.04 and I am consistently running into a state where,
>>>>> after 14 days of uptime on the computer, couchdb becomes unresponsive.
>>>>> Requests to the db start queueing up until all I'm getting from the
>>>> python
>>>>> client are 599 relax exceptions.
>>>>> 
>>>>> couch.couch.CouchException: HTTP 599: Unknown
>>>>> 
>>>>> Asking the service to stop and restart does not recover and
>>>>> /var/log/couchdb/couch.log doesn't have any errors.
>>>>> 
>>>>> I have been unable to find reports of similar errors in various Google
>>>>> searches, so I thought I'd ask here. Additional debugging and logging
>>>>> suggestions are welcome!
>>>>> 
>>>>> --
>>>>> Ian Danforth
>>>>> Fetch Robotics
>>>>> Lead Robotics Engineer
>>>>> 650-391-4467
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Ian Danforth
>>> Fetch Robotics
>>> Lead Robotics Engineer
>>> 650-391-4467
>> 
>> 
> 
> 
> -- 
> Ian Danforth
> Fetch Robotics
> Lead Robotics Engineer
> 650-391-4467


Re: 14 days and couchdb becomes unresponsive

Posted by Ian Danforth <id...@fetchrobotics.com>.
Robert,

 Yes. We had encountered ulimit issues previously with our setup because we
weren't properly closing client connections to couch, so we are well aware
of that possibility. Thanks again for continuing to think about this!

On Tue, Aug 23, 2016 at 12:42 PM, Robert Samuel Newson <rn...@apache.org>
wrote:

> sorry for not getting back to you.
>
> Do you have enough monitoring here to rule out things like hitting a file
> descriptor ulimit or ephemeral ports?
>
>
> > On 9 Aug 2016, at 00:05, Ian Danforth <id...@fetchrobotics.com>
> wrote:
> >
> > Robert,
> >
> > Sorry that error code is thrown by tornado-couch (a python library we use
> > to make async requests to couch from our Tornado server). That is the
> error
> > of last resort when no response is forthcoming.
> >
> > curl (or any) requests to couch endpoints simply do not return.
> >
> > Thanks,
> >
> > Ian
> >
> > On Mon, Aug 8, 2016 at 3:59 PM, Robert Samuel Newson <rnewson@apache.org
> >
> > wrote:
> >
> >> I am pretty sure couchdb does not send 599 status code. can you show a
> >> full request/response please (a curl -v would do it)?
> >>
> >>> On 8 Aug 2016, at 23:32, Ian Danforth <id...@fetchrobotics.com>
> >> wrote:
> >>>
> >>> Hello!
> >>>
> >>> First post to the list so please forgive any faux-pas. I'm running
> >> couchdb
> >>> 1.5.0 on Ubuntu 14.04 and I am consistently running into a state where,
> >>> after 14 days of uptime on the computer, couchdb becomes unresponsive.
> >>> Requests to the db start queueing up until all I'm getting from the
> >> python
> >>> client are 599 relax exceptions.
> >>>
> >>> couch.couch.CouchException: HTTP 599: Unknown
> >>>
> >>> Asking the service to stop and restart does not recover and
> >>> /var/log/couchdb/couch.log doesn't have any errors.
> >>>
> >>> I have been unable to find reports of similar errors in various Google
> >>> searches, so I thought I'd ask here. Additional debugging and logging
> >>> suggestions are welcome!
> >>>
> >>> --
> >>> Ian Danforth
> >>> Fetch Robotics
> >>> Lead Robotics Engineer
> >>> 650-391-4467
> >>
> >>
> >
> >
> > --
> > Ian Danforth
> > Fetch Robotics
> > Lead Robotics Engineer
> > 650-391-4467
>
>


-- 
Ian Danforth
Fetch Robotics
Lead Robotics Engineer
650-391-4467

Re: 14 days and couchdb becomes unresponsive

Posted by Robert Samuel Newson <rn...@apache.org>.
sorry for not getting back to you.

Do you have enough monitoring here to rule out things like hitting a file descriptor ulimit or ephemeral ports?


> On 9 Aug 2016, at 00:05, Ian Danforth <id...@fetchrobotics.com> wrote:
> 
> Robert,
> 
> Sorry that error code is thrown by tornado-couch (a python library we use
> to make async requests to couch from our Tornado server). That is the error
> of last resort when no response is forthcoming.
> 
> curl (or any) requests to couch endpoints simply do not return.
> 
> Thanks,
> 
> Ian
> 
> On Mon, Aug 8, 2016 at 3:59 PM, Robert Samuel Newson <rn...@apache.org>
> wrote:
> 
>> I am pretty sure couchdb does not send 599 status code. can you show a
>> full request/response please (a curl -v would do it)?
>> 
>>> On 8 Aug 2016, at 23:32, Ian Danforth <id...@fetchrobotics.com>
>> wrote:
>>> 
>>> Hello!
>>> 
>>> First post to the list so please forgive any faux-pas. I'm running
>> couchdb
>>> 1.5.0 on Ubuntu 14.04 and I am consistently running into a state where,
>>> after 14 days of uptime on the computer, couchdb becomes unresponsive.
>>> Requests to the db start queueing up until all I'm getting from the
>> python
>>> client are 599 relax exceptions.
>>> 
>>> couch.couch.CouchException: HTTP 599: Unknown
>>> 
>>> Asking the service to stop and restart does not recover and
>>> /var/log/couchdb/couch.log doesn't have any errors.
>>> 
>>> I have been unable to find reports of similar errors in various Google
>>> searches, so I thought I'd ask here. Additional debugging and logging
>>> suggestions are welcome!
>>> 
>>> --
>>> Ian Danforth
>>> Fetch Robotics
>>> Lead Robotics Engineer
>>> 650-391-4467
>> 
>> 
> 
> 
> -- 
> Ian Danforth
> Fetch Robotics
> Lead Robotics Engineer
> 650-391-4467


Re: 14 days and couchdb becomes unresponsive

Posted by Ian Danforth <id...@fetchrobotics.com>.
Robert,

 Sorry that error code is thrown by tornado-couch (a python library we use
to make async requests to couch from our Tornado server). That is the error
of last resort when no response is forthcoming.

 curl (or any) requests to couch endpoints simply do not return.

Thanks,

Ian

On Mon, Aug 8, 2016 at 3:59 PM, Robert Samuel Newson <rn...@apache.org>
wrote:

> I am pretty sure couchdb does not send 599 status code. can you show a
> full request/response please (a curl -v would do it)?
>
> > On 8 Aug 2016, at 23:32, Ian Danforth <id...@fetchrobotics.com>
> wrote:
> >
> > Hello!
> >
> > First post to the list so please forgive any faux-pas. I'm running
> couchdb
> > 1.5.0 on Ubuntu 14.04 and I am consistently running into a state where,
> > after 14 days of uptime on the computer, couchdb becomes unresponsive.
> > Requests to the db start queueing up until all I'm getting from the
> python
> > client are 599 relax exceptions.
> >
> > couch.couch.CouchException: HTTP 599: Unknown
> >
> > Asking the service to stop and restart does not recover and
> > /var/log/couchdb/couch.log doesn't have any errors.
> >
> > I have been unable to find reports of similar errors in various Google
> > searches, so I thought I'd ask here. Additional debugging and logging
> > suggestions are welcome!
> >
> > --
> > Ian Danforth
> > Fetch Robotics
> > Lead Robotics Engineer
> > 650-391-4467
>
>


-- 
Ian Danforth
Fetch Robotics
Lead Robotics Engineer
650-391-4467

Re: 14 days and couchdb becomes unresponsive

Posted by Robert Samuel Newson <rn...@apache.org>.
I am pretty sure couchdb does not send 599 status code. can you show a full request/response please (a curl -v would do it)?

> On 8 Aug 2016, at 23:32, Ian Danforth <id...@fetchrobotics.com> wrote:
> 
> Hello!
> 
> First post to the list so please forgive any faux-pas. I'm running couchdb
> 1.5.0 on Ubuntu 14.04 and I am consistently running into a state where,
> after 14 days of uptime on the computer, couchdb becomes unresponsive.
> Requests to the db start queueing up until all I'm getting from the python
> client are 599 relax exceptions.
> 
> couch.couch.CouchException: HTTP 599: Unknown
> 
> Asking the service to stop and restart does not recover and
> /var/log/couchdb/couch.log doesn't have any errors.
> 
> I have been unable to find reports of similar errors in various Google
> searches, so I thought I'd ask here. Additional debugging and logging
> suggestions are welcome!
> 
> -- 
> Ian Danforth
> Fetch Robotics
> Lead Robotics Engineer
> 650-391-4467