You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Ning Tan <ni...@gmail.com> on 2011/04/15 20:16:02 UTC

stray couchjs processes

A while back there was a post about stray couchjs processes that had
no apparent resolution. A similar situation happened in our
environment that resulted in hundreds of couchjs processes, which
caused out-of-memory problems for the server.

We are investigating the cause and would appreciate any help in
pinpointing the problem. One thing that was curious to me is, how many
couchjs processes are needed to support concurrent requests. I
couldn't reproduce a large number of couchjs processes in my local
environment. It seems that all my view/filter requests were handled by
just one couchjs process.

The environment that had problems was using 1.0.1. I've been testing
locally with 1.0.2.  Would that make any difference?

Also, the problematic environment had proxies sitting in front of the
couch boxes, so that's another variance in our analysis. But it's hard
to tell without knowing the relationship/cardinality between an HTTP
connection and a couchjs process. In the original post, connections
not properly closed were hinted as a potential culprit. However, it's
still unclear to me how mishandled HTTP connections can result in
multiple couchjs processes. If I'm not mistaken, couchjs only talks
via stdin/stdout and is not handling a connection directly.

Sorry if this question doesn't have enough information. We are still
in very early stages of our analysis and don't have a lot of leads
yet.

Thanks!

Re: stray couchjs processes

Posted by Ning Tan <ni...@gmail.com>.

Thanks for the replies!

Other than waiting for the fix Adam talked about to make it to our
environment, what else could we possibly do to alleviate the problem?
We were thinking of having a monitor on the number of couchjs
processes and/or their memory usage and bounce couch when needed. We'd
rather avoid doing so as it may have implications in other layers of
our application. (It may not; we'll have to think about it).

Would it be a better idea for the monitor to kill only the couchjs
processes instead? Will the couch database runtime tolerate this well?
I would assume killing a couchjs that's currently handling requests is
probably worse than killing one that's truly stray, but it may not be
the case.

Thanks again for helping us with this.

On Sat, Apr 16, 2011 at 10:54 AM, Jan Lehnardt <ja...@apache.org> wrote:
> Hah, thanks Adam,
>
> this is exactly the email I hoped to see by CCing dev@ :)
>
> Cheers
> Jan
> --
>
> On 16 Apr 2011, at 15:09, Adam Kocoloski wrote:
>
>> I've seen this bug in the wild.  I haven't been able to track down the exact root cause, but the various ets tables in couch_query_servers get out of sync with one another - one table will think there are no available processes and will cause new ones to be spawned but the others will still have some record of the hundreds of spawned couchjs processes.
>>
>> I rewrote the gen_server to use a single ets table and refactored a few other things in COUCHDB-901[1].  It's missing a hard limit on the number of processes that we'll spawn, and instead has a soft limit above which it will discard processes after their workload has finished.  I'm overdue to finish that ticket off and get it into trunk.  Regards,
>>
>> Adam
>>
>> [1]: https://issues.apache.org/jira/browse/COUCHDB-901
>>
>> On Apr 16, 2011, at 8:39 AM, Jan Lehnardt wrote:
>>
>>> Hi Ning,
>>>
>>> the correlation between couchjs and HTTP requests is that whenever a
>>> request needs couchjs for anything, it will use one that is around and
>>> idle. When CouchDB starts, none are idle and it will for and exec a
>>> new couchjs process. A couchjs process is not idle when a request is
>>> using it. So for every concurrent request, you will get a new fork &
>>> exec of a couchjs process.
>>>
>>> I haven't looked at the current implementation in a while, but we
>>> should look into implementing some configurable ceiling that can't
>>> be crossed with more fork & exec. Requests then could either wait
>>> until a couchjs is idle and eventually timeout if none get freed
>>> or they could get served a Service Unavailable (503) — That behaviour
>>> should also be configurable.
>>>
>>> CCing dev@ to see if we can get more feedback on this.
>>>
>>> Cheers
>>> Jan
>>> --
>>>
>>>
>>> On 15 Apr 2011, at 20:16, Ning Tan wrote:
>>>
>>>> A while back there was a post about stray couchjs processes that had
>>>> no apparent resolution. A similar situation happened in our
>>>> environment that resulted in hundreds of couchjs processes, which
>>>> caused out-of-memory problems for the server.
>>>>
>>>> We are investigating the cause and would appreciate any help in
>>>> pinpointing the problem. One thing that was curious to me is, how many
>>>> couchjs processes are needed to support concurrent requests. I
>>>> couldn't reproduce a large number of couchjs processes in my local
>>>> environment. It seems that all my view/filter requests were handled by
>>>> just one couchjs process.
>>>>
>>>> The environment that had problems was using 1.0.1. I've been testing
>>>> locally with 1.0.2.  Would that make any difference?
>>>>
>>>> Also, the problematic environment had proxies sitting in front of the
>>>> couch boxes, so that's another variance in our analysis. But it's hard
>>>> to tell without knowing the relationship/cardinality between an HTTP
>>>> connection and a couchjs process. In the original post, connections
>>>> not properly closed were hinted as a potential culprit. However, it's
>>>> still unclear to me how mishandled HTTP connections can result in
>>>> multiple couchjs processes. If I'm not mistaken, couchjs only talks
>>>> via stdin/stdout and is not handling a connection directly.
>>>>
>>>> Sorry if this question doesn't have enough information. We are still
>>>> in very early stages of our analysis and don't have a lot of leads
>>>> yet.
>>>>
>>>> Thanks!
>>>
>>
>
>

Re: stray couchjs processes

Posted by Jan Lehnardt <ja...@apache.org>.

Hah, thanks Adam,

this is exactly the email I hoped to see by CCing dev@ :)

Cheers
Jan
-- 

On 16 Apr 2011, at 15:09, Adam Kocoloski wrote:

> I've seen this bug in the wild.  I haven't been able to track down the exact root cause, but the various ets tables in couch_query_servers get out of sync with one another - one table will think there are no available processes and will cause new ones to be spawned but the others will still have some record of the hundreds of spawned couchjs processes.
> 
> I rewrote the gen_server to use a single ets table and refactored a few other things in COUCHDB-901[1].  It's missing a hard limit on the number of processes that we'll spawn, and instead has a soft limit above which it will discard processes after their workload has finished.  I'm overdue to finish that ticket off and get it into trunk.  Regards,
> 
> Adam
> 
> [1]: https://issues.apache.org/jira/browse/COUCHDB-901
> 
> On Apr 16, 2011, at 8:39 AM, Jan Lehnardt wrote:
> 
>> Hi Ning,
>> 
>> the correlation between couchjs and HTTP requests is that whenever a
>> request needs couchjs for anything, it will use one that is around and
>> idle. When CouchDB starts, none are idle and it will for and exec a 
>> new couchjs process. A couchjs process is not idle when a request is
>> using it. So for every concurrent request, you will get a new fork &
>> exec of a couchjs process.
>> 
>> I haven't looked at the current implementation in a while, but we
>> should look into implementing some configurable ceiling that can't
>> be crossed with more fork & exec. Requests then could either wait
>> until a couchjs is idle and eventually timeout if none get freed
>> or they could get served a Service Unavailable (503) — That behaviour
>> should also be configurable.
>> 
>> CCing dev@ to see if we can get more feedback on this.
>> 
>> Cheers
>> Jan
>> -- 
>> 
>> 
>> On 15 Apr 2011, at 20:16, Ning Tan wrote:
>> 
>>> A while back there was a post about stray couchjs processes that had
>>> no apparent resolution. A similar situation happened in our
>>> environment that resulted in hundreds of couchjs processes, which
>>> caused out-of-memory problems for the server.
>>> 
>>> We are investigating the cause and would appreciate any help in
>>> pinpointing the problem. One thing that was curious to me is, how many
>>> couchjs processes are needed to support concurrent requests. I
>>> couldn't reproduce a large number of couchjs processes in my local
>>> environment. It seems that all my view/filter requests were handled by
>>> just one couchjs process.
>>> 
>>> The environment that had problems was using 1.0.1. I've been testing
>>> locally with 1.0.2.  Would that make any difference?
>>> 
>>> Also, the problematic environment had proxies sitting in front of the
>>> couch boxes, so that's another variance in our analysis. But it's hard
>>> to tell without knowing the relationship/cardinality between an HTTP
>>> connection and a couchjs process. In the original post, connections
>>> not properly closed were hinted as a potential culprit. However, it's
>>> still unclear to me how mishandled HTTP connections can result in
>>> multiple couchjs processes. If I'm not mistaken, couchjs only talks
>>> via stdin/stdout and is not handling a connection directly.
>>> 
>>> Sorry if this question doesn't have enough information. We are still
>>> in very early stages of our analysis and don't have a lot of leads
>>> yet.
>>> 
>>> Thanks!
>> 
>

Re: stray couchjs processes

Posted by Adam Kocoloski <ko...@apache.org>.

I've seen this bug in the wild.  I haven't been able to track down the exact root cause, but the various ets tables in couch_query_servers get out of sync with one another - one table will think there are no available processes and will cause new ones to be spawned but the others will still have some record of the hundreds of spawned couchjs processes.

I rewrote the gen_server to use a single ets table and refactored a few other things in COUCHDB-901[1].  It's missing a hard limit on the number of processes that we'll spawn, and instead has a soft limit above which it will discard processes after their workload has finished.  I'm overdue to finish that ticket off and get it into trunk.  Regards,

Adam

[1]: https://issues.apache.org/jira/browse/COUCHDB-901

On Apr 16, 2011, at 8:39 AM, Jan Lehnardt wrote:

> Hi Ning,
> 
> the correlation between couchjs and HTTP requests is that whenever a
> request needs couchjs for anything, it will use one that is around and
> idle. When CouchDB starts, none are idle and it will for and exec a 
> new couchjs process. A couchjs process is not idle when a request is
> using it. So for every concurrent request, you will get a new fork &
> exec of a couchjs process.
> 
> I haven't looked at the current implementation in a while, but we
> should look into implementing some configurable ceiling that can't
> be crossed with more fork & exec. Requests then could either wait
> until a couchjs is idle and eventually timeout if none get freed
> or they could get served a Service Unavailable (503) — That behaviour
> should also be configurable.
> 
> CCing dev@ to see if we can get more feedback on this.
> 
> Cheers
> Jan
> -- 
> 
> 
> On 15 Apr 2011, at 20:16, Ning Tan wrote:
> 
>> A while back there was a post about stray couchjs processes that had
>> no apparent resolution. A similar situation happened in our
>> environment that resulted in hundreds of couchjs processes, which
>> caused out-of-memory problems for the server.
>> 
>> We are investigating the cause and would appreciate any help in
>> pinpointing the problem. One thing that was curious to me is, how many
>> couchjs processes are needed to support concurrent requests. I
>> couldn't reproduce a large number of couchjs processes in my local
>> environment. It seems that all my view/filter requests were handled by
>> just one couchjs process.
>> 
>> The environment that had problems was using 1.0.1. I've been testing
>> locally with 1.0.2.  Would that make any difference?
>> 
>> Also, the problematic environment had proxies sitting in front of the
>> couch boxes, so that's another variance in our analysis. But it's hard
>> to tell without knowing the relationship/cardinality between an HTTP
>> connection and a couchjs process. In the original post, connections
>> not properly closed were hinted as a potential culprit. However, it's
>> still unclear to me how mishandled HTTP connections can result in
>> multiple couchjs processes. If I'm not mistaken, couchjs only talks
>> via stdin/stdout and is not handling a connection directly.
>> 
>> Sorry if this question doesn't have enough information. We are still
>> in very early stages of our analysis and don't have a lot of leads
>> yet.
>> 
>> Thanks!
>

Re: stray couchjs processes

Posted by Jan Lehnardt <ja...@apache.org>.

Hi Ning,

the correlation between couchjs and HTTP requests is that whenever a
request needs couchjs for anything, it will use one that is around and
idle. When CouchDB starts, none are idle and it will for and exec a 
new couchjs process. A couchjs process is not idle when a request is
using it. So for every concurrent request, you will get a new fork &
exec of a couchjs process.

I haven't looked at the current implementation in a while, but we
should look into implementing some configurable ceiling that can't
be crossed with more fork & exec. Requests then could either wait
until a couchjs is idle and eventually timeout if none get freed
or they could get served a Service Unavailable (503) — That behaviour
should also be configurable.

CCing dev@ to see if we can get more feedback on this.

Cheers
Jan
-- 

On 15 Apr 2011, at 20:16, Ning Tan wrote:

> A while back there was a post about stray couchjs processes that had
> no apparent resolution. A similar situation happened in our
> environment that resulted in hundreds of couchjs processes, which
> caused out-of-memory problems for the server.
> 
> We are investigating the cause and would appreciate any help in
> pinpointing the problem. One thing that was curious to me is, how many
> couchjs processes are needed to support concurrent requests. I
> couldn't reproduce a large number of couchjs processes in my local
> environment. It seems that all my view/filter requests were handled by
> just one couchjs process.
> 
> The environment that had problems was using 1.0.1. I've been testing
> locally with 1.0.2.  Would that make any difference?
> 
> Also, the problematic environment had proxies sitting in front of the
> couch boxes, so that's another variance in our analysis. But it's hard
> to tell without knowing the relationship/cardinality between an HTTP
> connection and a couchjs process. In the original post, connections
> not properly closed were hinted as a potential culprit. However, it's
> still unclear to me how mishandled HTTP connections can result in
> multiple couchjs processes. If I'm not mistaken, couchjs only talks
> via stdin/stdout and is not handling a connection directly.
> 
> Sorry if this question doesn't have enough information. We are still
> in very early stages of our analysis and don't have a lot of leads
> yet.
> 
> Thanks!

Re: stray couchjs processes

Posted by Jan Lehnardt <ja...@apache.org>.

Hi Ning,

the correlation between couchjs and HTTP requests is that whenever a
request needs couchjs for anything, it will use one that is around and
idle. When CouchDB starts, none are idle and it will for and exec a 
new couchjs process. A couchjs process is not idle when a request is
using it. So for every concurrent request, you will get a new fork &
exec of a couchjs process.

I haven't looked at the current implementation in a while, but we
should look into implementing some configurable ceiling that can't
be crossed with more fork & exec. Requests then could either wait
until a couchjs is idle and eventually timeout if none get freed
or they could get served a Service Unavailable (503) — That behaviour
should also be configurable.

CCing dev@ to see if we can get more feedback on this.

Cheers
Jan
-- 

On 15 Apr 2011, at 20:16, Ning Tan wrote:

> A while back there was a post about stray couchjs processes that had
> no apparent resolution. A similar situation happened in our
> environment that resulted in hundreds of couchjs processes, which
> caused out-of-memory problems for the server.
> 
> We are investigating the cause and would appreciate any help in
> pinpointing the problem. One thing that was curious to me is, how many
> couchjs processes are needed to support concurrent requests. I
> couldn't reproduce a large number of couchjs processes in my local
> environment. It seems that all my view/filter requests were handled by
> just one couchjs process.
> 
> The environment that had problems was using 1.0.1. I've been testing
> locally with 1.0.2.  Would that make any difference?
> 
> Also, the problematic environment had proxies sitting in front of the
> couch boxes, so that's another variance in our analysis. But it's hard
> to tell without knowing the relationship/cardinality between an HTTP
> connection and a couchjs process. In the original post, connections
> not properly closed were hinted as a potential culprit. However, it's
> still unclear to me how mishandled HTTP connections can result in
> multiple couchjs processes. If I'm not mistaken, couchjs only talks
> via stdin/stdout and is not handling a connection directly.
> 
> Sorry if this question doesn't have enough information. We are still
> in very early stages of our analysis and don't have a lot of leads
> yet.
> 
> Thanks!