You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by ID...@jw.org on 2012/03/15 19:28:26 UTC

os_process_error in CouchDB

We are seeing a strange error in CouchDB that causes Chef to become unusable and unrecoverable. The knife command ceases to respond, and the chef webui ceases to respond. /var/log/couch.log shows an os_process_error with exit status 0.

This is the second time this has happened. The first time, it happened to our chef-server that was running properly for several weeks. On Monday, at about 11 AM EST, the error occurred and our chef-server became urecoverable. We tried to research and recover the issue for about a day.

We then rebuilt the chef-server this morning. During the setup/installation, we encountered this issue (http://tickets.opscode.com/browse/CHEF-2346) which we had encountered in the past. We then applied the fix, by increasing maxFieldLength in the mainIndex section of the chef solr config file.

Very shortly after that, while do a chef run on a lab node, running a knife command and trying to access the web UI all at the same time, the os_process_error occurred again and the chef-server became unusable.

Our chef-server is running on a vSphere VM with 2 cores (2 cores in 1 socket), 2GB of RAM. It's running Ubuntu 10.04 LTS, Chef 0.10.8 and CouchDB 0.10. The VM was generated from a pre-existing VM that originally had only 1 core.

Another detail about our environment that may be important is that we use Centrify on our Linux server for Active Directory integration. This is why we were affected by CHEF-2346. Since chef pulls in all authorized users on a node as an automatic attribute, there can be thousands of users in a list that gets gathered by chef.

couch.log is 125000 lines long, so I'll include the beginning (http://pastie.org/3602674) and the end (http://pastie.org/3602677).

I should also mention that we have since rebuilt our chef-server on Ubuntu 11.10 which includes CouchDB 1.0.1. We have no issues, but we are very interested in getting to the root cause of this problem, because we are still nervous.

Is perhaps CouchDB dying because of the size of the node data that we are asking chef to gather? Has anyone else encountered this error? Much thanks for any help. Let me know if I can provide any more information.

Ian D. Rossi

RE: os_process_error in CouchDB

Posted by ID...@jw.org.
Thanks very much. I will pass this on to the Chef team. It seems that it always crashes at any view--id_to_name or name_to_id.

Ian D. Rossi

________________________________________
From: CGS [cgsmcmlxxv@gmail.com]
Sent: Thursday, March 15, 2012 3:12 PM
To: user@couchdb.apache.org
Subject: Re: os_process_error in CouchDB

>From what I was able to see from your logs, I can say you have an error in
a view which raises an uncaught error in JavaScript which, further, crashes
CouchDB 0.10. It seems that one of your documents doesn't contain the
required field and, so, the map function exits with error which is not
caught correctly by JS (or not transmitted correctly to Erlang part from
CouchDB) and that crashes some Erlang components in CouchDB. If the issue
doesn't appear in 1.0.1, it seems the problem was solved meanwhile (the
error is caught correctly).

That is what I noticed from your logs.

CGS




On Thu, Mar 15, 2012 at 7:28 PM, <ID...@jw.org> wrote:

> We are seeing a strange error in CouchDB that causes Chef to become
> unusable and unrecoverable. The knife command ceases to respond, and the
> chef webui ceases to respond. /var/log/couch.log shows an os_process_error
> with exit status 0.
>
> This is the second time this has happened. The first time, it happened to
> our chef-server that was running properly for several weeks. On Monday, at
> about 11 AM EST, the error occurred and our chef-server became
> urecoverable. We tried to research and recover the issue for about a day.
>
> We then rebuilt the chef-server this morning. During the
> setup/installation, we encountered this issue (
> http://tickets.opscode.com/browse/CHEF-2346) which we had encountered in
> the past. We then applied the fix, by increasing maxFieldLength in the
> mainIndex section of the chef solr config file.
>
> Very shortly after that, while do a chef run on a lab node, running a
> knife command and trying to access the web UI all at the same time, the
> os_process_error occurred again and the chef-server became unusable.
>
> Our chef-server is running on a vSphere VM with 2 cores (2 cores in 1
> socket), 2GB of RAM. It's running Ubuntu 10.04 LTS, Chef 0.10.8 and CouchDB
> 0.10. The VM was generated from a pre-existing VM that originally had only
> 1 core.
>
> Another detail about our environment that may be important is that we use
> Centrify on our Linux server for Active Directory integration. This is why
> we were affected by CHEF-2346. Since chef pulls in all authorized users on
> a node as an automatic attribute, there can be thousands of users in a list
> that gets gathered by chef.
>
> couch.log is 125000 lines long, so I'll include the beginning (
> http://pastie.org/3602674) and the end (http://pastie.org/3602677).
>
> I should also mention that we have since rebuilt our chef-server on Ubuntu
> 11.10 which includes CouchDB 1.0.1. We have no issues, but we are very
> interested in getting to the root cause of this problem, because we are
> still nervous.
>
> Is perhaps CouchDB dying because of the size of the node data that we are
> asking chef to gather? Has anyone else encountered this error? Much thanks
> for any help. Let me know if I can provide any more information.
>
> Ian D. Rossi
>

Re: os_process_error in CouchDB

Posted by CGS <cg...@gmail.com>.
>From what I was able to see from your logs, I can say you have an error in
a view which raises an uncaught error in JavaScript which, further, crashes
CouchDB 0.10. It seems that one of your documents doesn't contain the
required field and, so, the map function exits with error which is not
caught correctly by JS (or not transmitted correctly to Erlang part from
CouchDB) and that crashes some Erlang components in CouchDB. If the issue
doesn't appear in 1.0.1, it seems the problem was solved meanwhile (the
error is caught correctly).

That is what I noticed from your logs.

CGS




On Thu, Mar 15, 2012 at 7:28 PM, <ID...@jw.org> wrote:

> We are seeing a strange error in CouchDB that causes Chef to become
> unusable and unrecoverable. The knife command ceases to respond, and the
> chef webui ceases to respond. /var/log/couch.log shows an os_process_error
> with exit status 0.
>
> This is the second time this has happened. The first time, it happened to
> our chef-server that was running properly for several weeks. On Monday, at
> about 11 AM EST, the error occurred and our chef-server became
> urecoverable. We tried to research and recover the issue for about a day.
>
> We then rebuilt the chef-server this morning. During the
> setup/installation, we encountered this issue (
> http://tickets.opscode.com/browse/CHEF-2346) which we had encountered in
> the past. We then applied the fix, by increasing maxFieldLength in the
> mainIndex section of the chef solr config file.
>
> Very shortly after that, while do a chef run on a lab node, running a
> knife command and trying to access the web UI all at the same time, the
> os_process_error occurred again and the chef-server became unusable.
>
> Our chef-server is running on a vSphere VM with 2 cores (2 cores in 1
> socket), 2GB of RAM. It's running Ubuntu 10.04 LTS, Chef 0.10.8 and CouchDB
> 0.10. The VM was generated from a pre-existing VM that originally had only
> 1 core.
>
> Another detail about our environment that may be important is that we use
> Centrify on our Linux server for Active Directory integration. This is why
> we were affected by CHEF-2346. Since chef pulls in all authorized users on
> a node as an automatic attribute, there can be thousands of users in a list
> that gets gathered by chef.
>
> couch.log is 125000 lines long, so I'll include the beginning (
> http://pastie.org/3602674) and the end (http://pastie.org/3602677).
>
> I should also mention that we have since rebuilt our chef-server on Ubuntu
> 11.10 which includes CouchDB 1.0.1. We have no issues, but we are very
> interested in getting to the root cause of this problem, because we are
> still nervous.
>
> Is perhaps CouchDB dying because of the size of the node data that we are
> asking chef to gather? Has anyone else encountered this error? Much thanks
> for any help. Let me know if I can provide any more information.
>
> Ian D. Rossi
>