You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Anand Chitipothu <an...@gmail.com> on 2011/09/07 16:44:17 UTC

CouchDB random crashes

Hi,

We are using CouchDB in production at openlibrary.org. We have couple
of databases with 25M docs and views with about 80M rows.

There was a crash and after restart, the couchdb server started
recomputing all views of all the databases. Sometime before the crash
I ran _view_cleanup on one of the databases to delete unused view
files, but I'm not sure, if that caused it.

We were using couchdb version 1.0.1. After that crash, I copied the
databases to a new node, restored views from a backup, upgraded
couchdb to 1.0.3 thinking that it will be more stable and everything
seems alright for a while.

I tried to compact a view and a database as compaction was not run
since the db was created. That increased the load on the machine and
couchdb crashed and restart started view recomputation.

I restored the view from backup again and it looked alright again for
a while. After that I've been going though phases of random crash and
restoring views  from backup. I'm not sure what is triggering this
crash. Tried moving back to 1.0.1, but that didn't help.

After the crash, the last error in the couch.log is alway the following:

[Wed, 07 Sep 2011 13:22:08 GMT] [error] [<0.77.0>] {error_report,<0.31.0>,
    {<0.77.0>,supervisor_report,
     [{supervisor,{local,couch_server_sup}},
      {errorContext,shutdown},
      {reason,reached_max_restart_intensity},
      {offender,
          [{pid,<0.29139.3>},
           {name,couch_secondary_services},
           {mfa,{couch_server_sup,start_secondary_services,[]}},
           {restart_type,permanent},
           {shutdown,infinity},
           {child_type,supervisor}]}]}}

Here is the tail of last 10K+ lines of couch.log after each crash.

http://www.archive.org/~anand/files/2011-09-07-couchdb-crash-log.txt
http://www.archive.org/~anand/files/2011-09-07-couchdb-crash2-log.txt

And the full log of most recent crash:

http://www.archive.org/~anand/files/2011-09-07-couchdb-crash2.log.gz

Can someone please help me to fix this?

Thanks,
Anand

Re: CouchDB random crashes

Posted by Anand Chitipothu <an...@gmail.com>.
2011/9/9 Randall Leeds <ra...@gmail.com>:
> I see "emfile" in the first of those log which indicates your CouchDB
> instance has reached the maximum number of open files.
> If this is an "idle" server, with no clients connected, that may mean there
> is a leak somehow in file descriptors and we should open a ticket to look
> into it.
> If the server is under load from clients it is likely you just need to
> change resource limits.
> See this wiki page:
> https://wiki.apache.org/couchdb/Performance#Resource_Limits

With the help of rnewson on #couchdb, I was able to find that this
problem is due to number of open files.

It happened that the couchdb server is holding lot of connections for
long time even after the remote end is closed. I could reproduce it by
sending multiple curl requests to the couchdb server with a timeout.

for i in `seq 1000`; do curl -m1
'http://localhost:5984/foo/_changes?feed=continuous'; echo $i; done

I also found that my couchdb server was creating too many couchpy
processes, but I couldn't reproduce that yet. I'll spend some time
exploring it in next couple days and send an update.

Thanks for your support.
Anand

Re: CouchDB random crashes

Posted by Randall Leeds <ra...@gmail.com>.
I see "emfile" in the first of those log which indicates your CouchDB
instance has reached the maximum number of open files.
If this is an "idle" server, with no clients connected, that may mean there
is a leak somehow in file descriptors and we should open a ticket to look
into it.
If the server is under load from clients it is likely you just need to
change resource limits.
See this wiki page:
https://wiki.apache.org/couchdb/Performance#Resource_Limits

On Wed, Sep 7, 2011 at 07:44, Anand Chitipothu <an...@gmail.com> wrote:

> Hi,
>
> We are using CouchDB in production at openlibrary.org. We have couple
> of databases with 25M docs and views with about 80M rows.
>
> There was a crash and after restart, the couchdb server started
> recomputing all views of all the databases. Sometime before the crash
> I ran _view_cleanup on one of the databases to delete unused view
> files, but I'm not sure, if that caused it.
>
> We were using couchdb version 1.0.1. After that crash, I copied the
> databases to a new node, restored views from a backup, upgraded
> couchdb to 1.0.3 thinking that it will be more stable and everything
> seems alright for a while.
>
> I tried to compact a view and a database as compaction was not run
> since the db was created. That increased the load on the machine and
> couchdb crashed and restart started view recomputation.
>
> I restored the view from backup again and it looked alright again for
> a while. After that I've been going though phases of random crash and
> restoring views  from backup. I'm not sure what is triggering this
> crash. Tried moving back to 1.0.1, but that didn't help.
>
> After the crash, the last error in the couch.log is alway the following:
>
> [Wed, 07 Sep 2011 13:22:08 GMT] [error] [<0.77.0>] {error_report,<0.31.0>,
>    {<0.77.0>,supervisor_report,
>     [{supervisor,{local,couch_server_sup}},
>      {errorContext,shutdown},
>      {reason,reached_max_restart_intensity},
>      {offender,
>          [{pid,<0.29139.3>},
>           {name,couch_secondary_services},
>           {mfa,{couch_server_sup,start_secondary_services,[]}},
>           {restart_type,permanent},
>           {shutdown,infinity},
>           {child_type,supervisor}]}]}}
>
> Here is the tail of last 10K+ lines of couch.log after each crash.
>
> http://www.archive.org/~anand/files/2011-09-07-couchdb-crash-log.txt
> http://www.archive.org/~anand/files/2011-09-07-couchdb-crash2-log.txt
>
> And the full log of most recent crash:
>
> http://www.archive.org/~anand/files/2011-09-07-couchdb-crash2.log.gz
>
> Can someone please help me to fix this?
>
> Thanks,
> Anand
>