You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by db <db...@leap.se> on 2014/02/14 20:02:47 UTC

bigcouch restarts when update handler parses a lot of data

Hello.

I'm working on the LEAP project (https://leap.se) and we've been using
cloudant's bigcouch as a backend for one part of our infrastructure.

Latest available bigcouch debian package is built on top of a fork of couchdb
1.1.1. Along the development, we stumbled upon some bugs that are already
solved in newer versions of couchdb, and even patched bigcouch code to
generate a new debian wheezy package (thanks to a lot of help from #cloudant
people).

Despite that, there's one bug which we could not solve yet and people on
#cloudant said it would be good to have the conversation on this list.

The bug description goes like this:

  bigcouch restarts on PUTs to a design document update handler that calls
  JSON.parse(req.body), when the PUT request body is "large".

One strange thing is that "large" differs from server to server. On my local
machine it works fine, i.e. I can use the update handler and PUT bodies up to
the configured max_document_size (which defaults to 64 MB). On 3 distinct
remote nodes we have, the maximum body sizes that make the server restart are
different, but consistent, that is, the limit is always the same for one node
(37, 26 and 8 MB).

Some info that might help the diagnosis:

* Tracing the request handling, I found that the last function that is called is
  mochijson2:encoder(), and then the server restarts.

* I tried changing resource limits for the bigcouch process, but made no
  difference.

* These are the first 150 KB of the erl_crash.dump for one of the restarts
  (the whole thing is about 100 MB): http://paste.debian.net/82102/
  It is clearly trying to allocate a lot more memory than it should.

* As rnewson told me on #cloudant that this might be related to memory bugs
  on erlang, I built bigcouch using the latest erlang from wheezy-backports,
  but i got the same behaviour. These are the first 150 KB of erl_crash.dump
  for bigcouch built with R16B03: http://paste.debian.net/82103/

* This is the design doc update handler that calls JSON.parse(req.body) we are
  using: http://tinyurl.com/mqf9bn2

Kind people on IRC have been helping with this and other issues, and asked to
continue the conversation on this list. I know that cloudant's bigcouch is
being merged into couchdb code, and hope that this problem may either (1) be
already resolved, or (2) be some stupid thing I'm missing, or (3) still exist
and we might help solve it.

If there's any other info I can provide, please let me know.

Thanks a lot for any help!
db.

-- 

                                        ----  __o
                                       ---- _`\<,_
                                      ---- (*)/ (*)

Re: bigcouch restarts when update handler parses a lot of data

Posted by db <db...@leap.se>.
Hello, this message is just to say that we found out that the issue is not a
memory or application bug, but rather a lack of knowledge about how strings
are handled in Erlang and the high amount of memory needed to represent them.

Thanks anyway!
db.

-- 

db spewed 2.6K bytes:
> Hello.
> 
> I'm working on the LEAP project (https://leap.se) and we've been using
> cloudant's bigcouch as a backend for one part of our infrastructure.
> 
> Latest available bigcouch debian package is built on top of a fork of couchdb
> 1.1.1. Along the development, we stumbled upon some bugs that are already
> solved in newer versions of couchdb, and even patched bigcouch code to
> generate a new debian wheezy package (thanks to a lot of help from #cloudant
> people).
> 
> Despite that, there's one bug which we could not solve yet and people on
> #cloudant said it would be good to have the conversation on this list.
> 
> The bug description goes like this:
> 
>   bigcouch restarts on PUTs to a design document update handler that calls
>   JSON.parse(req.body), when the PUT request body is "large".
> 
> One strange thing is that "large" differs from server to server. On my local
> machine it works fine, i.e. I can use the update handler and PUT bodies up to
> the configured max_document_size (which defaults to 64 MB). On 3 distinct
> remote nodes we have, the maximum body sizes that make the server restart are
> different, but consistent, that is, the limit is always the same for one node
> (37, 26 and 8 MB).
> 
> Some info that might help the diagnosis:
> 
> * Tracing the request handling, I found that the last function that is called is
>   mochijson2:encoder(), and then the server restarts.
> 
> * I tried changing resource limits for the bigcouch process, but made no
>   difference.
> 
> * These are the first 150 KB of the erl_crash.dump for one of the restarts
>   (the whole thing is about 100 MB): http://paste.debian.net/82102/
>   It is clearly trying to allocate a lot more memory than it should.
> 
> * As rnewson told me on #cloudant that this might be related to memory bugs
>   on erlang, I built bigcouch using the latest erlang from wheezy-backports,
>   but i got the same behaviour. These are the first 150 KB of erl_crash.dump
>   for bigcouch built with R16B03: http://paste.debian.net/82103/
> 
> * This is the design doc update handler that calls JSON.parse(req.body) we are
>   using: http://tinyurl.com/mqf9bn2
> 
> Kind people on IRC have been helping with this and other issues, and asked to
> continue the conversation on this list. I know that cloudant's bigcouch is
> being merged into couchdb code, and hope that this problem may either (1) be
> already resolved, or (2) be some stupid thing I'm missing, or (3) still exist
> and we might help solve it.
> 
> If there's any other info I can provide, please let me know.
> 
> Thanks a lot for any help!
> db.
> 
> -- 
> 
>                                         ----  __o
>                                        ---- _`\<,_
>                                       ---- (*)/ (*)