You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Robert Kowalski <ro...@kowalski.gd> on 2015/09/07 11:48:01 UTC

Re: CouchDB got 100% slower in the past 3 weeks

Hi folks,

the performance issues seem to be fixed for COUCHDB-2796.

Here is how CouchDB internals look like for an document update after the patch:

https://dl.dropboxusercontent.com/u/1809262/flame-couch-epi-update-doc-fixed.svg

The heavy bar for couch_epi_functions_gen:providers/4 in the middle
completely disappeared.

You may wonder why all other bars in the svgs are now wider. Given the
fact that we are always watching one operation to complete in a
flamegraph which splits the calls up in percentage of CPU time this is
the expected behaviour for a performance improve.

When we shrink an perfomance-heavy call in the graph from 30% to lets
say 0.5% the others fall more into account on the next graph. So in
this case the growth of all other bars other than the ones from
couch-epi is a good thing.

Same for the calls in the flamegraph for a simple GET of a document,
which previously added up to 33% CPU time for the given operation
(reading and returning a doc):

https://dl.dropboxusercontent.com/u/1809262/flame-epi-fixed.svg


Here are the original flamegraphs from Sunday, in case you missed them:

Update doc:  https://dl.dropboxusercontent.com/u/1809262/flame-couch-epi-update-doc.svg
Simple GET: https://dl.dropboxusercontent.com/u/1809262/flame-epi.svg

The PouchDB project also confirmed that the performance decrease is
fixed now, but it seems some attachment related tests are failing now.

Big thanks to everyone involved fixing this! :)

Best,
Robert

PS: in case if you are interested in an intro to flamegraphs and
profiling, this war-story [1] is quite interesting. It is not Erlang
but the concept is language agnostic and therefore easy to apply to
Erlang

[1] http://techblog.netflix.com/2014/11/nodejs-in-flames.html

On Sun, Aug 30, 2015 at 11:15 PM, Robert Kowalski <ro...@kowalski.gd> wrote:
> Hi list,
>
> I got pinged as our friends from PouchDB notices that their testsuite
> with CouchDB 2 as a backend suddenly takes 100% longer (20mins instead
> of 10). [1] Because the diff was so significant I got really curious
> and worried about it.
>
> This testsuite just took 10min
>
> https://travis-ci.org/pouchdb/pouchdb/jobs/74585676 (22 days old)
>
> vs
>
> https://travis-ci.org/pouchdb/pouchdb/jobs/77865796 (yesterday)
>
> which took like the other runs these days 20min.
>
> I wasn't really sure, maybe the travis VMs changed. Or Pouch.
>
> Based on the report I created a few flamegraphs to poke around what
> changed in CouchDBs internals. They look quite different to the ones I
> created in the past weeks:
>
> https://dl.dropboxusercontent.com/u/1809262/flame-couch-epi-update-doc.svg
>
> this flamegraphs shows the update of a doc. couch-epi takes 33% of the
> time and blocks.
>
> https://dl.dropboxusercontent.com/u/1809262/flame-epi.svg
>
> in this flamegraph I receive a document. I have 3 blocking calls to
> couch_epi, adding up to 21% time of the request.
>
> The report from Nolan (perf decrease of 100% in a timeframe from 3
> weeks) fits into the timeframe where we added couch_epi. As almost
> every module uses couch_epi the performance decrease of almost all
> APIs also fits into the scheme. And I think the flamegraphs show that
> the additional time is spent in couch_epi.
>
>
> - I see that couch_epi uses the codeserver internally, would it be
> possible to use faster ETS tables?
> - Anything else we could do?
>
> Best,
> Robert
>
> [1] https://github.com/pouchdb/pouchdb/issues/4209#issuecomment-135964232