You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Greg Tarsa <gt...@axialproject.com> on 2016/03/07 21:27:02 UTC

CouchDB crash during compaction with no log messages

We have a set of couchdb databases that we use to collect user information for various purposes.  I am inheriting this configuration from a predecessor and am relatively new to couchdb.

Whenever we attempt to compact the databases, the server crashes without any messages either in the couchdb log or the system logs.  This is running in an AWS instance with an EBS volume.

Experiments have shown that if the instance is configured with instance storage (ephemeral storage that disappears when the instance disappears) then this operation works properly.   But we would like to use larger volumes and have persistence.

When the instance is configured with an external EBS volume, then we see the server crash described above.

I have searched the web for “couchdb compaction crash no log” and not found anything helpful.

It seems like compacting while running should not be failing at all, much less silently, so I am looking for insights to the problem, or solutions if such exist.

Configuration and log info is below.

Any help would be appreciated.

Thanks,
Greg


---------------------------------------------------------

CouchDB version: 1.6.1
OS: RHEL 6.6

---------------------------------------------------------

Here is a directory of the databases as the time of the crash:

cat bad.couch.dbinfo.txt 
total 15400740
     12 -rw-r--r--. 1 couchdb couchdb       8297 Jan 20 16:31 _users.couch
     16 -rw-r--r--. 1 couchdb couchdb      12393 Jan 20 16:33 _replicator.couch
  21060 -rw-r--r--. 1 couchdb couchdb   21557368 Mar  7 11:57 biometrics.couch
 781136 -rw-r--r--. 1 couchdb couchdb  799875192 Mar  7 12:00 fitness.couch
 954244 -rw-r--r--. 1 couchdb couchdb  977137784 Mar  7 12:05 nutrition.couch
8419624 -rw-r--r--. 1 couchdb couchdb 8621678721 Mar  7 12:06 routine.couch
 390796 -rw-r--r--. 1 couchdb couchdb  400167032 Mar  7 12:06 sleep.couch
 217932 -rw-r--r--. 1 couchdb couchdb  223154296 Mar  7 12:06 weight.couch
4614884 -rw-r--r--. 1 couchdb couchdb 4725629060 Mar  7 12:06 trackers.couch
      4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 fitness.couch.compact
      4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 nutrition.couch.compact
      4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 routine.couch.compact
     64 -rw-r--r--. 1 couchdb couchdb      61551 Mar  7 12:41 diabetes.couch
      4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 sleep.couch.compact
     12 -rw-r--r--. 1 couchdb couchdb       8300 Mar  7 12:41 tobacco_cessation.couch
      4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 users.couch
      4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 weight.couch.compact
    152 -rw-r--r--. 1 couchdb couchdb     151797 Mar  7 12:42 trackers.couch.compact
    784 -rw-r--r--. 1 couchdb couchdb     801865 Mar  7 12:42 biometrics.couch.compact

---------------------------------------------------------

Here is the contents of the log at the time of the crash:

[Mon, 07 Mar 2016 17:25:32 GMT] [info] [<0.31.0>] Apache CouchDB has started on http://0.0.0.0:5984/
[Mon, 07 Mar 2016 17:25:32 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
[Mon, 07 Mar 2016 17:25:33 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
[Mon, 07 Mar 2016 17:25:34 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
[Mon, 07 Mar 2016 17:25:34 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
[Mon, 07 Mar 2016 17:25:34 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
[Mon, 07 Mar 2016 17:25:34 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
[Mon, 07 Mar 2016 17:25:35 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
[Mon, 07 Mar 2016 17:25:37 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
[Mon, 07 Mar 2016 17:25:37 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
[Mon, 07 Mar 2016 17:25:38 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
[Mon, 07 Mar 2016 17:25:38 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
[Mon, 07 Mar 2016 17:25:40 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
[Mon, 07 Mar 2016 17:25:45 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
[Mon, 07 Mar 2016 17:25:50 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
[Mon, 07 Mar 2016 17:25:50 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
[Mon, 07 Mar 2016 17:25:50 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
[Mon, 07 Mar 2016 17:25:52 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
[Mon, 07 Mar 2016 17:25:52 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
[Mon, 07 Mar 2016 17:25:54 GMT] [info] [<0.114.0>] 10.1.1.12 - - GET /users/_changes?feed=continuous&style=all_docs&since=0&heartbeat=10000 200
[Mon, 07 Mar 2016 17:25:54 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
[Mon, 07 Mar 2016 17:25:54 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
[Mon, 07 Mar 2016 17:25:55 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
[Mon, 07 Mar 2016 17:26:00 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
[Mon, 07 Mar 2016 17:26:00 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
[Mon, 07 Mar 2016 17:26:01 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
[Mon, 07 Mar 2016 17:26:05 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
... [numerous GET /users/ 200 messages removed for brevity] ...
[Mon, 07 Mar 2016 17:41:51 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
[Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.152.0>] 127.0.0.1 - - GET /_all_dbs 200
[Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.1157.0>] Starting compaction for db "biometrics"
[Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.151.0>] 127.0.0.1 - - POST /biometrics/_compact 202
[Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.150.0>] 127.0.0.1 - - POST /biometrics/_view_cleanup 202
[Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.1175.0>] Starting compaction for db "diabetes"
[Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.198.0>] 127.0.0.1 - - POST /diabetes/_compact 202
[Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.197.0>] 127.0.0.1 - - POST /diabetes/_view_cleanup 202
[Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.1193.0>] Starting compaction for db "fitness"
[Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.118.0>] 127.0.0.1 - - POST /fitness/_compact 202
[Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.119.0>] 127.0.0.1 - - POST /fitness/_view_cleanup 202
[Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.1211.0>] Starting compaction for db "nutrition"
[Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.120.0>] 127.0.0.1 - - POST /nutrition/_compact 202
[Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.121.0>] 127.0.0.1 - - POST /nutrition/_view_cleanup 202
[Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1229.0>] Starting compaction for db "routine"
[Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.122.0>] 127.0.0.1 - - POST /routine/_compact 202
[Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.115.0>] 127.0.0.1 - - POST /routine/_view_cleanup 202
[Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1175.0>] Compaction for db "diabetes" completed.
[Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1254.0>] Starting compaction for db "sleep"
[Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.116.0>] 127.0.0.1 - - POST /sleep/_compact 202
[Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.117.0>] 127.0.0.1 - - POST /sleep/_view_cleanup 202
[Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1272.0>] Starting compaction for db "tobacco_cessation"
[Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.184.0>] 127.0.0.1 - - POST /tobacco_cessation/_compact 202
[Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.183.0>] 127.0.0.1 - - POST /tobacco_cessation/_view_cleanup 202
[Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1290.0>] Starting compaction for db "trackers"
[Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.182.0>] 127.0.0.1 - - POST /trackers/_compact 202
[Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1272.0>] Compaction for db "tobacco_cessation" completed.
[Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1151.0>] 127.0.0.1 - - POST /trackers/_view_cleanup 202
[Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.162.0>] Starting compaction for db "users"
[Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1152.0>] 127.0.0.1 - - POST /users/_compact 202
[Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1168.0>] 127.0.0.1 - - POST /users/_view_cleanup 202
[Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.162.0>] Compaction for db "users" completed.
[Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1329.0>] Starting compaction for db "weight"
[Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1170.0>] 127.0.0.1 - - POST /weight/_compact 202
[Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1187.0>] 127.0.0.1 - - POST /weight/_view_cleanup 202
[Mon, 07 Mar 2016 17:41:56 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
[Mon, 07 Mar 2016 17:42:01 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
[Mon, 07 Mar 2016 17:42:06 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
[Mon, 07 Mar 2016 17:42:11 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200

--------------------------------------------------


Re: CouchDB crash during compaction with no log messages

Posted by Sharath <sh...@gmail.com>.
This may not help but.

Over the last year and a bit of using CouchDB I've gleaned a few things:

My production database has multiple databases each with documents numbering
over 13,000,000. Compaction happens weekly for all the databases in
parallel.

I used to use OSX 10.10 as a developer instance (running the same
production database) after upgrading to 10.11 the CouchDB process used to
die randomly.

A separate dedicated machine was created using Ubuntu 15.10.

Migrating the database went well with no issues. However upgrading the
Ubuntu with the latest patches brought back the random death syndrome.

The current database is live on stock Ubuntu 15.10 and CouchDB for the past
~60 days without a problem.

I do not think CouchDB has been validated with all the patches. However for
my need it is a doing a great job.

The only parameters I has to tune were:

checkpoint_after = 524288000
doc_buffer_size = 52428800

ref: http://qnalist.com/questions/5836043/couchdb-database-size


-Sharath

On Sun, Mar 13, 2016 at 10:08 AM, Greg Tarsa <gt...@axialproject.com>
wrote:

> Current Summary: It was not clear how to proceed with determining how much
> memory was needed for our application and the abrupt failures we are seeing
> were not giving us any data for how to move forward.
> For the record, we worked around the problem by using a Raid-1 volume that
> combines the instance storage with an EBS volume.  This seems to mitigate
> the issue and gives us a persistent store that will outlast the life of the
> AWS instance. This is not ideal, but it works for now.  Long-term we will
> likely move off CouchDB and move to using jsonb with Postgresql.  A
> database that crashes on memory errors without leaving a log trace is not a
> good production solution for us.
>
> Thanks,
> Greg
>
>
>
> > On Mar 8, 2016, at 3:33 PM, Greg Tarsa <gt...@axialproject.com> wrote:
> >
> > All the compaction request are made at the same time.  So I assume they
> are running in parallel.
> >
> > Does the out of memory indicate a configuration problem?  Since only the
> interactive session ends with the message and it is not in any log and the
> system did not kill the process for memory reasons, I am thinking there is
> a couchdb malfunction involved here.  Also, it works fine with an instance
> volume and initial results from experiments we are running here with a
> Raid-1 volume that is a hybrid instance/EBS volume appear to be working.
> >
> > If I need more memory, is there documentation or discussion somewhere
> that would guide me as to how much I would need?
> >
> > Thanks,
> > Greg
> >
> >
> >> On Mar 8, 2016, at 1:41 PM, Jan Lehnardt <ja...@apache.org> wrote:
> >>
> >>>
> >>> On 08 Mar 2016, at 18:07, Greg Tarsa <gt...@axialproject.com> wrote:
> >>>
> >>> Hi Jan,
> >>>
> >>> Thanks for your quick reply to my question.  I have some answers to
> your questions and some new information that I got from running couched
> interactively.
> >>>
> >>>
> >>>> Are there any other things going on on the VM, when you do this?
> >>> The VM also hosts a MySQL server, but I see no evidence that this is a
> contributing cause for the couch issue.
> >>>
> >>>>
> >>>> Can you reliably reproduce this behavior?
> >>> I can reliably reproduce it.
> >>>
> >>>>
> >>>> Are there other correlating factors (like does this always happen at
> the same time / due to a cronjob, etc)?
> >>> It can be repeated by re-starting couched and re-requesting the
> compaction on all databases.  It is not time-dependent.
> >>
> >> Are you running compaction on the databases in parallel or sequentially?
> >>
> >>
> >>
> >>>
> >>>>
> >>>> Can you set your CouchDB log level to debug and see if that gets you
> more info? (curl -X PUT http://[user:pass@]
> 127.0.0.1:5984/_config/log/level -d '"debug"’).
> >>> (see below)
> >>
> >> The paste ends with an allocation error which points to you running out
> of memory.
> >>
> >> Best
> >> Jan
> >> --
> >>
> >>
> >>>
> >>>>
> >>>> Is it possible for you to share these database files (publicly or in
> private)?
> >>> The databases contain health data and I am unable to share them.
> >>>
> >>>>
> >>>> What are your disk usage levels before/during compaction?
> >>> Plenty of disk in this case.  The compacted data is in the 2G range.
> The problem does not seem to be storage-size related.  We are able to
> compact during regular operation when using a 40G instance volume.  Unable
> to compact when using a 120G EBS volume.
> >>>
> >>>>
> >>>> Are you getting anything in the system log(s)?
> >>> That is what is odd.  There is nothing in the system logs or the
> couchdb logs.
> >>>
> >>> Bonus data:
> >>>
> >>> I ran the debug experiment with couchdb running interactively.  The
> session text is below, but note that I also got the following error message
> and an erlang core dump:
> >>>
> >>>  Crash dump was written to: erl_crash.dump
> >>>  eheap_alloc: Cannot allocate 156725600 bytes of memory (of type
> "old_heap").
> >>>  Aborted (core dumped)
> >>>
> >>> The dump is ~500MB.
> >>>
> >>> Here is the session text:
> >>>
> >>>
> >>> [gtarsa@prod-db01 ~]$  sudo /usr/local/bin/couchdb
> >>>
> >>> Apache CouchDB 1.6.1 (LogLevel=info) is starting.
> >>> Apache CouchDB has started. Time to relax.
> >>>
> >>> [info] [<0.31.0>] Apache CouchDB has started on http://0.0.0.0:5984/
> >>> [info] [<0.120.0>] 127.0.0.1 - - GET /_config/level 200
> >>> [info] [<0.678.0>] 127.0.0.1 - - GET /_config/log/level 200
> >>> [error] [<0.803.0>] attempted upload of invalid JSON (set log_level to
> debug to log it)
> >>> [info] [<0.803.0>] 127.0.0.1 - - PUT /_config/log/level 400
> >>>
> >>> =SUPERVISOR REPORT==== 8-Mar-2016::11:09:22 ===
> >>>   Supervisor: {local,couch_primary_services}
> >>>   Context:    child_terminated
> >>>   Reason:     normal
> >>>   Offender:   [{pid,<0.92.0>},
> >>>                {name,couch_log},
> >>>                {mfargs,{couch_log,start_link,[]}},
> >>>                {restart_type,permanent},
> >>>                {shutdown,brutal_kill},
> >>>                {child_type,worker}]
> >>>
> >>> [debug] [<0.117.0>] 'PUT' /_config/log/level {1,1} from "127.0.0.1"
> >>> Headers: [{'Accept',"*/*"},
> >>>        {'Content-Length',"7"},
> >>>        {'Content-Type',"application/x-www-form-urlencoded"},
> >>>        {'Host',"127.0.0.1:5984"},
> >>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu)
> libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> >>> [debug] [<0.117.0>] OAuth Params: []
> >>>
> >>> =SUPERVISOR REPORT==== 8-Mar-2016::11:09:25 ===
> >>>   Supervisor: {local,couch_primary_services}
> >>>   Context:    child_terminated
> >>>   Reason:     normal
> >>>   Offender:   [{pid,<0.828.0>},
> >>>                {name,couch_log},
> >>>                {mfargs,{couch_log,start_link,[]}},
> >>>                {restart_type,permanent},
> >>>                {shutdown,brutal_kill},
> >>>                {child_type,worker}]
> >>>
> >>> [debug] [<0.826.0>] 'GET' /_all_dbs {1,1} from "127.0.0.1"
> >>> Headers: [{'Accept',"*/*"},
> >>>        {'Content-Type',"application/json"},
> >>>        {'Host',"localhost:5984"},
> >>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu)
> libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> >>> [debug] [<0.826.0>] OAuth Params: []
> >>> [info] [<0.826.0>] 127.0.0.1 - - GET /_all_dbs 200
> >>> [debug] [<0.833.0>] 'POST' /biometrics/_compact {1,1} from "127.0.0.1"
> >>> Headers: [{'Accept',"*/*"},
> >>>        {'Content-Type',"application/json"},
> >>>        {'Host',"localhost:5984"},
> >>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu)
> libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> >>> [debug] [<0.833.0>] OAuth Params: []
> >>> [info] [<0.872.0>] Starting compaction for db "biometrics"
> >>> [debug] [<0.877.0>] Compaction process spawned for db "biometrics"
> >>> [info] [<0.833.0>] 127.0.0.1 - - POST /biometrics/_compact 202
> >>> [debug] [<0.115.0>] 'POST' /biometrics/_view_cleanup {1,1} from
> "127.0.0.1"
> >>> Headers: [{'Accept',"*/*"},
> >>>        {'Content-Type',"application/json"},
> >>>        {'Host',"localhost:5984"},
> >>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu)
> libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> >>> [debug] [<0.115.0>] OAuth Params: []
> >>> [info] [<0.115.0>] 127.0.0.1 - - POST /biometrics/_view_cleanup 202
> >>> [debug] [<0.114.0>] 'POST' /diabetes/_compact {1,1} from "127.0.0.1"
> >>> Headers: [{'Accept',"*/*"},
> >>>        {'Content-Type',"application/json"},
> >>>        {'Host',"localhost:5984"},
> >>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu)
> libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> >>> [debug] [<0.114.0>] OAuth Params: []
> >>> [info] [<0.890.0>] Starting compaction for db "diabetes"
> >>> [debug] [<0.895.0>] Compaction process spawned for db "diabetes"
> >>> [info] [<0.114.0>] 127.0.0.1 - - POST /diabetes/_compact 202
> >>> [debug] [<0.113.0>] 'POST' /diabetes/_view_cleanup {1,1} from
> "127.0.0.1"
> >>> Headers: [{'Accept',"*/*"},
> >>>        {'Content-Type',"application/json"},
> >>>        {'Host',"localhost:5984"},
> >>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu)
> libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> >>> [debug] [<0.113.0>] OAuth Params: []
> >>> [info] [<0.113.0>] 127.0.0.1 - - POST /diabetes/_view_cleanup 202
> >>> [debug] [<0.112.0>] 'POST' /fitness/_compact {1,1} from "127.0.0.1"
> >>> Headers: [{'Accept',"*/*"},
> >>>        {'Content-Type',"application/json"},
> >>>        {'Host',"localhost:5984"},
> >>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu)
> libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> >>> [debug] [<0.112.0>] OAuth Params: []
> >>> [info] [<0.909.0>] Starting compaction for db "fitness"
> >>> [debug] [<0.914.0>] Compaction process spawned for db "fitness"
> >>> [info] [<0.112.0>] 127.0.0.1 - - POST /fitness/_compact 202
> >>> [debug] [<0.111.0>] 'POST' /fitness/_view_cleanup {1,1} from
> "127.0.0.1"
> >>> Headers: [{'Accept',"*/*"},
> >>>        {'Content-Type',"application/json"},
> >>>        {'Host',"localhost:5984"},
> >>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu)
> libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> >>> [debug] [<0.111.0>] OAuth Params: []
> >>> [info] [<0.111.0>] 127.0.0.1 - - POST /fitness/_view_cleanup 202
> >>> [debug] [<0.110.0>] 'POST' /nutrition/_compact {1,1} from "127.0.0.1"
> >>> Headers: [{'Accept',"*/*"},
> >>>        {'Content-Type',"application/json"},
> >>>        {'Host',"localhost:5984"},
> >>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu)
> libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> >>> [debug] [<0.110.0>] OAuth Params: []
> >>> [info] [<0.927.0>] Starting compaction for db "nutrition"
> >>> [debug] [<0.932.0>] Compaction process spawned for db "nutrition"
> >>> [info] [<0.110.0>] 127.0.0.1 - - POST /nutrition/_compact 202
> >>> [debug] [<0.109.0>] 'POST' /nutrition/_view_cleanup {1,1} from
> "127.0.0.1"
> >>> Headers: [{'Accept',"*/*"},
> >>>        {'Content-Type',"application/json"},
> >>>        {'Host',"localhost:5984"},
> >>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu)
> libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> >>> [debug] [<0.109.0>] OAuth Params: []
> >>> [info] [<0.109.0>] 127.0.0.1 - - POST /nutrition/_view_cleanup 202
> >>> [debug] [<0.108.0>] 'POST' /routine/_compact {1,1} from "127.0.0.1"
> >>> Headers: [{'Accept',"*/*"},
> >>>        {'Content-Type',"application/json"},
> >>>        {'Host',"localhost:5984"},
> >>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu)
> libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> >>> [debug] [<0.108.0>] OAuth Params: []
> >>> [info] [<0.945.0>] Starting compaction for db "routine"
> >>> [debug] [<0.950.0>] Compaction process spawned for db "routine"
> >>> [info] [<0.108.0>] 127.0.0.1 - - POST /routine/_compact 202
> >>> [debug] [<0.123.0>] 'POST' /routine/_view_cleanup {1,1} from
> "127.0.0.1"
> >>> Headers: [{'Accept',"*/*"},
> >>>        {'Content-Type',"application/json"},
> >>>        {'Host',"localhost:5984"},
> >>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu)
> libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> >>> [debug] [<0.123.0>] OAuth Params: []
> >>> [info] [<0.123.0>] 127.0.0.1 - - POST /routine/_view_cleanup 202
> >>> [debug] [<0.122.0>] 'POST' /sleep/_compact {1,1} from "127.0.0.1"
> >>> Headers: [{'Accept',"*/*"},
> >>>        {'Content-Type',"application/json"},
> >>>        {'Host',"localhost:5984"},
> >>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu)
> libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> >>> [debug] [<0.122.0>] OAuth Params: []
> >>> [info] [<0.963.0>] Starting compaction for db "sleep"
> >>> [debug] [<0.87.0>] New task status for <0.895.0>: [{changes_done,113},
> >>>
>  {database,<<"diabetes">>},
> >>>                                                 {progress,100},
> >>>
>  {started_on,1457453394},
> >>>                                                 {total_changes,113},
> >>>
>  {type,database_compaction},
> >>>
>  {updated_on,1457453395}]
> >>> [debug] [<0.968.0>] Compaction process spawned for db "sleep"
> >>> [info] [<0.122.0>] 127.0.0.1 - - POST /sleep/_compact 202
> >>> [debug] [<0.385.0>] 'POST' /sleep/_view_cleanup {1,1} from "127.0.0.1"
> >>> Headers: [{'Accept',"*/*"},
> >>>        {'Content-Type',"application/json"},
> >>>        {'Host',"localhost:5984"},
> >>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu)
> libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> >>> [debug] [<0.385.0>] OAuth Params: []
> >>> [debug] [<0.890.0>] CouchDB swapping files
> /usr/local/var/lib/couchdb/diabetes.couch and
> /usr/local/var/lib/couchdb/diabetes.couch.compact.
> >>> [info] [<0.385.0>] 127.0.0.1 - - POST /sleep/_view_cleanup 202
> >>> [info] [<0.890.0>] Compaction for db "diabetes" completed.
> >>> [debug] [<0.635.0>] 'POST' /tobacco_cessation/_compact {1,1} from
> "127.0.0.1"
> >>> Headers: [{'Accept',"*/*"},
> >>>        {'Content-Type',"application/json"},
> >>>        {'Host',"localhost:5984"},
> >>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu)
> libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> >>> [debug] [<0.635.0>] OAuth Params: []
> >>> [info] [<0.987.0>] Starting compaction for db "tobacco_cessation"
> >>> [debug] [<0.992.0>] Compaction process spawned for db
> "tobacco_cessation"
> >>> [info] [<0.635.0>] 127.0.0.1 - - POST /tobacco_cessation/_compact 202
> >>> [debug] [<0.87.0>] New task status for <0.992.0>: [{changes_done,1},
> >>>                                                 {database,
> >>>
> <<"tobacco_cessation">>},
> >>>                                                 {progress,100},
> >>>
>  {started_on,1457453395},
> >>>                                                 {total_changes,1},
> >>>
>  {type,database_compaction},
> >>>
>  {updated_on,1457453395}]
> >>> [debug] [<0.640.0>] 'POST' /tobacco_cessation/_view_cleanup {1,1} from
> "127.0.0.1"
> >>> Headers: [{'Accept',"*/*"},
> >>>        {'Content-Type',"application/json"},
> >>>        {'Host',"localhost:5984"},
> >>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu)
> libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> >>> [debug] [<0.640.0>] OAuth Params: []
> >>> [info] [<0.640.0>] 127.0.0.1 - - POST /tobacco_cessation/_view_cleanup
> 202
> >>> [debug] [<0.987.0>] CouchDB swapping files
> /usr/local/var/lib/couchdb/tobacco_cessation.couch and
> /usr/local/var/lib/couchdb/tobacco_cessation.couch.compact.
> >>> [info] [<0.987.0>] Compaction for db "tobacco_cessation" completed.
> >>> [debug] [<0.815.0>] 'POST' /trackers/_compact {1,1} from "127.0.0.1"
> >>> Headers: [{'Accept',"*/*"},
> >>>        {'Content-Type',"application/json"},
> >>>        {'Host',"localhost:5984"},
> >>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu)
> libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> >>> [debug] [<0.815.0>] OAuth Params: []
> >>> [info] [<0.1011.0>] Starting compaction for db "trackers"
> >>> [debug] [<0.1016.0>] Compaction process spawned for db "trackers"
> >>> [info] [<0.815.0>] 127.0.0.1 - - POST /trackers/_compact 202
> >>> [debug] [<0.866.0>] 'POST' /trackers/_view_cleanup {1,1} from
> "127.0.0.1"
> >>> Headers: [{'Accept',"*/*"},
> >>>        {'Content-Type',"application/json"},
> >>>        {'Host',"localhost:5984"},
> >>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu)
> libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> >>> [debug] [<0.866.0>] OAuth Params: []
> >>> [info] [<0.866.0>] 127.0.0.1 - - POST /trackers/_view_cleanup 202
> >>> [debug] [<0.867.0>] 'POST' /users/_compact {1,1} from "127.0.0.1"
> >>> Headers: [{'Accept',"*/*"},
> >>>        {'Content-Type',"application/json"},
> >>>        {'Host',"localhost:5984"},
> >>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu)
> libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> >>> [debug] [<0.867.0>] OAuth Params: []
> >>> [info] [<0.1029.0>] Starting compaction for db "users"
> >>> [debug] [<0.1034.0>] Compaction process spawned for db "users"
> >>> [info] [<0.867.0>] 127.0.0.1 - - POST /users/_compact 202
> >>> [debug] [<0.87.0>] New task status for <0.1034.0>: [{changes_done,0},
> >>>
> {database,<<"users">>},
> >>>                                                  {progress,0},
> >>>
> {started_on,1457453395},
> >>>                                                  {total_changes,0},
> >>>
> {type,database_compaction},
> >>>
> {updated_on,1457453395}]
> >>> [debug] [<0.884.0>] 'POST' /users/_view_cleanup {1,1} from "127.0.0.1"
> >>> Headers: [{'Accept',"*/*"},
> >>>        {'Content-Type',"application/json"},
> >>>        {'Host',"localhost:5984"},
> >>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu)
> libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> >>> [debug] [<0.884.0>] OAuth Params: []
> >>> [info] [<0.884.0>] 127.0.0.1 - - POST /users/_view_cleanup 202
> >>> [debug] [<0.1029.0>] CouchDB swapping files
> /usr/local/var/lib/couchdb/users.couch and
> /usr/local/var/lib/couchdb/users.couch.compact.
> >>> [info] [<0.1029.0>] Compaction for db "users" completed.
> >>> [debug] [<0.885.0>] 'POST' /weight/_compact {1,1} from "127.0.0.1"
> >>> Headers: [{'Accept',"*/*"},
> >>>        {'Content-Type',"application/json"},
> >>>        {'Host',"localhost:5984"},
> >>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu)
> libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> >>> [debug] [<0.885.0>] OAuth Params: []
> >>> [info] [<0.1053.0>] Starting compaction for db "weight"
> >>> [debug] [<0.1058.0>] Compaction process spawned for db "weight"
> >>> [info] [<0.885.0>] 127.0.0.1 - - POST /weight/_compact 202
> >>> [debug] [<0.902.0>] 'POST' /weight/_view_cleanup {1,1} from "127.0.0.1"
> >>> Headers: [{'Accept',"*/*"},
> >>>        {'Content-Type',"application/json"},
> >>>        {'Host',"localhost:5984"},
> >>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu)
> libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> >>> [debug] [<0.902.0>] OAuth Params: []
> >>> [info] [<0.902.0>] 127.0.0.1 - - POST /weight/_view_cleanup 202
> >>>
> >>> Crash dump was written to: erl_crash.dump
> >>> eheap_alloc: Cannot allocate 156725600 bytes of memory (of type
> "old_heap").
> >>>
> >>> [gtarsa@prod-db01 ~]$
> >>>
> >>>
> >>>
> >>>> On Mar 7, 2016, at 4:09 PM, Jan Lehnardt <ja...@apache.org> wrote:
> >>>>
> >>>> Heya Greg,
> >>>>
> >>>> this should definitely not happen at all, regardless of AWS storage
> type.
> >>>>
> >>>> Are there any other things going on on the VM, when you do this?
> >>>>
> >>>> Can you reliably reproduce this behaviour?
> >>>>
> >>>> Are there other correlating factors (like does this always happen at
> the same time / due to a cronjob, etc)?
> >>>>
> >>>> Can you set your CouchDB log level to debug and see if that gets you
> more info? (curl -X PUT http://[user:pass@]
> 127.0.0.1:5984/_config/log/level -d '"debug"').
> >>>>
> >>>> Is it possible for you to share these database files (publicly or in
> private)?
> >>>>
> >>>> What are your disk usage levels before/during compaction?
> >>>>
> >>>> Are you getting anything in the system log(s)?
> >>>>
> >>>> Best
> >>>> Jan
> >>>> --
> >>>> Professional Support for Apache CouchDB:
> >>>> https://neighbourhood.ie/couchdb-support/
> >>>>
> >>>>
> >>>>> On 07 Mar 2016, at 21:27, Greg Tarsa <gt...@axialproject.com>
> wrote:
> >>>>>
> >>>>> We have a set of couchdb databases that we use to collect user
> information for various purposes.  I am inheriting this configuration from
> a predecessor and am relatively new to couchdb.
> >>>>>
> >>>>> Whenever we attempt to compact the databases, the server crashes
> without any messages either in the couchdb log or the system logs.  This is
> running in an AWS instance with an EBS volume.
> >>>>>
> >>>>> Experiments have shown that if the instance is configured with
> instance storage (ephemeral storage that disappears when the instance
> disappears) then this operation works properly.   But we would like to use
> larger volumes and have persistence.
> >>>>>
> >>>>> When the instance is configured with an external EBS volume, then we
> see the server crash described above.
> >>>>>
> >>>>> I have searched the web for “couchdb compaction crash no log” and
> not found anything helpful.
> >>>>>
> >>>>> It seems like compacting while running should not be failing at all,
> much less silently, so I am looking for insights to the problem, or
> solutions if such exist.
> >>>>>
> >>>>> Configuration and log info is below.
> >>>>>
> >>>>> Any help would be appreciated.
> >>>>>
> >>>>> Thanks,
> >>>>> Greg
> >>>>>
> >>>>>
> >>>>> ---------------------------------------------------------
> >>>>>
> >>>>> CouchDB version: 1.6.1
> >>>>> OS: RHEL 6.6
> >>>>>
> >>>>> ---------------------------------------------------------
> >>>>>
> >>>>> Here is a directory of the databases as the time of the crash:
> >>>>>
> >>>>> cat bad.couch.dbinfo.txt
> >>>>> total 15400740
> >>>>> 12 -rw-r--r--. 1 couchdb couchdb       8297 Jan 20 16:31 _users.couch
> >>>>> 16 -rw-r--r--. 1 couchdb couchdb      12393 Jan 20 16:33
> _replicator.couch
> >>>>> 21060 -rw-r--r--. 1 couchdb couchdb   21557368 Mar  7 11:57
> biometrics.couch
> >>>>> 781136 -rw-r--r--. 1 couchdb couchdb  799875192 Mar  7 12:00
> fitness.couch
> >>>>> 954244 -rw-r--r--. 1 couchdb couchdb  977137784 Mar  7 12:05
> nutrition.couch
> >>>>> 8419624 -rw-r--r--. 1 couchdb couchdb 8621678721 Mar  7 12:06
> routine.couch
> >>>>> 390796 -rw-r--r--. 1 couchdb couchdb  400167032 Mar  7 12:06
> sleep.couch
> >>>>> 217932 -rw-r--r--. 1 couchdb couchdb  223154296 Mar  7 12:06
> weight.couch
> >>>>> 4614884 -rw-r--r--. 1 couchdb couchdb 4725629060 Mar  7 12:06
> trackers.couch
> >>>>>  4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41
> fitness.couch.compact
> >>>>>  4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41
> nutrition.couch.compact
> >>>>>  4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41
> routine.couch.compact
> >>>>> 64 -rw-r--r--. 1 couchdb couchdb      61551 Mar  7 12:41
> diabetes.couch
> >>>>>  4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41
> sleep.couch.compact
> >>>>> 12 -rw-r--r--. 1 couchdb couchdb       8300 Mar  7 12:41
> tobacco_cessation.couch
> >>>>>  4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 users.couch
> >>>>>  4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41
> weight.couch.compact
> >>>>> 152 -rw-r--r--. 1 couchdb couchdb     151797 Mar  7 12:42
> trackers.couch.compact
> >>>>> 784 -rw-r--r--. 1 couchdb couchdb     801865 Mar  7 12:42
> biometrics.couch.compact
> >>>>>
> >>>>> ---------------------------------------------------------
> >>>>>
> >>>>> Here is the contents of the log at the time of the crash:
> >>>>>
> >>>>> [Mon, 07 Mar 2016 17:25:32 GMT] [info] [<0.31.0>] Apache CouchDB has
> started on http://0.0.0.0:5984/
> >>>>> [Mon, 07 Mar 2016 17:25:32 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET
> /_active_tasks 200
> >>>>> [Mon, 07 Mar 2016 17:25:33 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET
> /favicon.ico 200
> >>>>> [Mon, 07 Mar 2016 17:25:34 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET
> /_active_tasks 200
> >>>>> [Mon, 07 Mar 2016 17:25:34 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET
> /favicon.ico 200
> >>>>> [Mon, 07 Mar 2016 17:25:34 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET
> /_active_tasks 200
> >>>>> [Mon, 07 Mar 2016 17:25:34 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET
> /favicon.ico 200
> >>>>> [Mon, 07 Mar 2016 17:25:35 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET
> /users/ 200
> >>>>> [Mon, 07 Mar 2016 17:25:37 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET
> /_active_tasks 200
> >>>>> [Mon, 07 Mar 2016 17:25:37 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET
> /favicon.ico 200
> >>>>> [Mon, 07 Mar 2016 17:25:38 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET
> /_active_tasks 200
> >>>>> [Mon, 07 Mar 2016 17:25:38 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET
> /favicon.ico 200
> >>>>> [Mon, 07 Mar 2016 17:25:40 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET
> /users/ 200
> >>>>> [Mon, 07 Mar 2016 17:25:45 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET
> /users/ 200
> >>>>> [Mon, 07 Mar 2016 17:25:50 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET
> /_active_tasks 200
> >>>>> [Mon, 07 Mar 2016 17:25:50 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET
> /favicon.ico 200
> >>>>> [Mon, 07 Mar 2016 17:25:50 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET
> /users/ 200
> >>>>> [Mon, 07 Mar 2016 17:25:52 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET
> /_active_tasks 200
> >>>>> [Mon, 07 Mar 2016 17:25:52 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET
> /favicon.ico 200
> >>>>> [Mon, 07 Mar 2016 17:25:54 GMT] [info] [<0.114.0>] 10.1.1.12 - - GET
> /users/_changes?feed=continuous&style=all_docs&since=0&heartbeat=10000 200
> >>>>> [Mon, 07 Mar 2016 17:25:54 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET
> /_active_tasks 200
> >>>>> [Mon, 07 Mar 2016 17:25:54 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET
> /favicon.ico 200
> >>>>> [Mon, 07 Mar 2016 17:25:55 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET
> /users/ 200
> >>>>> [Mon, 07 Mar 2016 17:26:00 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET
> /users/ 200
> >>>>> [Mon, 07 Mar 2016 17:26:00 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET
> /_active_tasks 200
> >>>>> [Mon, 07 Mar 2016 17:26:01 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET
> /favicon.ico 200
> >>>>> [Mon, 07 Mar 2016 17:26:05 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET
> /users/ 200
> >>>>> ... [numerous GET /users/ 200 messages removed for brevity] ...
> >>>>> [Mon, 07 Mar 2016 17:41:51 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET
> /users/ 200
> >>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.152.0>] 127.0.0.1 - - GET
> /_all_dbs 200
> >>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.1157.0>] Starting
> compaction for db "biometrics"
> >>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.151.0>] 127.0.0.1 - -
> POST /biometrics/_compact 202
> >>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.150.0>] 127.0.0.1 - -
> POST /biometrics/_view_cleanup 202
> >>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.1175.0>] Starting
> compaction for db "diabetes"
> >>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.198.0>] 127.0.0.1 - -
> POST /diabetes/_compact 202
> >>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.197.0>] 127.0.0.1 - -
> POST /diabetes/_view_cleanup 202
> >>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.1193.0>] Starting
> compaction for db "fitness"
> >>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.118.0>] 127.0.0.1 - -
> POST /fitness/_compact 202
> >>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.119.0>] 127.0.0.1 - -
> POST /fitness/_view_cleanup 202
> >>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.1211.0>] Starting
> compaction for db "nutrition"
> >>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.120.0>] 127.0.0.1 - -
> POST /nutrition/_compact 202
> >>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.121.0>] 127.0.0.1 - -
> POST /nutrition/_view_cleanup 202
> >>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1229.0>] Starting
> compaction for db "routine"
> >>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.122.0>] 127.0.0.1 - -
> POST /routine/_compact 202
> >>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.115.0>] 127.0.0.1 - -
> POST /routine/_view_cleanup 202
> >>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1175.0>] Compaction for
> db "diabetes" completed.
> >>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1254.0>] Starting
> compaction for db "sleep"
> >>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.116.0>] 127.0.0.1 - -
> POST /sleep/_compact 202
> >>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.117.0>] 127.0.0.1 - -
> POST /sleep/_view_cleanup 202
> >>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1272.0>] Starting
> compaction for db "tobacco_cessation"
> >>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.184.0>] 127.0.0.1 - -
> POST /tobacco_cessation/_compact 202
> >>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.183.0>] 127.0.0.1 - -
> POST /tobacco_cessation/_view_cleanup 202
> >>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1290.0>] Starting
> compaction for db "trackers"
> >>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.182.0>] 127.0.0.1 - -
> POST /trackers/_compact 202
> >>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1272.0>] Compaction for
> db "tobacco_cessation" completed.
> >>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1151.0>] 127.0.0.1 - -
> POST /trackers/_view_cleanup 202
> >>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.162.0>] Starting
> compaction for db "users"
> >>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1152.0>] 127.0.0.1 - -
> POST /users/_compact 202
> >>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1168.0>] 127.0.0.1 - -
> POST /users/_view_cleanup 202
> >>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.162.0>] Compaction for db
> "users" completed.
> >>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1329.0>] Starting
> compaction for db "weight"
> >>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1170.0>] 127.0.0.1 - -
> POST /weight/_compact 202
> >>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1187.0>] 127.0.0.1 - -
> POST /weight/_view_cleanup 202
> >>>>> [Mon, 07 Mar 2016 17:41:56 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET
> /users/ 200
> >>>>> [Mon, 07 Mar 2016 17:42:01 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET
> /users/ 200
> >>>>> [Mon, 07 Mar 2016 17:42:06 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET
> /users/ 200
> >>>>> [Mon, 07 Mar 2016 17:42:11 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET
> /users/ 200
> >>>>>
> >>>>> --------------------------------------------------
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >> --
> >> Professional Support for Apache CouchDB:
> >> https://neighbourhood.ie/couchdb-support/
> >
>
>

Re: CouchDB crash during compaction with no log messages

Posted by Greg Tarsa <gt...@axialproject.com>.
Current Summary: It was not clear how to proceed with determining how much memory was needed for our application and the abrupt failures we are seeing were not giving us any data for how to move forward.
For the record, we worked around the problem by using a Raid-1 volume that combines the instance storage with an EBS volume.  This seems to mitigate the issue and gives us a persistent store that will outlast the life of the AWS instance. This is not ideal, but it works for now.  Long-term we will likely move off CouchDB and move to using jsonb with Postgresql.  A database that crashes on memory errors without leaving a log trace is not a good production solution for us.

Thanks,
Greg



> On Mar 8, 2016, at 3:33 PM, Greg Tarsa <gt...@axialproject.com> wrote:
> 
> All the compaction request are made at the same time.  So I assume they are running in parallel.
> 
> Does the out of memory indicate a configuration problem?  Since only the interactive session ends with the message and it is not in any log and the system did not kill the process for memory reasons, I am thinking there is a couchdb malfunction involved here.  Also, it works fine with an instance volume and initial results from experiments we are running here with a Raid-1 volume that is a hybrid instance/EBS volume appear to be working.
> 
> If I need more memory, is there documentation or discussion somewhere that would guide me as to how much I would need?
> 
> Thanks,
> Greg
> 
> 
>> On Mar 8, 2016, at 1:41 PM, Jan Lehnardt <ja...@apache.org> wrote:
>> 
>>> 
>>> On 08 Mar 2016, at 18:07, Greg Tarsa <gt...@axialproject.com> wrote:
>>> 
>>> Hi Jan,
>>> 
>>> Thanks for your quick reply to my question.  I have some answers to your questions and some new information that I got from running couched interactively.
>>> 
>>> 
>>>> Are there any other things going on on the VM, when you do this?
>>> The VM also hosts a MySQL server, but I see no evidence that this is a contributing cause for the couch issue.
>>> 
>>>> 
>>>> Can you reliably reproduce this behavior?
>>> I can reliably reproduce it.
>>> 
>>>> 
>>>> Are there other correlating factors (like does this always happen at the same time / due to a cronjob, etc)?
>>> It can be repeated by re-starting couched and re-requesting the compaction on all databases.  It is not time-dependent.
>> 
>> Are you running compaction on the databases in parallel or sequentially?
>> 
>> 
>> 
>>> 
>>>> 
>>>> Can you set your CouchDB log level to debug and see if that gets you more info? (curl -X PUT http://[user:pass@]127.0.0.1:5984/_config/log/level -d '"debug"’).
>>> (see below)
>> 
>> The paste ends with an allocation error which points to you running out of memory.
>> 
>> Best
>> Jan
>> --
>> 
>> 
>>> 
>>>> 
>>>> Is it possible for you to share these database files (publicly or in private)?
>>> The databases contain health data and I am unable to share them.
>>> 
>>>> 
>>>> What are your disk usage levels before/during compaction?
>>> Plenty of disk in this case.  The compacted data is in the 2G range.  The problem does not seem to be storage-size related.  We are able to compact during regular operation when using a 40G instance volume.  Unable to compact when using a 120G EBS volume.
>>> 
>>>> 
>>>> Are you getting anything in the system log(s)?
>>> That is what is odd.  There is nothing in the system logs or the couchdb logs.
>>> 
>>> Bonus data:
>>> 
>>> I ran the debug experiment with couchdb running interactively.  The session text is below, but note that I also got the following error message and an erlang core dump:
>>> 
>>>  Crash dump was written to: erl_crash.dump
>>>  eheap_alloc: Cannot allocate 156725600 bytes of memory (of type "old_heap").
>>>  Aborted (core dumped)
>>> 
>>> The dump is ~500MB.
>>> 
>>> Here is the session text:
>>> 
>>> 
>>> [gtarsa@prod-db01 ~]$  sudo /usr/local/bin/couchdb
>>> 
>>> Apache CouchDB 1.6.1 (LogLevel=info) is starting.
>>> Apache CouchDB has started. Time to relax.
>>> 
>>> [info] [<0.31.0>] Apache CouchDB has started on http://0.0.0.0:5984/
>>> [info] [<0.120.0>] 127.0.0.1 - - GET /_config/level 200
>>> [info] [<0.678.0>] 127.0.0.1 - - GET /_config/log/level 200
>>> [error] [<0.803.0>] attempted upload of invalid JSON (set log_level to debug to log it)
>>> [info] [<0.803.0>] 127.0.0.1 - - PUT /_config/log/level 400
>>> 
>>> =SUPERVISOR REPORT==== 8-Mar-2016::11:09:22 ===
>>>   Supervisor: {local,couch_primary_services}
>>>   Context:    child_terminated
>>>   Reason:     normal
>>>   Offender:   [{pid,<0.92.0>},
>>>                {name,couch_log},
>>>                {mfargs,{couch_log,start_link,[]}},
>>>                {restart_type,permanent},
>>>                {shutdown,brutal_kill},
>>>                {child_type,worker}]
>>> 
>>> [debug] [<0.117.0>] 'PUT' /_config/log/level {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Length',"7"},
>>>        {'Content-Type',"application/x-www-form-urlencoded"},
>>>        {'Host',"127.0.0.1:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.117.0>] OAuth Params: []
>>> 
>>> =SUPERVISOR REPORT==== 8-Mar-2016::11:09:25 ===
>>>   Supervisor: {local,couch_primary_services}
>>>   Context:    child_terminated
>>>   Reason:     normal
>>>   Offender:   [{pid,<0.828.0>},
>>>                {name,couch_log},
>>>                {mfargs,{couch_log,start_link,[]}},
>>>                {restart_type,permanent},
>>>                {shutdown,brutal_kill},
>>>                {child_type,worker}]
>>> 
>>> [debug] [<0.826.0>] 'GET' /_all_dbs {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.826.0>] OAuth Params: []
>>> [info] [<0.826.0>] 127.0.0.1 - - GET /_all_dbs 200
>>> [debug] [<0.833.0>] 'POST' /biometrics/_compact {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.833.0>] OAuth Params: []
>>> [info] [<0.872.0>] Starting compaction for db "biometrics"
>>> [debug] [<0.877.0>] Compaction process spawned for db "biometrics"
>>> [info] [<0.833.0>] 127.0.0.1 - - POST /biometrics/_compact 202
>>> [debug] [<0.115.0>] 'POST' /biometrics/_view_cleanup {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.115.0>] OAuth Params: []
>>> [info] [<0.115.0>] 127.0.0.1 - - POST /biometrics/_view_cleanup 202
>>> [debug] [<0.114.0>] 'POST' /diabetes/_compact {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.114.0>] OAuth Params: []
>>> [info] [<0.890.0>] Starting compaction for db "diabetes"
>>> [debug] [<0.895.0>] Compaction process spawned for db "diabetes"
>>> [info] [<0.114.0>] 127.0.0.1 - - POST /diabetes/_compact 202
>>> [debug] [<0.113.0>] 'POST' /diabetes/_view_cleanup {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.113.0>] OAuth Params: []
>>> [info] [<0.113.0>] 127.0.0.1 - - POST /diabetes/_view_cleanup 202
>>> [debug] [<0.112.0>] 'POST' /fitness/_compact {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.112.0>] OAuth Params: []
>>> [info] [<0.909.0>] Starting compaction for db "fitness"
>>> [debug] [<0.914.0>] Compaction process spawned for db "fitness"
>>> [info] [<0.112.0>] 127.0.0.1 - - POST /fitness/_compact 202
>>> [debug] [<0.111.0>] 'POST' /fitness/_view_cleanup {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.111.0>] OAuth Params: []
>>> [info] [<0.111.0>] 127.0.0.1 - - POST /fitness/_view_cleanup 202
>>> [debug] [<0.110.0>] 'POST' /nutrition/_compact {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.110.0>] OAuth Params: []
>>> [info] [<0.927.0>] Starting compaction for db "nutrition"
>>> [debug] [<0.932.0>] Compaction process spawned for db "nutrition"
>>> [info] [<0.110.0>] 127.0.0.1 - - POST /nutrition/_compact 202
>>> [debug] [<0.109.0>] 'POST' /nutrition/_view_cleanup {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.109.0>] OAuth Params: []
>>> [info] [<0.109.0>] 127.0.0.1 - - POST /nutrition/_view_cleanup 202
>>> [debug] [<0.108.0>] 'POST' /routine/_compact {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.108.0>] OAuth Params: []
>>> [info] [<0.945.0>] Starting compaction for db "routine"
>>> [debug] [<0.950.0>] Compaction process spawned for db "routine"
>>> [info] [<0.108.0>] 127.0.0.1 - - POST /routine/_compact 202
>>> [debug] [<0.123.0>] 'POST' /routine/_view_cleanup {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.123.0>] OAuth Params: []
>>> [info] [<0.123.0>] 127.0.0.1 - - POST /routine/_view_cleanup 202
>>> [debug] [<0.122.0>] 'POST' /sleep/_compact {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.122.0>] OAuth Params: []
>>> [info] [<0.963.0>] Starting compaction for db "sleep"
>>> [debug] [<0.87.0>] New task status for <0.895.0>: [{changes_done,113},
>>>                                                 {database,<<"diabetes">>},
>>>                                                 {progress,100},
>>>                                                 {started_on,1457453394},
>>>                                                 {total_changes,113},
>>>                                                 {type,database_compaction},
>>>                                                 {updated_on,1457453395}]
>>> [debug] [<0.968.0>] Compaction process spawned for db "sleep"
>>> [info] [<0.122.0>] 127.0.0.1 - - POST /sleep/_compact 202
>>> [debug] [<0.385.0>] 'POST' /sleep/_view_cleanup {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.385.0>] OAuth Params: []
>>> [debug] [<0.890.0>] CouchDB swapping files /usr/local/var/lib/couchdb/diabetes.couch and /usr/local/var/lib/couchdb/diabetes.couch.compact.
>>> [info] [<0.385.0>] 127.0.0.1 - - POST /sleep/_view_cleanup 202
>>> [info] [<0.890.0>] Compaction for db "diabetes" completed.
>>> [debug] [<0.635.0>] 'POST' /tobacco_cessation/_compact {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.635.0>] OAuth Params: []
>>> [info] [<0.987.0>] Starting compaction for db "tobacco_cessation"
>>> [debug] [<0.992.0>] Compaction process spawned for db "tobacco_cessation"
>>> [info] [<0.635.0>] 127.0.0.1 - - POST /tobacco_cessation/_compact 202
>>> [debug] [<0.87.0>] New task status for <0.992.0>: [{changes_done,1},
>>>                                                 {database,
>>>                                                  <<"tobacco_cessation">>},
>>>                                                 {progress,100},
>>>                                                 {started_on,1457453395},
>>>                                                 {total_changes,1},
>>>                                                 {type,database_compaction},
>>>                                                 {updated_on,1457453395}]
>>> [debug] [<0.640.0>] 'POST' /tobacco_cessation/_view_cleanup {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.640.0>] OAuth Params: []
>>> [info] [<0.640.0>] 127.0.0.1 - - POST /tobacco_cessation/_view_cleanup 202
>>> [debug] [<0.987.0>] CouchDB swapping files /usr/local/var/lib/couchdb/tobacco_cessation.couch and /usr/local/var/lib/couchdb/tobacco_cessation.couch.compact.
>>> [info] [<0.987.0>] Compaction for db "tobacco_cessation" completed.
>>> [debug] [<0.815.0>] 'POST' /trackers/_compact {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.815.0>] OAuth Params: []
>>> [info] [<0.1011.0>] Starting compaction for db "trackers"
>>> [debug] [<0.1016.0>] Compaction process spawned for db "trackers"
>>> [info] [<0.815.0>] 127.0.0.1 - - POST /trackers/_compact 202
>>> [debug] [<0.866.0>] 'POST' /trackers/_view_cleanup {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.866.0>] OAuth Params: []
>>> [info] [<0.866.0>] 127.0.0.1 - - POST /trackers/_view_cleanup 202
>>> [debug] [<0.867.0>] 'POST' /users/_compact {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.867.0>] OAuth Params: []
>>> [info] [<0.1029.0>] Starting compaction for db "users"
>>> [debug] [<0.1034.0>] Compaction process spawned for db "users"
>>> [info] [<0.867.0>] 127.0.0.1 - - POST /users/_compact 202
>>> [debug] [<0.87.0>] New task status for <0.1034.0>: [{changes_done,0},
>>>                                                  {database,<<"users">>},
>>>                                                  {progress,0},
>>>                                                  {started_on,1457453395},
>>>                                                  {total_changes,0},
>>>                                                  {type,database_compaction},
>>>                                                  {updated_on,1457453395}]
>>> [debug] [<0.884.0>] 'POST' /users/_view_cleanup {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.884.0>] OAuth Params: []
>>> [info] [<0.884.0>] 127.0.0.1 - - POST /users/_view_cleanup 202
>>> [debug] [<0.1029.0>] CouchDB swapping files /usr/local/var/lib/couchdb/users.couch and /usr/local/var/lib/couchdb/users.couch.compact.
>>> [info] [<0.1029.0>] Compaction for db "users" completed.
>>> [debug] [<0.885.0>] 'POST' /weight/_compact {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.885.0>] OAuth Params: []
>>> [info] [<0.1053.0>] Starting compaction for db "weight"
>>> [debug] [<0.1058.0>] Compaction process spawned for db "weight"
>>> [info] [<0.885.0>] 127.0.0.1 - - POST /weight/_compact 202
>>> [debug] [<0.902.0>] 'POST' /weight/_view_cleanup {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.902.0>] OAuth Params: []
>>> [info] [<0.902.0>] 127.0.0.1 - - POST /weight/_view_cleanup 202
>>> 
>>> Crash dump was written to: erl_crash.dump
>>> eheap_alloc: Cannot allocate 156725600 bytes of memory (of type "old_heap").
>>> 
>>> [gtarsa@prod-db01 ~]$ 
>>> 
>>> 
>>> 
>>>> On Mar 7, 2016, at 4:09 PM, Jan Lehnardt <ja...@apache.org> wrote:
>>>> 
>>>> Heya Greg,
>>>> 
>>>> this should definitely not happen at all, regardless of AWS storage type.
>>>> 
>>>> Are there any other things going on on the VM, when you do this?
>>>> 
>>>> Can you reliably reproduce this behaviour?
>>>> 
>>>> Are there other correlating factors (like does this always happen at the same time / due to a cronjob, etc)?
>>>> 
>>>> Can you set your CouchDB log level to debug and see if that gets you more info? (curl -X PUT http://[user:pass@]127.0.0.1:5984/_config/log/level -d '"debug"').
>>>> 
>>>> Is it possible for you to share these database files (publicly or in private)?
>>>> 
>>>> What are your disk usage levels before/during compaction?
>>>> 
>>>> Are you getting anything in the system log(s)?
>>>> 
>>>> Best
>>>> Jan
>>>> -- 
>>>> Professional Support for Apache CouchDB:
>>>> https://neighbourhood.ie/couchdb-support/
>>>> 
>>>> 
>>>>> On 07 Mar 2016, at 21:27, Greg Tarsa <gt...@axialproject.com> wrote:
>>>>> 
>>>>> We have a set of couchdb databases that we use to collect user information for various purposes.  I am inheriting this configuration from a predecessor and am relatively new to couchdb.
>>>>> 
>>>>> Whenever we attempt to compact the databases, the server crashes without any messages either in the couchdb log or the system logs.  This is running in an AWS instance with an EBS volume.
>>>>> 
>>>>> Experiments have shown that if the instance is configured with instance storage (ephemeral storage that disappears when the instance disappears) then this operation works properly.   But we would like to use larger volumes and have persistence.
>>>>> 
>>>>> When the instance is configured with an external EBS volume, then we see the server crash described above.
>>>>> 
>>>>> I have searched the web for “couchdb compaction crash no log” and not found anything helpful.
>>>>> 
>>>>> It seems like compacting while running should not be failing at all, much less silently, so I am looking for insights to the problem, or solutions if such exist.
>>>>> 
>>>>> Configuration and log info is below.
>>>>> 
>>>>> Any help would be appreciated.
>>>>> 
>>>>> Thanks,
>>>>> Greg
>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------
>>>>> 
>>>>> CouchDB version: 1.6.1
>>>>> OS: RHEL 6.6
>>>>> 
>>>>> ---------------------------------------------------------
>>>>> 
>>>>> Here is a directory of the databases as the time of the crash:
>>>>> 
>>>>> cat bad.couch.dbinfo.txt 
>>>>> total 15400740
>>>>> 12 -rw-r--r--. 1 couchdb couchdb       8297 Jan 20 16:31 _users.couch
>>>>> 16 -rw-r--r--. 1 couchdb couchdb      12393 Jan 20 16:33 _replicator.couch
>>>>> 21060 -rw-r--r--. 1 couchdb couchdb   21557368 Mar  7 11:57 biometrics.couch
>>>>> 781136 -rw-r--r--. 1 couchdb couchdb  799875192 Mar  7 12:00 fitness.couch
>>>>> 954244 -rw-r--r--. 1 couchdb couchdb  977137784 Mar  7 12:05 nutrition.couch
>>>>> 8419624 -rw-r--r--. 1 couchdb couchdb 8621678721 Mar  7 12:06 routine.couch
>>>>> 390796 -rw-r--r--. 1 couchdb couchdb  400167032 Mar  7 12:06 sleep.couch
>>>>> 217932 -rw-r--r--. 1 couchdb couchdb  223154296 Mar  7 12:06 weight.couch
>>>>> 4614884 -rw-r--r--. 1 couchdb couchdb 4725629060 Mar  7 12:06 trackers.couch
>>>>>  4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 fitness.couch.compact
>>>>>  4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 nutrition.couch.compact
>>>>>  4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 routine.couch.compact
>>>>> 64 -rw-r--r--. 1 couchdb couchdb      61551 Mar  7 12:41 diabetes.couch
>>>>>  4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 sleep.couch.compact
>>>>> 12 -rw-r--r--. 1 couchdb couchdb       8300 Mar  7 12:41 tobacco_cessation.couch
>>>>>  4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 users.couch
>>>>>  4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 weight.couch.compact
>>>>> 152 -rw-r--r--. 1 couchdb couchdb     151797 Mar  7 12:42 trackers.couch.compact
>>>>> 784 -rw-r--r--. 1 couchdb couchdb     801865 Mar  7 12:42 biometrics.couch.compact
>>>>> 
>>>>> ---------------------------------------------------------
>>>>> 
>>>>> Here is the contents of the log at the time of the crash:
>>>>> 
>>>>> [Mon, 07 Mar 2016 17:25:32 GMT] [info] [<0.31.0>] Apache CouchDB has started on http://0.0.0.0:5984/
>>>>> [Mon, 07 Mar 2016 17:25:32 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>>>>> [Mon, 07 Mar 2016 17:25:33 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>>>>> [Mon, 07 Mar 2016 17:25:34 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>>>>> [Mon, 07 Mar 2016 17:25:34 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>>>>> [Mon, 07 Mar 2016 17:25:34 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>>>>> [Mon, 07 Mar 2016 17:25:34 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>>>>> [Mon, 07 Mar 2016 17:25:35 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>>>> [Mon, 07 Mar 2016 17:25:37 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>>>>> [Mon, 07 Mar 2016 17:25:37 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>>>>> [Mon, 07 Mar 2016 17:25:38 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>>>>> [Mon, 07 Mar 2016 17:25:38 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>>>>> [Mon, 07 Mar 2016 17:25:40 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>>>> [Mon, 07 Mar 2016 17:25:45 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>>>> [Mon, 07 Mar 2016 17:25:50 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>>>>> [Mon, 07 Mar 2016 17:25:50 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>>>>> [Mon, 07 Mar 2016 17:25:50 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>>>> [Mon, 07 Mar 2016 17:25:52 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>>>>> [Mon, 07 Mar 2016 17:25:52 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>>>>> [Mon, 07 Mar 2016 17:25:54 GMT] [info] [<0.114.0>] 10.1.1.12 - - GET /users/_changes?feed=continuous&style=all_docs&since=0&heartbeat=10000 200
>>>>> [Mon, 07 Mar 2016 17:25:54 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>>>>> [Mon, 07 Mar 2016 17:25:54 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>>>>> [Mon, 07 Mar 2016 17:25:55 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>>>> [Mon, 07 Mar 2016 17:26:00 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>>>> [Mon, 07 Mar 2016 17:26:00 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>>>>> [Mon, 07 Mar 2016 17:26:01 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>>>>> [Mon, 07 Mar 2016 17:26:05 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>>>> ... [numerous GET /users/ 200 messages removed for brevity] ...
>>>>> [Mon, 07 Mar 2016 17:41:51 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.152.0>] 127.0.0.1 - - GET /_all_dbs 200
>>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.1157.0>] Starting compaction for db "biometrics"
>>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.151.0>] 127.0.0.1 - - POST /biometrics/_compact 202
>>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.150.0>] 127.0.0.1 - - POST /biometrics/_view_cleanup 202
>>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.1175.0>] Starting compaction for db "diabetes"
>>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.198.0>] 127.0.0.1 - - POST /diabetes/_compact 202
>>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.197.0>] 127.0.0.1 - - POST /diabetes/_view_cleanup 202
>>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.1193.0>] Starting compaction for db "fitness"
>>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.118.0>] 127.0.0.1 - - POST /fitness/_compact 202
>>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.119.0>] 127.0.0.1 - - POST /fitness/_view_cleanup 202
>>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.1211.0>] Starting compaction for db "nutrition"
>>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.120.0>] 127.0.0.1 - - POST /nutrition/_compact 202
>>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.121.0>] 127.0.0.1 - - POST /nutrition/_view_cleanup 202
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1229.0>] Starting compaction for db "routine"
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.122.0>] 127.0.0.1 - - POST /routine/_compact 202
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.115.0>] 127.0.0.1 - - POST /routine/_view_cleanup 202
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1175.0>] Compaction for db "diabetes" completed.
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1254.0>] Starting compaction for db "sleep"
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.116.0>] 127.0.0.1 - - POST /sleep/_compact 202
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.117.0>] 127.0.0.1 - - POST /sleep/_view_cleanup 202
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1272.0>] Starting compaction for db "tobacco_cessation"
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.184.0>] 127.0.0.1 - - POST /tobacco_cessation/_compact 202
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.183.0>] 127.0.0.1 - - POST /tobacco_cessation/_view_cleanup 202
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1290.0>] Starting compaction for db "trackers"
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.182.0>] 127.0.0.1 - - POST /trackers/_compact 202
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1272.0>] Compaction for db "tobacco_cessation" completed.
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1151.0>] 127.0.0.1 - - POST /trackers/_view_cleanup 202
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.162.0>] Starting compaction for db "users"
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1152.0>] 127.0.0.1 - - POST /users/_compact 202
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1168.0>] 127.0.0.1 - - POST /users/_view_cleanup 202
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.162.0>] Compaction for db "users" completed.
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1329.0>] Starting compaction for db "weight"
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1170.0>] 127.0.0.1 - - POST /weight/_compact 202
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1187.0>] 127.0.0.1 - - POST /weight/_view_cleanup 202
>>>>> [Mon, 07 Mar 2016 17:41:56 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>>>> [Mon, 07 Mar 2016 17:42:01 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>>>> [Mon, 07 Mar 2016 17:42:06 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>>>> [Mon, 07 Mar 2016 17:42:11 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>>>> 
>>>>> --------------------------------------------------
>>>>> 
>>>> 
>>>> 
>>> 
>> 
>> -- 
>> Professional Support for Apache CouchDB:
>> https://neighbourhood.ie/couchdb-support/
> 


Re: CouchDB crash during compaction with no log messages

Posted by Greg Tarsa <gt...@axialproject.com>.
All the compaction request are made at the same time.  So I assume they are running in parallel.

Does the out of memory indicate a configuration problem?  Since only the interactive session ends with the message and it is not in any log and the system did not kill the process for memory reasons, I am thinking there is a couchdb malfunction involved here.  Also, it works fine with an instance volume and initial results from experiments we are running here with a Raid-1 volume that is a hybrid instance/EBS volume appear to be working.

If I need more memory, is there documentation or discussion somewhere that would guide me as to how much I would need?

Thanks,
Greg


> On Mar 8, 2016, at 1:41 PM, Jan Lehnardt <ja...@apache.org> wrote:
> 
>> 
>> On 08 Mar 2016, at 18:07, Greg Tarsa <gt...@axialproject.com> wrote:
>> 
>> Hi Jan,
>> 
>> Thanks for your quick reply to my question.  I have some answers to your questions and some new information that I got from running couched interactively.
>> 
>> 
>>> Are there any other things going on on the VM, when you do this?
>> The VM also hosts a MySQL server, but I see no evidence that this is a contributing cause for the couch issue.
>> 
>>> 
>>> Can you reliably reproduce this behavior?
>> I can reliably reproduce it.
>> 
>>> 
>>> Are there other correlating factors (like does this always happen at the same time / due to a cronjob, etc)?
>> It can be repeated by re-starting couched and re-requesting the compaction on all databases.  It is not time-dependent.
> 
> Are you running compaction on the databases in parallel or sequentially?
> 
> 
> 
>> 
>>> 
>>> Can you set your CouchDB log level to debug and see if that gets you more info? (curl -X PUT http://[user:pass@]127.0.0.1:5984/_config/log/level -d '"debug"’).
>> (see below)
> 
> The paste ends with an allocation error which points to you running out of memory.
> 
> Best
> Jan
> --
> 
> 
>> 
>>> 
>>> Is it possible for you to share these database files (publicly or in private)?
>> The databases contain health data and I am unable to share them.
>> 
>>> 
>>> What are your disk usage levels before/during compaction?
>> Plenty of disk in this case.  The compacted data is in the 2G range.  The problem does not seem to be storage-size related.  We are able to compact during regular operation when using a 40G instance volume.  Unable to compact when using a 120G EBS volume.
>> 
>>> 
>>> Are you getting anything in the system log(s)?
>> That is what is odd.  There is nothing in the system logs or the couchdb logs.
>> 
>> Bonus data:
>> 
>> I ran the debug experiment with couchdb running interactively.  The session text is below, but note that I also got the following error message and an erlang core dump:
>> 
>>   Crash dump was written to: erl_crash.dump
>>   eheap_alloc: Cannot allocate 156725600 bytes of memory (of type "old_heap").
>>   Aborted (core dumped)
>> 
>> The dump is ~500MB.
>> 
>> Here is the session text:
>> 
>> 
>> [gtarsa@prod-db01 ~]$  sudo /usr/local/bin/couchdb
>> 
>> Apache CouchDB 1.6.1 (LogLevel=info) is starting.
>> Apache CouchDB has started. Time to relax.
>> 
>> [info] [<0.31.0>] Apache CouchDB has started on http://0.0.0.0:5984/
>> [info] [<0.120.0>] 127.0.0.1 - - GET /_config/level 200
>> [info] [<0.678.0>] 127.0.0.1 - - GET /_config/log/level 200
>> [error] [<0.803.0>] attempted upload of invalid JSON (set log_level to debug to log it)
>> [info] [<0.803.0>] 127.0.0.1 - - PUT /_config/log/level 400
>> 
>> =SUPERVISOR REPORT==== 8-Mar-2016::11:09:22 ===
>>    Supervisor: {local,couch_primary_services}
>>    Context:    child_terminated
>>    Reason:     normal
>>    Offender:   [{pid,<0.92.0>},
>>                 {name,couch_log},
>>                 {mfargs,{couch_log,start_link,[]}},
>>                 {restart_type,permanent},
>>                 {shutdown,brutal_kill},
>>                 {child_type,worker}]
>> 
>> [debug] [<0.117.0>] 'PUT' /_config/log/level {1,1} from "127.0.0.1"
>> Headers: [{'Accept',"*/*"},
>>         {'Content-Length',"7"},
>>         {'Content-Type',"application/x-www-form-urlencoded"},
>>         {'Host',"127.0.0.1:5984"},
>>         {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>> [debug] [<0.117.0>] OAuth Params: []
>> 
>> =SUPERVISOR REPORT==== 8-Mar-2016::11:09:25 ===
>>    Supervisor: {local,couch_primary_services}
>>    Context:    child_terminated
>>    Reason:     normal
>>    Offender:   [{pid,<0.828.0>},
>>                 {name,couch_log},
>>                 {mfargs,{couch_log,start_link,[]}},
>>                 {restart_type,permanent},
>>                 {shutdown,brutal_kill},
>>                 {child_type,worker}]
>> 
>> [debug] [<0.826.0>] 'GET' /_all_dbs {1,1} from "127.0.0.1"
>> Headers: [{'Accept',"*/*"},
>>         {'Content-Type',"application/json"},
>>         {'Host',"localhost:5984"},
>>         {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>> [debug] [<0.826.0>] OAuth Params: []
>> [info] [<0.826.0>] 127.0.0.1 - - GET /_all_dbs 200
>> [debug] [<0.833.0>] 'POST' /biometrics/_compact {1,1} from "127.0.0.1"
>> Headers: [{'Accept',"*/*"},
>>         {'Content-Type',"application/json"},
>>         {'Host',"localhost:5984"},
>>         {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>> [debug] [<0.833.0>] OAuth Params: []
>> [info] [<0.872.0>] Starting compaction for db "biometrics"
>> [debug] [<0.877.0>] Compaction process spawned for db "biometrics"
>> [info] [<0.833.0>] 127.0.0.1 - - POST /biometrics/_compact 202
>> [debug] [<0.115.0>] 'POST' /biometrics/_view_cleanup {1,1} from "127.0.0.1"
>> Headers: [{'Accept',"*/*"},
>>         {'Content-Type',"application/json"},
>>         {'Host',"localhost:5984"},
>>         {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>> [debug] [<0.115.0>] OAuth Params: []
>> [info] [<0.115.0>] 127.0.0.1 - - POST /biometrics/_view_cleanup 202
>> [debug] [<0.114.0>] 'POST' /diabetes/_compact {1,1} from "127.0.0.1"
>> Headers: [{'Accept',"*/*"},
>>         {'Content-Type',"application/json"},
>>         {'Host',"localhost:5984"},
>>         {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>> [debug] [<0.114.0>] OAuth Params: []
>> [info] [<0.890.0>] Starting compaction for db "diabetes"
>> [debug] [<0.895.0>] Compaction process spawned for db "diabetes"
>> [info] [<0.114.0>] 127.0.0.1 - - POST /diabetes/_compact 202
>> [debug] [<0.113.0>] 'POST' /diabetes/_view_cleanup {1,1} from "127.0.0.1"
>> Headers: [{'Accept',"*/*"},
>>         {'Content-Type',"application/json"},
>>         {'Host',"localhost:5984"},
>>         {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>> [debug] [<0.113.0>] OAuth Params: []
>> [info] [<0.113.0>] 127.0.0.1 - - POST /diabetes/_view_cleanup 202
>> [debug] [<0.112.0>] 'POST' /fitness/_compact {1,1} from "127.0.0.1"
>> Headers: [{'Accept',"*/*"},
>>         {'Content-Type',"application/json"},
>>         {'Host',"localhost:5984"},
>>         {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>> [debug] [<0.112.0>] OAuth Params: []
>> [info] [<0.909.0>] Starting compaction for db "fitness"
>> [debug] [<0.914.0>] Compaction process spawned for db "fitness"
>> [info] [<0.112.0>] 127.0.0.1 - - POST /fitness/_compact 202
>> [debug] [<0.111.0>] 'POST' /fitness/_view_cleanup {1,1} from "127.0.0.1"
>> Headers: [{'Accept',"*/*"},
>>         {'Content-Type',"application/json"},
>>         {'Host',"localhost:5984"},
>>         {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>> [debug] [<0.111.0>] OAuth Params: []
>> [info] [<0.111.0>] 127.0.0.1 - - POST /fitness/_view_cleanup 202
>> [debug] [<0.110.0>] 'POST' /nutrition/_compact {1,1} from "127.0.0.1"
>> Headers: [{'Accept',"*/*"},
>>         {'Content-Type',"application/json"},
>>         {'Host',"localhost:5984"},
>>         {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>> [debug] [<0.110.0>] OAuth Params: []
>> [info] [<0.927.0>] Starting compaction for db "nutrition"
>> [debug] [<0.932.0>] Compaction process spawned for db "nutrition"
>> [info] [<0.110.0>] 127.0.0.1 - - POST /nutrition/_compact 202
>> [debug] [<0.109.0>] 'POST' /nutrition/_view_cleanup {1,1} from "127.0.0.1"
>> Headers: [{'Accept',"*/*"},
>>         {'Content-Type',"application/json"},
>>         {'Host',"localhost:5984"},
>>         {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>> [debug] [<0.109.0>] OAuth Params: []
>> [info] [<0.109.0>] 127.0.0.1 - - POST /nutrition/_view_cleanup 202
>> [debug] [<0.108.0>] 'POST' /routine/_compact {1,1} from "127.0.0.1"
>> Headers: [{'Accept',"*/*"},
>>         {'Content-Type',"application/json"},
>>         {'Host',"localhost:5984"},
>>         {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>> [debug] [<0.108.0>] OAuth Params: []
>> [info] [<0.945.0>] Starting compaction for db "routine"
>> [debug] [<0.950.0>] Compaction process spawned for db "routine"
>> [info] [<0.108.0>] 127.0.0.1 - - POST /routine/_compact 202
>> [debug] [<0.123.0>] 'POST' /routine/_view_cleanup {1,1} from "127.0.0.1"
>> Headers: [{'Accept',"*/*"},
>>         {'Content-Type',"application/json"},
>>         {'Host',"localhost:5984"},
>>         {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>> [debug] [<0.123.0>] OAuth Params: []
>> [info] [<0.123.0>] 127.0.0.1 - - POST /routine/_view_cleanup 202
>> [debug] [<0.122.0>] 'POST' /sleep/_compact {1,1} from "127.0.0.1"
>> Headers: [{'Accept',"*/*"},
>>         {'Content-Type',"application/json"},
>>         {'Host',"localhost:5984"},
>>         {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>> [debug] [<0.122.0>] OAuth Params: []
>> [info] [<0.963.0>] Starting compaction for db "sleep"
>> [debug] [<0.87.0>] New task status for <0.895.0>: [{changes_done,113},
>>                                                  {database,<<"diabetes">>},
>>                                                  {progress,100},
>>                                                  {started_on,1457453394},
>>                                                  {total_changes,113},
>>                                                  {type,database_compaction},
>>                                                  {updated_on,1457453395}]
>> [debug] [<0.968.0>] Compaction process spawned for db "sleep"
>> [info] [<0.122.0>] 127.0.0.1 - - POST /sleep/_compact 202
>> [debug] [<0.385.0>] 'POST' /sleep/_view_cleanup {1,1} from "127.0.0.1"
>> Headers: [{'Accept',"*/*"},
>>         {'Content-Type',"application/json"},
>>         {'Host',"localhost:5984"},
>>         {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>> [debug] [<0.385.0>] OAuth Params: []
>> [debug] [<0.890.0>] CouchDB swapping files /usr/local/var/lib/couchdb/diabetes.couch and /usr/local/var/lib/couchdb/diabetes.couch.compact.
>> [info] [<0.385.0>] 127.0.0.1 - - POST /sleep/_view_cleanup 202
>> [info] [<0.890.0>] Compaction for db "diabetes" completed.
>> [debug] [<0.635.0>] 'POST' /tobacco_cessation/_compact {1,1} from "127.0.0.1"
>> Headers: [{'Accept',"*/*"},
>>         {'Content-Type',"application/json"},
>>         {'Host',"localhost:5984"},
>>         {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>> [debug] [<0.635.0>] OAuth Params: []
>> [info] [<0.987.0>] Starting compaction for db "tobacco_cessation"
>> [debug] [<0.992.0>] Compaction process spawned for db "tobacco_cessation"
>> [info] [<0.635.0>] 127.0.0.1 - - POST /tobacco_cessation/_compact 202
>> [debug] [<0.87.0>] New task status for <0.992.0>: [{changes_done,1},
>>                                                  {database,
>>                                                   <<"tobacco_cessation">>},
>>                                                  {progress,100},
>>                                                  {started_on,1457453395},
>>                                                  {total_changes,1},
>>                                                  {type,database_compaction},
>>                                                  {updated_on,1457453395}]
>> [debug] [<0.640.0>] 'POST' /tobacco_cessation/_view_cleanup {1,1} from "127.0.0.1"
>> Headers: [{'Accept',"*/*"},
>>         {'Content-Type',"application/json"},
>>         {'Host',"localhost:5984"},
>>         {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>> [debug] [<0.640.0>] OAuth Params: []
>> [info] [<0.640.0>] 127.0.0.1 - - POST /tobacco_cessation/_view_cleanup 202
>> [debug] [<0.987.0>] CouchDB swapping files /usr/local/var/lib/couchdb/tobacco_cessation.couch and /usr/local/var/lib/couchdb/tobacco_cessation.couch.compact.
>> [info] [<0.987.0>] Compaction for db "tobacco_cessation" completed.
>> [debug] [<0.815.0>] 'POST' /trackers/_compact {1,1} from "127.0.0.1"
>> Headers: [{'Accept',"*/*"},
>>         {'Content-Type',"application/json"},
>>         {'Host',"localhost:5984"},
>>         {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>> [debug] [<0.815.0>] OAuth Params: []
>> [info] [<0.1011.0>] Starting compaction for db "trackers"
>> [debug] [<0.1016.0>] Compaction process spawned for db "trackers"
>> [info] [<0.815.0>] 127.0.0.1 - - POST /trackers/_compact 202
>> [debug] [<0.866.0>] 'POST' /trackers/_view_cleanup {1,1} from "127.0.0.1"
>> Headers: [{'Accept',"*/*"},
>>         {'Content-Type',"application/json"},
>>         {'Host',"localhost:5984"},
>>         {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>> [debug] [<0.866.0>] OAuth Params: []
>> [info] [<0.866.0>] 127.0.0.1 - - POST /trackers/_view_cleanup 202
>> [debug] [<0.867.0>] 'POST' /users/_compact {1,1} from "127.0.0.1"
>> Headers: [{'Accept',"*/*"},
>>         {'Content-Type',"application/json"},
>>         {'Host',"localhost:5984"},
>>         {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>> [debug] [<0.867.0>] OAuth Params: []
>> [info] [<0.1029.0>] Starting compaction for db "users"
>> [debug] [<0.1034.0>] Compaction process spawned for db "users"
>> [info] [<0.867.0>] 127.0.0.1 - - POST /users/_compact 202
>> [debug] [<0.87.0>] New task status for <0.1034.0>: [{changes_done,0},
>>                                                   {database,<<"users">>},
>>                                                   {progress,0},
>>                                                   {started_on,1457453395},
>>                                                   {total_changes,0},
>>                                                   {type,database_compaction},
>>                                                   {updated_on,1457453395}]
>> [debug] [<0.884.0>] 'POST' /users/_view_cleanup {1,1} from "127.0.0.1"
>> Headers: [{'Accept',"*/*"},
>>         {'Content-Type',"application/json"},
>>         {'Host',"localhost:5984"},
>>         {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>> [debug] [<0.884.0>] OAuth Params: []
>> [info] [<0.884.0>] 127.0.0.1 - - POST /users/_view_cleanup 202
>> [debug] [<0.1029.0>] CouchDB swapping files /usr/local/var/lib/couchdb/users.couch and /usr/local/var/lib/couchdb/users.couch.compact.
>> [info] [<0.1029.0>] Compaction for db "users" completed.
>> [debug] [<0.885.0>] 'POST' /weight/_compact {1,1} from "127.0.0.1"
>> Headers: [{'Accept',"*/*"},
>>         {'Content-Type',"application/json"},
>>         {'Host',"localhost:5984"},
>>         {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>> [debug] [<0.885.0>] OAuth Params: []
>> [info] [<0.1053.0>] Starting compaction for db "weight"
>> [debug] [<0.1058.0>] Compaction process spawned for db "weight"
>> [info] [<0.885.0>] 127.0.0.1 - - POST /weight/_compact 202
>> [debug] [<0.902.0>] 'POST' /weight/_view_cleanup {1,1} from "127.0.0.1"
>> Headers: [{'Accept',"*/*"},
>>         {'Content-Type',"application/json"},
>>         {'Host',"localhost:5984"},
>>         {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>> [debug] [<0.902.0>] OAuth Params: []
>> [info] [<0.902.0>] 127.0.0.1 - - POST /weight/_view_cleanup 202
>> 
>> Crash dump was written to: erl_crash.dump
>> eheap_alloc: Cannot allocate 156725600 bytes of memory (of type "old_heap").
>> 
>> [gtarsa@prod-db01 ~]$ 
>> 
>> 
>> 
>>> On Mar 7, 2016, at 4:09 PM, Jan Lehnardt <ja...@apache.org> wrote:
>>> 
>>> Heya Greg,
>>> 
>>> this should definitely not happen at all, regardless of AWS storage type.
>>> 
>>> Are there any other things going on on the VM, when you do this?
>>> 
>>> Can you reliably reproduce this behaviour?
>>> 
>>> Are there other correlating factors (like does this always happen at the same time / due to a cronjob, etc)?
>>> 
>>> Can you set your CouchDB log level to debug and see if that gets you more info? (curl -X PUT http://[user:pass@]127.0.0.1:5984/_config/log/level -d '"debug"').
>>> 
>>> Is it possible for you to share these database files (publicly or in private)?
>>> 
>>> What are your disk usage levels before/during compaction?
>>> 
>>> Are you getting anything in the system log(s)?
>>> 
>>> Best
>>> Jan
>>> -- 
>>> Professional Support for Apache CouchDB:
>>> https://neighbourhood.ie/couchdb-support/
>>> 
>>> 
>>>> On 07 Mar 2016, at 21:27, Greg Tarsa <gt...@axialproject.com> wrote:
>>>> 
>>>> We have a set of couchdb databases that we use to collect user information for various purposes.  I am inheriting this configuration from a predecessor and am relatively new to couchdb.
>>>> 
>>>> Whenever we attempt to compact the databases, the server crashes without any messages either in the couchdb log or the system logs.  This is running in an AWS instance with an EBS volume.
>>>> 
>>>> Experiments have shown that if the instance is configured with instance storage (ephemeral storage that disappears when the instance disappears) then this operation works properly.   But we would like to use larger volumes and have persistence.
>>>> 
>>>> When the instance is configured with an external EBS volume, then we see the server crash described above.
>>>> 
>>>> I have searched the web for “couchdb compaction crash no log” and not found anything helpful.
>>>> 
>>>> It seems like compacting while running should not be failing at all, much less silently, so I am looking for insights to the problem, or solutions if such exist.
>>>> 
>>>> Configuration and log info is below.
>>>> 
>>>> Any help would be appreciated.
>>>> 
>>>> Thanks,
>>>> Greg
>>>> 
>>>> 
>>>> ---------------------------------------------------------
>>>> 
>>>> CouchDB version: 1.6.1
>>>> OS: RHEL 6.6
>>>> 
>>>> ---------------------------------------------------------
>>>> 
>>>> Here is a directory of the databases as the time of the crash:
>>>> 
>>>> cat bad.couch.dbinfo.txt 
>>>> total 15400740
>>>>  12 -rw-r--r--. 1 couchdb couchdb       8297 Jan 20 16:31 _users.couch
>>>>  16 -rw-r--r--. 1 couchdb couchdb      12393 Jan 20 16:33 _replicator.couch
>>>> 21060 -rw-r--r--. 1 couchdb couchdb   21557368 Mar  7 11:57 biometrics.couch
>>>> 781136 -rw-r--r--. 1 couchdb couchdb  799875192 Mar  7 12:00 fitness.couch
>>>> 954244 -rw-r--r--. 1 couchdb couchdb  977137784 Mar  7 12:05 nutrition.couch
>>>> 8419624 -rw-r--r--. 1 couchdb couchdb 8621678721 Mar  7 12:06 routine.couch
>>>> 390796 -rw-r--r--. 1 couchdb couchdb  400167032 Mar  7 12:06 sleep.couch
>>>> 217932 -rw-r--r--. 1 couchdb couchdb  223154296 Mar  7 12:06 weight.couch
>>>> 4614884 -rw-r--r--. 1 couchdb couchdb 4725629060 Mar  7 12:06 trackers.couch
>>>>   4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 fitness.couch.compact
>>>>   4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 nutrition.couch.compact
>>>>   4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 routine.couch.compact
>>>>  64 -rw-r--r--. 1 couchdb couchdb      61551 Mar  7 12:41 diabetes.couch
>>>>   4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 sleep.couch.compact
>>>>  12 -rw-r--r--. 1 couchdb couchdb       8300 Mar  7 12:41 tobacco_cessation.couch
>>>>   4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 users.couch
>>>>   4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 weight.couch.compact
>>>> 152 -rw-r--r--. 1 couchdb couchdb     151797 Mar  7 12:42 trackers.couch.compact
>>>> 784 -rw-r--r--. 1 couchdb couchdb     801865 Mar  7 12:42 biometrics.couch.compact
>>>> 
>>>> ---------------------------------------------------------
>>>> 
>>>> Here is the contents of the log at the time of the crash:
>>>> 
>>>> [Mon, 07 Mar 2016 17:25:32 GMT] [info] [<0.31.0>] Apache CouchDB has started on http://0.0.0.0:5984/
>>>> [Mon, 07 Mar 2016 17:25:32 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>>>> [Mon, 07 Mar 2016 17:25:33 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>>>> [Mon, 07 Mar 2016 17:25:34 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>>>> [Mon, 07 Mar 2016 17:25:34 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>>>> [Mon, 07 Mar 2016 17:25:34 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>>>> [Mon, 07 Mar 2016 17:25:34 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>>>> [Mon, 07 Mar 2016 17:25:35 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>>> [Mon, 07 Mar 2016 17:25:37 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>>>> [Mon, 07 Mar 2016 17:25:37 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>>>> [Mon, 07 Mar 2016 17:25:38 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>>>> [Mon, 07 Mar 2016 17:25:38 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>>>> [Mon, 07 Mar 2016 17:25:40 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>>> [Mon, 07 Mar 2016 17:25:45 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>>> [Mon, 07 Mar 2016 17:25:50 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>>>> [Mon, 07 Mar 2016 17:25:50 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>>>> [Mon, 07 Mar 2016 17:25:50 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>>> [Mon, 07 Mar 2016 17:25:52 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>>>> [Mon, 07 Mar 2016 17:25:52 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>>>> [Mon, 07 Mar 2016 17:25:54 GMT] [info] [<0.114.0>] 10.1.1.12 - - GET /users/_changes?feed=continuous&style=all_docs&since=0&heartbeat=10000 200
>>>> [Mon, 07 Mar 2016 17:25:54 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>>>> [Mon, 07 Mar 2016 17:25:54 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>>>> [Mon, 07 Mar 2016 17:25:55 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>>> [Mon, 07 Mar 2016 17:26:00 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>>> [Mon, 07 Mar 2016 17:26:00 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>>>> [Mon, 07 Mar 2016 17:26:01 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>>>> [Mon, 07 Mar 2016 17:26:05 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>>> ... [numerous GET /users/ 200 messages removed for brevity] ...
>>>> [Mon, 07 Mar 2016 17:41:51 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.152.0>] 127.0.0.1 - - GET /_all_dbs 200
>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.1157.0>] Starting compaction for db "biometrics"
>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.151.0>] 127.0.0.1 - - POST /biometrics/_compact 202
>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.150.0>] 127.0.0.1 - - POST /biometrics/_view_cleanup 202
>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.1175.0>] Starting compaction for db "diabetes"
>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.198.0>] 127.0.0.1 - - POST /diabetes/_compact 202
>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.197.0>] 127.0.0.1 - - POST /diabetes/_view_cleanup 202
>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.1193.0>] Starting compaction for db "fitness"
>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.118.0>] 127.0.0.1 - - POST /fitness/_compact 202
>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.119.0>] 127.0.0.1 - - POST /fitness/_view_cleanup 202
>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.1211.0>] Starting compaction for db "nutrition"
>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.120.0>] 127.0.0.1 - - POST /nutrition/_compact 202
>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.121.0>] 127.0.0.1 - - POST /nutrition/_view_cleanup 202
>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1229.0>] Starting compaction for db "routine"
>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.122.0>] 127.0.0.1 - - POST /routine/_compact 202
>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.115.0>] 127.0.0.1 - - POST /routine/_view_cleanup 202
>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1175.0>] Compaction for db "diabetes" completed.
>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1254.0>] Starting compaction for db "sleep"
>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.116.0>] 127.0.0.1 - - POST /sleep/_compact 202
>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.117.0>] 127.0.0.1 - - POST /sleep/_view_cleanup 202
>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1272.0>] Starting compaction for db "tobacco_cessation"
>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.184.0>] 127.0.0.1 - - POST /tobacco_cessation/_compact 202
>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.183.0>] 127.0.0.1 - - POST /tobacco_cessation/_view_cleanup 202
>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1290.0>] Starting compaction for db "trackers"
>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.182.0>] 127.0.0.1 - - POST /trackers/_compact 202
>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1272.0>] Compaction for db "tobacco_cessation" completed.
>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1151.0>] 127.0.0.1 - - POST /trackers/_view_cleanup 202
>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.162.0>] Starting compaction for db "users"
>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1152.0>] 127.0.0.1 - - POST /users/_compact 202
>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1168.0>] 127.0.0.1 - - POST /users/_view_cleanup 202
>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.162.0>] Compaction for db "users" completed.
>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1329.0>] Starting compaction for db "weight"
>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1170.0>] 127.0.0.1 - - POST /weight/_compact 202
>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1187.0>] 127.0.0.1 - - POST /weight/_view_cleanup 202
>>>> [Mon, 07 Mar 2016 17:41:56 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>>> [Mon, 07 Mar 2016 17:42:01 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>>> [Mon, 07 Mar 2016 17:42:06 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>>> [Mon, 07 Mar 2016 17:42:11 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>>> 
>>>> --------------------------------------------------
>>>> 
>>> 
>>> 
>> 
> 
> -- 
> Professional Support for Apache CouchDB:
> https://neighbourhood.ie/couchdb-support/


Re: CouchDB crash during compaction with no log messages

Posted by Jan Lehnardt <ja...@apache.org>.
> On 08 Mar 2016, at 18:07, Greg Tarsa <gt...@axialproject.com> wrote:
> 
> Hi Jan,
> 
> Thanks for your quick reply to my question.  I have some answers to your questions and some new information that I got from running couched interactively.
> 
> 
>> Are there any other things going on on the VM, when you do this?
> The VM also hosts a MySQL server, but I see no evidence that this is a contributing cause for the couch issue.
> 
>> 
>> Can you reliably reproduce this behavior?
> I can reliably reproduce it.
> 
>> 
>> Are there other correlating factors (like does this always happen at the same time / due to a cronjob, etc)?
> It can be repeated by re-starting couched and re-requesting the compaction on all databases.  It is not time-dependent.

Are you running compaction on the databases in parallel or sequentially?



> 
>> 
>> Can you set your CouchDB log level to debug and see if that gets you more info? (curl -X PUT http://[user:pass@]127.0.0.1:5984/_config/log/level -d '"debug"’).
> (see below)

The paste ends with an allocation error which points to you running out of memory.

Best
Jan
--


> 
>> 
>> Is it possible for you to share these database files (publicly or in private)?
> The databases contain health data and I am unable to share them.
> 
>> 
>> What are your disk usage levels before/during compaction?
> Plenty of disk in this case.  The compacted data is in the 2G range.  The problem does not seem to be storage-size related.  We are able to compact during regular operation when using a 40G instance volume.  Unable to compact when using a 120G EBS volume.
> 
>> 
>> Are you getting anything in the system log(s)?
> That is what is odd.  There is nothing in the system logs or the couchdb logs.
> 
> Bonus data:
> 
> I ran the debug experiment with couchdb running interactively.  The session text is below, but note that I also got the following error message and an erlang core dump:
> 
>    Crash dump was written to: erl_crash.dump
>    eheap_alloc: Cannot allocate 156725600 bytes of memory (of type "old_heap").
>    Aborted (core dumped)
> 
> The dump is ~500MB.
> 
> Here is the session text:
> 
> 
> [gtarsa@prod-db01 ~]$  sudo /usr/local/bin/couchdb
> 
> Apache CouchDB 1.6.1 (LogLevel=info) is starting.
> Apache CouchDB has started. Time to relax.
> 
> [info] [<0.31.0>] Apache CouchDB has started on http://0.0.0.0:5984/
> [info] [<0.120.0>] 127.0.0.1 - - GET /_config/level 200
> [info] [<0.678.0>] 127.0.0.1 - - GET /_config/log/level 200
> [error] [<0.803.0>] attempted upload of invalid JSON (set log_level to debug to log it)
> [info] [<0.803.0>] 127.0.0.1 - - PUT /_config/log/level 400
> 
> =SUPERVISOR REPORT==== 8-Mar-2016::11:09:22 ===
>     Supervisor: {local,couch_primary_services}
>     Context:    child_terminated
>     Reason:     normal
>     Offender:   [{pid,<0.92.0>},
>                  {name,couch_log},
>                  {mfargs,{couch_log,start_link,[]}},
>                  {restart_type,permanent},
>                  {shutdown,brutal_kill},
>                  {child_type,worker}]
> 
> [debug] [<0.117.0>] 'PUT' /_config/log/level {1,1} from "127.0.0.1"
> Headers: [{'Accept',"*/*"},
>          {'Content-Length',"7"},
>          {'Content-Type',"application/x-www-form-urlencoded"},
>          {'Host',"127.0.0.1:5984"},
>          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> [debug] [<0.117.0>] OAuth Params: []
> 
> =SUPERVISOR REPORT==== 8-Mar-2016::11:09:25 ===
>     Supervisor: {local,couch_primary_services}
>     Context:    child_terminated
>     Reason:     normal
>     Offender:   [{pid,<0.828.0>},
>                  {name,couch_log},
>                  {mfargs,{couch_log,start_link,[]}},
>                  {restart_type,permanent},
>                  {shutdown,brutal_kill},
>                  {child_type,worker}]
> 
> [debug] [<0.826.0>] 'GET' /_all_dbs {1,1} from "127.0.0.1"
> Headers: [{'Accept',"*/*"},
>          {'Content-Type',"application/json"},
>          {'Host',"localhost:5984"},
>          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> [debug] [<0.826.0>] OAuth Params: []
> [info] [<0.826.0>] 127.0.0.1 - - GET /_all_dbs 200
> [debug] [<0.833.0>] 'POST' /biometrics/_compact {1,1} from "127.0.0.1"
> Headers: [{'Accept',"*/*"},
>          {'Content-Type',"application/json"},
>          {'Host',"localhost:5984"},
>          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> [debug] [<0.833.0>] OAuth Params: []
> [info] [<0.872.0>] Starting compaction for db "biometrics"
> [debug] [<0.877.0>] Compaction process spawned for db "biometrics"
> [info] [<0.833.0>] 127.0.0.1 - - POST /biometrics/_compact 202
> [debug] [<0.115.0>] 'POST' /biometrics/_view_cleanup {1,1} from "127.0.0.1"
> Headers: [{'Accept',"*/*"},
>          {'Content-Type',"application/json"},
>          {'Host',"localhost:5984"},
>          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> [debug] [<0.115.0>] OAuth Params: []
> [info] [<0.115.0>] 127.0.0.1 - - POST /biometrics/_view_cleanup 202
> [debug] [<0.114.0>] 'POST' /diabetes/_compact {1,1} from "127.0.0.1"
> Headers: [{'Accept',"*/*"},
>          {'Content-Type',"application/json"},
>          {'Host',"localhost:5984"},
>          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> [debug] [<0.114.0>] OAuth Params: []
> [info] [<0.890.0>] Starting compaction for db "diabetes"
> [debug] [<0.895.0>] Compaction process spawned for db "diabetes"
> [info] [<0.114.0>] 127.0.0.1 - - POST /diabetes/_compact 202
> [debug] [<0.113.0>] 'POST' /diabetes/_view_cleanup {1,1} from "127.0.0.1"
> Headers: [{'Accept',"*/*"},
>          {'Content-Type',"application/json"},
>          {'Host',"localhost:5984"},
>          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> [debug] [<0.113.0>] OAuth Params: []
> [info] [<0.113.0>] 127.0.0.1 - - POST /diabetes/_view_cleanup 202
> [debug] [<0.112.0>] 'POST' /fitness/_compact {1,1} from "127.0.0.1"
> Headers: [{'Accept',"*/*"},
>          {'Content-Type',"application/json"},
>          {'Host',"localhost:5984"},
>          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> [debug] [<0.112.0>] OAuth Params: []
> [info] [<0.909.0>] Starting compaction for db "fitness"
> [debug] [<0.914.0>] Compaction process spawned for db "fitness"
> [info] [<0.112.0>] 127.0.0.1 - - POST /fitness/_compact 202
> [debug] [<0.111.0>] 'POST' /fitness/_view_cleanup {1,1} from "127.0.0.1"
> Headers: [{'Accept',"*/*"},
>          {'Content-Type',"application/json"},
>          {'Host',"localhost:5984"},
>          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> [debug] [<0.111.0>] OAuth Params: []
> [info] [<0.111.0>] 127.0.0.1 - - POST /fitness/_view_cleanup 202
> [debug] [<0.110.0>] 'POST' /nutrition/_compact {1,1} from "127.0.0.1"
> Headers: [{'Accept',"*/*"},
>          {'Content-Type',"application/json"},
>          {'Host',"localhost:5984"},
>          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> [debug] [<0.110.0>] OAuth Params: []
> [info] [<0.927.0>] Starting compaction for db "nutrition"
> [debug] [<0.932.0>] Compaction process spawned for db "nutrition"
> [info] [<0.110.0>] 127.0.0.1 - - POST /nutrition/_compact 202
> [debug] [<0.109.0>] 'POST' /nutrition/_view_cleanup {1,1} from "127.0.0.1"
> Headers: [{'Accept',"*/*"},
>          {'Content-Type',"application/json"},
>          {'Host',"localhost:5984"},
>          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> [debug] [<0.109.0>] OAuth Params: []
> [info] [<0.109.0>] 127.0.0.1 - - POST /nutrition/_view_cleanup 202
> [debug] [<0.108.0>] 'POST' /routine/_compact {1,1} from "127.0.0.1"
> Headers: [{'Accept',"*/*"},
>          {'Content-Type',"application/json"},
>          {'Host',"localhost:5984"},
>          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> [debug] [<0.108.0>] OAuth Params: []
> [info] [<0.945.0>] Starting compaction for db "routine"
> [debug] [<0.950.0>] Compaction process spawned for db "routine"
> [info] [<0.108.0>] 127.0.0.1 - - POST /routine/_compact 202
> [debug] [<0.123.0>] 'POST' /routine/_view_cleanup {1,1} from "127.0.0.1"
> Headers: [{'Accept',"*/*"},
>          {'Content-Type',"application/json"},
>          {'Host',"localhost:5984"},
>          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> [debug] [<0.123.0>] OAuth Params: []
> [info] [<0.123.0>] 127.0.0.1 - - POST /routine/_view_cleanup 202
> [debug] [<0.122.0>] 'POST' /sleep/_compact {1,1} from "127.0.0.1"
> Headers: [{'Accept',"*/*"},
>          {'Content-Type',"application/json"},
>          {'Host',"localhost:5984"},
>          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> [debug] [<0.122.0>] OAuth Params: []
> [info] [<0.963.0>] Starting compaction for db "sleep"
> [debug] [<0.87.0>] New task status for <0.895.0>: [{changes_done,113},
>                                                   {database,<<"diabetes">>},
>                                                   {progress,100},
>                                                   {started_on,1457453394},
>                                                   {total_changes,113},
>                                                   {type,database_compaction},
>                                                   {updated_on,1457453395}]
> [debug] [<0.968.0>] Compaction process spawned for db "sleep"
> [info] [<0.122.0>] 127.0.0.1 - - POST /sleep/_compact 202
> [debug] [<0.385.0>] 'POST' /sleep/_view_cleanup {1,1} from "127.0.0.1"
> Headers: [{'Accept',"*/*"},
>          {'Content-Type',"application/json"},
>          {'Host',"localhost:5984"},
>          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> [debug] [<0.385.0>] OAuth Params: []
> [debug] [<0.890.0>] CouchDB swapping files /usr/local/var/lib/couchdb/diabetes.couch and /usr/local/var/lib/couchdb/diabetes.couch.compact.
> [info] [<0.385.0>] 127.0.0.1 - - POST /sleep/_view_cleanup 202
> [info] [<0.890.0>] Compaction for db "diabetes" completed.
> [debug] [<0.635.0>] 'POST' /tobacco_cessation/_compact {1,1} from "127.0.0.1"
> Headers: [{'Accept',"*/*"},
>          {'Content-Type',"application/json"},
>          {'Host',"localhost:5984"},
>          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> [debug] [<0.635.0>] OAuth Params: []
> [info] [<0.987.0>] Starting compaction for db "tobacco_cessation"
> [debug] [<0.992.0>] Compaction process spawned for db "tobacco_cessation"
> [info] [<0.635.0>] 127.0.0.1 - - POST /tobacco_cessation/_compact 202
> [debug] [<0.87.0>] New task status for <0.992.0>: [{changes_done,1},
>                                                   {database,
>                                                    <<"tobacco_cessation">>},
>                                                   {progress,100},
>                                                   {started_on,1457453395},
>                                                   {total_changes,1},
>                                                   {type,database_compaction},
>                                                   {updated_on,1457453395}]
> [debug] [<0.640.0>] 'POST' /tobacco_cessation/_view_cleanup {1,1} from "127.0.0.1"
> Headers: [{'Accept',"*/*"},
>          {'Content-Type',"application/json"},
>          {'Host',"localhost:5984"},
>          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> [debug] [<0.640.0>] OAuth Params: []
> [info] [<0.640.0>] 127.0.0.1 - - POST /tobacco_cessation/_view_cleanup 202
> [debug] [<0.987.0>] CouchDB swapping files /usr/local/var/lib/couchdb/tobacco_cessation.couch and /usr/local/var/lib/couchdb/tobacco_cessation.couch.compact.
> [info] [<0.987.0>] Compaction for db "tobacco_cessation" completed.
> [debug] [<0.815.0>] 'POST' /trackers/_compact {1,1} from "127.0.0.1"
> Headers: [{'Accept',"*/*"},
>          {'Content-Type',"application/json"},
>          {'Host',"localhost:5984"},
>          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> [debug] [<0.815.0>] OAuth Params: []
> [info] [<0.1011.0>] Starting compaction for db "trackers"
> [debug] [<0.1016.0>] Compaction process spawned for db "trackers"
> [info] [<0.815.0>] 127.0.0.1 - - POST /trackers/_compact 202
> [debug] [<0.866.0>] 'POST' /trackers/_view_cleanup {1,1} from "127.0.0.1"
> Headers: [{'Accept',"*/*"},
>          {'Content-Type',"application/json"},
>          {'Host',"localhost:5984"},
>          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> [debug] [<0.866.0>] OAuth Params: []
> [info] [<0.866.0>] 127.0.0.1 - - POST /trackers/_view_cleanup 202
> [debug] [<0.867.0>] 'POST' /users/_compact {1,1} from "127.0.0.1"
> Headers: [{'Accept',"*/*"},
>          {'Content-Type',"application/json"},
>          {'Host',"localhost:5984"},
>          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> [debug] [<0.867.0>] OAuth Params: []
> [info] [<0.1029.0>] Starting compaction for db "users"
> [debug] [<0.1034.0>] Compaction process spawned for db "users"
> [info] [<0.867.0>] 127.0.0.1 - - POST /users/_compact 202
> [debug] [<0.87.0>] New task status for <0.1034.0>: [{changes_done,0},
>                                                    {database,<<"users">>},
>                                                    {progress,0},
>                                                    {started_on,1457453395},
>                                                    {total_changes,0},
>                                                    {type,database_compaction},
>                                                    {updated_on,1457453395}]
> [debug] [<0.884.0>] 'POST' /users/_view_cleanup {1,1} from "127.0.0.1"
> Headers: [{'Accept',"*/*"},
>          {'Content-Type',"application/json"},
>          {'Host',"localhost:5984"},
>          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> [debug] [<0.884.0>] OAuth Params: []
> [info] [<0.884.0>] 127.0.0.1 - - POST /users/_view_cleanup 202
> [debug] [<0.1029.0>] CouchDB swapping files /usr/local/var/lib/couchdb/users.couch and /usr/local/var/lib/couchdb/users.couch.compact.
> [info] [<0.1029.0>] Compaction for db "users" completed.
> [debug] [<0.885.0>] 'POST' /weight/_compact {1,1} from "127.0.0.1"
> Headers: [{'Accept',"*/*"},
>          {'Content-Type',"application/json"},
>          {'Host',"localhost:5984"},
>          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> [debug] [<0.885.0>] OAuth Params: []
> [info] [<0.1053.0>] Starting compaction for db "weight"
> [debug] [<0.1058.0>] Compaction process spawned for db "weight"
> [info] [<0.885.0>] 127.0.0.1 - - POST /weight/_compact 202
> [debug] [<0.902.0>] 'POST' /weight/_view_cleanup {1,1} from "127.0.0.1"
> Headers: [{'Accept',"*/*"},
>          {'Content-Type',"application/json"},
>          {'Host',"localhost:5984"},
>          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
> [debug] [<0.902.0>] OAuth Params: []
> [info] [<0.902.0>] 127.0.0.1 - - POST /weight/_view_cleanup 202
> 
> Crash dump was written to: erl_crash.dump
> eheap_alloc: Cannot allocate 156725600 bytes of memory (of type "old_heap").
> 
> [gtarsa@prod-db01 ~]$ 
> 
> 
> 
>> On Mar 7, 2016, at 4:09 PM, Jan Lehnardt <ja...@apache.org> wrote:
>> 
>> Heya Greg,
>> 
>> this should definitely not happen at all, regardless of AWS storage type.
>> 
>> Are there any other things going on on the VM, when you do this?
>> 
>> Can you reliably reproduce this behaviour?
>> 
>> Are there other correlating factors (like does this always happen at the same time / due to a cronjob, etc)?
>> 
>> Can you set your CouchDB log level to debug and see if that gets you more info? (curl -X PUT http://[user:pass@]127.0.0.1:5984/_config/log/level -d '"debug"').
>> 
>> Is it possible for you to share these database files (publicly or in private)?
>> 
>> What are your disk usage levels before/during compaction?
>> 
>> Are you getting anything in the system log(s)?
>> 
>> Best
>> Jan
>> -- 
>> Professional Support for Apache CouchDB:
>> https://neighbourhood.ie/couchdb-support/
>> 
>> 
>>> On 07 Mar 2016, at 21:27, Greg Tarsa <gt...@axialproject.com> wrote:
>>> 
>>> We have a set of couchdb databases that we use to collect user information for various purposes.  I am inheriting this configuration from a predecessor and am relatively new to couchdb.
>>> 
>>> Whenever we attempt to compact the databases, the server crashes without any messages either in the couchdb log or the system logs.  This is running in an AWS instance with an EBS volume.
>>> 
>>> Experiments have shown that if the instance is configured with instance storage (ephemeral storage that disappears when the instance disappears) then this operation works properly.   But we would like to use larger volumes and have persistence.
>>> 
>>> When the instance is configured with an external EBS volume, then we see the server crash described above.
>>> 
>>> I have searched the web for “couchdb compaction crash no log” and not found anything helpful.
>>> 
>>> It seems like compacting while running should not be failing at all, much less silently, so I am looking for insights to the problem, or solutions if such exist.
>>> 
>>> Configuration and log info is below.
>>> 
>>> Any help would be appreciated.
>>> 
>>> Thanks,
>>> Greg
>>> 
>>> 
>>> ---------------------------------------------------------
>>> 
>>> CouchDB version: 1.6.1
>>> OS: RHEL 6.6
>>> 
>>> ---------------------------------------------------------
>>> 
>>> Here is a directory of the databases as the time of the crash:
>>> 
>>> cat bad.couch.dbinfo.txt 
>>> total 15400740
>>>   12 -rw-r--r--. 1 couchdb couchdb       8297 Jan 20 16:31 _users.couch
>>>   16 -rw-r--r--. 1 couchdb couchdb      12393 Jan 20 16:33 _replicator.couch
>>> 21060 -rw-r--r--. 1 couchdb couchdb   21557368 Mar  7 11:57 biometrics.couch
>>> 781136 -rw-r--r--. 1 couchdb couchdb  799875192 Mar  7 12:00 fitness.couch
>>> 954244 -rw-r--r--. 1 couchdb couchdb  977137784 Mar  7 12:05 nutrition.couch
>>> 8419624 -rw-r--r--. 1 couchdb couchdb 8621678721 Mar  7 12:06 routine.couch
>>> 390796 -rw-r--r--. 1 couchdb couchdb  400167032 Mar  7 12:06 sleep.couch
>>> 217932 -rw-r--r--. 1 couchdb couchdb  223154296 Mar  7 12:06 weight.couch
>>> 4614884 -rw-r--r--. 1 couchdb couchdb 4725629060 Mar  7 12:06 trackers.couch
>>>    4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 fitness.couch.compact
>>>    4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 nutrition.couch.compact
>>>    4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 routine.couch.compact
>>>   64 -rw-r--r--. 1 couchdb couchdb      61551 Mar  7 12:41 diabetes.couch
>>>    4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 sleep.couch.compact
>>>   12 -rw-r--r--. 1 couchdb couchdb       8300 Mar  7 12:41 tobacco_cessation.couch
>>>    4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 users.couch
>>>    4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 weight.couch.compact
>>>  152 -rw-r--r--. 1 couchdb couchdb     151797 Mar  7 12:42 trackers.couch.compact
>>>  784 -rw-r--r--. 1 couchdb couchdb     801865 Mar  7 12:42 biometrics.couch.compact
>>> 
>>> ---------------------------------------------------------
>>> 
>>> Here is the contents of the log at the time of the crash:
>>> 
>>> [Mon, 07 Mar 2016 17:25:32 GMT] [info] [<0.31.0>] Apache CouchDB has started on http://0.0.0.0:5984/
>>> [Mon, 07 Mar 2016 17:25:32 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>>> [Mon, 07 Mar 2016 17:25:33 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>>> [Mon, 07 Mar 2016 17:25:34 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>>> [Mon, 07 Mar 2016 17:25:34 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>>> [Mon, 07 Mar 2016 17:25:34 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>>> [Mon, 07 Mar 2016 17:25:34 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>>> [Mon, 07 Mar 2016 17:25:35 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>> [Mon, 07 Mar 2016 17:25:37 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>>> [Mon, 07 Mar 2016 17:25:37 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>>> [Mon, 07 Mar 2016 17:25:38 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>>> [Mon, 07 Mar 2016 17:25:38 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>>> [Mon, 07 Mar 2016 17:25:40 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>> [Mon, 07 Mar 2016 17:25:45 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>> [Mon, 07 Mar 2016 17:25:50 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>>> [Mon, 07 Mar 2016 17:25:50 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>>> [Mon, 07 Mar 2016 17:25:50 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>> [Mon, 07 Mar 2016 17:25:52 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>>> [Mon, 07 Mar 2016 17:25:52 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>>> [Mon, 07 Mar 2016 17:25:54 GMT] [info] [<0.114.0>] 10.1.1.12 - - GET /users/_changes?feed=continuous&style=all_docs&since=0&heartbeat=10000 200
>>> [Mon, 07 Mar 2016 17:25:54 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>>> [Mon, 07 Mar 2016 17:25:54 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>>> [Mon, 07 Mar 2016 17:25:55 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>> [Mon, 07 Mar 2016 17:26:00 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>> [Mon, 07 Mar 2016 17:26:00 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>>> [Mon, 07 Mar 2016 17:26:01 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>>> [Mon, 07 Mar 2016 17:26:05 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>> ... [numerous GET /users/ 200 messages removed for brevity] ...
>>> [Mon, 07 Mar 2016 17:41:51 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.152.0>] 127.0.0.1 - - GET /_all_dbs 200
>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.1157.0>] Starting compaction for db "biometrics"
>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.151.0>] 127.0.0.1 - - POST /biometrics/_compact 202
>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.150.0>] 127.0.0.1 - - POST /biometrics/_view_cleanup 202
>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.1175.0>] Starting compaction for db "diabetes"
>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.198.0>] 127.0.0.1 - - POST /diabetes/_compact 202
>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.197.0>] 127.0.0.1 - - POST /diabetes/_view_cleanup 202
>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.1193.0>] Starting compaction for db "fitness"
>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.118.0>] 127.0.0.1 - - POST /fitness/_compact 202
>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.119.0>] 127.0.0.1 - - POST /fitness/_view_cleanup 202
>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.1211.0>] Starting compaction for db "nutrition"
>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.120.0>] 127.0.0.1 - - POST /nutrition/_compact 202
>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.121.0>] 127.0.0.1 - - POST /nutrition/_view_cleanup 202
>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1229.0>] Starting compaction for db "routine"
>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.122.0>] 127.0.0.1 - - POST /routine/_compact 202
>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.115.0>] 127.0.0.1 - - POST /routine/_view_cleanup 202
>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1175.0>] Compaction for db "diabetes" completed.
>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1254.0>] Starting compaction for db "sleep"
>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.116.0>] 127.0.0.1 - - POST /sleep/_compact 202
>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.117.0>] 127.0.0.1 - - POST /sleep/_view_cleanup 202
>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1272.0>] Starting compaction for db "tobacco_cessation"
>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.184.0>] 127.0.0.1 - - POST /tobacco_cessation/_compact 202
>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.183.0>] 127.0.0.1 - - POST /tobacco_cessation/_view_cleanup 202
>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1290.0>] Starting compaction for db "trackers"
>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.182.0>] 127.0.0.1 - - POST /trackers/_compact 202
>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1272.0>] Compaction for db "tobacco_cessation" completed.
>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1151.0>] 127.0.0.1 - - POST /trackers/_view_cleanup 202
>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.162.0>] Starting compaction for db "users"
>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1152.0>] 127.0.0.1 - - POST /users/_compact 202
>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1168.0>] 127.0.0.1 - - POST /users/_view_cleanup 202
>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.162.0>] Compaction for db "users" completed.
>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1329.0>] Starting compaction for db "weight"
>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1170.0>] 127.0.0.1 - - POST /weight/_compact 202
>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1187.0>] 127.0.0.1 - - POST /weight/_view_cleanup 202
>>> [Mon, 07 Mar 2016 17:41:56 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>> [Mon, 07 Mar 2016 17:42:01 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>> [Mon, 07 Mar 2016 17:42:06 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>> [Mon, 07 Mar 2016 17:42:11 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>>> 
>>> --------------------------------------------------
>>> 
>> 
>> 
> 

-- 
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/


Re: CouchDB crash during compaction with no log messages

Posted by Greg Tarsa <gt...@axialproject.com>.
Hi Jan,

Thanks for your quick reply to my question.  I have some answers to your questions and some new information that I got from running couched interactively.


> Are there any other things going on on the VM, when you do this?
The VM also hosts a MySQL server, but I see no evidence that this is a contributing cause for the couch issue.

> 
> Can you reliably reproduce this behavior?
I can reliably reproduce it.

> 
> Are there other correlating factors (like does this always happen at the same time / due to a cronjob, etc)?
It can be repeated by re-starting couched and re-requesting the compaction on all databases.  It is not time-dependent.

> 
> Can you set your CouchDB log level to debug and see if that gets you more info? (curl -X PUT http://[user:pass@]127.0.0.1:5984/_config/log/level -d '"debug"’).
 (see below)

> 
> Is it possible for you to share these database files (publicly or in private)?
The databases contain health data and I am unable to share them.

> 
> What are your disk usage levels before/during compaction?
Plenty of disk in this case.  The compacted data is in the 2G range.  The problem does not seem to be storage-size related.  We are able to compact during regular operation when using a 40G instance volume.  Unable to compact when using a 120G EBS volume.

> 
> Are you getting anything in the system log(s)?
That is what is odd.  There is nothing in the system logs or the couchdb logs.

Bonus data:

I ran the debug experiment with couchdb running interactively.  The session text is below, but note that I also got the following error message and an erlang core dump:

    Crash dump was written to: erl_crash.dump
    eheap_alloc: Cannot allocate 156725600 bytes of memory (of type "old_heap").
    Aborted (core dumped)

The dump is ~500MB.

Here is the session text:


[gtarsa@prod-db01 ~]$  sudo /usr/local/bin/couchdb

Apache CouchDB 1.6.1 (LogLevel=info) is starting.
Apache CouchDB has started. Time to relax.

[info] [<0.31.0>] Apache CouchDB has started on http://0.0.0.0:5984/
[info] [<0.120.0>] 127.0.0.1 - - GET /_config/level 200
[info] [<0.678.0>] 127.0.0.1 - - GET /_config/log/level 200
[error] [<0.803.0>] attempted upload of invalid JSON (set log_level to debug to log it)
[info] [<0.803.0>] 127.0.0.1 - - PUT /_config/log/level 400

=SUPERVISOR REPORT==== 8-Mar-2016::11:09:22 ===
     Supervisor: {local,couch_primary_services}
     Context:    child_terminated
     Reason:     normal
     Offender:   [{pid,<0.92.0>},
                  {name,couch_log},
                  {mfargs,{couch_log,start_link,[]}},
                  {restart_type,permanent},
                  {shutdown,brutal_kill},
                  {child_type,worker}]

[debug] [<0.117.0>] 'PUT' /_config/log/level {1,1} from "127.0.0.1"
Headers: [{'Accept',"*/*"},
          {'Content-Length',"7"},
          {'Content-Type',"application/x-www-form-urlencoded"},
          {'Host',"127.0.0.1:5984"},
          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
[debug] [<0.117.0>] OAuth Params: []

=SUPERVISOR REPORT==== 8-Mar-2016::11:09:25 ===
     Supervisor: {local,couch_primary_services}
     Context:    child_terminated
     Reason:     normal
     Offender:   [{pid,<0.828.0>},
                  {name,couch_log},
                  {mfargs,{couch_log,start_link,[]}},
                  {restart_type,permanent},
                  {shutdown,brutal_kill},
                  {child_type,worker}]

[debug] [<0.826.0>] 'GET' /_all_dbs {1,1} from "127.0.0.1"
Headers: [{'Accept',"*/*"},
          {'Content-Type',"application/json"},
          {'Host',"localhost:5984"},
          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
[debug] [<0.826.0>] OAuth Params: []
[info] [<0.826.0>] 127.0.0.1 - - GET /_all_dbs 200
[debug] [<0.833.0>] 'POST' /biometrics/_compact {1,1} from "127.0.0.1"
Headers: [{'Accept',"*/*"},
          {'Content-Type',"application/json"},
          {'Host',"localhost:5984"},
          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
[debug] [<0.833.0>] OAuth Params: []
[info] [<0.872.0>] Starting compaction for db "biometrics"
[debug] [<0.877.0>] Compaction process spawned for db "biometrics"
[info] [<0.833.0>] 127.0.0.1 - - POST /biometrics/_compact 202
[debug] [<0.115.0>] 'POST' /biometrics/_view_cleanup {1,1} from "127.0.0.1"
Headers: [{'Accept',"*/*"},
          {'Content-Type',"application/json"},
          {'Host',"localhost:5984"},
          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
[debug] [<0.115.0>] OAuth Params: []
[info] [<0.115.0>] 127.0.0.1 - - POST /biometrics/_view_cleanup 202
[debug] [<0.114.0>] 'POST' /diabetes/_compact {1,1} from "127.0.0.1"
Headers: [{'Accept',"*/*"},
          {'Content-Type',"application/json"},
          {'Host',"localhost:5984"},
          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
[debug] [<0.114.0>] OAuth Params: []
[info] [<0.890.0>] Starting compaction for db "diabetes"
[debug] [<0.895.0>] Compaction process spawned for db "diabetes"
[info] [<0.114.0>] 127.0.0.1 - - POST /diabetes/_compact 202
[debug] [<0.113.0>] 'POST' /diabetes/_view_cleanup {1,1} from "127.0.0.1"
Headers: [{'Accept',"*/*"},
          {'Content-Type',"application/json"},
          {'Host',"localhost:5984"},
          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
[debug] [<0.113.0>] OAuth Params: []
[info] [<0.113.0>] 127.0.0.1 - - POST /diabetes/_view_cleanup 202
[debug] [<0.112.0>] 'POST' /fitness/_compact {1,1} from "127.0.0.1"
Headers: [{'Accept',"*/*"},
          {'Content-Type',"application/json"},
          {'Host',"localhost:5984"},
          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
[debug] [<0.112.0>] OAuth Params: []
[info] [<0.909.0>] Starting compaction for db "fitness"
[debug] [<0.914.0>] Compaction process spawned for db "fitness"
[info] [<0.112.0>] 127.0.0.1 - - POST /fitness/_compact 202
[debug] [<0.111.0>] 'POST' /fitness/_view_cleanup {1,1} from "127.0.0.1"
Headers: [{'Accept',"*/*"},
          {'Content-Type',"application/json"},
          {'Host',"localhost:5984"},
          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
[debug] [<0.111.0>] OAuth Params: []
[info] [<0.111.0>] 127.0.0.1 - - POST /fitness/_view_cleanup 202
[debug] [<0.110.0>] 'POST' /nutrition/_compact {1,1} from "127.0.0.1"
Headers: [{'Accept',"*/*"},
          {'Content-Type',"application/json"},
          {'Host',"localhost:5984"},
          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
[debug] [<0.110.0>] OAuth Params: []
[info] [<0.927.0>] Starting compaction for db "nutrition"
[debug] [<0.932.0>] Compaction process spawned for db "nutrition"
[info] [<0.110.0>] 127.0.0.1 - - POST /nutrition/_compact 202
[debug] [<0.109.0>] 'POST' /nutrition/_view_cleanup {1,1} from "127.0.0.1"
Headers: [{'Accept',"*/*"},
          {'Content-Type',"application/json"},
          {'Host',"localhost:5984"},
          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
[debug] [<0.109.0>] OAuth Params: []
[info] [<0.109.0>] 127.0.0.1 - - POST /nutrition/_view_cleanup 202
[debug] [<0.108.0>] 'POST' /routine/_compact {1,1} from "127.0.0.1"
Headers: [{'Accept',"*/*"},
          {'Content-Type',"application/json"},
          {'Host',"localhost:5984"},
          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
[debug] [<0.108.0>] OAuth Params: []
[info] [<0.945.0>] Starting compaction for db "routine"
[debug] [<0.950.0>] Compaction process spawned for db "routine"
[info] [<0.108.0>] 127.0.0.1 - - POST /routine/_compact 202
[debug] [<0.123.0>] 'POST' /routine/_view_cleanup {1,1} from "127.0.0.1"
Headers: [{'Accept',"*/*"},
          {'Content-Type',"application/json"},
          {'Host',"localhost:5984"},
          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
[debug] [<0.123.0>] OAuth Params: []
[info] [<0.123.0>] 127.0.0.1 - - POST /routine/_view_cleanup 202
[debug] [<0.122.0>] 'POST' /sleep/_compact {1,1} from "127.0.0.1"
Headers: [{'Accept',"*/*"},
          {'Content-Type',"application/json"},
          {'Host',"localhost:5984"},
          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
[debug] [<0.122.0>] OAuth Params: []
[info] [<0.963.0>] Starting compaction for db "sleep"
[debug] [<0.87.0>] New task status for <0.895.0>: [{changes_done,113},
                                                   {database,<<"diabetes">>},
                                                   {progress,100},
                                                   {started_on,1457453394},
                                                   {total_changes,113},
                                                   {type,database_compaction},
                                                   {updated_on,1457453395}]
[debug] [<0.968.0>] Compaction process spawned for db "sleep"
[info] [<0.122.0>] 127.0.0.1 - - POST /sleep/_compact 202
[debug] [<0.385.0>] 'POST' /sleep/_view_cleanup {1,1} from "127.0.0.1"
Headers: [{'Accept',"*/*"},
          {'Content-Type',"application/json"},
          {'Host',"localhost:5984"},
          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
[debug] [<0.385.0>] OAuth Params: []
[debug] [<0.890.0>] CouchDB swapping files /usr/local/var/lib/couchdb/diabetes.couch and /usr/local/var/lib/couchdb/diabetes.couch.compact.
[info] [<0.385.0>] 127.0.0.1 - - POST /sleep/_view_cleanup 202
[info] [<0.890.0>] Compaction for db "diabetes" completed.
[debug] [<0.635.0>] 'POST' /tobacco_cessation/_compact {1,1} from "127.0.0.1"
Headers: [{'Accept',"*/*"},
          {'Content-Type',"application/json"},
          {'Host',"localhost:5984"},
          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
[debug] [<0.635.0>] OAuth Params: []
[info] [<0.987.0>] Starting compaction for db "tobacco_cessation"
[debug] [<0.992.0>] Compaction process spawned for db "tobacco_cessation"
[info] [<0.635.0>] 127.0.0.1 - - POST /tobacco_cessation/_compact 202
[debug] [<0.87.0>] New task status for <0.992.0>: [{changes_done,1},
                                                   {database,
                                                    <<"tobacco_cessation">>},
                                                   {progress,100},
                                                   {started_on,1457453395},
                                                   {total_changes,1},
                                                   {type,database_compaction},
                                                   {updated_on,1457453395}]
[debug] [<0.640.0>] 'POST' /tobacco_cessation/_view_cleanup {1,1} from "127.0.0.1"
Headers: [{'Accept',"*/*"},
          {'Content-Type',"application/json"},
          {'Host',"localhost:5984"},
          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
[debug] [<0.640.0>] OAuth Params: []
[info] [<0.640.0>] 127.0.0.1 - - POST /tobacco_cessation/_view_cleanup 202
[debug] [<0.987.0>] CouchDB swapping files /usr/local/var/lib/couchdb/tobacco_cessation.couch and /usr/local/var/lib/couchdb/tobacco_cessation.couch.compact.
[info] [<0.987.0>] Compaction for db "tobacco_cessation" completed.
[debug] [<0.815.0>] 'POST' /trackers/_compact {1,1} from "127.0.0.1"
Headers: [{'Accept',"*/*"},
          {'Content-Type',"application/json"},
          {'Host',"localhost:5984"},
          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
[debug] [<0.815.0>] OAuth Params: []
[info] [<0.1011.0>] Starting compaction for db "trackers"
[debug] [<0.1016.0>] Compaction process spawned for db "trackers"
[info] [<0.815.0>] 127.0.0.1 - - POST /trackers/_compact 202
[debug] [<0.866.0>] 'POST' /trackers/_view_cleanup {1,1} from "127.0.0.1"
Headers: [{'Accept',"*/*"},
          {'Content-Type',"application/json"},
          {'Host',"localhost:5984"},
          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
[debug] [<0.866.0>] OAuth Params: []
[info] [<0.866.0>] 127.0.0.1 - - POST /trackers/_view_cleanup 202
[debug] [<0.867.0>] 'POST' /users/_compact {1,1} from "127.0.0.1"
Headers: [{'Accept',"*/*"},
          {'Content-Type',"application/json"},
          {'Host',"localhost:5984"},
          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
[debug] [<0.867.0>] OAuth Params: []
[info] [<0.1029.0>] Starting compaction for db "users"
[debug] [<0.1034.0>] Compaction process spawned for db "users"
[info] [<0.867.0>] 127.0.0.1 - - POST /users/_compact 202
[debug] [<0.87.0>] New task status for <0.1034.0>: [{changes_done,0},
                                                    {database,<<"users">>},
                                                    {progress,0},
                                                    {started_on,1457453395},
                                                    {total_changes,0},
                                                    {type,database_compaction},
                                                    {updated_on,1457453395}]
[debug] [<0.884.0>] 'POST' /users/_view_cleanup {1,1} from "127.0.0.1"
Headers: [{'Accept',"*/*"},
          {'Content-Type',"application/json"},
          {'Host',"localhost:5984"},
          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
[debug] [<0.884.0>] OAuth Params: []
[info] [<0.884.0>] 127.0.0.1 - - POST /users/_view_cleanup 202
[debug] [<0.1029.0>] CouchDB swapping files /usr/local/var/lib/couchdb/users.couch and /usr/local/var/lib/couchdb/users.couch.compact.
[info] [<0.1029.0>] Compaction for db "users" completed.
[debug] [<0.885.0>] 'POST' /weight/_compact {1,1} from "127.0.0.1"
Headers: [{'Accept',"*/*"},
          {'Content-Type',"application/json"},
          {'Host',"localhost:5984"},
          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
[debug] [<0.885.0>] OAuth Params: []
[info] [<0.1053.0>] Starting compaction for db "weight"
[debug] [<0.1058.0>] Compaction process spawned for db "weight"
[info] [<0.885.0>] 127.0.0.1 - - POST /weight/_compact 202
[debug] [<0.902.0>] 'POST' /weight/_view_cleanup {1,1} from "127.0.0.1"
Headers: [{'Accept',"*/*"},
          {'Content-Type',"application/json"},
          {'Host',"localhost:5984"},
          {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
[debug] [<0.902.0>] OAuth Params: []
[info] [<0.902.0>] 127.0.0.1 - - POST /weight/_view_cleanup 202

Crash dump was written to: erl_crash.dump
eheap_alloc: Cannot allocate 156725600 bytes of memory (of type "old_heap").

[gtarsa@prod-db01 ~]$ 



> On Mar 7, 2016, at 4:09 PM, Jan Lehnardt <ja...@apache.org> wrote:
> 
> Heya Greg,
> 
> this should definitely not happen at all, regardless of AWS storage type.
> 
> Are there any other things going on on the VM, when you do this?
> 
> Can you reliably reproduce this behaviour?
> 
> Are there other correlating factors (like does this always happen at the same time / due to a cronjob, etc)?
> 
> Can you set your CouchDB log level to debug and see if that gets you more info? (curl -X PUT http://[user:pass@]127.0.0.1:5984/_config/log/level -d '"debug"').
> 
> Is it possible for you to share these database files (publicly or in private)?
> 
> What are your disk usage levels before/during compaction?
> 
> Are you getting anything in the system log(s)?
> 
> Best
> Jan
> -- 
> Professional Support for Apache CouchDB:
> https://neighbourhood.ie/couchdb-support/
> 
> 
>> On 07 Mar 2016, at 21:27, Greg Tarsa <gt...@axialproject.com> wrote:
>> 
>> We have a set of couchdb databases that we use to collect user information for various purposes.  I am inheriting this configuration from a predecessor and am relatively new to couchdb.
>> 
>> Whenever we attempt to compact the databases, the server crashes without any messages either in the couchdb log or the system logs.  This is running in an AWS instance with an EBS volume.
>> 
>> Experiments have shown that if the instance is configured with instance storage (ephemeral storage that disappears when the instance disappears) then this operation works properly.   But we would like to use larger volumes and have persistence.
>> 
>> When the instance is configured with an external EBS volume, then we see the server crash described above.
>> 
>> I have searched the web for “couchdb compaction crash no log” and not found anything helpful.
>> 
>> It seems like compacting while running should not be failing at all, much less silently, so I am looking for insights to the problem, or solutions if such exist.
>> 
>> Configuration and log info is below.
>> 
>> Any help would be appreciated.
>> 
>> Thanks,
>> Greg
>> 
>> 
>> ---------------------------------------------------------
>> 
>> CouchDB version: 1.6.1
>> OS: RHEL 6.6
>> 
>> ---------------------------------------------------------
>> 
>> Here is a directory of the databases as the time of the crash:
>> 
>> cat bad.couch.dbinfo.txt 
>> total 15400740
>>    12 -rw-r--r--. 1 couchdb couchdb       8297 Jan 20 16:31 _users.couch
>>    16 -rw-r--r--. 1 couchdb couchdb      12393 Jan 20 16:33 _replicator.couch
>> 21060 -rw-r--r--. 1 couchdb couchdb   21557368 Mar  7 11:57 biometrics.couch
>> 781136 -rw-r--r--. 1 couchdb couchdb  799875192 Mar  7 12:00 fitness.couch
>> 954244 -rw-r--r--. 1 couchdb couchdb  977137784 Mar  7 12:05 nutrition.couch
>> 8419624 -rw-r--r--. 1 couchdb couchdb 8621678721 Mar  7 12:06 routine.couch
>> 390796 -rw-r--r--. 1 couchdb couchdb  400167032 Mar  7 12:06 sleep.couch
>> 217932 -rw-r--r--. 1 couchdb couchdb  223154296 Mar  7 12:06 weight.couch
>> 4614884 -rw-r--r--. 1 couchdb couchdb 4725629060 Mar  7 12:06 trackers.couch
>>     4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 fitness.couch.compact
>>     4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 nutrition.couch.compact
>>     4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 routine.couch.compact
>>    64 -rw-r--r--. 1 couchdb couchdb      61551 Mar  7 12:41 diabetes.couch
>>     4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 sleep.couch.compact
>>    12 -rw-r--r--. 1 couchdb couchdb       8300 Mar  7 12:41 tobacco_cessation.couch
>>     4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 users.couch
>>     4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 weight.couch.compact
>>   152 -rw-r--r--. 1 couchdb couchdb     151797 Mar  7 12:42 trackers.couch.compact
>>   784 -rw-r--r--. 1 couchdb couchdb     801865 Mar  7 12:42 biometrics.couch.compact
>> 
>> ---------------------------------------------------------
>> 
>> Here is the contents of the log at the time of the crash:
>> 
>> [Mon, 07 Mar 2016 17:25:32 GMT] [info] [<0.31.0>] Apache CouchDB has started on http://0.0.0.0:5984/
>> [Mon, 07 Mar 2016 17:25:32 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>> [Mon, 07 Mar 2016 17:25:33 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>> [Mon, 07 Mar 2016 17:25:34 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>> [Mon, 07 Mar 2016 17:25:34 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>> [Mon, 07 Mar 2016 17:25:34 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>> [Mon, 07 Mar 2016 17:25:34 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>> [Mon, 07 Mar 2016 17:25:35 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>> [Mon, 07 Mar 2016 17:25:37 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>> [Mon, 07 Mar 2016 17:25:37 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>> [Mon, 07 Mar 2016 17:25:38 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>> [Mon, 07 Mar 2016 17:25:38 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>> [Mon, 07 Mar 2016 17:25:40 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>> [Mon, 07 Mar 2016 17:25:45 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>> [Mon, 07 Mar 2016 17:25:50 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>> [Mon, 07 Mar 2016 17:25:50 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>> [Mon, 07 Mar 2016 17:25:50 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>> [Mon, 07 Mar 2016 17:25:52 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>> [Mon, 07 Mar 2016 17:25:52 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>> [Mon, 07 Mar 2016 17:25:54 GMT] [info] [<0.114.0>] 10.1.1.12 - - GET /users/_changes?feed=continuous&style=all_docs&since=0&heartbeat=10000 200
>> [Mon, 07 Mar 2016 17:25:54 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>> [Mon, 07 Mar 2016 17:25:54 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>> [Mon, 07 Mar 2016 17:25:55 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>> [Mon, 07 Mar 2016 17:26:00 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>> [Mon, 07 Mar 2016 17:26:00 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
>> [Mon, 07 Mar 2016 17:26:01 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
>> [Mon, 07 Mar 2016 17:26:05 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>> ... [numerous GET /users/ 200 messages removed for brevity] ...
>> [Mon, 07 Mar 2016 17:41:51 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.152.0>] 127.0.0.1 - - GET /_all_dbs 200
>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.1157.0>] Starting compaction for db "biometrics"
>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.151.0>] 127.0.0.1 - - POST /biometrics/_compact 202
>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.150.0>] 127.0.0.1 - - POST /biometrics/_view_cleanup 202
>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.1175.0>] Starting compaction for db "diabetes"
>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.198.0>] 127.0.0.1 - - POST /diabetes/_compact 202
>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.197.0>] 127.0.0.1 - - POST /diabetes/_view_cleanup 202
>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.1193.0>] Starting compaction for db "fitness"
>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.118.0>] 127.0.0.1 - - POST /fitness/_compact 202
>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.119.0>] 127.0.0.1 - - POST /fitness/_view_cleanup 202
>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.1211.0>] Starting compaction for db "nutrition"
>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.120.0>] 127.0.0.1 - - POST /nutrition/_compact 202
>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.121.0>] 127.0.0.1 - - POST /nutrition/_view_cleanup 202
>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1229.0>] Starting compaction for db "routine"
>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.122.0>] 127.0.0.1 - - POST /routine/_compact 202
>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.115.0>] 127.0.0.1 - - POST /routine/_view_cleanup 202
>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1175.0>] Compaction for db "diabetes" completed.
>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1254.0>] Starting compaction for db "sleep"
>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.116.0>] 127.0.0.1 - - POST /sleep/_compact 202
>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.117.0>] 127.0.0.1 - - POST /sleep/_view_cleanup 202
>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1272.0>] Starting compaction for db "tobacco_cessation"
>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.184.0>] 127.0.0.1 - - POST /tobacco_cessation/_compact 202
>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.183.0>] 127.0.0.1 - - POST /tobacco_cessation/_view_cleanup 202
>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1290.0>] Starting compaction for db "trackers"
>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.182.0>] 127.0.0.1 - - POST /trackers/_compact 202
>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1272.0>] Compaction for db "tobacco_cessation" completed.
>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1151.0>] 127.0.0.1 - - POST /trackers/_view_cleanup 202
>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.162.0>] Starting compaction for db "users"
>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1152.0>] 127.0.0.1 - - POST /users/_compact 202
>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1168.0>] 127.0.0.1 - - POST /users/_view_cleanup 202
>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.162.0>] Compaction for db "users" completed.
>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1329.0>] Starting compaction for db "weight"
>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1170.0>] 127.0.0.1 - - POST /weight/_compact 202
>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1187.0>] 127.0.0.1 - - POST /weight/_view_cleanup 202
>> [Mon, 07 Mar 2016 17:41:56 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>> [Mon, 07 Mar 2016 17:42:01 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>> [Mon, 07 Mar 2016 17:42:06 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>> [Mon, 07 Mar 2016 17:42:11 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
>> 
>> --------------------------------------------------
>> 
> 
> 


Re: CouchDB crash during compaction with no log messages

Posted by Jan Lehnardt <ja...@apache.org>.
Heya Greg,

this should definitely not happen at all, regardless of AWS storage type.

Are there any other things going on on the VM, when you do this?

Can you reliably reproduce this behaviour?

Are there other correlating factors (like does this always happen at the same time / due to a cronjob, etc)?

Can you set your CouchDB log level to debug and see if that gets you more info? (curl -X PUT http://[user:pass@]127.0.0.1:5984/_config/log/level -d '"debug"').

Is it possible for you to share these database files (publicly or in private)?

What are your disk usage levels before/during compaction?

Are you getting anything in the system log(s)?

Best
Jan
-- 
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/


> On 07 Mar 2016, at 21:27, Greg Tarsa <gt...@axialproject.com> wrote:
> 
> We have a set of couchdb databases that we use to collect user information for various purposes.  I am inheriting this configuration from a predecessor and am relatively new to couchdb.
> 
> Whenever we attempt to compact the databases, the server crashes without any messages either in the couchdb log or the system logs.  This is running in an AWS instance with an EBS volume.
> 
> Experiments have shown that if the instance is configured with instance storage (ephemeral storage that disappears when the instance disappears) then this operation works properly.   But we would like to use larger volumes and have persistence.
> 
> When the instance is configured with an external EBS volume, then we see the server crash described above.
> 
> I have searched the web for “couchdb compaction crash no log” and not found anything helpful.
> 
> It seems like compacting while running should not be failing at all, much less silently, so I am looking for insights to the problem, or solutions if such exist.
> 
> Configuration and log info is below.
> 
> Any help would be appreciated.
> 
> Thanks,
> Greg
> 
> 
> ---------------------------------------------------------
> 
> CouchDB version: 1.6.1
> OS: RHEL 6.6
> 
> ---------------------------------------------------------
> 
> Here is a directory of the databases as the time of the crash:
> 
> cat bad.couch.dbinfo.txt 
> total 15400740
>     12 -rw-r--r--. 1 couchdb couchdb       8297 Jan 20 16:31 _users.couch
>     16 -rw-r--r--. 1 couchdb couchdb      12393 Jan 20 16:33 _replicator.couch
>  21060 -rw-r--r--. 1 couchdb couchdb   21557368 Mar  7 11:57 biometrics.couch
> 781136 -rw-r--r--. 1 couchdb couchdb  799875192 Mar  7 12:00 fitness.couch
> 954244 -rw-r--r--. 1 couchdb couchdb  977137784 Mar  7 12:05 nutrition.couch
> 8419624 -rw-r--r--. 1 couchdb couchdb 8621678721 Mar  7 12:06 routine.couch
> 390796 -rw-r--r--. 1 couchdb couchdb  400167032 Mar  7 12:06 sleep.couch
> 217932 -rw-r--r--. 1 couchdb couchdb  223154296 Mar  7 12:06 weight.couch
> 4614884 -rw-r--r--. 1 couchdb couchdb 4725629060 Mar  7 12:06 trackers.couch
>      4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 fitness.couch.compact
>      4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 nutrition.couch.compact
>      4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 routine.couch.compact
>     64 -rw-r--r--. 1 couchdb couchdb      61551 Mar  7 12:41 diabetes.couch
>      4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 sleep.couch.compact
>     12 -rw-r--r--. 1 couchdb couchdb       8300 Mar  7 12:41 tobacco_cessation.couch
>      4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 users.couch
>      4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 weight.couch.compact
>    152 -rw-r--r--. 1 couchdb couchdb     151797 Mar  7 12:42 trackers.couch.compact
>    784 -rw-r--r--. 1 couchdb couchdb     801865 Mar  7 12:42 biometrics.couch.compact
> 
> ---------------------------------------------------------
> 
> Here is the contents of the log at the time of the crash:
> 
> [Mon, 07 Mar 2016 17:25:32 GMT] [info] [<0.31.0>] Apache CouchDB has started on http://0.0.0.0:5984/
> [Mon, 07 Mar 2016 17:25:32 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
> [Mon, 07 Mar 2016 17:25:33 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
> [Mon, 07 Mar 2016 17:25:34 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
> [Mon, 07 Mar 2016 17:25:34 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
> [Mon, 07 Mar 2016 17:25:34 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
> [Mon, 07 Mar 2016 17:25:34 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
> [Mon, 07 Mar 2016 17:25:35 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
> [Mon, 07 Mar 2016 17:25:37 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
> [Mon, 07 Mar 2016 17:25:37 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
> [Mon, 07 Mar 2016 17:25:38 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
> [Mon, 07 Mar 2016 17:25:38 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
> [Mon, 07 Mar 2016 17:25:40 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
> [Mon, 07 Mar 2016 17:25:45 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
> [Mon, 07 Mar 2016 17:25:50 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
> [Mon, 07 Mar 2016 17:25:50 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
> [Mon, 07 Mar 2016 17:25:50 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
> [Mon, 07 Mar 2016 17:25:52 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
> [Mon, 07 Mar 2016 17:25:52 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
> [Mon, 07 Mar 2016 17:25:54 GMT] [info] [<0.114.0>] 10.1.1.12 - - GET /users/_changes?feed=continuous&style=all_docs&since=0&heartbeat=10000 200
> [Mon, 07 Mar 2016 17:25:54 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
> [Mon, 07 Mar 2016 17:25:54 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
> [Mon, 07 Mar 2016 17:25:55 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
> [Mon, 07 Mar 2016 17:26:00 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
> [Mon, 07 Mar 2016 17:26:00 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /_active_tasks 200
> [Mon, 07 Mar 2016 17:26:01 GMT] [info] [<0.108.0>] 127.0.0.1 - - GET /favicon.ico 200
> [Mon, 07 Mar 2016 17:26:05 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
> ... [numerous GET /users/ 200 messages removed for brevity] ...
> [Mon, 07 Mar 2016 17:41:51 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.152.0>] 127.0.0.1 - - GET /_all_dbs 200
> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.1157.0>] Starting compaction for db "biometrics"
> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.151.0>] 127.0.0.1 - - POST /biometrics/_compact 202
> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.150.0>] 127.0.0.1 - - POST /biometrics/_view_cleanup 202
> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.1175.0>] Starting compaction for db "diabetes"
> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.198.0>] 127.0.0.1 - - POST /diabetes/_compact 202
> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.197.0>] 127.0.0.1 - - POST /diabetes/_view_cleanup 202
> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.1193.0>] Starting compaction for db "fitness"
> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.118.0>] 127.0.0.1 - - POST /fitness/_compact 202
> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.119.0>] 127.0.0.1 - - POST /fitness/_view_cleanup 202
> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.1211.0>] Starting compaction for db "nutrition"
> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.120.0>] 127.0.0.1 - - POST /nutrition/_compact 202
> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.121.0>] 127.0.0.1 - - POST /nutrition/_view_cleanup 202
> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1229.0>] Starting compaction for db "routine"
> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.122.0>] 127.0.0.1 - - POST /routine/_compact 202
> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.115.0>] 127.0.0.1 - - POST /routine/_view_cleanup 202
> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1175.0>] Compaction for db "diabetes" completed.
> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1254.0>] Starting compaction for db "sleep"
> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.116.0>] 127.0.0.1 - - POST /sleep/_compact 202
> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.117.0>] 127.0.0.1 - - POST /sleep/_view_cleanup 202
> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1272.0>] Starting compaction for db "tobacco_cessation"
> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.184.0>] 127.0.0.1 - - POST /tobacco_cessation/_compact 202
> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.183.0>] 127.0.0.1 - - POST /tobacco_cessation/_view_cleanup 202
> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1290.0>] Starting compaction for db "trackers"
> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.182.0>] 127.0.0.1 - - POST /trackers/_compact 202
> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1272.0>] Compaction for db "tobacco_cessation" completed.
> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1151.0>] 127.0.0.1 - - POST /trackers/_view_cleanup 202
> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.162.0>] Starting compaction for db "users"
> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1152.0>] 127.0.0.1 - - POST /users/_compact 202
> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1168.0>] 127.0.0.1 - - POST /users/_view_cleanup 202
> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.162.0>] Compaction for db "users" completed.
> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1329.0>] Starting compaction for db "weight"
> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1170.0>] 127.0.0.1 - - POST /weight/_compact 202
> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1187.0>] 127.0.0.1 - - POST /weight/_view_cleanup 202
> [Mon, 07 Mar 2016 17:41:56 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
> [Mon, 07 Mar 2016 17:42:01 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
> [Mon, 07 Mar 2016 17:42:06 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
> [Mon, 07 Mar 2016 17:42:11 GMT] [info] [<0.123.0>] 10.1.1.12 - - GET /users/ 200
> 
> --------------------------------------------------
>