You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by GitBox <gi...@apache.org> on 2020/08/17 21:40:34 UTC

[GitHub] [couchdb] raulmartinezr opened a new issue #3083: Couchdb stops writting logs and and then after some time we have always timeouts

raulmartinezr opened a new issue #3083:
URL: https://github.com/apache/couchdb/issues/3083


   [NOTE]: # ( ^^ Provide a general summary of the issue in the title above. ^^ )
   
   ## Description
   We use **dockerized couchdb** in local development environment to **test applications**.
   The procedure for each test:
     - create a database in couchdb
     - test whatever we need to test inside this DB
     - delete it in case it's not necessary anymore
   
   We have **aprox 70 tests**
   
   For each database a few documents are created
      - **less than 10 documents** usually (max 50 when we test indexes created by ddoc) 
      - **2 design documents,** one with two mango indexes and the other one with 1 mango index
   
   If we execute tests module by module (each module contain between 1 and 10 tests ) there is not any issue, but if we execute all at the same time (sequentially, without parallelism), then at some point tests hangs
   
   **We observed**
   - Couchdb stops writting logs (CPU consumption decreases)
   - We still have responses to quieries for some time, 1min aprox
   - Then we have not responses from couchdb anymore
   
   **Example** (It's not always the same)
    - Last logs written (querying a view with /master/_partition/iam/_find ). After that CPU consumption of couchdb goes down.
   ```bash
   [debug] 2020-08-17T19:30:05.663402Z nonode@nohost <0.20917.0> a13fe062bc no record of user admin
   [debug] 2020-08-17T19:30:05.663457Z nonode@nohost <0.20917.0> a13fe062bc timeout 600
   [debug] 2020-08-17T19:30:05.663495Z nonode@nohost <0.20917.0> a13fe062bc Successful cookie auth as: "admin"
   [notice] 2020-08-17T19:30:05.665187Z nonode@nohost <0.20917.0> a13fe062bc 127.0.0.1:5984 172.28.0.1 admin POST /master/_partition/iam/_find 200 ok 2
   [debug] 2020-08-17T19:30:05.677477Z nonode@nohost <0.20917.0> be296403e4 no record of user admin
   [debug] 2020-08-17T19:30:05.677539Z nonode@nohost <0.20917.0> be296403e4 timeout 600
   [debug] 2020-08-17T19:30:05.677570Z nonode@nohost <0.20917.0> be296403e4 Successful cookie auth as: "admin"
   ```
   - Request captured with tcpdump
   ![image](https://user-images.githubusercontent.com/4292375/90440621-d4bd7d00-e0d7-11ea-9d9a-6710dc422e5f.png)
   
   - After some time (1min 9s), timeouts 
   ![image](https://user-images.githubusercontent.com/4292375/90440906-4bf31100-e0d8-11ea-8d54-40ebb6035337.png)
   
   
   
   **HW and environment**
   - Host machine runs Ubuntu 20.04, with 8cores and 16Gb RAM and 512GB SSD (250free)
   - Docker container has not any limitation in CPU/Memory/Space
   - Couchdb configured as single node (full configuration below)
   
   This is how the container processes looks. Earlang (beam.smp) is the most consuming, with peaks of 70% CPU
   ![image](https://user-images.githubusercontent.com/4292375/90441456-4649fb00-e0d9-11ea-9801-a8aae75c6cf4.png)
   
   Theads of beam.smp: Just when issue happens
   I monitored threads during the whole process. Just before the crash, it seems scheduler threads increase the activity
   ```bash
   top - 20:55:05 up  2:58,  0 users,  load average: 3.93, 2.31, 2.35
   Threads:  46 total,   0 running,  46 sleeping,   0 stopped,   0 zombie
   %Cpu(s): 46.2 us,  4.3 sy,  0.0 ni, 49.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
   MiB Mem :  15687.8 total,   1947.6 free,   7294.3 used,   6445.9 buff/cache
   MiB Swap:    980.0 total,    980.0 free,      0.0 used.   7200.1 avail Mem 
   
       PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
        54 couchdb   20   0 4568544  62092  10904 S  40.0   0.4   0:07.25 2_scheduler
        53 couchdb   20   0 4568544  62092  10904 S  33.3   0.4   0:20.74 1_scheduler
        55 couchdb   20   0 4568544  62092  10904 S  33.3   0.4   0:06.63 3_scheduler
        56 couchdb   20   0 4568544  62092  10904 S  20.0   0.4   0:05.66 4_scheduler
        37 couchdb   20   0 4568544  62092  10904 S   6.7   0.4   0:00.03 async_2
         6 couchdb   20   0 4568544  62092  10904 S   0.0   0.4   0:00.03 beam.smp
        34 couchdb   20   0 4568544  62092  10904 S   0.0   0.4   0:00.00 sys_sig_dispatc
        35 couchdb   20   0 4568544  62092  10904 S   0.0   0.4   0:00.00 sys_msg_dispatc
        36 couchdb   20   0 4568544  62092  10904 S   0.0   0.4   0:00.18 async_1
        38 couchdb   20   0 4568544  62092  10904 S   0.0   0.4   0:00.14 async_3
   .....
   
   ```
   
   **Any idea about what could be the cause? Any hint would be appreaciated.** 
   
   
   
   
   
   
   [NOTE]: # ( Describe the problem you're encountering. )
   [TIP]:  # ( Do NOT give us access or passwords to your actual CouchDB! )
   
   ## Steps to Reproduce
   There is not a fixed trigger for the issue. 
   
   ## Expected Behaviour
   We would expect couchdb can handle this load even with docker. It's not heavy, during tests we have 15operations per second max.
   
   [NOTE]: # ( Tell us what you expected to happen. )
   
   ## Your Environment
   
   [TIP]:  # ( Include as many relevant details about your environment as possible. )
   [TIP]:  # ( You can paste the output of curl http://YOUR-COUCHDB:5984/ here. )
   ```json
   {"couchdb":"Welcome","version":"3.1.0","git_sha":"ff0feea20","uuid":"b17edbd1de7d1504022d6f359ff9a4f8","features":["access-ready","partitioned","pluggable-storage-engines","reshard","scheduler"],"vendor":{"name":"The Apache Software Foundation"}}
   ```
   * CouchDB version used: 3.1.0
   * Browser name and version: Not relevant
   * Operating system and version: Couchdb docker@Ubuntu20.04
   
   ## Additional Context
   **Full couchdb configuration**
   ```bash
   Configuration Settings:
     [admins] admin="******"
     [attachments] compressible_types="text/*, application/javascript, application/json, application/xml"
     [attachments] compression_level="8"
     [chttpd] backlog="512"
     [chttpd] bind_address="any"
     [chttpd] max_db_number_for_dbs_info_req="100"
     [chttpd] port="5984"
     [chttpd] prefer_minimal="Cache-Control, Content-Length, Content-Range, Content-Type, ETag, Server, Transfer-Encoding, Vary"
     [chttpd] require_valid_user="false"
     [chttpd] server_options="[{backlog, 512}, {acceptor_pool_size, 64}, {max, 4096}]"
     [chttpd] socket_options="[{sndbuf, 262144}, {nodelay, true}]"
     [cluster] n="3"
     [cluster] q="2"
     [cors] credentials="false"
     [couch_httpd_auth] allow_persistent_cookies="true"
     [couch_httpd_auth] auth_cache_size="50"
     [couch_httpd_auth] authentication_db="_users"
     [couch_httpd_auth] authentication_redirect="/_utils/session.html"
     [couch_httpd_auth] iterations="10"
     [couch_httpd_auth] require_valid_user="false"
     [couch_httpd_auth] secret="00464db7ba8beb6a5915e4f5dbd03a49"
     [couch_httpd_auth] timeout="600"
     [couch_peruser] database_prefix="userdb-"
     [couch_peruser] delete_dbs="false"
     [couch_peruser] enable="false"
     [couchdb] attachment_stream_buffer_size="4096"
     [couchdb] changes_doc_ids_optimization_threshold="100"
     [couchdb] database_dir="./data"
     [couchdb] default_engine="couch"
     [couchdb] default_security="admin_only"
     [couchdb] file_compression="snappy"
     [couchdb] max_dbs_open="10000"
     [couchdb] max_document_size="8000000"
     [couchdb] os_process_timeout="20000"
     [couchdb] single_node="true"
     [couchdb] users_db_security_editable="false"
     [couchdb] uuid="b17edbd1de7d1504022d6f359ff9a4f8"
     [couchdb] view_index_dir="./data"
     [couchdb_engines] couch="couch_bt_engine"
     [csp] enable="true"
     [fabric] request_timeout="infinity"
     [feature_flags] partitioned||*="true"
     [httpd] allow_jsonp="false"
     [httpd] authentication_handlers="{couch_httpd_auth, cookie_authentication_handler}, {couch_httpd_auth, default_authentication_handler}"
     [httpd] bind_address="127.0.0.1"
     [httpd] enable_cors="false"
     [httpd] enable_xframe_options="false"
     [httpd] max_http_request_size="4294967296"
     [httpd] port="5986"
     [httpd] secure_rewrites="true"
     [httpd] socket_options="[{sndbuf, 262144}]"
     [indexers] couch_mrview="true"
     [ioq] concurrency="10"
     [ioq] ratio="0.01"
     [ioq.bypass] compaction="false"
     [ioq.bypass] os_process="true"
     [ioq.bypass] read="true"
     [ioq.bypass] shard_sync="false"
     [ioq.bypass] view_update="true"
     [ioq.bypass] write="true"
     [log] level="debug"
     [log] writer="stderr"
     [query_server_config] os_process_limit="2000"
     [query_server_config] os_process_soft_limit="1000"
     [query_server_config] reduce_limit="true"
     [replicator] connection_timeout="30000"
     [replicator] http_connections="20"
     [replicator] interval="60000"
     [replicator] max_churn="20"
     [replicator] max_jobs="500"
     [replicator] retries_per_request="5"
     [replicator] socket_options="[{keepalive, true}, {nodelay, false}]"
     [replicator] ssl_certificate_max_depth="3"
     [replicator] startup_jitter="5000"
     [replicator] verify_ssl_certificates="false"
     [replicator] worker_batch_size="500"
     [replicator] worker_processes="4"
     [ssl] port="6984"
     [uuids] algorithm="sequential"
     [uuids] max_count="1000"
     [vendor] name="The Apache Software Foundation"
   ```
   
   Some errors identified in the startup. They seem not relevant for the case.
   
   - `_users db` does not exists (but seems to be created afterwards)
   ```bash
   [error] 2020-08-17T19:18:47.731011Z nonode@nohost emulator -------- Error in process <0.372.0> with exit value:
   {database_does_not_exist,[{mem3_shards,load_shards_from_db,"_users",[{file,"src/mem3_shards.erl"},{line,399}]},{mem3_shards,load_shards_from_disk,1,[{file,"src/mem3_shards.erl"},{line,374}]},{mem3_shards,load_shards_from_disk,2,[{file,"src/mem3_shards.erl"},{line,403}]},{mem3_shards,for_docid,3,[{file,"src/mem3_shards.erl"},{line,96}]},{fabric_doc_open,go,3,[{file,"src/fabric_doc_open.erl"},{line,39}]},{chttpd_auth_cache,ensure_auth_ddoc_exists,2,[{file,"src/chttpd_auth_cache.erl"},{line,198}]},{chttpd_auth_cache,listen_for_changes,1,[{file,"src/chttpd_auth_cache.erl"},{line,145}]}]}
   ```
   - I suppose without any effect as long as it's configured as single node `[couchdb] single_node="true"`
   ```bash
   [error] 2020-08-17T19:18:47.799365Z nonode@nohost <0.457.0> -------- Request to create N=3 DB but only 1 node(s)
   [error] 2020-08-17T19:18:47.812920Z nonode@nohost <0.457.0> -------- Request to create N=3 DB but only 1 node(s)
   ```
   
   [TIP]:  # ( Add any other context about the problem here. )
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] janl commented on issue #3083: Couchdb stops activity and then after some time we have always timeouts

Posted by GitBox <gi...@apache.org>.
janl commented on issue #3083:
URL: https://github.com/apache/couchdb/issues/3083#issuecomment-798934096


   please reopen if there is something CouchDB can do to run better in t he docker env


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] willholley commented on issue #3083: Couchdb stops activity and then after some time we have always timeouts

Posted by GitBox <gi...@apache.org>.
willholley commented on issue #3083:
URL: https://github.com/apache/couchdb/issues/3083#issuecomment-675410714


   @raulmartinezr are you using Docker volumes to mount a CouchDB data volume or using a storage driver to write to the Docker filesystem directly? If the latter, which driver are you using?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] raulmartinezr commented on issue #3083: Couchdb stops activity and then after some time we have always timeouts

Posted by GitBox <gi...@apache.org>.
raulmartinezr commented on issue #3083:
URL: https://github.com/apache/couchdb/issues/3083#issuecomment-680719321


   Hello!
   
   Thats what I  have
   ```yaml
   Client: Docker Engine - Community
    Version:           19.03.12
    API version:       1.40
    Go version:        go1.13.10
    Git commit:        48a66213fe
    Built:             Mon Jun 22 15:45:44 2020
    OS/Arch:           linux/amd64
    Experimental:      false
   
   Server: Docker Engine - Community
    Engine:
     Version:          19.03.12
     API version:      1.40 (minimum version 1.12)
     Go version:       go1.13.10
     Git commit:       48a66213fe
     Built:            Mon Jun 22 15:44:15 2020
     OS/Arch:          linux/amd64
     Experimental:     true
    containerd:
     Version:          1.2.13
     GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
    runc:
     Version:          1.0.0-rc10
     GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
    docker-init:
     Version:          0.18.0
     GitCommit:        fec3683
   
   
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] raulmartinezr commented on issue #3083: Couchdb stops activity and then after some time we have always timeouts

Posted by GitBox <gi...@apache.org>.
raulmartinezr commented on issue #3083:
URL: https://github.com/apache/couchdb/issues/3083#issuecomment-675421571


   Hi @willholley !
   
   we are using tmfs
   ```yaml
       tmpfs:
         - /opt/couchdb/data
   ```
   As long as we do not need to persist data between runs, that option was  usefull. 
   
   BR,
   Raul


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] willholley commented on issue #3083: Couchdb stops activity and then after some time we have always timeouts

Posted by GitBox <gi...@apache.org>.
willholley commented on issue #3083:
URL: https://github.com/apache/couchdb/issues/3083#issuecomment-678982470


   Thanks @raulmartinezr  - that's certainly interesting. What version of Docker are you running (output of `docker version` would be helpful)?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] janl edited a comment on issue #3083: Couchdb stops activity and then after some time we have always timeouts

Posted by GitBox <gi...@apache.org>.
janl edited a comment on issue #3083:
URL: https://github.com/apache/couchdb/issues/3083#issuecomment-798934096


   please reopen if there is something CouchDB can do to run better in the docker env


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] raulmartinezr commented on issue #3083: Couchdb stops activity and then after some time we have always timeouts

Posted by GitBox <gi...@apache.org>.
raulmartinezr commented on issue #3083:
URL: https://github.com/apache/couchdb/issues/3083#issuecomment-675759698


   Hi @willholley ,
   
   I tried to reproduce with podman but it was not possible. With podman it's working fine. Seems to be something related to docker daemon.
   
   What do you think?
   
   Thanks,
   Raul


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] willholley commented on issue #3083: Couchdb stops activity and then after some time we have always timeouts

Posted by GitBox <gi...@apache.org>.
willholley commented on issue #3083:
URL: https://github.com/apache/couchdb/issues/3083#issuecomment-675445965


   Thanks @raulmartinezr. Just out of interest, are you able to reproduce the problem using `podman` instead of Docker? Afaik `podman` will use the same container runtime (`runc`) as Docker so it would help narrow down the problem to something Docker-specific vs a container/CouchDB issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] willholley commented on issue #3083: Couchdb stops activity and then after some time we have always timeouts

Posted by GitBox <gi...@apache.org>.
willholley commented on issue #3083:
URL: https://github.com/apache/couchdb/issues/3083#issuecomment-680756563


   @raulmartinezr that looks recent enough. I don't know Docker well enough to dig into this I'm afraid. I can only suggest looking for differences in the `runc` versions between Docker and podman and/or experimenting with different Docker storage drivers. It would be useful to the community to understand where the problem lies but I can't really investigate without a reproducer.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] janl closed issue #3083: Couchdb stops activity and then after some time we have always timeouts

Posted by GitBox <gi...@apache.org>.
janl closed issue #3083:
URL: https://github.com/apache/couchdb/issues/3083


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org