You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/02/03 23:24:52 UTC
[jira] [Commented] (COUCHDB-3291) Excessivly long document IDs prevent replicator from making progress

    [ https://issues.apache.org/jira/browse/COUCHDB-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15852314#comment-15852314 ] 

ASF GitHub Bot commented on COUCHDB-3291:
-----------------------------------------

GitHub user nickva opened a pull request:

    https://github.com/apache/couchdb-couch-replicator/pull/54

    Allow configuring maximum document ID length during replication

    Currently due to a bug in http parser and lack of document ID length
    enforcement, large document IDs will break replication jobs. Large IDs
    will pass through the _change feed, revs diffs,  but then fail
    during open_revs get request. open_revs request will keep retrying until
    it gives up after long enough time, then replication task crashes and
    restart again with the same pattern. The current effective limit is
    around 8k or so. (The buffer size default 8192 and if the first line
    of the request is larger than that, request will fail).
    
    (See http://erlang.org/pipermail/erlang-questions/2011-June/059567.html
    for more information about the possible failure mechanism).
    
    Bypassing the parser bug by increasing recbuf size, will alow replication
    to finish, however that means simply spreading the abnormal document through
    the rest of the system, and might not be desirable always.
    
    Also once long document IDs have been inserted in the source DB. Simply deleting
    them doesn't work as they'd still appear in the change feed. They'd have to
    be purged or somehow skipped during the replication step. This commit helps
    do the later.
    
    Operators can configure maximum length via this setting:
    ```
      replicator.max_document_id_length=0
    ```
    
    The default value is 0 which means there is no maximum enforced, which is
    backwards compatible behavior.
    
    During replication if maximum is hit by a document, that document is skipped,
    an error is written to the log:
    
    ```
    Replicator: document id `aaaaaaaaaaaaaaaaaaaaa...` from source db  `http://.../cdyno-0000001/` is too long, ignoring.
    ```
    
    and `"doc_write_failures"` statistic is bumped.
    
    COUCHDB-3291

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloudant/couchdb-couch-replicator couchdb-3291-limit-doc-id-size-in-replicator

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/couchdb-couch-replicator/pull/54.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #54
    
----
commit 3ff2d83893481afd68025a52a6d859a2efaf0bcf
Author: Nick Vatamaniuc <va...@apache.org>
Date:   2017-02-03T23:00:37Z

    Allow configuring maximum document ID length during replication
    
    Currently due to a bug in http parser and lack of document ID length
    enforcement, large document IDs will break replication jobs. Large IDs
    will pass through the _change feed, revs diffs,  but then fail
    during open_revs get request. open_revs request will keep retrying until
    it gives up after long enough time, then replication task crashes and
    restart again with the same pattern. The current effective limit is
    around 8k or so. (The buffer size default 8192 and if the first line
    of the request is larger than that, request will fail).
    
    (See http://erlang.org/pipermail/erlang-questions/2011-June/059567.html
    for more information about the possible failure mechanism).
    
    Bypassing the parser bug by increasing recbuf size, will alow replication
    to finish, however that means simply spreading the abnormal document through
    the rest of the system, and might not be desirable always.
    
    Also once long document IDs have been inserted in the source DB. Simply deleting
    them doesn't work as they'd still appear in the change feed. They'd have to
    be purged or somehow skipped during the replication step. This commit helps
    do the later.
    
    Operators can configure maximum length via this setting:
    ```
      replicator.max_document_id_length=0
    ```
    
    The default value is 0 which means there is no maximum enforced, which is
    backwards compatible behavior.
    
    During replication if maximum is hit by a document, that document is skipped,
    an error is written to the log:
    
    ```
    Replicator: document id `aaaaaaaaaaaaaaaaaaaaa...` from source db  `http://.../cdyno-0000001/` is too long, ignoring.
    ```
    
    and `"doc_write_failures"` statistic is bumped.
    
    COUCHDB-3291

----


> Excessivly long document IDs prevent replicator from making progress
> --------------------------------------------------------------------
>
>                 Key: COUCHDB-3291
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-3291
>             Project: CouchDB
>          Issue Type: Bug
>            Reporter: Nick Vatamaniuc
>
> Currently there is not protection in couchdb from creating IDs which are too long. So large IDs will hit various implicit limits which usually results in unpredictable failure modes.
> On such example implicit limit is hit in the replicator code. Replicate usually fetches document IDs in a bulk-like call either gets them via changes feed, computes revs_diffs in a post or inserts them with bulk_docs, except one case when it fetch open_revs. There it uses a single GET request. That requests fails because there is a bug / limitation in the http parser. The first GET line in the http request has to fit in the receive buffer for the receiving socket. 
> Increasing that buffer allow passing through larger http requests lines. In configuration options it can be manipulated as 
> {code}
>  chttpd.server_options="[...,{recbuf, 32768},...]"
> {code}
> Steve Vinoski mentions something about a possible bug in http packet parser code as well:
> http://erlang.org/pipermail/erlang-questions/2011-June/059567.html
> Tracing this a bit I see that a proper mochiweb request is never even created and instead request hangs. So that confirms it further. It seems in the code here:
> https://github.com/apache/couchdb-mochiweb/blob/bd6ae7cbb371666a1f68115056f7b30d13765782/src/mochiweb_http.erl#L90
> The timeout clause is hit. Adding a catchall exception I get the {tcp_error,#Port<0.40682>,emsgsize} message which we don't handle. Seems like a sane place to throw a 413 or such there.
> There are probably multiple ways to address the issue:
>  * Increase mochiweb listener buffer to fit larger doc ids. However that is a separate bug and using it to control document size during replication is not reliable. Moreover that would allow larger IDs to propagate through the system during replication, then would have to configure all future replication source with the same maximum recbuf value.
>  * Introduce a validation step in {code} couch_doc:validate_docid {code}. Currently that code doesn't read from config files and is in the hotpath. Added a config read in there might reduce performance.  If that is enabled it would stop creating new documents with large ids. But have to decide how to handle already existing IDs which are larger than the limit.
>  * Introduce a validation/bypass in the replicator. Specifically targeting replicator might help prevent propagation of large IDs during replication. There is a already a similar case of skipping writing large attachment or large documents (which exceed request size) and bumping {code} doc_write_failures {code}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)