You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by nickva <gi...@git.apache.org> on 2017/02/03 23:24:18 UTC

[GitHub] couchdb-couch-replicator pull request #54: Allow configuring maximum documen...

GitHub user nickva opened a pull request:

    https://github.com/apache/couchdb-couch-replicator/pull/54

    Allow configuring maximum document ID length during replication

    Currently due to a bug in http parser and lack of document ID length
    enforcement, large document IDs will break replication jobs. Large IDs
    will pass through the _change feed, revs diffs,  but then fail
    during open_revs get request. open_revs request will keep retrying until
    it gives up after long enough time, then replication task crashes and
    restart again with the same pattern. The current effective limit is
    around 8k or so. (The buffer size default 8192 and if the first line
    of the request is larger than that, request will fail).
    
    (See http://erlang.org/pipermail/erlang-questions/2011-June/059567.html
    for more information about the possible failure mechanism).
    
    Bypassing the parser bug by increasing recbuf size, will alow replication
    to finish, however that means simply spreading the abnormal document through
    the rest of the system, and might not be desirable always.
    
    Also once long document IDs have been inserted in the source DB. Simply deleting
    them doesn't work as they'd still appear in the change feed. They'd have to
    be purged or somehow skipped during the replication step. This commit helps
    do the later.
    
    Operators can configure maximum length via this setting:
    ```
      replicator.max_document_id_length=0
    ```
    
    The default value is 0 which means there is no maximum enforced, which is
    backwards compatible behavior.
    
    During replication if maximum is hit by a document, that document is skipped,
    an error is written to the log:
    
    ```
    Replicator: document id `aaaaaaaaaaaaaaaaaaaaa...` from source db  `http://.../cdyno-0000001/` is too long, ignoring.
    ```
    
    and `"doc_write_failures"` statistic is bumped.
    
    COUCHDB-3291

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloudant/couchdb-couch-replicator couchdb-3291-limit-doc-id-size-in-replicator

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/couchdb-couch-replicator/pull/54.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #54
    
----
commit 3ff2d83893481afd68025a52a6d859a2efaf0bcf
Author: Nick Vatamaniuc <va...@apache.org>
Date:   2017-02-03T23:00:37Z

    Allow configuring maximum document ID length during replication
    
    Currently due to a bug in http parser and lack of document ID length
    enforcement, large document IDs will break replication jobs. Large IDs
    will pass through the _change feed, revs diffs,  but then fail
    during open_revs get request. open_revs request will keep retrying until
    it gives up after long enough time, then replication task crashes and
    restart again with the same pattern. The current effective limit is
    around 8k or so. (The buffer size default 8192 and if the first line
    of the request is larger than that, request will fail).
    
    (See http://erlang.org/pipermail/erlang-questions/2011-June/059567.html
    for more information about the possible failure mechanism).
    
    Bypassing the parser bug by increasing recbuf size, will alow replication
    to finish, however that means simply spreading the abnormal document through
    the rest of the system, and might not be desirable always.
    
    Also once long document IDs have been inserted in the source DB. Simply deleting
    them doesn't work as they'd still appear in the change feed. They'd have to
    be purged or somehow skipped during the replication step. This commit helps
    do the later.
    
    Operators can configure maximum length via this setting:
    ```
      replicator.max_document_id_length=0
    ```
    
    The default value is 0 which means there is no maximum enforced, which is
    backwards compatible behavior.
    
    During replication if maximum is hit by a document, that document is skipped,
    an error is written to the log:
    
    ```
    Replicator: document id `aaaaaaaaaaaaaaaaaaaaa...` from source db  `http://.../cdyno-0000001/` is too long, ignoring.
    ```
    
    and `"doc_write_failures"` statistic is bumped.
    
    COUCHDB-3291

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] couchdb-couch-replicator pull request #54: Allow configuring maximum documen...

Posted by jaydoane <gi...@git.apache.org>.
Github user jaydoane commented on a diff in the pull request:

    https://github.com/apache/couchdb-couch-replicator/pull/54#discussion_r99449067
  
    --- Diff: src/couch_replicator_changes_reader.erl ---
    @@ -89,9 +89,20 @@ process_change(#doc_info{id = <<>>} = DocInfo, {_, Db, _, _, _}) ->
             "source database `~s` (_changes sequence ~p)",
             [couch_replicator_api_wrap:db_uri(Db), DocInfo#doc_info.high_seq]);
     
    -process_change(#doc_info{} = DocInfo, {_, _, ChangesQueue, _, _}) ->
    -    ok = couch_work_queue:queue(ChangesQueue, DocInfo),
    -    put(last_seq, DocInfo#doc_info.high_seq);
    +process_change(#doc_info{id = Id} = DocInfo, {Parent, Db, ChangesQueue, _, _}) ->
    +    case is_doc_id_too_long(byte_size(Id)) of
    +        true ->
    +            ShortId = lists:sublist(binary_to_list(Id), 64),
    +            SourceDb = couch_replicator_api_wrap:db_uri(Db),
    +            couch_log:error("Replicator: document id `~s...` from source db "
    +                " `~s` is too long, ignoring.", [ShortId, SourceDb]),
    +            Stats = couch_replicator_stats:new([{doc_write_failures, 1}]),
    +            ok = gen_server:call(Parent, {add_stats, Stats}, infinity),
    +            ok;
    --- End diff --
    
    I believe this final ok is redundant


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] couchdb-couch-replicator issue #54: Allow configuring maximum document ID le...

Posted by iilyak <gi...@git.apache.org>.
Github user iilyak commented on the issue:

    https://github.com/apache/couchdb-couch-replicator/pull/54
  
    The master has been updated. Rebasing. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] couchdb-couch-replicator pull request #54: Allow configuring maximum documen...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/couchdb-couch-replicator/pull/54


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] couchdb-couch-replicator issue #54: Allow configuring maximum document ID le...

Posted by kxepal <gi...@git.apache.org>.
Github user kxepal commented on the issue:

    https://github.com/apache/couchdb-couch-replicator/pull/54
  
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] couchdb-couch-replicator pull request #54: Allow configuring maximum documen...

Posted by nickva <gi...@git.apache.org>.
Github user nickva commented on a diff in the pull request:

    https://github.com/apache/couchdb-couch-replicator/pull/54#discussion_r99454566
  
    --- Diff: src/couch_replicator_changes_reader.erl ---
    @@ -89,9 +89,20 @@ process_change(#doc_info{id = <<>>} = DocInfo, {_, Db, _, _, _}) ->
             "source database `~s` (_changes sequence ~p)",
             [couch_replicator_api_wrap:db_uri(Db), DocInfo#doc_info.high_seq]);
     
    -process_change(#doc_info{} = DocInfo, {_, _, ChangesQueue, _, _}) ->
    -    ok = couch_work_queue:queue(ChangesQueue, DocInfo),
    -    put(last_seq, DocInfo#doc_info.high_seq);
    +process_change(#doc_info{id = Id} = DocInfo, {Parent, Db, ChangesQueue, _, _}) ->
    +    case is_doc_id_too_long(byte_size(Id)) of
    +        true ->
    +            ShortId = lists:sublist(binary_to_list(Id), 64),
    +            SourceDb = couch_replicator_api_wrap:db_uri(Db),
    +            couch_log:error("Replicator: document id `~s...` from source db "
    +                " `~s` is too long, ignoring.", [ShortId, SourceDb]),
    +            Stats = couch_replicator_stats:new([{doc_write_failures, 1}]),
    +            ok = gen_server:call(Parent, {add_stats, Stats}, infinity),
    +            ok;
    --- End diff --
    
    You're right. That was silly of me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] couchdb-couch-replicator issue #54: Allow configuring maximum document ID le...

Posted by jaydoane <gi...@git.apache.org>.
Github user jaydoane commented on the issue:

    https://github.com/apache/couchdb-couch-replicator/pull/54
  
    LGTM, FWIW


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] couchdb-couch-replicator issue #54: Allow configuring maximum document ID le...

Posted by iilyak <gi...@git.apache.org>.
Github user iilyak commented on the issue:

    https://github.com/apache/couchdb-couch-replicator/pull/54
  
    Nick is away today. However he asked me to help merging it. I am going to merge it on his behalf.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] couchdb-couch-replicator issue #54: Allow configuring maximum document ID le...

Posted by nickva <gi...@git.apache.org>.
Github user nickva commented on the issue:

    https://github.com/apache/couchdb-couch-replicator/pull/54
  
    Thanks @iilyak !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---