You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by "Nick Vatamaniuc (JIRA)" <ji...@apache.org> on 2015/10/02 15:17:27 UTC

[jira] [Commented] (COUCHDB-2833) Replicator client doesn't handle un-expectedly closed pipelined connections well

    [ https://issues.apache.org/jira/browse/COUCHDB-2833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941117#comment-14941117 ] 

Nick Vatamaniuc commented on COUCHDB-2833:
------------------------------------------

TCP session trace from wireshark that illustrates the sequence of requests in the same pipelined http session. This was created by a stripped down version of the test with only 2 documents, and one attachment of the size 65537

{code}
GET /eunit-test-db-1443762758720227/ HTTP/1.1
User-Agent: CouchDB-Replicator/4ca9e41
Accept: application/json
Host: 127.0.0.1:58985
Content-Length: 0

HTTP/1.1 200 OK
Cache-Control: must-revalidate
Content-Length: 381
Content-Type: application/json
Date: Fri, 02 Oct 2015 05:13:08 GMT
Server: CouchDB/4ca9e41 (Erlang OTP/17)

{"db_name":"eunit-test-db-1443762758720227","doc_count":0,"doc_del_count":0,"update_seq":0,"purge_seq":0,"compact_running":false,"disk_size":4240,"other":{"data_size":0},"data_size":0,"sizes":{"file":4240,"active":0,"external":0},"instance_start_time":"1443762758732464","disk_format_version":6,"committed_update_seq":0,"compacted_seq":0,"uuid":"6543f5e9eb4589be70ac1068a8d0f577"}



GET /eunit-test-db-1443762758720227/ HTTP/1.1
User-Agent: CouchDB-Replicator/4ca9e41
Accept: application/json
Host: 127.0.0.1:58985
Content-Length: 0

HTTP/1.1 200 OK
Cache-Control: must-revalidate
Content-Length: 381
Content-Type: application/json
Date: Fri, 02 Oct 2015 05:13:08 GMT
Server: CouchDB/4ca9e41 (Erlang OTP/17)

{"db_name":"eunit-test-db-1443762758720227","doc_count":0,"doc_del_count":0,"update_seq":0,"purge_seq":0,"compact_running":false,"disk_size":4240,"other":{"data_size":0},"data_size":0,"sizes":{"file":4240,"active":0,"external":0},"instance_start_time":"1443762758732464","disk_format_version":6,"committed_update_seq":0,"compacted_seq":0,"uuid":"6543f5e9eb4589be70ac1068a8d0f577"}




GET /eunit-test-db-1443762758720227/_local/9232f5f1b3ccb9fe9d7b781873a147cf HTTP/1.1
User-Agent: CouchDB-Replicator/4ca9e41
Accept: application/json
Host: 127.0.0.1:58985
Content-Length: 0

HTTP/1.1 404 Object Not Found
Cache-Control: must-revalidate
Content-Length: 41
Content-Type: application/json
Date: Fri, 02 Oct 2015 05:13:08 GMT
Server: CouchDB/4ca9e41 (Erlang OTP/17)

{"error":"not_found","reason":"missing"}




GET /eunit-test-db-1443762758720227/_local/c1b4431ba39c56ce3667d7e4e24deb24 HTTP/1.1
User-Agent: CouchDB-Replicator/4ca9e41
Accept: application/json
Host: 127.0.0.1:58985
Content-Length: 0

HTTP/1.1 404 Object Not Found
Cache-Control: must-revalidate
Content-Length: 41
Content-Type: application/json
Date: Fri, 02 Oct 2015 05:13:08 GMT
Server: CouchDB/4ca9e41 (Erlang OTP/17)

{"error":"not_found","reason":"missing"}




GET /eunit-test-db-1443762758720227/_local/ce13e7d20a5667809215382ecddd4e6a HTTP/1.1
User-Agent: CouchDB-Replicator/4ca9e41
Accept: application/json
Host: 127.0.0.1:58985
Content-Length: 0

HTTP/1.1 404 Object Not Found
Cache-Control: must-revalidate
Content-Length: 41
Content-Type: application/json
Date: Fri, 02 Oct 2015 05:13:08 GMT
Server: CouchDB/4ca9e41 (Erlang OTP/17)

{"error":"not_found","reason":"missing"}




POST /eunit-test-db-1443762758720227/_revs_diff HTTP/1.1
User-Agent: CouchDB-Replicator/4ca9e41
Content-Type: application/json
Accept: application/json
Host: 127.0.0.1:58985
Content-Length: 93

{"doc1":["1-d557112ee4133d7d5142b54c1b5e902d"],"doc2":["1-cdbdf8a3c0e5f5a6f3b38fb16e7ce321"]}

HTTP/1.1 200 OK
Cache-Control: must-revalidate
Content-Length: 118
Content-Type: application/json
Date: Fri, 02 Oct 2015 05:13:08 GMT
Server: CouchDB/4ca9e41 (Erlang OTP/17)

{"doc1":{"missing":["1-d557112ee4133d7d5142b54c1b5e902d"]},"doc2":{"missing":["1-cdbdf8a3c0e5f5a6f3b38fb16e7ce321"]}}



PUT /eunit-test-db-1443762758720227/doc1?new_edits=false HTTP/1.1
User-Agent: CouchDB-Replicator/4ca9e41
Content-Type: multipart/related; boundary="895bdeab4d30deafd86390e1551e005b"
Content-Length: 131941
Accept: application/json
Host: 127.0.0.1:58985

--895bdeab4d30deafd86390e1551e005b
Content-Type: application/json

{"_id":"doc1","_rev":"1-d557112ee4133d7d5142b54c1b5e902d","_revisions":{"start":1,"ids":["d557112ee4133d7d5142b54c1b5e902d"]},"_attachments":{"att1":{"content_type":"text/plain","revpos":1,"digest":"md5-6mzSBe5/Psu+Gh6WMajMNQ==","length":65536,"follows":true,"encoding":"gzip","encoded_length":65574},"att2":{"content_type":"app/binary","revpos":1,"digest":"md5-yxFL0AWNm6nGy4zlmQx/0g==","length":65537,"follows":true}}}

--895bdeab4d30deafd86390e1551e005b
Content-Disposition: attachment; filename="att1"
Content-Type: text/plain
Content-Length: 65574
Content-Encoding: gzip

*BINARY DATA*
--895bdeab4d30deafd86390e1551e005b--


HTTP/1.1 201 Created
Cache-Control: must-revalidate
Connection: close
Content-Length: 67
Content-Type: application/json
Date: Fri, 02 Oct 2015 05:13:08 GMT
ETag: "1-d557112ee4133d7d5142b54c1b5e902d"
Location: http://127.0.0.1:58985/eunit-test-db-1443762758720227/doc1
Server: CouchDB/4ca9e41 (Erlang OTP/17)

{"ok":true,"id":"doc1","rev":"1-d557112ee4133d7d5142b54c1b5e902d"}

{code}

> Replicator client doesn't handle un-expectedly closed pipelined connections well
> --------------------------------------------------------------------------------
>
>                 Key: COUCHDB-2833
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-2833
>             Project: CouchDB
>          Issue Type: Bug
>      Security Level: public(Regular issues) 
>          Components: Database Core
>            Reporter: Nick Vatamaniuc
>
> This was found investigating the failure of replication tests. Specifically couch_replicator_large_atts_tests, the {local, remote} sub-case.
> The test sets up push replications from local to remote.
> Replication workers  have more than 1 document larger than MAX_BULK_ATT_SIZE=64K.  They start pushing them to the target, using a keep-alive connection (default  for HTTP 1.1), the first few pipelined requests will go through using the same connection, then server will accept the first PUT to …/docid?edits=false, then return Connection:close and close the connection after the 201 Created result.  Workers don't expect that, and try to do another PUT on same connection. And then crash on ibrowser's connection_closing error, which they don't handle. That causes the whole async replication process to exit.
> Potentially there are 2 issues.
> couch_replicator_http layer needs to handle this case better. On closing error, shut down the socket quickly and then retry. (Not shutting it down and retrying means retrying for at least 5 or so seconds until something cleans up that connection state).
> Adding this clause to couch_replicator_httpc.erl seems to do the trick:
> {code}
> process_response({error,connection_closing}, Worker, HttpDb, Params, _Cb)->
>     couch_log:notice("Connection closed by server. Closing the socket and trying again",[]),
>     ibrowse_http_client:stop(Worker),
>     throw({retry, HttpDb, Params});
> {code}
> Another issue is to make the server not close the connection after first PUT to .../db/docid/new_edits=false when using pipeline connections.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)