You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by GitBox <gi...@apache.org> on 2022/02/23 12:00:28 UTC

[GitHub] [couchdb] jcoglan opened a new issue #3939: Race condition causing slow responses to `PUT /db/doc` with `multipart/related` attachments

jcoglan opened a new issue #3939:
URL: https://github.com/apache/couchdb/issues/3939


   We have observed a cluster exhibiting a race condition in how `PUT /db/doc` with
   `Content-Type: multipart/related` transfers attachment data to backend nodes. If
   two such requests happen concurrently, it is possible for the second request to
   hit a `receive` timeout that blocks the request from responding for 5 minutes.
   
   This issue was observed on replication requests with `new_edits=false` and part
   of the code path for this request differentiates between `replicated_changes`
   and `interactive_edit`. However we don't believe this problem is specific to
   replication requests.
   
   We do not believe any data is corrupted by this interaction, but this issue
   highlights some paths that are worth checking for data races.
   
   
   ## Relevant code
   
   The following represents the tree of function calls that are relevant to this
   issue; this trace references CouchDB v3.2.1. All-caps names indicate a process
   boundary; the `HTTPD`, `PARSER` and `FABRIC` processes execute on the
   coordinator node, while `WRITER` executes on the backend nodes holding the DB’s
   shard files.
   
   - (`HTTPD`)
     - [`chttpd_db:db_doc_req(#httpd{method='PUT', user_ctx=Ctx}=Req, Db,
       DocId)`][101]
       - [`couch_doc:doc_from_multi_part_stream`][102]
         - [`couch_httpd_multipart:decode_multipart_stream`][103]
           - `spawn_monitor` (`PARSER`)
             - [`couch_httpd:parse_multipart_request`][104]
             - [`couch_httpd_multipart:mp_parse_doc`][105]
             - [`couch_httpd_multipart:mp_parse_atts`][106]
             - [`couch_httpd_multipart:maybe_send_data`][107]
       - [`chttpd_db:send_updated_doc`][108]
         - [`chttpd_db:update_doc`][109]
           - `spawn_monitor` (`FABRIC`)
             - [`fabric:update_doc`][110]
               - [`fabric:update_docs`][111]
                 - [`fabric_doc_update:go`][112]
                   - [`fabric_doc_update:handle_message`][113]
                   - [`fabric_rpc:update_docs`][114] (`WRITER`; via `rexi:cast`)
                     - [`fabric_rpc:make_att_readers`][115]
                       - [`fabric_rpc:make_att_reader`][116]
                     - [`couch_db:update_docs(Db, Docs0, Options,
                       replicated_changes)`][117]
                       - [`couch_db:doc_flush_atts`][118]
                         - [`couch_att:flush`][119]
                           - [`couch_att:flush_data(Db, Fun, Att) when
                             is_function(Fun)`][120]
                             - [`couch_att:write_streamed_attachment`][121]
                               - [`couch_att:read_next_chunk`][122]
   
   [101]: https://github.com/apache/couchdb/blob/3.2.1/src/chttpd/src/chttpd_db.erl#L1055-L1069
   [102]: https://github.com/apache/couchdb/blob/3.2.1/src/couch/src/couch_doc.erl#L461-L488
   [103]: https://github.com/apache/couchdb/blob/3.2.1/src/couch/src/couch_httpd_multipart.erl#L26-L55
   [104]: https://github.com/apache/couchdb/blob/3.2.1/src/couch/src/couch_httpd.erl#L1058-L1069
   [105]: https://github.com/apache/couchdb/blob/3.2.1/src/couch/src/couch_httpd_multipart.erl#L58-L77
   [106]: https://github.com/apache/couchdb/blob/3.2.1/src/couch/src/couch_httpd_multipart.erl#L79-L122
   [107]: https://github.com/apache/couchdb/blob/3.2.1/src/couch/src/couch_httpd_multipart.erl#L129-L191
   [108]: https://github.com/apache/couchdb/blob/3.2.1/src/chttpd/src/chttpd_db.erl#L1315-L1331
   [109]: https://github.com/apache/couchdb/blob/3.2.1/src/chttpd/src/chttpd_db.erl#L1343-L1385
   [110]: https://github.com/apache/couchdb/blob/3.2.1/src/fabric/src/fabric.erl#L275-L291
   [111]: https://github.com/apache/couchdb/blob/3.2.1/src/fabric/src/fabric.erl#L296-L310
   [112]: https://github.com/apache/couchdb/blob/3.2.1/src/fabric/src/fabric_doc_update.erl#L21-L54
   [113]: https://github.com/apache/couchdb/blob/3.2.1/src/fabric/src/fabric_doc_update.erl#L56-L102
   [114]: https://github.com/apache/couchdb/blob/3.2.1/src/fabric/src/fabric_rpc.erl#L267-L280
   [115]: https://github.com/apache/couchdb/blob/3.2.1/src/fabric/src/fabric_rpc.erl#L570-L576
   [116]: https://github.com/apache/couchdb/blob/3.2.1/src/fabric/src/fabric_rpc.erl#L578-L596
   [117]: https://github.com/apache/couchdb/blob/3.2.1/src/couch/src/couch_db.erl#L1186-L1201
   [118]: https://github.com/apache/couchdb/blob/3.2.1/src/couch/src/couch_db.erl#L1388-L1389
   [119]: https://github.com/apache/couchdb/blob/3.2.1/src/couch/src/couch_att.erl#L532-L533
   [120]: https://github.com/apache/couchdb/blob/3.2.1/src/couch/src/couch_att.erl#L540-L573
   [121]: https://github.com/apache/couchdb/blob/3.2.1/src/couch/src/couch_att.erl#L604-L615
   [122]: https://github.com/apache/couchdb/blob/3.2.1/src/couch/src/couch_att.erl#L617-L620
   
   
   ## Cluster setup
   
   The cluster in question consists of 6 nodes spread over 2 data centres, with 3
   nodes in each location. The database shard targeted by the `PUT` request is
   replicated on 3 nodes, not all of which are in the same data centre.
   
   Ping latency between nodes in the same datacentre is under 0.2ms, while between
   nodes in different data centres it's around 8ms.
   
   
   ## Observations
   
   By instrumenting the above code paths with more logging, we have observed the
   cluster executing the following sequence of events. These events happen over two
   requests, both are `PUT /db/doc` for the same doc ID and revision and have
   attachments encoded as `multipart/related`.
   
   - Request 1 is received by `HTTPD` on the coordinator node, which we'll call
     node `A`. This hits the [`multipart/related`][201] clause; `W` is 2 and
     `num_mp_writers` is set to 3. The shard targeted by the write is stored on 3
     other nodes which we'll call `B`, `C` and `D`.
   
   - This calls `couch_doc:doc_from_multi_part_stream`, which calls
     `couch_httpd_multipart:decode_multipart_stream`, which [starts the `PARSER`
     process][202]. This process uses a closure as a state machine to invoke
     `mp_parse_doc`, `mp_parse_doc` and `maybe_send_data`. `HTTPD` [sends
     `get_doc_bytes`][203] to `PARSER`, which [replies with `doc_bytes`][204]. This
     causes the attachments to be [converted into `{follows, Parser, Ref}`
     tuples][205].
   
   - Meanwhile, `HTTPD` [starts the `FABRIC` process][206], which calls through to
     `fabric_doc_update:go`, which uses `rexi:cast` to forward the incoming doc to
     [`fabric_rpc:update_docs`][207] in the `WRITER` process. This [converts
     `{follows, Parser, Ref}` into a `fun/0`][208], which is ultimately used in
     [`flush_data`][209] via [`write_streamed_attachment`][210] and
     [`read_next_chunk`][211].
   
   - This `fun/0` [sends `get_bytes` to `PARSER`][212], which can handle it in
     several places: [1][213], [2][214], [3][215]. (There is a lot of code
     duplication between `mp_parse_atts` and `maybe_send_data` that includes
     prioritising `get_bytes` over other messages.) `PARSER` [replies with
     `bytes`][216]. The code around this reply manages a buffer of attachment data;
     once a chunk has been read by all the `WRITER` procs, it is discarded. The
     parser expects 3 `WRITER` processes to request the data because
     `num_mp_writers` is set to 3.
   
   - When `WRITER` on nodes `B`, `C` or `D` receives `bytes` it [sends
     `attachment_chunk_received` to `FABRIC`][217] on node `A`, which [handles
     it][218] in `fabric_doc_update`.
   
   - We observe one of the nodes, say node `B`, doing several rounds of requesting
     data from `PARSER` and acking it with `attachment_chunk_received` before the
     other two nodes `C` and `D` request anything. `PARSER` behaves correctly,
     tracking which writers are waiting for data and how far through the stream
     each writer has got; it does not prematurely drop chunks as we [previously
     observed in the "middleman" attachment handler][229]. It appears `B` is
     consuming the attachment much faster than `C` and `D`.
   
   - While `C` and `D` are still fetching the attachment, we observe an [`ok`
     message][219] from node `B` as it's finished reading the attachment and
     completed the write. So we have had one ack but we need one more to achieve
     quorum.
   
   - At this point, an identical request, Request 2 is received by another `HTTPD`
     process, also running on node `A`. Everything proceeds as above, spawning
     `PARSER` and `FABRIC` processes. The `PARSER` again sees `num_mp_writers` is 3
     and so expects 3 processes to fetch the data before it drops the buffered
     chunks.
   
   - In request 2, the first message that `FABRIC` receives is an `ok` from node
     `B`. We observe `PARSER` receiving `get_bytes` from nodes `C` and `D`, and
     those nodes send `attachment_chunk_received` to `FABRIC`. However, node `B`
     never requests anything from `PARSER`, and no other signal is received by
     `PARSER` that would tell it that one of the writers already has the attachment
     data and doesn't need to fetch it. No writer process is started on `B` so
     there's not even a `DOWN` signal that `PARSER` could detect.
   
   - Because of this, `PARSER` ends up waiting for a `get_bytes` message that will
     never arrive, and ends up hitting an `after 300000` timeout in its `receive`
     blocks in [`mp_parse_atts`][220] or [`maybe_send_data`][221].
   
   - Up in `chttpd_db`, [`send_updated_doc` returns][222] quickly; request 2 still
     achieves quorum from the rapid `ok` from node `B` and an evantual ack from one
     of the other nodes. However we then [call `WaitFun()`][223], which [waits for
     `PARSER` to exit][224]. This blocks request 2 from returning for 5 minutes,
     even though the write has completed correctly.
   
   
   [201]: https://github.com/apache/couchdb/blob/3.2.1/src/chttpd/src/chttpd_db.erl#L1055
   [202]: https://github.com/apache/couchdb/blob/3.2.1/src/couch/src/couch_httpd_multipart.erl#L29-L37
   [203]: https://github.com/apache/couchdb/blob/3.2.1/src/couch/src/couch_httpd_multipart.erl#L38
   [204]: https://github.com/apache/couchdb/blob/3.2.1/src/couch/src/couch_httpd_multipart.erl#L73
   [205]: https://github.com/apache/couchdb/blob/3.2.1/src/couch/src/couch_doc.erl#L481
   [206]: https://github.com/apache/couchdb/blob/3.2.1/src/chttpd/src/chttpd_db.erl#L1344-L1356
   [207]: https://github.com/apache/couchdb/blob/3.2.1/src/fabric/src/fabric_rpc.erl#L267-L280
   [208]: https://github.com/apache/couchdb/blob/3.2.1/src/fabric/src/fabric_rpc.erl#L578-L596
   [209]: https://github.com/apache/couchdb/blob/3.2.1/src/couch/src/couch_att.erl#L540-L573
   [210]: https://github.com/apache/couchdb/blob/3.2.1/src/couch/src/couch_att.erl#L604-L615
   [211]: https://github.com/apache/couchdb/blob/3.2.1/src/couch/src/couch_att.erl#L617-L620
   [212]: https://github.com/apache/couchdb/blob/3.2.1/src/fabric/src/fabric_rpc.erl#L588
   [213]: https://github.com/apache/couchdb/blob/3.2.1/src/couch/src/couch_httpd_multipart.erl#L101
   [214]: https://github.com/apache/couchdb/blob/3.2.1/src/couch/src/couch_httpd_multipart.erl#L130
   [215]: https://github.com/apache/couchdb/blob/3.2.1/src/couch/src/couch_httpd_multipart.erl#L184
   [216]: https://github.com/apache/couchdb/blob/3.2.1/src/couch/src/couch_httpd_multipart.erl#L139
   [217]: https://github.com/apache/couchdb/blob/3.2.1/src/fabric/src/fabric_rpc.erl#L590-L592
   [218]: https://github.com/apache/couchdb/blob/3.2.1/src/fabric/src/fabric_doc_update.erl#L70
   [219]: https://github.com/apache/couchdb/blob/3.2.1/src/fabric/src/fabric_doc_update.erl#L72
   [220]: https://github.com/apache/couchdb/blob/3.2.1/src/couch/src/couch_httpd_multipart.erl#L119
   [221]: https://github.com/apache/couchdb/blob/3.2.1/src/couch/src/couch_httpd_multipart.erl#L187
   [222]: https://github.com/apache/couchdb/blob/3.2.1/src/chttpd/src/chttpd_db.erl#L1062
   [223]: https://github.com/apache/couchdb/blob/3.2.1/src/chttpd/src/chttpd_db.erl#L1063
   [224]: https://github.com/apache/couchdb/blob/3.2.1/src/couch/src/couch_doc.erl#L483-L485
   [225]: https://github.com/apache/couchdb/blob/3.2.1/src/fabric/src/fabric_rpc.erl#L578
   [226]: https://github.com/apache/couchdb/blob/3.2.1/src/fabric/src/fabric_rpc.erl#L597
   [227]: https://github.com/apache/couchdb/blob/3.2.1/src/couch/src/couch_att.erl#L568-L572
   [228]: https://github.com/apache/couchdb/blob/3.2.1/src/couch/src/couch_att.erl#L544-L567
   [229]: https://github.com/apache/couchdb/pull/3715
   
   
   ## Comments
   
   This situation is made more likely by the layout of the cluster; node `B` is in
   the same data centre as node `A` and so has much lower latency than nodes `C`
   and `D`. So it's likely to complete the write much quicker than the other nodes,
   leaving a window where request 1 has one ack and doesn't have quorum yet when
   request 2 starts.
   
   Request 2 gets into this state because request 1 had partially succeeded; one of
   the nodes completes the write but we don't have quorum yet. It is worth finding
   out what would happen in request 2 if request 1 had had _two_ nodes complete the
   write when request 2 starts; even though this is enough for quorum, `WaitFun()`
   causes the request not to return until all 3 nodes have read the attachment or
   timed out. But, two acks is enough for quorum so it might put the doc in a
   different state such that request 2 doesn't even try to write anything or
   forward the attachment data.
   
   For replication requests, this would not result in data loss, it's more of an
   efficiency concern. Replication requests are allowed to add revs anywhere in the
   tree, so the second request should just be a no-op. For interactive edits, the
   first request will change what the leaves of the rev tree are and might change
   the winning rev, so could cause request 2 to be rejected -- again we should
   examine what effect partial success of request 1 has here.
   
   Finally we note that we currently have at least two methods for proxying
   attachment data to backend nodes: the [`follows` format][225] from
   `multipart/related` request, and the [`fabric_attachment_receiver` format][226]
   with the "middleman" process. There are also two different ways of invoking the
   intermediate funs depending on if the attachment length is [known][227] or
   [unknown][228]. There's a lot of duplicate code for dealing with buffering the
   attachment data to multiple backends and we recently discovered and [fixed a bug
   in the "middleman" implementation][229], also discovered on this cluster. It may
   be worth designing a single implementation for this buffer system.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] kocolosk commented on issue #3939: Race condition causing slow responses to `PUT /db/doc` with `multipart/related` attachments

Posted by GitBox <gi...@apache.org>.
kocolosk commented on issue #3939:
URL: https://github.com/apache/couchdb/issues/3939#issuecomment-1049102965


   I tried adding an integration test for this in e85e34359 but it doesn't do quite what you'd expect. I tried writing a doc with attachments and then saving that same revision again using `new_edits=false`. The client does receive a rapid response to **both** requests, but (I think) the attachment bytes are never consumed off of the wire and so the _next_ request on the connection from a different part of the test suite hangs. The second `new_edits=false` multipart request never shows up in the logs because the test suite times out first and destroys the DB.
   
   Perhaps this is also what you were observing in production?
   
   At any rate, I added the test in part to demonstrate that the data race with the quorum logic might be a red herring. _Anytime_ a `multipart/related` request with `new_edits=false` is made with a revision that already exists in the database the server does not properly clean up the connection.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva edited a comment on issue #3939: Race condition causing slow responses to `PUT /db/doc` with `multipart/related` attachments

Posted by GitBox <gi...@apache.org>.
nickva edited a comment on issue #3939:
URL: https://github.com/apache/couchdb/issues/3939#issuecomment-1049135699


   That's a good test. If we run with `n=1` only mode for integration tests, there could be a separate issue revealed where we drop the connection and never close it properly when attachments are ignored. I guess with `n>1` it would happen as well, if all the nodes have the revision already because we would never pull the data bytes off the socket. We had tried before to force-close such connections to prevent them from being recycled [1]. Whether the client notices the close quickly enough is another issue...
   
   In the case where n>1 and some nodes ready the attachment, for `replicated_changes` we perhaps would want to inspect the doc when we ignore it, and if it has any attachment parser processes, explicitly notify them in the case the update is dropped:
   https://github.com/apache/couchdb/blob/e4b8a4624fc7ae09b4649aac9a8d68226208cd8b/src/couch/src/couch_db.erl#L1240-L1241
   
   [1] https://github.com/apache/couchdb/blob/2c23d8e0fc8e44fa3f91228aa93002a9977e7221/src/couch/src/couch_httpd.erl#L988-L999 (I am afraid this never actually made it to the chttpd side!)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] kocolosk commented on issue #3939: Race condition causing slow responses to `PUT /db/doc` with `multipart/related` attachments

Posted by GitBox <gi...@apache.org>.
kocolosk commented on issue #3939:
URL: https://github.com/apache/couchdb/issues/3939#issuecomment-1048773500


   Ah, thanks! Then the `ok` response from node B makes more sense, but I can see where we'd still run into the same issue of the `PARSER` waiting around for a receiver that was never spawned in the interactive case as well.
   
   I guess the expedient fix here might for the `FABRIC` process to kill off the `PARSER` when it knows that a response has been received from every worker?
   
   I agree that this attachment buffering system is overly complex and poorly designed (speaking as one of the designers). It'd be a pretty interesting project to design a greenfield replacement application and then figure out how to integrate that into CouchDB.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] kocolosk commented on issue #3939: Race condition causing slow responses to `PUT /db/doc` with `multipart/related` attachments

Posted by GitBox <gi...@apache.org>.
kocolosk commented on issue #3939:
URL: https://github.com/apache/couchdb/issues/3939#issuecomment-1049016840


   It turns out that the backend does reject the write before fetching the attachment bodies, but `fabric:update_doc` will throw the edit conflict, which means the handler in `chttpd_db` skips the `WaitFun()` and instead calls `abort_multipart_stream`, which causes the parser to consume and discard all the remaining attachment data on the wire.
   
   So, that covers the interactive case. I still suspect that the `new_edits=false` case does not require this concurrent data race in order to trigger the timeout. It would be enough to submit the same request twice in a row; as far as I can tell at the moment the second request should hit the `WaitFun()` timeout.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] janl commented on issue #3939: Race condition causing slow responses to `PUT /db/doc` with `multipart/related` attachments

Posted by GitBox <gi...@apache.org>.
janl commented on issue #3939:
URL: https://github.com/apache/couchdb/issues/3939#issuecomment-1048750693


   > are you referring to the base revision against which the update is being applied? If so, isn't this bit surprising?
   
   this is a function of the client app, where these two requests can happen concurrently from two different clients hitting the same exact coordinating node. A round-robin request assignment would make this occurring _a lot_ less likely.
   
   > Or were you running with new_edits=false here?
   
   Second para: 
   
   > This issue was observed on replication requests with new_edits=false and part
   of the code path for this request differentiates between replicated_changes
   and interactive_edit. However we don't believe this problem is specific to
   replication requests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] kocolosk commented on issue #3939: Race condition causing slow responses to `PUT /db/doc` with `multipart/related` attachments

Posted by GitBox <gi...@apache.org>.
kocolosk commented on issue #3939:
URL: https://github.com/apache/couchdb/issues/3939#issuecomment-1048944382


   Yes, that's correct @jcoglan. The question I've been mulling over for the past few minutes is whether an interactive edit conflict is detected by `PARSER`, or whether we have the same issue there because the attachment writers are never spawned. Should be pretty straightforward to test; in the worst case _any_ `multipart/related` request that results in an edit conflict would hit the timeout, no data race required. But I was hoping to track down that logic in the codebase itself, and as you're well aware that's a challenging exercise for this bit of functionality.
   
   I'm OK with the suggestion for the `FABRIC` coordinator process to inform the `PARSER` that a writer won't be requesting data. I'm still curious to understand what happens in the case of an interactive edit conflict.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] kocolosk commented on issue #3939: Race condition causing slow responses to `PUT /db/doc` with `multipart/related` attachments

Posted by GitBox <gi...@apache.org>.
kocolosk commented on issue #3939:
URL: https://github.com/apache/couchdb/issues/3939#issuecomment-1049172876


   > I guess with n>1 it would happen as well, if all the nodes have the revision already because we would never pull the data bytes off the socket.
   
   Yes, I think so, although I'm now realizing that "pulling bytes off the socket" is not necessarily the issue in either the n=1 or n>1 case (with or without concurrent data races). The parser might well consume all the request data (and for small requests it probably does, even if no worker ever requests it), but the `HTTPD` process will still be stuck in `WaitFun()` _after_ sending the response to the client. The connection stays open, and the client sends a new request on that connection, but `HTTPD` won't act on it until that 5 minute timeout fires.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] jcoglan commented on issue #3939: Race condition causing slow responses to `PUT /db/doc` with `multipart/related` attachments

Posted by GitBox <gi...@apache.org>.
jcoglan commented on issue #3939:
URL: https://github.com/apache/couchdb/issues/3939#issuecomment-1048969592


   This probably depends on at what point the backend decides to reject the write, and whether it does this before or after fetching the attachment bodies. If it does it _before_, it needs to signal to the parser that it's not going to send `get_bytes`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on issue #3939: Race condition causing slow responses to `PUT /db/doc` with `multipart/related` attachments

Posted by GitBox <gi...@apache.org>.
nickva commented on issue #3939:
URL: https://github.com/apache/couchdb/issues/3939#issuecomment-1049135699


   That's a good test. If we run with `n=1` only mode for integration tests, there could be a separate issue revealed where we drop the connection and never close it properly when attachments are ignored. I guess with `n>1` it would happen as well, if all the nodes have the revision already because we would never pull the data bytes off the socket. We had tried before to force-close such connections to prevent them from being recycled [1]. Whether the client notices the close quickly enough is another issue...
   
   In the case where n>1 and some nodes, do need the attachment and would read the data, perhaps for `replicated_changes` we'd want to inspect the doc for any attachment, and if they are any parser processes, explicitly notify them in the case the update is dropped:
   https://github.com/apache/couchdb/blob/e4b8a4624fc7ae09b4649aac9a8d68226208cd8b/src/couch/src/couch_db.erl#L1240-L1241
   
   [1] https://github.com/apache/couchdb/blob/2c23d8e0fc8e44fa3f91228aa93002a9977e7221/src/couch/src/couch_httpd.erl#L988-L999 (I am afraid this never actually made it to the chttpd side!)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] jcoglan edited a comment on issue #3939: Race condition causing slow responses to `PUT /db/doc` with `multipart/related` attachments

Posted by GitBox <gi...@apache.org>.
jcoglan edited a comment on issue #3939:
URL: https://github.com/apache/couchdb/issues/3939#issuecomment-1048934037


   @kocolosk This has prompted me to rethink how this would work for normal (`new_edits != false`) writes. For replication writes, both requests can succeed b/c replication writes can add revs anywhere in the tree; if the revs already exist, no problem.
   
   For normal writes, one of these concurrent requests _must_ fail, or you have a causality violation. Let's refer to the writer procs on `B`, `C` and `D` for requests 1 and 2 as `B1`, `C1`, `D1` and `B2`, `C2`, `D2`. In our example above, `B1` succeeds first, before request 2 begins. This means `B2` _must_ fail, because the `rev` param in these identical requests is no longer a leaf in node B's copy of the DB.
   
   The rest of the processes `C1`, `D1`, `C2`, `D2` are still ongoing and there's nothing forcing them to finish in any particular order. It should be the case that if `C1` succeeds, then `C2` fails and vice versa, same for the `D` writers. This in turn should mean that only one request achieves quorum, but it _could_ be request 2, even if this is unlikely in general.
   
   So this still should not result in a consistency violation, but it's worth seeing if there's any way this allows both requests to succeed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] jcoglan commented on issue #3939: Race condition causing slow responses to `PUT /db/doc` with `multipart/related` attachments

Posted by GitBox <gi...@apache.org>.
jcoglan commented on issue #3939:
URL: https://github.com/apache/couchdb/issues/3939#issuecomment-1048934037


   @kocolosk This has prompted me to rethink how this would work for normal (`new_edits != false`) writes. For replication writes, both requests can succeed b/c replication writes can add revs anywhere in the tree; if the revs already exist, no problem.
   
   For normal writes, one of these concurrent requests _must_ fail, or you have a causality violation. Let's refer to the writer procs on `B`, `C` and `D` for requests 1 and 2 as `B1`, `C1`, `D1` and `B2`, `C2`, `D2`. In our example above, `B1` succeeds first, before request 2 begins. This means `B2` _must_ fail, because the `rev` for request 1 is no longer a leaf in node B's copy of the DB.
   
   The rest of the processes `C1`, `D1`, `C2`, `D2` are still ongoing and there's nothing forcing them to finish in any particular order. It should be the case that if `C1` succeeds, then `C2` fails and vice versa, same for the `D` writers. This in turn should mean that only one request achieves quorum, but it _could_ be request 2, even if this is unlikely in general.
   
   So this still should not result in a consistency violation, but it's worth seeing if there's any way this allows both requests to succeed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] kocolosk commented on issue #3939: Race condition causing slow responses to `PUT /db/doc` with `multipart/related` attachments

Posted by GitBox <gi...@apache.org>.
kocolosk commented on issue #3939:
URL: https://github.com/apache/couchdb/issues/3939#issuecomment-1048747905


   That's quite the report, thanks! When you say
   
   > These events happen over two requests, both are PUT /db/doc for the same doc ID and revision
   
   are you referring to the base revision against which the update is being applied? If so, isn't this bit surprising?
   
   > In request 2, the first message that FABRIC receives is an ok from node B.
   
   Without pulling up all the code modules I would have guessed that node B would reject request 2 as an edit conflict. Or were you running with `new_edits=false` here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] jcoglan commented on issue #3939: Race condition causing slow responses to `PUT /db/doc` with `multipart/related` attachments

Posted by GitBox <gi...@apache.org>.
jcoglan commented on issue #3939:
URL: https://github.com/apache/couchdb/issues/3939#issuecomment-1048761683


   > are you referring to the base revision against which the update is being applied?
   
   No, this is a replication request, so I mean that both are for the same document ID, and contain the same `_revisions` value. i.e. both requests are uploading the same version of the document.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] janl commented on issue #3939: Race condition causing slow responses to `PUT /db/doc` with `multipart/related` attachments

Posted by GitBox <gi...@apache.org>.
janl commented on issue #3939:
URL: https://github.com/apache/couchdb/issues/3939#issuecomment-1048792212


   @kocolosk I wouldn’t want to introduce a layering violation here, but could another option be for the receiving node to send a `no_bytes_will_ever_be_requested` message to PARSER so it can decrement the workers it waits for?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] kocolosk commented on issue #3939: Race condition causing slow responses to `PUT /db/doc` with `multipart/related` attachments

Posted by GitBox <gi...@apache.org>.
kocolosk commented on issue #3939:
URL: https://github.com/apache/couchdb/issues/3939#issuecomment-1048949348


   Oh, and as to the consistency violation -- I strongly suspect we're immune, because the ultimate source of truth for whether the update is committed is in the normal `couch_db_updater` logic, where we've run all manner of tests stressing the correctness of the system under concurrent load. We could have plenty of cases where `C1` and `C2` are both writing attachment data to disk concurrently, but in the end the revision tree for a particular shard copy will only commit the update from one of them in interactive mode.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] jcoglan commented on issue #3939: Race condition causing slow responses to `PUT /db/doc` with `multipart/related` attachments

Posted by GitBox <gi...@apache.org>.
jcoglan commented on issue #3939:
URL: https://github.com/apache/couchdb/issues/3939#issuecomment-1048967722


   > The question I've been mulling over for the past few minutes is whether an interactive edit conflict is detected by PARSER, or whether we have the same issue there because the attachment writers are never spawned
   
   The thing to check for here would be that if request 2 gets one failing response from a backend, b/c that backend already accepted request 1, but it goes on to receive `ok` from the other two backends, will it respond quickly? Or will it hit the same timeout because the backend that rejected the write is not requesting the attachment data?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on issue #3939: Race condition causing slow responses to `PUT /db/doc` with `multipart/related` attachments

Posted by GitBox <gi...@apache.org>.
nickva commented on issue #3939:
URL: https://github.com/apache/couchdb/issues/3939#issuecomment-1048870372


   > send a `no_bytes_will_ever_be_requested` message to PARSER so it can decrement the workers it waits for?
   
   @janl Yeah, I wonder if that would work. And we'd explicitly check for `new_edits=false` case only.
   
   @kocolosk what do you think, would that work? In the interactive case I'd think we'd get back a conflict, or would we have the exact same problem.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org