You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/04/16 14:45:00 UTC

[jira] [Commented] (GEODE-9147) Dropped keys in single-hop PUTALL request when one or more servers is unreachable

    [ https://issues.apache.org/jira/browse/GEODE-9147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17323871#comment-17323871 ] 

ASF subversion and git services commented on GEODE-9147:
--------------------------------------------------------

Commit 1be2ee31f592ef99cf588753529ec62f5e7cddfd in geode-native's branch refs/heads/develop from Blake Bender
[ https://gitbox.apache.org/repos/asf?p=geode-native.git;h=1be2ee3 ]

GEODE-9147: Revert to multi-hop PUTALL in the face of missing metadata (#784)

- This matches the behavior of the Java client
- Our current code to "tack on" values that we don't have metadata
  for will sometimes result in EventIds reaching a server out-of-order,
  causing them to be dropped and resulting in data loss.  Resorting
  to multi-hop avoids this altogether.
- Add test to repro issue of missing keys on single-hop putAll


> Dropped keys in single-hop PUTALL request when one or more servers is unreachable
> ---------------------------------------------------------------------------------
>
>                 Key: GEODE-9147
>                 URL: https://issues.apache.org/jira/browse/GEODE-9147
>             Project: Geode
>          Issue Type: Bug
>          Components: native client
>            Reporter: Blake Bender
>            Priority: Major
>
> For single-hop PUTALL, the request from the app is broken up in Geode native as follows:
> i. Each value is hashed to a bucket, the server corresponding to the bucket is looked up in the metadata, and the value is added to a server-specific list for that server.
> ii. When all values are added to a list, Geode native spins up a thread for each list, and sends a PUTALL to each server.
>  
> When a server can't be reached by Geode native, its entries are removed from the metadata, and the bucket-to-server lookup fails.  This situation is handled as follows:
>  i. the size of the "leftover keys" list is divided by the number of servers, then 1 added to compensate for any fractional piece.
> ii. That many keys are added to each remaining list going to a server that is still reachable.
> iii. We proceed normally, and send one list to each server, on its own thread.
>  
> _Unfortunately_, this scenario can lead to data loss, because each of the fractional pieces of the list going to the unreachable server has an eventId with the same threadId and incrementing sequenceId.  Thus, if any of our PUTALL threads send out-of-order, the earlier sequenceIds will be marked as already "seen" on the server and _dropped_.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)