You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Darrel Schneider (Jira)" <ji...@apache.org> on 2020/07/10 22:38:00 UTC
[jira] [Resolved] (GEODE-8338) Redis commands may be repeated when server dies

     [ https://issues.apache.org/jira/browse/GEODE-8338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Darrel Schneider resolved GEODE-8338.
-------------------------------------
    Fix Version/s: 1.14.0
       Resolution: Fixed

> Redis commands may be repeated when server dies
> -----------------------------------------------
>
>                 Key: GEODE-8338
>                 URL: https://issues.apache.org/jira/browse/GEODE-8338
>             Project: Geode
>          Issue Type: Bug
>          Components: redis
>            Reporter: Sarah Abbey
>            Assignee: Darrel Schneider
>            Priority: Major
>             Fix For: 1.14.0
>
>
> Since we have one redundant copy of the data, and since we modify the data using a function, I think we may have a data corruption issue with non-idempotent operations. What can happen is that an operation like APPEND can:
>  0) executor called on non-primary redis server, 
>  1) modify the primary (by sending a function exec to it), 
>  2) modify the secondary (by sending a geode delta to it), 
>  3) the primary server fails now (before the function executing on it completes), 
>  4) the non-primary redis server sees the function fail and that it is marked as HA so it retries it. This time it sends it the secondary, which is the new primary, but the operation was actually done on the secondary so this retry will end up doing the operation twice.
> This may be okay for certain ops (like SADD) that are idempotent (but even they could cause extra key events in the future), but for ops like APPEND we end up appending twice.
> This will only happen when a server executing a function dies and our function service retries the function on another server because it is marked HA. The easy way to fix this is to change our function to not be HA. This is just a single one line change.
>  Note that our clients can already see exceptions/errors if the server they are connected to dies. When that happens the operation they requested may have happened, and if they have multiple geode redis servers running it may have been stored and still in memory. So clients will need some logic to decide if they should redo such an operation or not (because it is already done).
> *Note:* By making the function non-HA, it should just give the client another case in which they need to handle a server crash. It can now be for servers they were not connected to but that were involved in performing the operation they requested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)