You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by "nickva (via GitHub)" <gi...@apache.org> on 2023/05/27 08:05:00 UTC

[GitHub] [couchdb] nickva opened a new pull request, #4626: Fix purge infos replicating to the wrong shards during shard splitting.

nickva opened a new pull request, #4626:
URL: https://github.com/apache/couchdb/pull/4626

   Previously, internal replicator (mem3_rep) replicated purge infos to/from all the target shards. Instead, it should push/pull changes only to appropriate ranges if those purge infos belong there based on database's hash function.
   
   Users experienced this error as a failure in database which contains purges, which was split twice in a row. For example, if a Q=8 database is split to Q=16, then split again from Q=16 to Q=32, the second split operation might fail with a `split_state:initial_copy ...{{badkey,not_in_range}` error. The misplaced purge infos would be noticed only during the second split, when the initial copy phase would crash because some purge infos do not hash to neither one of the two target ranges. Moreover, the crash would lead to repeated retries, which generated a huge job history log.
   
   The fix consists of three improvements:
   
     1) Internal replicator is updated to filter purge infos based on the db hash.
   
     2) Account for the fact that some users' dbs might already contain misplaced
       purge infos. Since it's a known bug, we anticipate that error and ignore
       misplaced purge info during the second shard split operation with a warning
       emitted in the logs.
   
     3) Make similar range errors fatal, and emit a clear error in the logs and
        job history so any future range errors are immediately obvious.
   
   Fixes #4624
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] nickva commented on pull request #4626: Fix purge infos replicating to the wrong shards during shard splitting.

Posted by "nickva (via GitHub)" <gi...@apache.org>.
nickva commented on PR #4626:
URL: https://github.com/apache/couchdb/pull/4626#issuecomment-1568442663

   Thank you, @janl 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] janl commented on pull request #4626: Fix purge infos replicating to the wrong shards during shard splitting.

Posted by "janl (via GitHub)" <gi...@apache.org>.
janl commented on PR #4626:
URL: https://github.com/apache/couchdb/pull/4626#issuecomment-1568954666

   this works in practice, we got the log warnings about out of range entries but successful splitting going to 32 shards


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] nickva merged pull request #4626: Fix purge infos replicating to the wrong shards during shard splitting.

Posted by "nickva (via GitHub)" <gi...@apache.org>.
nickva merged PR #4626:
URL: https://github.com/apache/couchdb/pull/4626


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] janl commented on pull request #4626: Fix purge infos replicating to the wrong shards during shard splitting.

Posted by "janl (via GitHub)" <gi...@apache.org>.
janl commented on PR #4626:
URL: https://github.com/apache/couchdb/pull/4626#issuecomment-1568431673

   we have deployed this on the cluster where we found the issue and are attempting a new set of shard splitting. We’ll report back!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org