You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by amit jaiswal <am...@yahoo.com> on 2010/10/22 10:12:53 UTC
Recovery from a complete node failure in BookKeeper
Hi,
There is a class BookKeeperTools that has methods for complete recovery of a
node. The recovery of dead bookie involves updating zk first with the
replacement bookie and then replicating the necessary ledger entries. So, if the
recovery process / target bookie dies before the actual entries could get
copied, then there can be data inconsistency issues.
Data copy can take time and thus increases the window during a which a node can
potentially fail. Is this an issue that needs to be addressed?
Also, this tool needs to be triggered manually for doing node recovery. Any
plans for automatic node recovery (similar to Hadoop HDFS) in which if a machine
goes down, then some background process replicates data to maintain the
replication factor (quorum).
-regards
Amit