You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by "Rakesh R (JIRA)" <ji...@apache.org> on 2012/06/07 07:45:23 UTC
[jira] [Commented] (BOOKKEEPER-272) Introduce chain of bookies for distributing the re-replication

    [ https://issues.apache.org/jira/browse/BOOKKEEPER-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13290811#comment-13290811 ] 

Rakesh R commented on BOOKKEEPER-272:
-------------------------------------

I have uploaded latest patch with few test cases for knowing the algo.

??How the Bookie failure detection algo works??
This scheme is made based on distributed approach. Apart from the single bookie auditor for detecting the bookie failures all other logics are completely distributed. I'm thinking Auditor would make the detection more simple as we have existing 'available' znode in zookeeper for knowing the bookie failures.

*Following are the logical steps:*
# ??Generate BookieId:??
Each bookie will be creating a unique Id under 'bookieids' path, this is a persistent node in ZooKeeper.
Keep IP:PORT info as his data into respective 'bookieid' znode.
For ex: 0001 is the bookieId and 10.18.40.13:2181 is bokkie IP. 0001 znode contains 10.18.40.13:2181 as data.
# ??Build per bookie-ledger mappings:??
This will help to know the bookie's content very quickly and avoid parsing of all the ledgers for knowing the failed bookie's ledgers again and again. 
# ??How to build per bookie-ledger mappings:??
When creating/reforming the ensemble, the'ledgerid' is putting under the respective 'bookieid'. During ensemble formation, metadata will give us the bookies info. We will this and add the 'ledgerid' only to the respective 'bookieid'.
For ex: 
0001 is the bookieId, say it contains children znodes as ledgers like: 0001/L_001,L_005 etc. 
0002 is another, say it contains children znodes as ledgers like: 0002/L_005 etc.
# ??Elect an Auditor Bookie for the entire Bookie cluster:?? Makes it simple and minimize the duplication efforts.
Only one Auditor is monitoring the available bookies, his responsibility is to watch the bookies 'available' znode. When he detects any bookie failure through 'NodeChildrenChanged' watcher, will publish the same in 'failedbookies' path in Zookeeper. Publishing is done by creating the respective unique 'bookieid' persistent znode in the 'failedbookies' path. All the other bookies will be watching on the 'failedbookies' znode.
# ??Starts Re-replication:??
Upon receiving the 'failebookies' notification, all the bookies will compete eachother for acquiring the lock(using zk distributed lock-ephemeral znode). Whoever acquires will start re-replication, all others will look into this lock for knowing the re-replication status. When creating the lock, the replica bookie will add its IP:PORT info to it. After finishing the re-replication, he will delete the lock and so that others will takeup and continue this cycle till the end.
For ex: 0002/L_005/lock

How Re-replication works, I have commented in BookKeeper-237. Please go through the link.
https://issues.apache.org/jira/browse/BOOKKEEPER-237?focusedCommentId=13281470&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13281470

Responsible classes:
- ServerConfiguration : for zk path configurations
- LedgerCreateOp,LedgerHandle : forming bookie-ledger mappings
- Bookie : starting bookie chain
- BKLedgerMapper : for updating bookie-ledger mapping

- BookieIdGenerator : generating unique bookieId
- BookieChain : forming the detection chain
- AuditorBookieChain : by default auditor based, later it can have CircularBookieChain etc.
- AuditorElector : do auditor election
- Auditor : watch bookies
- BookieObserver : listening failedbookie notifications.

Tests:
- BookieLedgerMetadataTest
- AuditorBookieTest
- BookieDetectionTest
- BookieIdGenTest


-Rakesh

                
> Introduce chain of bookies for distributing the re-replication
> --------------------------------------------------------------
>
>                 Key: BOOKKEEPER-272
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-272
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-server
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>         Attachments: BOOKKEEPER-272.1.patch, BOOKKEEPER-272.2.patch, BOOKKEEPER-272.patch
>
>
> The idea is to form logical chain and each Bookie will be taking care each other. On any Bookie failure, his neighbour will act and initiate re-replication.
> For example:
> Assume we have 1,2,3,4,5,6 bookies and will be forming the following chain
> 01 <- 02 <- 03 <- 04 <- 05 <- 06 <- 01
> Here, each one should take care of my immediate predecessor node. The lowest node should always care the highest node and is forming the logical closed chain.
> Reference docs attached in BookKeeper-237.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira