You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by "Sijie Guo (JIRA)" <ji...@apache.org> on 2012/07/15 10:11:34 UTC

[jira] [Comment Edited] (BOOKKEEPER-300) Create Bookie format command

    [ https://issues.apache.org/jira/browse/BOOKKEEPER-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13414586#comment-13414586 ] 

Sijie Guo edited comment on BOOKKEEPER-300 at 7/15/12 8:10 AM:
---------------------------------------------------------------

{quote}
I think format should just take care of metadata. Storage directories anyway will be cleaned if there is no metadata presents related to that data.
{quote}
{quote}
If we clean only zookeeper nodes, then bookie itself will not start. In each and every storage directory, cookie file will be present. If that information present in storage dir and not in storage directory, then bookie will not start.
{quote}

I think the bookies would still start because there is no cookie existed in ZooKeeper. bookies would treat it as a new environment and create its cookie node.

but there is an inconsistent issue if we just clean metadata. there is a time period before all old data was cleaned by GC thread. so those data might be introduced in the system again due some corner cases.

so if we want to introduce format here for BookKeeper, we need to introduce a mechanism to distinguish  bookies between current format and previous formats first. the initial idea is to generate an INSTANCE-ID (could be timestamp, uuid) for bookkeeper during formatting. all bookies join this instance of bookkeeper would generate cookie based on INSTANCE-ID and recorded them in local directories. if a bookie server join bookkeeper using different INSTANCE-ID, it should fail to start.

after introducing such mechanism, we could avoid bookie server starts with old data between different bookkeeper format.

so the format progress would contains two parts, one is metadata format, the other one is bookie format.

metadata format: could be run in either bookie server and run only once. it takes the responsibility on cleaning metadata. after cleaning old metadata, it would generate a new INSTANCE-ID for new bookkeeper instance.

bookie format: be run each bookie server. it takes the responsibility on cleaning old data.

if a bookie server doesn't run 'bookie-format' after the metadata is formatted. it should fail to start due to providing different INSTANCE-ID. so admin guys could run 'bookie -format' to clean local bookie data.

{quote}
leaving other nodes ( cookies, LAYOOUT ) as it is
{quote}

as Uma mentioned, 'format' means there is no old info introduced to new environment. so cookies and LAYOUT should be also removed.

before removing LAYOUT, you should not remove ledgers znode directly. LAYOUT introduced is for different ledger metadata management. in BOOKKEEPER-203, we had a zk-independent LedgerManager interface to handle metadata management for bookkeeper. so a possible way to handle metadata formatting is to provide a 'format' interface in LedgerManager. and different LedgerManager implementation should provide a way to format their metadata.

so the metadata format progress would be:

1) reading LAYOUT to instantiate a LedgerManager.
2) call LedgerManager#format to format its metadata layout.
3) remove LAYOUT znode.
3) remove cookie znodes.

{quote}
Format should be implememted as API in server side or Client side.?
If BKJM also want to call format, which needs to format only ledger details in zk, then it would be better to implement at client side. i.e. in BookKeeper class.
{quote}

for metadata format, it might be better to put it in BookKeeperAdmin. for local data format, you could put it in Bookie.









                
      was (Author: hustlmsp):
    {quote}
I think format should just take care of metadata. Storage directories anyway will be cleaned if there is no metadata presents related to that data.
{quote}
{quote}
If we clean only zookeeper nodes, then bookie itself will not start. In each and every storage directory, cookie file will be present. If that information present in storage dir and not in storage directory, then bookie will not start.
{quote}

I think the bookies would still start because there is no cookie existed in ZooKeeper. bookies would treat it as a new environment and create its cookie node.

but there is an inconsistent issue if we just clean metadata. there is a time period before all old data was cleaned by GC thread. so those data might be introduced in the system again due some corner cases.

so if we want to introduce format here for BookKeeper, we need to introduce a mechanism to distinguish  bookies between current format and previous formats first. the initial idea is to generate an INSTANCE-ID (could be timestamp, uuid) for bookkeeper during formatting. all bookies join this instance of bookkeeper would generate cookie based on INSTANCE-ID and recorded them in local directories. if a bookie server join bookkeeper using different INSTANCE-ID, it should fail to start.

after introducing such mechanism, we could avoid bookie server starts with old data between different bookkeeper format.

so the format progress would contains two parts, one is metadata format, the other one is bookie format.

metadata format: could be run in either bookie server and run only once. it takes the responsibility on cleaning metadata. after cleaning old metadata, it would generate a new INSTANCE-ID for new bookkeeper instance.

bookie format: be run each bookie server. it takes the responsibility on cleaning old data.

if a bookie server doesn't run 'bookie-format' after the metadata is formatted. it should fail to start due to providing different INSTANCE-ID. so admin guys could run 'bookie -format' to clean local bookie data.

{code}
leaving other nodes ( cookies, LAYOOUT ) as it is
{code}

as Uma mentioned, 'format' means there is no old info introduced to new environment. so cookies and LAYOUT should be also removed.

before removing LAYOUT, you should not remove ledgers znode directly. LAYOUT introduced is for different ledger metadata management. in BOOKKEEPER-203, we had a zk-independent LedgerManager interface to handle metadata management for bookkeeper. so a possible way to handle metadata formatting is to provide a 'format' interface in LedgerManager. and different LedgerManager implementation should provide a way to format their metadata.

so the metadata format progress would be:

1) reading LAYOUT to instantiate a LedgerManager.
2) call LedgerManager#format to format its metadata layout.
3) remove LAYOUT znode.
3) remove cookie znodes.

{quote}
Format should be implememted as API in server side or Client side.?
If BKJM also want to call format, which needs to format only ledger details in zk, then it would be better to implement at client side. i.e. in BookKeeper class.
{quote}

for metadata format, it might be better to put it in BookKeeperAdmin. for local data format, you could put it in Bookie.









                  
> Create Bookie format command
> ----------------------------
>
>                 Key: BOOKKEEPER-300
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-300
>             Project: Bookkeeper
>          Issue Type: New Feature
>          Components: bookkeeper-server
>    Affects Versions: 4.1.0
>            Reporter: Rakesh R
>         Attachments: BOOKKEEPER-300.patch
>
>
> Provide a bookie format command. Then the admin would just have to run the command on each machine, which will prepare the bookie env
> +Zookeeper paths (znodes):+
> - ledger's root path
> - bookie's available path
> +Directories:+
> - Journal directories
> - Ledger directories

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira