You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Luca Telloli (JIRA)" <ji...@apache.org> on 2009/03/26 11:26:07 UTC

[jira] Issue Comment Edited: (HADOOP-5189) Integration with BookKeeper logging system

    [ https://issues.apache.org/jira/browse/HADOOP-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689278#action_12689278 ] 

Luca Telloli edited comment on HADOOP-5189 at 3/26/09 3:24 AM:
---------------------------------------------------------------

I'm posting a new patch for the integration of BookKeeper with HDFS. 

In this patch the logSync() method is exactly the same as the original file based logging. Additionally, it does not implement any abstract class for logging, apart from the two EditLogInput/OutputStream classes, as requested by Konstantin (in the following I'll just use InputStream, but I'll refer to both). 

Here's some detail about the patch:

1. the current implementation, HDFS does not allow other types of logging; specifically version .19 does not allow any other EditLogInputStream apart from EditLogFileInputStream, that is, each time a EditLogInputStream is needed, a EditLogFileInputStream is instantiated. In the patch I add configuration values to enable Bookkeper logging and I allow the user to switch between different logging types by using a configuration property in hadoop-site.xml 


2. As Konstantin suggested some time ago, in the current patch I started by implementing only the two abstract classes above, but at the end I had to modify more classes. In particular I modify FSEditLog.java, SecondaryNamenode.java and FSImage.java. Although the modifications are mainly related to issue 1, there's an additional confusion between the semantics of "open" and "create" since, in the case of files, the two operations have strong similarities. This doesn't hold for BookKeeper in some cases (mostly related to the CreateEditLogFile() method), my code needs to branch to avoid some unwanted creation of new ledgers. 

Even with this, the patch is not yet complete, due to the following issues: 

3. the current .19 implementation does not yet implement support for multiple concurrent logging systems. This is another implementation problem which should be fixed, but I'm not sure how easily. As Ben said, HDFS is heavily based on file and uses storage directories to store the image of the file system and the edits in these directories. I think this then turns into a design problem, because it's not easy to decouple the file system from the edits file, since they both live in the same directory. 

4. Another drawback related to 3 is that the Namenode, to properly work, needs some files like edits and edits new even if the logging system is not using file-based logging. In the patch, even if I'm using Bookkeeper to store edits, I still need to have the edits and edits.new files. I currently use them to store some small information about ledgers IDs but this will change soon in favor of ZooKeeper. 

I'm not sure how to fix the above issues, in particular I'm worried that a good solution would need to rethink the Storage Directories and the FSImage as they're currently implemented. 


      was (Author: lucat):
    I'm posting a new patch for the integration of BookKeeper with HDFS. 

In this patch the logSync() method is exactly the same as the original file based logging. Additionally, it does not implement any abstract class for logging, apart from the two EditLogInput/OutputStream classes, as requested by Konstantin (in the following I'll just use InputStream, but I'll refer to both). 

Here's some detail about the patch:

1. the current implementation does not allow other types of logging; specifically version .19 does not allow any other EditLogInputStream apart from EditLogFileInputStream, that is, each time a EditLogInputStream is needed, a EditLogFileInputStream is instantiated. In the patch I add configuration values to enable Bookkeper logging and I allow the user to switch between different logging types by using a configuration property in hadoop-site.xml 


2. As Konstantin suggested some time ago, in the current patch I started by implementing only the two abstract classes above, but at the end I had to modify more classes. In particular I modify FSEditLog.java, SecondaryNamenode.java and FSImage.java. Although the modifications are mainly related to issue 1, there's an additional confusion between the semantics of "open" and "create" since, in the case of files, the two operations have strong similarities. This doesn't hold for BookKeeper in some cases (mostly related to the CreateEditLogFile() method), my code needs to branch to avoid some unwanted creation of new ledgers. 

Even with this, the patch is not yet complete, due to the following issues: 

3. the current .19 implementation does not yet implement support for multiple concurrent logging systems. This is another implementation problem which should be fixed, but I'm not sure how easily. As Ben said, HDFS is heavily based on file and uses storage directories to store the image of the file system and the edits in these directories. I think this then turns into a design problem, because it's not easy to decouple the file system from the edits file, since they both live in the same directory. 

4. Another drawback related to 3 is that the Namenode, to properly work, needs some files like edits and edits new even if the logging system is not using file-based logging. In the patch, even if I'm using Bookkeeper to store edits, I still need to have the edits and edits.new files. I currently use them to store some small information about ledgers IDs but this will change soon in favor of ZooKeeper. 

I'm not sure how to fix the above issues, in particular I'm worried that a good solution would need to rethink the Storage Directories and the FSImage as they're currently implemented. 

  
> Integration with BookKeeper logging system
> ------------------------------------------
>
>                 Key: HADOOP-5189
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5189
>             Project: Hadoop Core
>          Issue Type: New Feature
>    Affects Versions: 0.19.0
>            Reporter: Luca Telloli
>         Attachments: create.png, HADOOP-5189.patch, HADOOP-5189.patch
>
>
> BookKeeper is a system to reliably log streams of records (https://issues.apache.org/jira/browse/ZOOKEEPER-276). The NameNode is a natural target for such a system for being the metadata repository of the entire file system for HDFS. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.