You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jesse Yates (JIRA)" <ji...@apache.org> on 2012/05/10 00:11:48 UTC

[jira] [Created] (HBASE-5977) Usage of modules

Jesse Yates created HBASE-5977:
----------------------------------

             Summary: Usage of modules 
                 Key: HBASE-5977
                 URL: https://issues.apache.org/jira/browse/HBASE-5977
             Project: HBase
          Issue Type: Brainstorming
          Components: build
    Affects Versions: 0.96.0
            Reporter: Jesse Yates


With HBASE-4336, HBase will have the ability to add multiple modules for different aspects of the codebase (less tests, see HBASE-4336 for details). We need to set a policy for when modules should be used versus putting the code into a single existing module or dispersed across modules. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5977) Usage of modules

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286830#comment-13286830 ] 

Jesse Yates commented on HBASE-5977:
------------------------------------

Other thing which would be awesome is an hbase-mapreduce package. Pull out all the classes that are map-reduce specific, but don't really touch the rest of the codebase.
                
> Usage of modules 
> -----------------
>
>                 Key: HBASE-5977
>                 URL: https://issues.apache.org/jira/browse/HBASE-5977
>             Project: HBase
>          Issue Type: Brainstorming
>          Components: build
>    Affects Versions: 0.96.0
>            Reporter: Jesse Yates
>         Attachments: Potential-HBase-Module-Descriptions-v1.pdf, Potential-HBase-Modules-v1.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With HBASE-4336, HBase will have the ability to add multiple modules for different aspects of the codebase (less tests, see HBASE-4336 for details). We need to set a policy for when modules should be used versus putting the code into a single existing module or dispersed across modules. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5977) Usage of modules

Posted by "Matt Corgan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matt Corgan updated HBASE-5977:
-------------------------------

    Attachment: Potential-HBase-Modules-v2.pdf

oh yeah - great call Jesse.  replacing Potential-HBase-Modules-v1.pdf with v2
                
> Usage of modules 
> -----------------
>
>                 Key: HBASE-5977
>                 URL: https://issues.apache.org/jira/browse/HBASE-5977
>             Project: HBase
>          Issue Type: Brainstorming
>          Components: build
>    Affects Versions: 0.96.0
>            Reporter: Jesse Yates
>         Attachments: Potential-HBase-Module-Descriptions-v1.pdf, Potential-HBase-Modules-v2.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With HBASE-4336, HBase will have the ability to add multiple modules for different aspects of the codebase (less tests, see HBASE-4336 for details). We need to set a policy for when modules should be used versus putting the code into a single existing module or dispersed across modules. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5977) Usage of modules

Posted by "Matt Corgan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matt Corgan updated HBASE-5977:
-------------------------------

    Attachment:     (was: Potential-HBase-Modules-v1.pdf)
    
> Usage of modules 
> -----------------
>
>                 Key: HBASE-5977
>                 URL: https://issues.apache.org/jira/browse/HBASE-5977
>             Project: HBase
>          Issue Type: Brainstorming
>          Components: build
>    Affects Versions: 0.96.0
>            Reporter: Jesse Yates
>         Attachments: Potential-HBase-Module-Descriptions-v1.pdf, Potential-HBase-Modules-v2.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With HBASE-4336, HBase will have the ability to add multiple modules for different aspects of the codebase (less tests, see HBASE-4336 for details). We need to set a policy for when modules should be used versus putting the code into a single existing module or dispersed across modules. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5977) Usage of modules

Posted by "Matt Corgan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matt Corgan updated HBASE-5977:
-------------------------------

    Attachment: Potential-HBase-Module-Descriptions-v1.pdf
                Potential-HBase-Modules-v1.pdf

Jesse, Stack and I have discussed this from a few different angles to try to identify some of the reasons for creating modules.  The main benefit of modules is to isolate complex implementations behind simple interfaces.  The main drawback is that modules add overhead in the form of more things to open in eclipse and more jar files in the build.

Pasting from HBASE-5720 some arguments for creating a "codec" module that contains wrapper classes for individual HFile block types:
* make it more testable, like a normal in-memory data structure without having to set up heavyweight testing environments
* separate the encoding concerns from IO concerns. after the checksum happens, encoders/decoders should not even know what an IOException is
* strongly discourage people from modifying anything in the codec packages without knowing what they're getting into
* ensure the main project code only references the interfaces and not any codec internals (see if main project compiles without codecs in classpath)
* make it easier for contributors to develop and profile the codecs without having to become experts in all aspects of hbase
* help to simplify the main project. imagine if the gzip or snappy internals were sprinkled throughout the regionserver code. yikes.

Attaching Potential-HBase-Modules-v1.pdf and Potential-HBaseModule-Descriptions-v1.pdf to illustrate a possible roadmap for extracting modules.  We currently have hbase-server, and first going to "pull up" some files into hbase-common.  Eventually we may "push down" an integration-test module.  

Extracting these modules can't really be done all at once, so this is just a roadmap meant to start discussion.  For example, there's probably an opportunity to isolate some of regionserver and master code, but they also share a lot.  This v1 doc shows a push down of master code out of the server module, but we probably need to think through that in more detail.

* Link to dependency chart: https://docs.google.com/presentation/d/16Kf9FAFjtneWwCnpy9Bql4QhXmORf7U9uJLoRobePHQ/edit
* Link to description doc: https://docs.google.com/document/d/1RHrUa9qWGvIR6ZmqVYP17rS7JTPSzCFCPKNjTo-XY38/edit

                
> Usage of modules 
> -----------------
>
>                 Key: HBASE-5977
>                 URL: https://issues.apache.org/jira/browse/HBASE-5977
>             Project: HBase
>          Issue Type: Brainstorming
>          Components: build
>    Affects Versions: 0.96.0
>            Reporter: Jesse Yates
>         Attachments: Potential-HBase-Module-Descriptions-v1.pdf, Potential-HBase-Modules-v1.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With HBASE-4336, HBase will have the ability to add multiple modules for different aspects of the codebase (less tests, see HBASE-4336 for details). We need to set a policy for when modules should be used versus putting the code into a single existing module or dispersed across modules. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5977) Usage of modules

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271879#comment-13271879 ] 

Jesse Yates commented on HBASE-5977:
------------------------------------

I'd like to avoid creating a ton of packages (or the tendancy to have lots of packages) as I see it more as a rough separation of concerns (like how hadoop has dfs, mr, and common) versus the finer grained functionality separation (where hadoop-common has 20+ modules) as each module means a new jar. 

In the short to medium term, I would like to see the following packages materialize out the existing single package:
* hbase-assemble  - necessary for building
* hbase-common - common functionality used between the client and server
* hbase-client - functionality just for the client. A general hbase client would just need hbase-common and hbase-client to run
* hbase-server - all server side functionality, including regionserver and master (this could even be separated, but not necessarily)

Other potential things that came up earlier in the process that seemed useful:
* hbase-security - shouldn't be needed if we roll in security, but still an option
* hbase-it - for a single place for higher level integration tests (all those using the mini-cluster) to avoid the maven test-jar dependency issue discussed in HBASE-4336

Any more granularity that these pacakges tends to be a bit of a mess and rarely all that useful. Instead, a lot of times its really better to just have a config option to specify the right class and load that from the path. The jar approach is much more heavy weight and only useful for wholesale replacements for which there are multiple (possibly competing) implementations. For instance, async-hbase could roll up into a hbase-client.jar and be a drop-in replacement in your install, but you wouldn't have a whole log-cleaner jar for switching the log cleaner class to use.

                
> Usage of modules 
> -----------------
>
>                 Key: HBASE-5977
>                 URL: https://issues.apache.org/jira/browse/HBASE-5977
>             Project: HBase
>          Issue Type: Brainstorming
>          Components: build
>    Affects Versions: 0.96.0
>            Reporter: Jesse Yates
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With HBASE-4336, HBase will have the ability to add multiple modules for different aspects of the codebase (less tests, see HBASE-4336 for details). We need to set a policy for when modules should be used versus putting the code into a single existing module or dispersed across modules. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5977) Usage of modules

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13282211#comment-13282211 ] 

Jesse Yates commented on HBASE-5977:
------------------------------------

Results of hackathon today: Matt is going to start working on pulling things into an hbase-common module for the common/utility classes. When we do a rewrite of the client (probably based on asynchbase), then we are getting an hbase-client module. Until then, we are going to slowly start pulling out modules as they seem necessary. 

Also, I'm going to add the hbase-common module so there is an example for how to add a new module, but let Matt deal with the actual moving of classes (thanks matt!).
                
> Usage of modules 
> -----------------
>
>                 Key: HBASE-5977
>                 URL: https://issues.apache.org/jira/browse/HBASE-5977
>             Project: HBase
>          Issue Type: Brainstorming
>          Components: build
>    Affects Versions: 0.96.0
>            Reporter: Jesse Yates
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With HBASE-4336, HBase will have the ability to add multiple modules for different aspects of the codebase (less tests, see HBASE-4336 for details). We need to set a policy for when modules should be used versus putting the code into a single existing module or dispersed across modules. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira