You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Enis Soztutar (JIRA)" <ji...@apache.org> on 2009/12/10 14:10:18 UTC

[jira] Created: (HBASE-2036) Use Configuration instead of HBaseConfiguration

Use Configuration instead of HBaseConfiguration 
------------------------------------------------

                 Key: HBASE-2036
                 URL: https://issues.apache.org/jira/browse/HBASE-2036
             Project: Hadoop HBase
          Issue Type: Improvement
            Reporter: Enis Soztutar


HBaseConfiguration extends Configuration but does not add any functionality to it. The only function is hashCode() which really should be refactored into Hadoop Configuration. 
I think in all the places(especially in the client side)  HBase methods and classes should accept Configuration rather than HBaseConfiguration. The creation of the configuration with the right files (hbase-site and hbase-default) should not be encapsulated in a private method, but in a public static one. 

The issues has arisen in our nutch+hbase patch for which we include both nutch configuration and hbase configurations. Moreover people may want to include separate project-specific configuration files to their configurations without the need to be dependent on the HBaseConfiguration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2036) Use Configuration instead of HBaseConfiguration

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790468#action_12790468 ] 

stack commented on HBASE-2036:
------------------------------

.bq ...then we should reuse the cached connections & data instead of building up a new set. 

Dave, doing the above would be fancy and a new facility, no?  What if we didn't do this new smarts, but just left it as dumb as it was, would that simplify things?

.bq Is it worth that added complexity to avoid recomputing the hash code each time we instantiate an HTable?

I'm with you that its not worth.

My base thing is that at the root, this static map of HCMs is broke so lets not go out of our way to preserve it.  I like your KeyConfiguration idea.  It gets the ugly equals and hashcode out of HBC.  I think object identity is good enough.  The broke HCM map will work as it did?



> Use Configuration instead of HBaseConfiguration 
> ------------------------------------------------
>
>                 Key: HBASE-2036
>                 URL: https://issues.apache.org/jira/browse/HBASE-2036
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Enis Soztutar
>         Attachments: hconf.patch
>
>
> HBaseConfiguration extends Configuration but does not add any functionality to it. The only function is hashCode() which really should be refactored into Hadoop Configuration. 
> I think in all the places(especially in the client side)  HBase methods and classes should accept Configuration rather than HBaseConfiguration. The creation of the configuration with the right files (hbase-site and hbase-default) should not be encapsulated in a private method, but in a public static one. 
> The issues has arisen in our nutch+hbase patch for which we include both nutch configuration and hbase configurations. Moreover people may want to include separate project-specific configuration files to their configurations without the need to be dependent on the HBaseConfiguration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2036) Use Configuration instead of HBaseConfiguration

Posted by "Dave Latham (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790385#action_12790385 ] 

Dave Latham commented on HBASE-2036:
------------------------------------

Enis: It would definitely be more efficient if we cache the hash code.  Can we assume that no one will modify the Configuration after it's already been used?  It seems the existing code already makes that assumption, so it's probably safe to stick with that.

Stack: I agree, an internal ConfigurationKey would probably be a good idea.  I'd rather have it cache the hashcode than use the object id, so that identical Configurations will work.  For example, if you do a new HTable("tableName"), it automatically instantiates a new HBaseConfiguration.

> Use Configuration instead of HBaseConfiguration 
> ------------------------------------------------
>
>                 Key: HBASE-2036
>                 URL: https://issues.apache.org/jira/browse/HBASE-2036
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Enis Soztutar
>         Attachments: hconf.patch
>
>
> HBaseConfiguration extends Configuration but does not add any functionality to it. The only function is hashCode() which really should be refactored into Hadoop Configuration. 
> I think in all the places(especially in the client side)  HBase methods and classes should accept Configuration rather than HBaseConfiguration. The creation of the configuration with the right files (hbase-site and hbase-default) should not be encapsulated in a private method, but in a public static one. 
> The issues has arisen in our nutch+hbase patch for which we include both nutch configuration and hbase configurations. Moreover people may want to include separate project-specific configuration files to their configurations without the need to be dependent on the HBaseConfiguration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2036) Use Configuration instead of HBaseConfiguration

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789446#action_12789446 ] 

stack commented on HBASE-2036:
------------------------------

@Doğacan What is your timeline?  OK if we make these changes for hbase 0.21 or you need them in 0.20 branch?

> Use Configuration instead of HBaseConfiguration 
> ------------------------------------------------
>
>                 Key: HBASE-2036
>                 URL: https://issues.apache.org/jira/browse/HBASE-2036
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Enis Soztutar
>
> HBaseConfiguration extends Configuration but does not add any functionality to it. The only function is hashCode() which really should be refactored into Hadoop Configuration. 
> I think in all the places(especially in the client side)  HBase methods and classes should accept Configuration rather than HBaseConfiguration. The creation of the configuration with the right files (hbase-site and hbase-default) should not be encapsulated in a private method, but in a public static one. 
> The issues has arisen in our nutch+hbase patch for which we include both nutch configuration and hbase configurations. Moreover people may want to include separate project-specific configuration files to their configurations without the need to be dependent on the HBaseConfiguration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2036) Use Configuration instead of HBaseConfiguration

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790999#action_12790999 ] 

stack commented on HBASE-2036:
------------------------------

@Dave I see.  You are right.  I thought the hashCode a recent addition but see it old.  +1 on ConnectionKey made around Configuration/HBaseConfiguration#hashCode.

> Use Configuration instead of HBaseConfiguration 
> ------------------------------------------------
>
>                 Key: HBASE-2036
>                 URL: https://issues.apache.org/jira/browse/HBASE-2036
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Enis Soztutar
>         Attachments: hconf.patch
>
>
> HBaseConfiguration extends Configuration but does not add any functionality to it. The only function is hashCode() which really should be refactored into Hadoop Configuration. 
> I think in all the places(especially in the client side)  HBase methods and classes should accept Configuration rather than HBaseConfiguration. The creation of the configuration with the right files (hbase-site and hbase-default) should not be encapsulated in a private method, but in a public static one. 
> The issues has arisen in our nutch+hbase patch for which we include both nutch configuration and hbase configurations. Moreover people may want to include separate project-specific configuration files to their configurations without the need to be dependent on the HBaseConfiguration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-2036) Use Configuration instead of HBaseConfiguration

Posted by "Enis Soztutar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Enis Soztutar updated HBASE-2036:
---------------------------------

    Attachment: hconf_v1.patch

Here is a first shot at the complete patch. All the unit tests pass, except org.apache.hadoop.hbase.util.TestMergeTool. 

> Use Configuration instead of HBaseConfiguration 
> ------------------------------------------------
>
>                 Key: HBASE-2036
>                 URL: https://issues.apache.org/jira/browse/HBASE-2036
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Enis Soztutar
>         Attachments: hconf.patch, hconf_v1.patch
>
>
> HBaseConfiguration extends Configuration but does not add any functionality to it. The only function is hashCode() which really should be refactored into Hadoop Configuration. 
> I think in all the places(especially in the client side)  HBase methods and classes should accept Configuration rather than HBaseConfiguration. The creation of the configuration with the right files (hbase-site and hbase-default) should not be encapsulated in a private method, but in a public static one. 
> The issues has arisen in our nutch+hbase patch for which we include both nutch configuration and hbase configurations. Moreover people may want to include separate project-specific configuration files to their configurations without the need to be dependent on the HBaseConfiguration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2036) Use Configuration instead of HBaseConfiguration

Posted by "Enis Soztutar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789592#action_12789592 ] 

Enis Soztutar commented on HBASE-2036:
--------------------------------------

Sorry, I intended to write a preliminary patch for this, but I've got stuck up with something else. We(either me or Dogacan) will write the patch hopefully before Monday.  I think it is OK for the 0.21 branch for this, since there is some more time needed for nutchbase to become mature enough. 

> Use Configuration instead of HBaseConfiguration 
> ------------------------------------------------
>
>                 Key: HBASE-2036
>                 URL: https://issues.apache.org/jira/browse/HBASE-2036
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Enis Soztutar
>
> HBaseConfiguration extends Configuration but does not add any functionality to it. The only function is hashCode() which really should be refactored into Hadoop Configuration. 
> I think in all the places(especially in the client side)  HBase methods and classes should accept Configuration rather than HBaseConfiguration. The creation of the configuration with the right files (hbase-site and hbase-default) should not be encapsulated in a private method, but in a public static one. 
> The issues has arisen in our nutch+hbase patch for which we include both nutch configuration and hbase configurations. Moreover people may want to include separate project-specific configuration files to their configurations without the need to be dependent on the HBaseConfiguration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-2036) Use Configuration instead of HBaseConfiguration

Posted by "Enis Soztutar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Enis Soztutar updated HBASE-2036:
---------------------------------

    Status: Patch Available  (was: Open)

> Use Configuration instead of HBaseConfiguration 
> ------------------------------------------------
>
>                 Key: HBASE-2036
>                 URL: https://issues.apache.org/jira/browse/HBASE-2036
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Enis Soztutar
>         Attachments: hconf.patch, hconf_v1.patch
>
>
> HBaseConfiguration extends Configuration but does not add any functionality to it. The only function is hashCode() which really should be refactored into Hadoop Configuration. 
> I think in all the places(especially in the client side)  HBase methods and classes should accept Configuration rather than HBaseConfiguration. The creation of the configuration with the right files (hbase-site and hbase-default) should not be encapsulated in a private method, but in a public static one. 
> The issues has arisen in our nutch+hbase patch for which we include both nutch configuration and hbase configurations. Moreover people may want to include separate project-specific configuration files to their configurations without the need to be dependent on the HBaseConfiguration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2036) Use Configuration instead of HBaseConfiguration

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790364#action_12790364 ] 

stack commented on HBASE-2036:
------------------------------

Enis: This looks great.  Lets get it into TRUNK.

Dave: I like the idea of a Key object, separating this whacky keying of HConnectionManagers from Configuration.  What about the Configuration/HBaseConfiguration object id?  Your Key object could be internal class and it would take the object id as the thing to check equals and hashcode on?  If someone wants to make a new HCM, then they just create a new Configuraiton/HBaseConfiguration?

> Use Configuration instead of HBaseConfiguration 
> ------------------------------------------------
>
>                 Key: HBASE-2036
>                 URL: https://issues.apache.org/jira/browse/HBASE-2036
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Enis Soztutar
>         Attachments: hconf.patch
>
>
> HBaseConfiguration extends Configuration but does not add any functionality to it. The only function is hashCode() which really should be refactored into Hadoop Configuration. 
> I think in all the places(especially in the client side)  HBase methods and classes should accept Configuration rather than HBaseConfiguration. The creation of the configuration with the right files (hbase-site and hbase-default) should not be encapsulated in a private method, but in a public static one. 
> The issues has arisen in our nutch+hbase patch for which we include both nutch configuration and hbase configurations. Moreover people may want to include separate project-specific configuration files to their configurations without the need to be dependent on the HBaseConfiguration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2036) Use Configuration instead of HBaseConfiguration

Posted by "Enis Soztutar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790288#action_12790288 ] 

Enis Soztutar commented on HBASE-2036:
--------------------------------------

bq.It relates to HBASE-2027 and some other issues that deal with how HConnection's are cached by the client. Right now, they use HBaseConfiguration as a key to a set of cached connections and information. It looks like the patch removes equals() from HBaseConfiguration (but not hashcode()) and then uses the hashcode as the key instead.
Yeah, the patch should not remove equals() and keep hashCode(). Since I have insufficient background about HBASE-2027, I did not want to make any changes to any configuration-related logic, so I think keeping hashcode as the key may work for now. But calculating the whole hashcode() at every call seems a major overkill. Maybe we can cache the hashcode in the conf itself so that the hash-code works much faster. Does it make sense? 

> Use Configuration instead of HBaseConfiguration 
> ------------------------------------------------
>
>                 Key: HBASE-2036
>                 URL: https://issues.apache.org/jira/browse/HBASE-2036
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Enis Soztutar
>         Attachments: hconf.patch
>
>
> HBaseConfiguration extends Configuration but does not add any functionality to it. The only function is hashCode() which really should be refactored into Hadoop Configuration. 
> I think in all the places(especially in the client side)  HBase methods and classes should accept Configuration rather than HBaseConfiguration. The creation of the configuration with the right files (hbase-site and hbase-default) should not be encapsulated in a private method, but in a public static one. 
> The issues has arisen in our nutch+hbase patch for which we include both nutch configuration and hbase configurations. Moreover people may want to include separate project-specific configuration files to their configurations without the need to be dependent on the HBaseConfiguration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2036) Use Configuration instead of HBaseConfiguration

Posted by "Dave Latham (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790433#action_12790433 ] 

Dave Latham commented on HBASE-2036:
------------------------------------

Yes, we can make a new Configuration instance just as we currently do, but if that Configuration has the same properties as an existing one, then we should reuse the cached connections & data instead of building up a new set.  To do that, we need to match against an existing key, which means a new hash.  So long as we cache the hash, then it only needs to be done once per instantiation.

On the other hand, if someone is using the HTable() constructor and passing in the same Configuration each time, then it would be nice to avoid the hash step.  We could bring back the old WeakHashMap but this time map from Configuration to ConfigurationKey so that we can reuse the ConfigurationKey if it's the same instance of Configuration.  Is it worth that added complexity to avoid recomputing the hash code each time we instantiate an HTable?  I think I would lean toward no.

> Use Configuration instead of HBaseConfiguration 
> ------------------------------------------------
>
>                 Key: HBASE-2036
>                 URL: https://issues.apache.org/jira/browse/HBASE-2036
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Enis Soztutar
>         Attachments: hconf.patch
>
>
> HBaseConfiguration extends Configuration but does not add any functionality to it. The only function is hashCode() which really should be refactored into Hadoop Configuration. 
> I think in all the places(especially in the client side)  HBase methods and classes should accept Configuration rather than HBaseConfiguration. The creation of the configuration with the right files (hbase-site and hbase-default) should not be encapsulated in a private method, but in a public static one. 
> The issues has arisen in our nutch+hbase patch for which we include both nutch configuration and hbase configurations. Moreover people may want to include separate project-specific configuration files to their configurations without the need to be dependent on the HBaseConfiguration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2036) Use Configuration instead of HBaseConfiguration

Posted by "Doğacan Güney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789206#action_12789206 ] 

Doğacan Güney commented on HBASE-2036:
--------------------------------------

If you are going to remove hashCode(), then it can be done just like in nutch. Add a static HBaseConfiguration.create() method that creates a Configuration object, reads hbase-*.xml as well then returns the Configuration object. 

> Use Configuration instead of HBaseConfiguration 
> ------------------------------------------------
>
>                 Key: HBASE-2036
>                 URL: https://issues.apache.org/jira/browse/HBASE-2036
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Enis Soztutar
>
> HBaseConfiguration extends Configuration but does not add any functionality to it. The only function is hashCode() which really should be refactored into Hadoop Configuration. 
> I think in all the places(especially in the client side)  HBase methods and classes should accept Configuration rather than HBaseConfiguration. The creation of the configuration with the right files (hbase-site and hbase-default) should not be encapsulated in a private method, but in a public static one. 
> The issues has arisen in our nutch+hbase patch for which we include both nutch configuration and hbase configurations. Moreover people may want to include separate project-specific configuration files to their configurations without the need to be dependent on the HBaseConfiguration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2036) Use Configuration instead of HBaseConfiguration

Posted by "Dave Latham (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790243#action_12790243 ] 

Dave Latham commented on HBASE-2036:
------------------------------------

This looks like a good idea.

It relates to HBASE-2027 and some other issues that deal with how HConnection's are cached by the client.  Right now, they use HBaseConfiguration as a key to a set of cached connections and information.  It looks like the patch removes equals() from HBaseConfiguration (but not hashcode()) and then uses the hashcode as the key instead.

If we're planning on removing HBaseConfiguration entirely, what about introducing a separate key object that can implement both equals and hashcode properly based on the relevant information from a Configuration object, then using that as the key.  That would avoid potential collisions, get rid of having an object with just a hashcode method but not equals, and separate this connection caching logic away from the Configuration where it doesn't really belong.

The trick then is identifying the relevant information to base it on.  I can see two ways of doing it, maybe someone else could come up with something better.  The current code uses all properties in the Configuration.  This is complete, but runs the risk of redundant connections for configurations that have only irrelevant config changes.  Another choice would be to explicitly define the set of hbase config properties that define the connections.  However, this could be fragile to changes.

Anyone else have thoughts?

> Use Configuration instead of HBaseConfiguration 
> ------------------------------------------------
>
>                 Key: HBASE-2036
>                 URL: https://issues.apache.org/jira/browse/HBASE-2036
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Enis Soztutar
>         Attachments: hconf.patch
>
>
> HBaseConfiguration extends Configuration but does not add any functionality to it. The only function is hashCode() which really should be refactored into Hadoop Configuration. 
> I think in all the places(especially in the client side)  HBase methods and classes should accept Configuration rather than HBaseConfiguration. The creation of the configuration with the right files (hbase-site and hbase-default) should not be encapsulated in a private method, but in a public static one. 
> The issues has arisen in our nutch+hbase patch for which we include both nutch configuration and hbase configurations. Moreover people may want to include separate project-specific configuration files to their configurations without the need to be dependent on the HBaseConfiguration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2036) Use Configuration instead of HBaseConfiguration

Posted by "Dave Latham (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790498#action_12790498 ] 

Dave Latham commented on HBASE-2036:
------------------------------------

If we use object identity, the HCM map will not work as it did.  With the current code, when you create a new HTable without passing in a HBaseConfiguration object, it instantiates a new one.  The HCM will find the existing connections/data via the hash map (and hashing of the data).  The current patch does the same thing via HBaseConfiguration.create().  So if we changed to only use object identity, then each time you create a new HTable this way, it will not find the existing TableServers object containing the existing connections & data.  I think we should move to a internal ConnectionKey that has the hashcode / equals logic as well as caching of the hash code.  This will preserve the existing behavior and clean up the Configuration object.

> Use Configuration instead of HBaseConfiguration 
> ------------------------------------------------
>
>                 Key: HBASE-2036
>                 URL: https://issues.apache.org/jira/browse/HBASE-2036
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Enis Soztutar
>         Attachments: hconf.patch
>
>
> HBaseConfiguration extends Configuration but does not add any functionality to it. The only function is hashCode() which really should be refactored into Hadoop Configuration. 
> I think in all the places(especially in the client side)  HBase methods and classes should accept Configuration rather than HBaseConfiguration. The creation of the configuration with the right files (hbase-site and hbase-default) should not be encapsulated in a private method, but in a public static one. 
> The issues has arisen in our nutch+hbase patch for which we include both nutch configuration and hbase configurations. Moreover people may want to include separate project-specific configuration files to their configurations without the need to be dependent on the HBaseConfiguration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2036) Use Configuration instead of HBaseConfiguration

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788810#action_12788810 ] 

stack commented on HBASE-2036:
------------------------------

I'm not so sure hashcode in HBC is a good idea.  Its up in the air at the moment.  It'll probably be removed.

If we used plain Configuration Enis, how would we ensure that hbase-*.xml had been read into the Configuration?   The public static method you allude to above would be where?

What if we kept HBC and just changed methods so they took a Configuration, not necessarily an HBC?  Would that work?

Thanks.



> Use Configuration instead of HBaseConfiguration 
> ------------------------------------------------
>
>                 Key: HBASE-2036
>                 URL: https://issues.apache.org/jira/browse/HBASE-2036
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Enis Soztutar
>
> HBaseConfiguration extends Configuration but does not add any functionality to it. The only function is hashCode() which really should be refactored into Hadoop Configuration. 
> I think in all the places(especially in the client side)  HBase methods and classes should accept Configuration rather than HBaseConfiguration. The creation of the configuration with the right files (hbase-site and hbase-default) should not be encapsulated in a private method, but in a public static one. 
> The issues has arisen in our nutch+hbase patch for which we include both nutch configuration and hbase configurations. Moreover people may want to include separate project-specific configuration files to their configurations without the need to be dependent on the HBaseConfiguration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-2036) Use Configuration instead of HBaseConfiguration

Posted by "Enis Soztutar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Enis Soztutar updated HBASE-2036:
---------------------------------

    Attachment: hconf.patch

I have put together a patch (not covering src/test yet). HBaseConfiguration constructors are deprecated and create() methods are added. I think it illustrates the goal of the issue. For compatibility, the methods are only deprecated. We can make remove the deprecated methods in the next release. 

> Use Configuration instead of HBaseConfiguration 
> ------------------------------------------------
>
>                 Key: HBASE-2036
>                 URL: https://issues.apache.org/jira/browse/HBASE-2036
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Enis Soztutar
>         Attachments: hconf.patch
>
>
> HBaseConfiguration extends Configuration but does not add any functionality to it. The only function is hashCode() which really should be refactored into Hadoop Configuration. 
> I think in all the places(especially in the client side)  HBase methods and classes should accept Configuration rather than HBaseConfiguration. The creation of the configuration with the right files (hbase-site and hbase-default) should not be encapsulated in a private method, but in a public static one. 
> The issues has arisen in our nutch+hbase patch for which we include both nutch configuration and hbase configurations. Moreover people may want to include separate project-specific configuration files to their configurations without the need to be dependent on the HBaseConfiguration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2036) Use Configuration instead of HBaseConfiguration

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795502#action_12795502 ] 

stack commented on HBASE-2036:
------------------------------

 I've not completed the review yet but so far so good.  Just giving a heads up that I'm going to commit this patch soon (will try and figure the broken test).  Its big so especially liable to rot so lets get it in. 

> Use Configuration instead of HBaseConfiguration 
> ------------------------------------------------
>
>                 Key: HBASE-2036
>                 URL: https://issues.apache.org/jira/browse/HBASE-2036
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Enis Soztutar
>         Attachments: hconf.patch, hconf_v1.patch
>
>
> HBaseConfiguration extends Configuration but does not add any functionality to it. The only function is hashCode() which really should be refactored into Hadoop Configuration. 
> I think in all the places(especially in the client side)  HBase methods and classes should accept Configuration rather than HBaseConfiguration. The creation of the configuration with the right files (hbase-site and hbase-default) should not be encapsulated in a private method, but in a public static one. 
> The issues has arisen in our nutch+hbase patch for which we include both nutch configuration and hbase configurations. Moreover people may want to include separate project-specific configuration files to their configurations without the need to be dependent on the HBaseConfiguration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2036) Use Configuration instead of HBaseConfiguration

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790390#action_12790390 ] 

stack commented on HBASE-2036:
------------------------------

Yes presumption seems to be that the configuration won't change.   Hashing all in a configuration will be expensive.  There is usually a lot of config in there.  For a new htable that is not passed a configuration can we not make a new instance internal to the htable constrictor as we currently do?  

> Use Configuration instead of HBaseConfiguration 
> ------------------------------------------------
>
>                 Key: HBASE-2036
>                 URL: https://issues.apache.org/jira/browse/HBASE-2036
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Enis Soztutar
>         Attachments: hconf.patch
>
>
> HBaseConfiguration extends Configuration but does not add any functionality to it. The only function is hashCode() which really should be refactored into Hadoop Configuration. 
> I think in all the places(especially in the client side)  HBase methods and classes should accept Configuration rather than HBaseConfiguration. The creation of the configuration with the right files (hbase-site and hbase-default) should not be encapsulated in a private method, but in a public static one. 
> The issues has arisen in our nutch+hbase patch for which we include both nutch configuration and hbase configurations. Moreover people may want to include separate project-specific configuration files to their configurations without the need to be dependent on the HBaseConfiguration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2036) Use Configuration instead of HBaseConfiguration

Posted by "Enis Soztutar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792415#action_12792415 ] 

Enis Soztutar commented on HBASE-2036:
--------------------------------------

ok, then let me finish with the patch first. Then we can open another issue for this. I think you can find the fix better then me. Anyway, I will complete the fix as soon as I can, Thanks. 

> Use Configuration instead of HBaseConfiguration 
> ------------------------------------------------
>
>                 Key: HBASE-2036
>                 URL: https://issues.apache.org/jira/browse/HBASE-2036
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Enis Soztutar
>         Attachments: hconf.patch
>
>
> HBaseConfiguration extends Configuration but does not add any functionality to it. The only function is hashCode() which really should be refactored into Hadoop Configuration. 
> I think in all the places(especially in the client side)  HBase methods and classes should accept Configuration rather than HBaseConfiguration. The creation of the configuration with the right files (hbase-site and hbase-default) should not be encapsulated in a private method, but in a public static one. 
> The issues has arisen in our nutch+hbase patch for which we include both nutch configuration and hbase configurations. Moreover people may want to include separate project-specific configuration files to their configurations without the need to be dependent on the HBaseConfiguration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-2036) Use Configuration instead of HBaseConfiguration

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-2036:
-------------------------

       Resolution: Fixed
    Fix Version/s: 0.21.0
     Release Note: HBaseConfiguration as an object is now deprecated.  Use HBaseConfiguration.create to make an Hadoop Configuration populated with hbase config. from here on out.
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

Committed TRUNK.  Thanks for the patch Enis.  It was a bug in how Merge was using Configurable interface that was causing the test failure.  Let us know if anything else we can do to help out your effort.

> Use Configuration instead of HBaseConfiguration 
> ------------------------------------------------
>
>                 Key: HBASE-2036
>                 URL: https://issues.apache.org/jira/browse/HBASE-2036
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Enis Soztutar
>             Fix For: 0.21.0
>
>         Attachments: hconf.patch, hconf_v1.patch
>
>
> HBaseConfiguration extends Configuration but does not add any functionality to it. The only function is hashCode() which really should be refactored into Hadoop Configuration. 
> I think in all the places(especially in the client side)  HBase methods and classes should accept Configuration rather than HBaseConfiguration. The creation of the configuration with the right files (hbase-site and hbase-default) should not be encapsulated in a private method, but in a public static one. 
> The issues has arisen in our nutch+hbase patch for which we include both nutch configuration and hbase configurations. Moreover people may want to include separate project-specific configuration files to their configurations without the need to be dependent on the HBaseConfiguration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2036) Use Configuration instead of HBaseConfiguration

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791000#action_12791000 ] 

stack commented on HBASE-2036:
------------------------------

Enis, you up for finishing off the change or you want us to take it from here?

> Use Configuration instead of HBaseConfiguration 
> ------------------------------------------------
>
>                 Key: HBASE-2036
>                 URL: https://issues.apache.org/jira/browse/HBASE-2036
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Enis Soztutar
>         Attachments: hconf.patch
>
>
> HBaseConfiguration extends Configuration but does not add any functionality to it. The only function is hashCode() which really should be refactored into Hadoop Configuration. 
> I think in all the places(especially in the client side)  HBase methods and classes should accept Configuration rather than HBaseConfiguration. The creation of the configuration with the right files (hbase-site and hbase-default) should not be encapsulated in a private method, but in a public static one. 
> The issues has arisen in our nutch+hbase patch for which we include both nutch configuration and hbase configurations. Moreover people may want to include separate project-specific configuration files to their configurations without the need to be dependent on the HBaseConfiguration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2036) Use Configuration instead of HBaseConfiguration

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795546#action_12795546 ] 

stack commented on HBASE-2036:
------------------------------

Tracking down why TestMergeTool is failing.  Its a messy test particularly in how it does its setup.  Here is the failure mode:

{code}
Testcase: testMergeTool took 10.931 sec
  Caused an ERROR
null
java.lang.NullPointerException
  at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:111)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:103)
  at org.apache.hadoop.hbase.util.Merge.run(Merge.java:79)
  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
  at org.apache.hadoop.hbase.util.TestMergeTool.mergeAndVerify(TestMergeTool.java:174)
  at org.apache.hadoop.hbase.util.TestMergeTool.testMergeTool(TestMergeTool.java:252)
{code}

> Use Configuration instead of HBaseConfiguration 
> ------------------------------------------------
>
>                 Key: HBASE-2036
>                 URL: https://issues.apache.org/jira/browse/HBASE-2036
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Enis Soztutar
>         Attachments: hconf.patch, hconf_v1.patch
>
>
> HBaseConfiguration extends Configuration but does not add any functionality to it. The only function is hashCode() which really should be refactored into Hadoop Configuration. 
> I think in all the places(especially in the client side)  HBase methods and classes should accept Configuration rather than HBaseConfiguration. The creation of the configuration with the right files (hbase-site and hbase-default) should not be encapsulated in a private method, but in a public static one. 
> The issues has arisen in our nutch+hbase patch for which we include both nutch configuration and hbase configurations. Moreover people may want to include separate project-specific configuration files to their configurations without the need to be dependent on the HBaseConfiguration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.