You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Eric Tschetter (JIRA)" <ji...@apache.org> on 2010/11/20 02:31:13 UTC

[jira] Created: (HBASE-3254) Ability to specify the "host" published in zookeeper

Ability to specify the "host" published in zookeeper
----------------------------------------------------

                 Key: HBASE-3254
                 URL: https://issues.apache.org/jira/browse/HBASE-3254
             Project: HBase
          Issue Type: Improvement
    Affects Versions: 0.89.20100924
            Reporter: Eric Tschetter


We are running HBase on EC2 and I'm trying to get a client external from EC2 to connect to the cluster.  But, each of the nodes appears to be publishing its IP address into zookeeper.  The problem is that the nodes on EC2 see a 10. IP address that is only resolvable inside of EC2.

Specifically for EC2, there is a DNS name that will resolve properly both externally and internally, so it would be nice if I could tell each of the processes what host to publish into zookeeper via a property.  As it stands, I have to do ssh tunnelling/muck with the hosts file in order to get my client to connect.
 
This problem could occur anywhere that you have a different DNS entry for public vs. private access.  That might only ever happen on EC2, but it might happen elsewhere.  I don't really know :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3254) Ability to specify the "host" published in zookeeper

Posted by "Eric Tschetter (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935585#action_12935585 ] 

Eric Tschetter commented on HBASE-3254:
---------------------------------------

Ok, so I looked into this some more as I ran into *another* DNS issue when I went to deploy our client application in EC2.  It turns out the problem is somewhat confounded.  There are two places that IP addresses are taken from.

#1, when the client tries to locate the Root region, it uses the IP address in zookeeper.  From what I can tell this is the *only* time the addresses in zookeeper matter.  This IP address is determined by HServerInfo.getAddress()

#2, when the client tries to load up data from a specific HRegionServer (i.e. tries to access a user-created table) it looks at the meta data for the table to find the region that has the key(s) it's looking for.  This gets the HRegionServer address from the meta data itself which is populated by the ProcessRegionOpen through a call to HServerInfo.getHostnamePort().  HServerInfo.getHostnamePort() is very close to HServerInfo.getServerName().

So, it looks like the changes required are actually two fold, but would work:

1) Adjust the address reported to ZK to be .getHostnamePort() instead of getAddress()
2) Adjust line 260 of HRegionServer (the constructor) from

{code}
    machineName = DNS.getDefaultHost(
        conf.get("hbase.regionserver.dns.interface","default"),
        conf.get("hbase.regionserver.dns.nameserver","default"));
{code}

to

{code}
    machineName = DNS.getDefaultHost(
        conf.get("hbase.regionserver.dns.interface","default"),
        conf.get("hbase.regionserver.dns.nameserver","default"));
    machineName = conf.get("hbase.regionserver.machineName.override", machineName);
{code}

will have the intended effect.  Does this sound reasonable?  If so, I'll try to figure out how our guys did the CDH deployment to EC2, so that I can figure out how to test out a patched version...

> Ability to specify the "host" published in zookeeper
> ----------------------------------------------------
>
>                 Key: HBASE-3254
>                 URL: https://issues.apache.org/jira/browse/HBASE-3254
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.89.20100924
>            Reporter: Eric Tschetter
>
> We are running HBase on EC2 and I'm trying to get a client external from EC2 to connect to the cluster.  But, each of the nodes appears to be publishing its IP address into zookeeper.  The problem is that the nodes on EC2 see a 10. IP address that is only resolvable inside of EC2.
> Specifically for EC2, there is a DNS name that will resolve properly both externally and internally, so it would be nice if I could tell each of the processes what host to publish into zookeeper via a property.  As it stands, I have to do ssh tunnelling/muck with the hosts file in order to get my client to connect.
>  
> This problem could occur anywhere that you have a different DNS entry for public vs. private access.  That might only ever happen on EC2, but it might happen elsewhere.  I don't really know :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3254) Ability to specify the "host" published in zookeeper

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935187#action_12935187 ] 

stack commented on HBASE-3254:
------------------------------

A patch whereby optionally servers would register themselves in zk using a suggested hostname seems reasonable (Only tricky part is that a RegionServer will use the 'name' the master tells it use -- see the reportForDuty code in HRegionServer.  RegionServer on startup reads its 'address' then volunteers this to the Master but the Master could change it on the RegionServer.  Subsequently after first reportForDuty, the regionserver will always checkin using the name the Master told it.  In fact, you might be able to exploit this behavior by patching the Master only?)

> Ability to specify the "host" published in zookeeper
> ----------------------------------------------------
>
>                 Key: HBASE-3254
>                 URL: https://issues.apache.org/jira/browse/HBASE-3254
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.89.20100924
>            Reporter: Eric Tschetter
>
> We are running HBase on EC2 and I'm trying to get a client external from EC2 to connect to the cluster.  But, each of the nodes appears to be publishing its IP address into zookeeper.  The problem is that the nodes on EC2 see a 10. IP address that is only resolvable inside of EC2.
> Specifically for EC2, there is a DNS name that will resolve properly both externally and internally, so it would be nice if I could tell each of the processes what host to publish into zookeeper via a property.  As it stands, I have to do ssh tunnelling/muck with the hosts file in order to get my client to connect.
>  
> This problem could occur anywhere that you have a different DNS entry for public vs. private access.  That might only ever happen on EC2, but it might happen elsewhere.  I don't really know :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3254) Ability to specify the "host" published in zookeeper

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935040#action_12935040 ] 

stack commented on HBASE-3254:
------------------------------

Hey Cheddar.  You got a patch that would illustrate how you'd fix this (We should be using hostnames rather than IPs I'd say)?

> Ability to specify the "host" published in zookeeper
> ----------------------------------------------------
>
>                 Key: HBASE-3254
>                 URL: https://issues.apache.org/jira/browse/HBASE-3254
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.89.20100924
>            Reporter: Eric Tschetter
>
> We are running HBase on EC2 and I'm trying to get a client external from EC2 to connect to the cluster.  But, each of the nodes appears to be publishing its IP address into zookeeper.  The problem is that the nodes on EC2 see a 10. IP address that is only resolvable inside of EC2.
> Specifically for EC2, there is a DNS name that will resolve properly both externally and internally, so it would be nice if I could tell each of the processes what host to publish into zookeeper via a property.  As it stands, I have to do ssh tunnelling/muck with the hosts file in order to get my client to connect.
>  
> This problem could occur anywhere that you have a different DNS entry for public vs. private access.  That might only ever happen on EC2, but it might happen elsewhere.  I don't really know :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3254) Ability to specify the "host" published in zookeeper

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935600#action_12935600 ] 

stack commented on HBASE-3254:
------------------------------

On #1, would be better if this were the host, yeah.

On #2, yes, getServerName on HSI is getHostnamePort but with addition of startcode (as you've figured).

Yes, your override seems reasonable.

Good on you Eric.

> Ability to specify the "host" published in zookeeper
> ----------------------------------------------------
>
>                 Key: HBASE-3254
>                 URL: https://issues.apache.org/jira/browse/HBASE-3254
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.89.20100924
>            Reporter: Eric Tschetter
>
> We are running HBase on EC2 and I'm trying to get a client external from EC2 to connect to the cluster.  But, each of the nodes appears to be publishing its IP address into zookeeper.  The problem is that the nodes on EC2 see a 10. IP address that is only resolvable inside of EC2.
> Specifically for EC2, there is a DNS name that will resolve properly both externally and internally, so it would be nice if I could tell each of the processes what host to publish into zookeeper via a property.  As it stands, I have to do ssh tunnelling/muck with the hosts file in order to get my client to connect.
>  
> This problem could occur anywhere that you have a different DNS entry for public vs. private access.  That might only ever happen on EC2, but it might happen elsewhere.  I don't really know :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3254) Ability to specify the "host" published in zookeeper

Posted by "Eric Tschetter (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935102#action_12935102 ] 

Eric Tschetter commented on HBASE-3254:
---------------------------------------

Not yet.  I worked around it by mucking with my hosts file and setting up an
SSH-based SOCKS proxy for now.  If I get some time, I'll take a stab at a
patch.

I think that it should be reasonable to just have a System/hbase conf
property that the HRegionServer and HMaster look at to get the address that
they publish.  If that is not set, then just do what it does now.
https://issues.apache.org/jira/browse/HBASE-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935040#action_12935040]
should be using hostnames rather than IPs I'd say)?
EC2 to connect to the cluster. But, each of the nodes appears to be
publishing its IP address into zookeeper. The problem is that the nodes on
EC2 see a 10. IP address that is only resolvable inside of EC2.
externally and internally, so it would be nice if I could tell each of the
processes what host to publish into zookeeper via a property. As it stands,
I have to do ssh tunnelling/muck with the hosts file in order to get my
client to connect.
public vs. private access. That might only ever happen on EC2, but it might
happen elsewhere. I don't really know :).


> Ability to specify the "host" published in zookeeper
> ----------------------------------------------------
>
>                 Key: HBASE-3254
>                 URL: https://issues.apache.org/jira/browse/HBASE-3254
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.89.20100924
>            Reporter: Eric Tschetter
>
> We are running HBase on EC2 and I'm trying to get a client external from EC2 to connect to the cluster.  But, each of the nodes appears to be publishing its IP address into zookeeper.  The problem is that the nodes on EC2 see a 10. IP address that is only resolvable inside of EC2.
> Specifically for EC2, there is a DNS name that will resolve properly both externally and internally, so it would be nice if I could tell each of the processes what host to publish into zookeeper via a property.  As it stands, I have to do ssh tunnelling/muck with the hosts file in order to get my client to connect.
>  
> This problem could occur anywhere that you have a different DNS entry for public vs. private access.  That might only ever happen on EC2, but it might happen elsewhere.  I don't really know :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.