You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Allen Wittenauer (JIRA)" <ji...@apache.org> on 2008/08/21 16:06:44 UTC

[jira] Created: (HADOOP-3988) The elephant should remember names, not numbers.

The elephant should remember names, not numbers.
------------------------------------------------

                 Key: HADOOP-3988
                 URL: https://issues.apache.org/jira/browse/HADOOP-3988
             Project: Hadoop Core
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.17.2
            Reporter: Allen Wittenauer


The name node and the data node should not cache the resolution of host names, as doing so prevents the use of DNS CNAMEs for any sort of fail over capability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3988) The elephant should remember names, not numbers.

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624412#action_12624412 ] 

Steve Loughran commented on HADOOP-3988:
----------------------------------------

Did I say you could set this on the command line? I was wrong:
http://jira.smartfrog.org/jira/browse/SFOS-764

you edit a properties file in the JVM lib/security directory, or call

java.security.Security.setProperty("networkaddress.cache.ttl" , "0"); 

It would be possible for server-side nodes to set this property when they start up, but the operation should be wrapped with a catch for any security exception, so running hadoop under a security manager isn't fatal.

-this is separate to where the hostnames should be resolved, which needs to be moved into every services offerService loop.

Alan - I believe the Sun JVM DNS cache still ignores the TTL that comes down from above. It's to stop applets and other sandboxed things breaking out of the sandbox and talking to hosts behind the firewall, but interferes with long-lived server-side code.

> The elephant should remember names, not numbers.
> ------------------------------------------------
>
>                 Key: HADOOP-3988
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3988
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.2
>            Reporter: Allen Wittenauer
>
> The name node and the data node should not cache the resolution of host names, as doing so prevents the use of DNS CNAMEs for any sort of fail over capability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3988) The elephant should remember names, not numbers.

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660867#action_12660867 ] 

Raghu Angadi commented on HADOOP-3988:
--------------------------------------

{quote} Alan - I believe the Sun JVM DNS cache still ignores the TTL that comes down from above. It's to stop applets and other sandboxed things breaking out of the sandbox and talking to hosts behind the firewall, but interferes with long-lived server-side code. {quote}

oops! fix for Sun JVM is a must.

> The elephant should remember names, not numbers.
> ------------------------------------------------
>
>                 Key: HADOOP-3988
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3988
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.2
>            Reporter: Allen Wittenauer
>
> The name node and the data node should not cache the resolution of host names, as doing so prevents the use of DNS CNAMEs for any sort of fail over capability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3988) The elephant should remember names, not numbers.

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624378#action_12624378 ] 

Allen Wittenauer commented on HADOOP-3988:
------------------------------------------

We recently tried a "new kind" of fail over in our environment.  Rather than having a static IP for the name node, we attempted to use DNS CNAMEs to move the name node from one node to another.  We discovered that the data nodes continually attempted to contact the old machine even though DNS pointed to the new machine.

Since we configure a host name in hadoop, I would expect that the data nodes would at some point drop their cache of the IP and re-resolve.  However, this never happened.

I'd like to see either an option or just the default to be when a name is given in a configuration file, Hadoop always does a host name resolution on that entry prior to connection.  The operating system should be able to handle the job of caching any addresses that need to be cached, either through a mechanism like nscd or through a fully-blooded, local DNS cache.

> The elephant should remember names, not numbers.
> ------------------------------------------------
>
>                 Key: HADOOP-3988
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3988
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.2
>            Reporter: Allen Wittenauer
>
> The name node and the data node should not cache the resolution of host names, as doing so prevents the use of DNS CNAMEs for any sort of fail over capability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3988) The elephant should remember names, not numbers.

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624392#action_12624392 ] 

Steve Loughran commented on HADOOP-3988:
----------------------------------------

1. The JVM does lots and lots of DNS caching of its own. To get it to cache positive and negative DNS entries for less time than forever, you've got to start the processes with a DNS TTL property,  networkaddress.cache.ttl
By default, Java5 caches forever: http://java.sun.com/j2se/1.5.0/docs/api/java/net/InetAddress.html
Java6 is a bit smarter, and only caches forever when a security manager is installed
http://java.sun.com/javase/6/docs/technotes/guides/net/properties.html

You'd need to decide what is a good DNS cache TTL for the datanodes and push it out to the scripts. I'm not 100% sure you can set this property inside the JVM and have it taken up; the command line is the conventional way to do it.

2. The address of the namenode is currently set up in DataNode.startDataNode(). There are a few classes that assume that DataNode.getNameNodeAddr() is never null; they'd need to change their assumptions.

3. DNS is slow and is one of the main delays in test runs right now.

What may work is leaving the current init code as it is, but whenever a connection to the namenode fails, the datanode should re-read the namenode address from the configuration and redo the nslookup; the scripts would need patching to set a low TTL on the live systems. Rereading the address from the configuration would be useful if the configuration was coming from something like an LDAP server; you could change the hostname there and have it picked up without DNS caching interfering. 


> The elephant should remember names, not numbers.
> ------------------------------------------------
>
>                 Key: HADOOP-3988
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3988
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.2
>            Reporter: Allen Wittenauer
>
> The name node and the data node should not cache the resolution of host names, as doing so prevents the use of DNS CNAMEs for any sort of fail over capability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3988) The elephant should remember names, not numbers.

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660865#action_12660865 ] 

Raghu Angadi commented on HADOOP-3988:
--------------------------------------

Thanks Steve. Based on the above comments, we need to do couple of things :

- If a configuration variable for ttl is not default, DN (and may be clients) set an explicit ttl (through "networkaddress.cache.ttl").
- RPC clients re-resolve RPC servers for a new connection, if the last resolution was at least "ttl" ego.

So that admin could set the ttl to 10 minutes and RPC clients resolve at at most once every 10 minutes.

> The elephant should remember names, not numbers.
> ------------------------------------------------
>
>                 Key: HADOOP-3988
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3988
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.2
>            Reporter: Allen Wittenauer
>
> The name node and the data node should not cache the resolution of host names, as doing so prevents the use of DNS CNAMEs for any sort of fail over capability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3988) The elephant should remember names, not numbers.

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624407#action_12624407 ] 

Allen Wittenauer commented on HADOOP-3988:
------------------------------------------

Steve's comment reminded me of an important detail that is bound to come up from our experiment! I failed to mention that we set the TTL for the DNS CNAME to be an extremely low value (5 minutes) in the DNS zone.  We knew (and tested to make sure) that on the DNS side, the address change would be handled properly.

> The elephant should remember names, not numbers.
> ------------------------------------------------
>
>                 Key: HADOOP-3988
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3988
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.2
>            Reporter: Allen Wittenauer
>
> The name node and the data node should not cache the resolution of host names, as doing so prevents the use of DNS CNAMEs for any sort of fail over capability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3988) The elephant should remember names, not numbers.

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661100#action_12661100 ] 

Steve Loughran commented on HADOOP-3988:
----------------------------------------

If you start the JVM with a TTL property on the command line, it gets picked up. If hadoop does its own TTL code underneath that, then you probably get the minimum TTL of hadoop-site.xml and the JVM. I think.

this is going to be painful to test. 

> The elephant should remember names, not numbers.
> ------------------------------------------------
>
>                 Key: HADOOP-3988
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3988
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.2
>            Reporter: Allen Wittenauer
>
> The name node and the data node should not cache the resolution of host names, as doing so prevents the use of DNS CNAMEs for any sort of fail over capability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3988) The elephant should remember names, not numbers.

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661262#action_12661262 ] 

Raghu Angadi commented on HADOOP-3988:
--------------------------------------

> If you start the JVM with a TTL property on the command line, it gets picked up.

Do you mean this method would work on Sun JVM? Then it is probably good enough. Documentation for the hadoop config variable would clarify that and admin needs to add a JVM arg if it needs to be effective with Sun JVM.


> The elephant should remember names, not numbers.
> ------------------------------------------------
>
>                 Key: HADOOP-3988
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3988
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.2
>            Reporter: Allen Wittenauer
>
> The name node and the data node should not cache the resolution of host names, as doing so prevents the use of DNS CNAMEs for any sort of fail over capability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3988) The elephant should remember names, not numbers.

Posted by "Yossi Ittach (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647215#action_12647215 ] 

Yossi Ittach commented on HADOOP-3988:
--------------------------------------

We are trying to do something similar - but instead of changing the DNS scheme , we're using floating IP - So it should override all the caching. I'll update when I have more results. 

> The elephant should remember names, not numbers.
> ------------------------------------------------
>
>                 Key: HADOOP-3988
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3988
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.2
>            Reporter: Allen Wittenauer
>
> The name node and the data node should not cache the resolution of host names, as doing so prevents the use of DNS CNAMEs for any sort of fail over capability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.