You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "James Todd (JIRA)" <ji...@apache.org> on 2006/11/07 02:04:37 UTC

[jira] Created: (HADOOP-685) DataNode appears to require DNS name resolution as opposed to direct ip mapping

DataNode appears to require DNS name resolution as opposed to direct ip mapping
-------------------------------------------------------------------------------

                 Key: HADOOP-685
                 URL: http://issues.apache.org/jira/browse/HADOOP-685
             Project: Hadoop
          Issue Type: Improvement
          Components: dfs
         Environment: osx, ubuntu 6.10b
            Reporter: James Todd
            Priority: Minor


DataNode appears to require DNS resolution of nodes via the class org.apache.hadoop.net.DNS as opposed being able to use a specified ip.

as an example, i was not able to set up more then one instance of dfs datanodes on one box using loopback w/ varying ports since DataNode
resolved the ip of 127.0.0.1 to be "foo.bar" which was then mapped to the dhcp allocated ip of 192.168.0.***, which was not addressable by the
rest of the dfs cluster (namely namenode).

while this example is trivial one should be able to use the very same process yet change only the ip's of the nodes and have things work as
expected.

it would be nice to not always require nds resolution.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Assigned: (HADOOP-685) DataNode appears to require DNS name resolution as opposed to direct ip mapping

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi reassigned HADOOP-685:
-----------------------------------

    Assignee: Raghu Angadi  (was: Sameer Paranjpye)

> DataNode appears to require DNS name resolution as opposed to direct ip mapping
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-685
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>         Environment: osx, ubuntu 6.10b
>            Reporter: James Todd
>         Assigned To: Raghu Angadi
>            Priority: Minor
>
> DataNode appears to require DNS resolution of nodes via the class org.apache.hadoop.net.DNS as opposed being able to use a specified ip.
> as an example, i was not able to set up more then one instance of dfs datanodes on one box using loopback w/ varying ports since DataNode
> resolved the ip of 127.0.0.1 to be "foo.bar" which was then mapped to the dhcp allocated ip of 192.168.0.***, which was not addressable by the
> rest of the dfs cluster (namely namenode).
> while this example is trivial one should be able to use the very same process yet change only the ip's of the nodes and have things work as
> expected.
> it would be nice to not always require nds resolution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-685) DataNode appears to require DNS name resolution as opposed to direct ip mapping

Posted by "Andrew McNabb (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12470698 ] 

Andrew McNabb commented on HADOOP-685:
--------------------------------------

We worked around a problem that I think is related.  In /etc/hosts, the default line was "127.0.0.1 localhost.localdomain localhost" because the IP address was assigned by DHCP.  If you ran hostname, it gave the correct name for the host, but 'hostname -f' gave localhost.localdomain because it used /etc/hosts instead of the information from DHCP.

In this situation, Hadoop didn't work because as each machine reported in, they claimed that their hostname was "localhost.localdomain" rather than the actual hostnames.  The jobtracker was confused and didn't work.  We ended up having to manually change /etc/hosts on each machine.

If everything worked with IP addresses instead of hostnames, I think the problem would be fixed.  Anyway, I think this is related to the problem that James Todd brought up.

> DataNode appears to require DNS name resolution as opposed to direct ip mapping
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-685
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>         Environment: osx, ubuntu 6.10b
>            Reporter: James Todd
>         Assigned To: Raghu Angadi
>            Priority: Minor
>
> DataNode appears to require DNS resolution of nodes via the class org.apache.hadoop.net.DNS as opposed being able to use a specified ip.
> as an example, i was not able to set up more then one instance of dfs datanodes on one box using loopback w/ varying ports since DataNode
> resolved the ip of 127.0.0.1 to be "foo.bar" which was then mapped to the dhcp allocated ip of 192.168.0.***, which was not addressable by the
> rest of the dfs cluster (namely namenode).
> while this example is trivial one should be able to use the very same process yet change only the ip's of the nodes and have things work as
> expected.
> it would be nice to not always require nds resolution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-685) DataNode appears to require DNS name resolution as opposed to direct ip mapping

Posted by "James Todd (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-685?page=comments#action_12447599 ] 
            
James Todd commented on HADOOP-685:
-----------------------------------

i was thinking about a hack to get around this issue that once proven i could offer as a starting point. if someone else sees the way out on this one and has a fix in mind i do not want to be the one to hold up progress so by all means .... "make it so"

> DataNode appears to require DNS name resolution as opposed to direct ip mapping
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-685
>                 URL: http://issues.apache.org/jira/browse/HADOOP-685
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>         Environment: osx, ubuntu 6.10b
>            Reporter: James Todd
>            Priority: Minor
>
> DataNode appears to require DNS resolution of nodes via the class org.apache.hadoop.net.DNS as opposed being able to use a specified ip.
> as an example, i was not able to set up more then one instance of dfs datanodes on one box using loopback w/ varying ports since DataNode
> resolved the ip of 127.0.0.1 to be "foo.bar" which was then mapped to the dhcp allocated ip of 192.168.0.***, which was not addressable by the
> rest of the dfs cluster (namely namenode).
> while this example is trivial one should be able to use the very same process yet change only the ip's of the nodes and have things work as
> expected.
> it would be nice to not always require nds resolution.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-685) DataNode appears to require DNS name resolution as opposed to direct ip mapping

Posted by "Andrew McNabb (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471026 ] 

Andrew McNabb commented on HADOOP-685:
--------------------------------------

I agree with Marco.  When machines have multiple interfaces, automatic resolution of hostnames always seems to be problematic.

> DataNode appears to require DNS name resolution as opposed to direct ip mapping
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-685
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>         Environment: osx, ubuntu 6.10b
>            Reporter: James Todd
>         Assigned To: Raghu Angadi
>            Priority: Minor
>
> DataNode appears to require DNS resolution of nodes via the class org.apache.hadoop.net.DNS as opposed being able to use a specified ip.
> as an example, i was not able to set up more then one instance of dfs datanodes on one box using loopback w/ varying ports since DataNode
> resolved the ip of 127.0.0.1 to be "foo.bar" which was then mapped to the dhcp allocated ip of 192.168.0.***, which was not addressable by the
> rest of the dfs cluster (namely namenode).
> while this example is trivial one should be able to use the very same process yet change only the ip's of the nodes and have things work as
> expected.
> it would be nice to not always require nds resolution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-685) DataNode appears to require DNS name resolution as opposed to direct ip mapping

Posted by "Marco Nicosia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12470790 ] 

Marco Nicosia commented on HADOOP-685:
--------------------------------------

I don't think a dataNode should ever try to determine its own hostname.

In situations where dataNodes might have virtual IP addresses configured, or have multiple interfaces on different subnets, determining what the "correct" hostname should be is non-deterministic. You can do some work to find the "administrative hostname" (ie, the name of the host, not necessarily any particular interface) but that's only useful for identification purposes, and requires DNS to get the FQDN.

I know it's not trivial, but I'd prefer that the nameNode record the IP address of a connection. That way there's no DNS involved at any level in the transaction, and we know exactly which interface/IP address is being used. Additionally, there's no worrying about /etc/hosts, or dhcp, or whatnot. It works for the entire time the dataNode's up, and making network connections.

In order to support multiple dataNodes per machine, dataNodes need to report their listening port, but I think that's required regardless of how we solve this problem?


> DataNode appears to require DNS name resolution as opposed to direct ip mapping
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-685
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>         Environment: osx, ubuntu 6.10b
>            Reporter: James Todd
>         Assigned To: Raghu Angadi
>            Priority: Minor
>
> DataNode appears to require DNS resolution of nodes via the class org.apache.hadoop.net.DNS as opposed being able to use a specified ip.
> as an example, i was not able to set up more then one instance of dfs datanodes on one box using loopback w/ varying ports since DataNode
> resolved the ip of 127.0.0.1 to be "foo.bar" which was then mapped to the dhcp allocated ip of 192.168.0.***, which was not addressable by the
> rest of the dfs cluster (namely namenode).
> while this example is trivial one should be able to use the very same process yet change only the ip's of the nodes and have things work as
> expected.
> it would be nice to not always require nds resolution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HADOOP-685) DataNode appears to require DNS name resolution as opposed to direct ip mapping

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi resolved HADOOP-685.
---------------------------------

    Resolution: Duplicate

> DataNode appears to require DNS name resolution as opposed to direct ip mapping
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-685
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>         Environment: osx, ubuntu 6.10b
>            Reporter: James Todd
>         Assigned To: Raghu Angadi
>            Priority: Minor
>
> DataNode appears to require DNS resolution of nodes via the class org.apache.hadoop.net.DNS as opposed being able to use a specified ip.
> as an example, i was not able to set up more then one instance of dfs datanodes on one box using loopback w/ varying ports since DataNode
> resolved the ip of 127.0.0.1 to be "foo.bar" which was then mapped to the dhcp allocated ip of 192.168.0.***, which was not addressable by the
> rest of the dfs cluster (namely namenode).
> while this example is trivial one should be able to use the very same process yet change only the ip's of the nodes and have things work as
> expected.
> it would be nice to not always require nds resolution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-685) DataNode appears to require DNS name resolution as opposed to direct ip mapping

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12470685 ] 

Raghu Angadi commented on HADOOP-685:
-------------------------------------


I am a little confused about the underlying issue. Which config is set to "127.0.0.1"? Is it "fs.default.name"? Could you describe the config (or attach hadoop-site.xml) and describe what you would like that to imply?

> DataNode appears to require DNS name resolution as opposed to direct ip mapping
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-685
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>         Environment: osx, ubuntu 6.10b
>            Reporter: James Todd
>         Assigned To: Raghu Angadi
>            Priority: Minor
>
> DataNode appears to require DNS resolution of nodes via the class org.apache.hadoop.net.DNS as opposed being able to use a specified ip.
> as an example, i was not able to set up more then one instance of dfs datanodes on one box using loopback w/ varying ports since DataNode
> resolved the ip of 127.0.0.1 to be "foo.bar" which was then mapped to the dhcp allocated ip of 192.168.0.***, which was not addressable by the
> rest of the dfs cluster (namely namenode).
> while this example is trivial one should be able to use the very same process yet change only the ip's of the nodes and have things work as
> expected.
> it would be nice to not always require nds resolution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-685) DataNode appears to require DNS name resolution as opposed to direct ip mapping

Posted by "James P. White (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-685?page=comments#action_12447641 ] 
            
James P. White commented on HADOOP-685:
---------------------------------------

While it is probably a separate issue, what I've wanted to do is implement Zeroconf so we can use names but not have to mess around with the DNS server configuration.

There is a mDNS implementation in Java which I think will work dandy:

http://jmdns.sourceforge.net/


> DataNode appears to require DNS name resolution as opposed to direct ip mapping
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-685
>                 URL: http://issues.apache.org/jira/browse/HADOOP-685
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>         Environment: osx, ubuntu 6.10b
>            Reporter: James Todd
>            Priority: Minor
>
> DataNode appears to require DNS resolution of nodes via the class org.apache.hadoop.net.DNS as opposed being able to use a specified ip.
> as an example, i was not able to set up more then one instance of dfs datanodes on one box using loopback w/ varying ports since DataNode
> resolved the ip of 127.0.0.1 to be "foo.bar" which was then mapped to the dhcp allocated ip of 192.168.0.***, which was not addressable by the
> rest of the dfs cluster (namely namenode).
> while this example is trivial one should be able to use the very same process yet change only the ip's of the nodes and have things work as
> expected.
> it would be nice to not always require nds resolution.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-685) DataNode appears to require DNS name resolution as opposed to direct ip mapping

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12470708 ] 

Raghu Angadi commented on HADOOP-685:
-------------------------------------


Have you tried setting "dfs.datanode.dns.interface" (set to something like eth0)? Would that work in the above case?



> DataNode appears to require DNS name resolution as opposed to direct ip mapping
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-685
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>         Environment: osx, ubuntu 6.10b
>            Reporter: James Todd
>         Assigned To: Raghu Angadi
>            Priority: Minor
>
> DataNode appears to require DNS resolution of nodes via the class org.apache.hadoop.net.DNS as opposed being able to use a specified ip.
> as an example, i was not able to set up more then one instance of dfs datanodes on one box using loopback w/ varying ports since DataNode
> resolved the ip of 127.0.0.1 to be "foo.bar" which was then mapped to the dhcp allocated ip of 192.168.0.***, which was not addressable by the
> rest of the dfs cluster (namely namenode).
> while this example is trivial one should be able to use the very same process yet change only the ip's of the nodes and have things work as
> expected.
> it would be nice to not always require nds resolution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.