You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@whirr.apache.org by "Tibor Kiss (JIRA)" <ji...@apache.org> on 2010/11/05 01:26:41 UTC

[jira] Created: (WHIRR-128) In ec2 instances instead of public dns names a public ip address is resolved for the started master node which causes workers to not be able to connect back to the master

In ec2 instances instead of public dns names a public ip address is resolved for the started master node which causes workers to not be able to connect back to the master
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------

                 Key: WHIRR-128
                 URL: https://issues.apache.org/jira/browse/WHIRR-128
             Project: Whirr
          Issue Type: Bug
    Affects Versions: 0.3.0
         Environment: Running hadoop (apache or CDH distro) in ec2 instances (Ubuntu or CentOS or Fedora).
The same issue with the integration test of whirr.
            Reporter: Tibor Kiss


The problem it is related to the nature how it is resolved the reverse address in ec2 instances.
After isolating the problem I could write a very simple app which reproduces the cause of the issue.
Pass in args the public ip address of the ec2 instance where are you running the following small code.
    InetAddress namenodePublicAddress = InetAddress.getByName(args[0]);
    System.out.println("getHostAddress: " + namenodePublicAddress.getHostAddress());
    System.out.println("getHostName: " + namenodePublicAddress.getHostName());
    System.out.println("getCanonicalHostName: " + namenodePublicAddress.getCanonicalHostName());

If I am running it on my laptop I get
getHostAddress: 50.16.71.64
getHostName: ec2-50-16-71-64.compute-1.amazonaws.com
getCanonicalHostName: ec2-50-16-71-64.compute-1.amazonaws.com

if I am running it on ec2 instance
getHostAddress: 50.16.71.64
getHostName: 50.16.71.64
getCanonicalHostName: 50.16.71.64 

My laptop has the same CentOS 5.5 as my ec2 instance and the /etc/resolv.conf in each cases contains a nameserver entry.
For some unknown reason, the java.net.InetAddress's getHostName() or getCanonicalHostName() does not resolves reverse dns names for ec2 public addresses if it was running in ec2 instance.
But any other resolver tools correctly resolves that reverse dns name.

In whirr codebase there are some getHostName() calls, which because of the previously described symptom, causes that /etc/hadoop/conf/hadoop-site.xml on the worker nodes are incorrectly filled with ip addresses instead of dns names. As we know, it is important to use public dns name of the ec2 instance because amazon's nameserver it can resolve to an external or internal ip address, one that is better for direct communication. In case of hadoop cluster, the used security group does not allow intercommunication between nodes by using public ip address and therefore the worker nodes cannot contact the services on the master node. Looking into the hadoop logs it is clearly visible the problem that workers cannot connect to master.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (WHIRR-128) In ec2 instances instead of public dns names a public ip address is resolved for the started master node which causes workers to not be able to connect back to the master

Posted by "Tibor Kiss (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/WHIRR-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tibor Kiss updated WHIRR-128:
-----------------------------

    Attachment: on-ec2-after-patch.tar.gz
                on-ec2-before-patch.tar.gz
                compare-myhost-with-ec2.txt

I created a patch which changes the getHostName() calls to a function implemented using dnsjava 2.0.8 which can resolve the reverse dns names for public addresses of ec2 instance. Moreover, using dnsjava api, it has some more advantages (read about it on the net).

I attached a 'compare-myhost-with-ec2.txt' where I copied the output of the console while running whirr integration test on my laptop, on ec2 before patch and after the patch. You may notice where are ip addresses instead of public dns names.

I also attached the hadoop-site.xml downloaded from worker host while running the whirr integration test. You will see both cases, before and after the patch. 

In the "on-ec2-before-patch.tar.gz" I also added the logfiles from the worker, to show you the problem which prohibit using hadoop if it was launched from an ec2 instance node.

> In ec2 instances instead of public dns names a public ip address is resolved for the started master node which causes workers to not be able to connect back to the master
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: WHIRR-128
>                 URL: https://issues.apache.org/jira/browse/WHIRR-128
>             Project: Whirr
>          Issue Type: Bug
>    Affects Versions: 0.3.0
>         Environment: Running hadoop (apache or CDH distro) in ec2 instances (Ubuntu or CentOS or Fedora).
> The same issue with the integration test of whirr.
>            Reporter: Tibor Kiss
>         Attachments: compare-myhost-with-ec2.txt, on-ec2-after-patch.tar.gz, on-ec2-before-patch.tar.gz
>
>
> The problem it is related to the nature how it is resolved the reverse address in ec2 instances.
> After isolating the problem I could write a very simple app which reproduces the cause of the issue.
> Pass in args the public ip address of the ec2 instance where are you running the following small code.
>     InetAddress namenodePublicAddress = InetAddress.getByName(args[0]);
>     System.out.println("getHostAddress: " + namenodePublicAddress.getHostAddress());
>     System.out.println("getHostName: " + namenodePublicAddress.getHostName());
>     System.out.println("getCanonicalHostName: " + namenodePublicAddress.getCanonicalHostName());
> If I am running it on my laptop I get
> getHostAddress: 50.16.71.64
> getHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> getCanonicalHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> if I am running it on ec2 instance
> getHostAddress: 50.16.71.64
> getHostName: 50.16.71.64
> getCanonicalHostName: 50.16.71.64 
> My laptop has the same CentOS 5.5 as my ec2 instance and the /etc/resolv.conf in each cases contains a nameserver entry.
> For some unknown reason, the java.net.InetAddress's getHostName() or getCanonicalHostName() does not resolves reverse dns names for ec2 public addresses if it was running in ec2 instance.
> But any other resolver tools correctly resolves that reverse dns name.
> In whirr codebase there are some getHostName() calls, which because of the previously described symptom, causes that /etc/hadoop/conf/hadoop-site.xml on the worker nodes are incorrectly filled with ip addresses instead of dns names. As we know, it is important to use public dns name of the ec2 instance because amazon's nameserver it can resolve to an external or internal ip address, one that is better for direct communication. In case of hadoop cluster, the used security group does not allow intercommunication between nodes by using public ip address and therefore the worker nodes cannot contact the services on the master node. Looking into the hadoop logs it is clearly visible the problem that workers cannot connect to master.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (WHIRR-128) In ec2 instances instead of public dns names a public ip address is resolved for the started master node which causes workers to not be able to connect back to the master

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/WHIRR-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated WHIRR-128:
-------------------------------

    Component/s: core

> In ec2 instances instead of public dns names a public ip address is resolved for the started master node which causes workers to not be able to connect back to the master
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: WHIRR-128
>                 URL: https://issues.apache.org/jira/browse/WHIRR-128
>             Project: Whirr
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.3.0
>         Environment: Running hadoop (apache or CDH distro) in ec2 instances (Ubuntu or CentOS or Fedora).
> The same issue with the integration test of whirr.
>            Reporter: Tibor Kiss
>         Attachments: compare-myhost-with-ec2.txt, on-ec2-after-patch.tar.gz, on-ec2-before-patch.tar.gz, whirr-trunk.patch
>
>
> The problem it is related to the nature how it is resolved the reverse address in ec2 instances.
> After isolating the problem I could write a very simple app which reproduces the cause of the issue.
> Pass in args the public ip address of the ec2 instance where are you running the following small code.
>     InetAddress namenodePublicAddress = InetAddress.getByName(args[0]);
>     System.out.println("getHostAddress: " + namenodePublicAddress.getHostAddress());
>     System.out.println("getHostName: " + namenodePublicAddress.getHostName());
>     System.out.println("getCanonicalHostName: " + namenodePublicAddress.getCanonicalHostName());
> If I am running it on my laptop I get
> getHostAddress: 50.16.71.64
> getHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> getCanonicalHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> if I am running it on ec2 instance
> getHostAddress: 50.16.71.64
> getHostName: 50.16.71.64
> getCanonicalHostName: 50.16.71.64 
> My laptop has the same CentOS 5.5 as my ec2 instance and the /etc/resolv.conf in each cases contains a nameserver entry.
> For some unknown reason, the java.net.InetAddress's getHostName() or getCanonicalHostName() does not resolves reverse dns names for ec2 public addresses if it was running in ec2 instance.
> But any other resolver tools correctly resolves that reverse dns name.
> In whirr codebase there are some getHostName() calls, which because of the previously described symptom, causes that /etc/hadoop/conf/hadoop-site.xml on the worker nodes are incorrectly filled with ip addresses instead of dns names. As we know, it is important to use public dns name of the ec2 instance because amazon's nameserver it can resolve to an external or internal ip address, one that is better for direct communication. In case of hadoop cluster, the used security group does not allow intercommunication between nodes by using public ip address and therefore the worker nodes cannot contact the services on the master node. Looking into the hadoop logs it is clearly visible the problem that workers cannot connect to master.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (WHIRR-128) In ec2 instances instead of public dns names a public ip address is resolved for the started master node which causes workers to not be able to connect back to the master

Posted by "Tom White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/WHIRR-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929651#action_12929651 ] 

Tom White commented on WHIRR-128:
---------------------------------

Thanks for the new patch, Tibor. I ran this successfully on Rackspace. Before this goes in a few more things, all pretty minor:

* DnsUtil needs the standard ASF header. (You can use mvn apache-rat:check to find files missing license headers.)
* Should DnsUtilTest go in org.apache.whirr.service.hadoop since it is not really an integration test?
* Indentation should be 2 spaces - some files seem to be inconsistent on this.

> In ec2 instances instead of public dns names a public ip address is resolved for the started master node which causes workers to not be able to connect back to the master
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: WHIRR-128
>                 URL: https://issues.apache.org/jira/browse/WHIRR-128
>             Project: Whirr
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.3.0
>         Environment: Running hadoop (apache or CDH distro) in ec2 instances (Ubuntu or CentOS or Fedora).
> The same issue with the integration test of whirr.
>            Reporter: Tibor Kiss
>            Assignee: Tibor Kiss
>             Fix For: 0.3.0
>
>         Attachments: compare-myhost-with-ec2.txt, on-ec2-after-patch.tar.gz, on-ec2-before-patch.tar.gz, whirr-trunk.patch
>
>
> The problem it is related to the nature how it is resolved the reverse address in ec2 instances.
> After isolating the problem I could write a very simple app which reproduces the cause of the issue.
> Pass in args the public ip address of the ec2 instance where are you running the following small code.
>     InetAddress namenodePublicAddress = InetAddress.getByName(args[0]);
>     System.out.println("getHostAddress: " + namenodePublicAddress.getHostAddress());
>     System.out.println("getHostName: " + namenodePublicAddress.getHostName());
>     System.out.println("getCanonicalHostName: " + namenodePublicAddress.getCanonicalHostName());
> If I am running it on my laptop I get
> getHostAddress: 50.16.71.64
> getHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> getCanonicalHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> if I am running it on ec2 instance
> getHostAddress: 50.16.71.64
> getHostName: 50.16.71.64
> getCanonicalHostName: 50.16.71.64 
> My laptop has the same CentOS 5.5 as my ec2 instance and the /etc/resolv.conf in each cases contains a nameserver entry.
> For some unknown reason, the java.net.InetAddress's getHostName() or getCanonicalHostName() does not resolves reverse dns names for ec2 public addresses if it was running in ec2 instance.
> But any other resolver tools correctly resolves that reverse dns name.
> In whirr codebase there are some getHostName() calls, which because of the previously described symptom, causes that /etc/hadoop/conf/hadoop-site.xml on the worker nodes are incorrectly filled with ip addresses instead of dns names. As we know, it is important to use public dns name of the ec2 instance because amazon's nameserver it can resolve to an external or internal ip address, one that is better for direct communication. In case of hadoop cluster, the used security group does not allow intercommunication between nodes by using public ip address and therefore the worker nodes cannot contact the services on the master node. Looking into the hadoop logs it is clearly visible the problem that workers cannot connect to master.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (WHIRR-128) In ec2 instances instead of public dns names a public ip address is resolved for the started master node which causes workers to not be able to connect back to the master

Posted by "Tibor Kiss (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/WHIRR-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tibor Kiss updated WHIRR-128:
-----------------------------

    Attachment:     (was: whirr-trunk.patch)

> In ec2 instances instead of public dns names a public ip address is resolved for the started master node which causes workers to not be able to connect back to the master
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: WHIRR-128
>                 URL: https://issues.apache.org/jira/browse/WHIRR-128
>             Project: Whirr
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.3.0
>         Environment: Running hadoop (apache or CDH distro) in ec2 instances (Ubuntu or CentOS or Fedora).
> The same issue with the integration test of whirr.
>            Reporter: Tibor Kiss
>            Assignee: Tibor Kiss
>             Fix For: 0.3.0
>
>         Attachments: compare-myhost-with-ec2.txt, on-ec2-after-patch.tar.gz, on-ec2-before-patch.tar.gz, whirr-trunk.patch
>
>
> The problem it is related to the nature how it is resolved the reverse address in ec2 instances.
> After isolating the problem I could write a very simple app which reproduces the cause of the issue.
> Pass in args the public ip address of the ec2 instance where are you running the following small code.
>     InetAddress namenodePublicAddress = InetAddress.getByName(args[0]);
>     System.out.println("getHostAddress: " + namenodePublicAddress.getHostAddress());
>     System.out.println("getHostName: " + namenodePublicAddress.getHostName());
>     System.out.println("getCanonicalHostName: " + namenodePublicAddress.getCanonicalHostName());
> If I am running it on my laptop I get
> getHostAddress: 50.16.71.64
> getHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> getCanonicalHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> if I am running it on ec2 instance
> getHostAddress: 50.16.71.64
> getHostName: 50.16.71.64
> getCanonicalHostName: 50.16.71.64 
> My laptop has the same CentOS 5.5 as my ec2 instance and the /etc/resolv.conf in each cases contains a nameserver entry.
> For some unknown reason, the java.net.InetAddress's getHostName() or getCanonicalHostName() does not resolves reverse dns names for ec2 public addresses if it was running in ec2 instance.
> But any other resolver tools correctly resolves that reverse dns name.
> In whirr codebase there are some getHostName() calls, which because of the previously described symptom, causes that /etc/hadoop/conf/hadoop-site.xml on the worker nodes are incorrectly filled with ip addresses instead of dns names. As we know, it is important to use public dns name of the ec2 instance because amazon's nameserver it can resolve to an external or internal ip address, one that is better for direct communication. In case of hadoop cluster, the used security group does not allow intercommunication between nodes by using public ip address and therefore the worker nodes cannot contact the services on the master node. Looking into the hadoop logs it is clearly visible the problem that workers cannot connect to master.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (WHIRR-128) In ec2 instances instead of public dns names a public ip address is resolved for the started master node which causes workers to not be able to connect back to the master

Posted by "Tibor Kiss (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/WHIRR-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tibor Kiss updated WHIRR-128:
-----------------------------

    Attachment: whirr-trunk.patch

> In ec2 instances instead of public dns names a public ip address is resolved for the started master node which causes workers to not be able to connect back to the master
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: WHIRR-128
>                 URL: https://issues.apache.org/jira/browse/WHIRR-128
>             Project: Whirr
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.3.0
>         Environment: Running hadoop (apache or CDH distro) in ec2 instances (Ubuntu or CentOS or Fedora).
> The same issue with the integration test of whirr.
>            Reporter: Tibor Kiss
>            Assignee: Tibor Kiss
>             Fix For: 0.3.0
>
>         Attachments: compare-myhost-with-ec2.txt, on-ec2-after-patch.tar.gz, on-ec2-before-patch.tar.gz, whirr-trunk.patch, whirr-trunk.patch
>
>
> The problem it is related to the nature how it is resolved the reverse address in ec2 instances.
> After isolating the problem I could write a very simple app which reproduces the cause of the issue.
> Pass in args the public ip address of the ec2 instance where are you running the following small code.
>     InetAddress namenodePublicAddress = InetAddress.getByName(args[0]);
>     System.out.println("getHostAddress: " + namenodePublicAddress.getHostAddress());
>     System.out.println("getHostName: " + namenodePublicAddress.getHostName());
>     System.out.println("getCanonicalHostName: " + namenodePublicAddress.getCanonicalHostName());
> If I am running it on my laptop I get
> getHostAddress: 50.16.71.64
> getHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> getCanonicalHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> if I am running it on ec2 instance
> getHostAddress: 50.16.71.64
> getHostName: 50.16.71.64
> getCanonicalHostName: 50.16.71.64 
> My laptop has the same CentOS 5.5 as my ec2 instance and the /etc/resolv.conf in each cases contains a nameserver entry.
> For some unknown reason, the java.net.InetAddress's getHostName() or getCanonicalHostName() does not resolves reverse dns names for ec2 public addresses if it was running in ec2 instance.
> But any other resolver tools correctly resolves that reverse dns name.
> In whirr codebase there are some getHostName() calls, which because of the previously described symptom, causes that /etc/hadoop/conf/hadoop-site.xml on the worker nodes are incorrectly filled with ip addresses instead of dns names. As we know, it is important to use public dns name of the ec2 instance because amazon's nameserver it can resolve to an external or internal ip address, one that is better for direct communication. In case of hadoop cluster, the used security group does not allow intercommunication between nodes by using public ip address and therefore the worker nodes cannot contact the services on the master node. Looking into the hadoop logs it is clearly visible the problem that workers cannot connect to master.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (WHIRR-128) In ec2 instances instead of public dns names a public ip address is resolved for the started master node which causes workers to not be able to connect back to the master

Posted by "Tibor Kiss (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/WHIRR-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tibor Kiss updated WHIRR-128:
-----------------------------

    Attachment: whirr-trunk.patch

> In ec2 instances instead of public dns names a public ip address is resolved for the started master node which causes workers to not be able to connect back to the master
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: WHIRR-128
>                 URL: https://issues.apache.org/jira/browse/WHIRR-128
>             Project: Whirr
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.3.0
>         Environment: Running hadoop (apache or CDH distro) in ec2 instances (Ubuntu or CentOS or Fedora).
> The same issue with the integration test of whirr.
>            Reporter: Tibor Kiss
>            Assignee: Tibor Kiss
>             Fix For: 0.3.0
>
>         Attachments: compare-myhost-with-ec2.txt, on-ec2-after-patch.tar.gz, on-ec2-before-patch.tar.gz, whirr-trunk.patch
>
>
> The problem it is related to the nature how it is resolved the reverse address in ec2 instances.
> After isolating the problem I could write a very simple app which reproduces the cause of the issue.
> Pass in args the public ip address of the ec2 instance where are you running the following small code.
>     InetAddress namenodePublicAddress = InetAddress.getByName(args[0]);
>     System.out.println("getHostAddress: " + namenodePublicAddress.getHostAddress());
>     System.out.println("getHostName: " + namenodePublicAddress.getHostName());
>     System.out.println("getCanonicalHostName: " + namenodePublicAddress.getCanonicalHostName());
> If I am running it on my laptop I get
> getHostAddress: 50.16.71.64
> getHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> getCanonicalHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> if I am running it on ec2 instance
> getHostAddress: 50.16.71.64
> getHostName: 50.16.71.64
> getCanonicalHostName: 50.16.71.64 
> My laptop has the same CentOS 5.5 as my ec2 instance and the /etc/resolv.conf in each cases contains a nameserver entry.
> For some unknown reason, the java.net.InetAddress's getHostName() or getCanonicalHostName() does not resolves reverse dns names for ec2 public addresses if it was running in ec2 instance.
> But any other resolver tools correctly resolves that reverse dns name.
> In whirr codebase there are some getHostName() calls, which because of the previously described symptom, causes that /etc/hadoop/conf/hadoop-site.xml on the worker nodes are incorrectly filled with ip addresses instead of dns names. As we know, it is important to use public dns name of the ec2 instance because amazon's nameserver it can resolve to an external or internal ip address, one that is better for direct communication. In case of hadoop cluster, the used security group does not allow intercommunication between nodes by using public ip address and therefore the worker nodes cannot contact the services on the master node. Looking into the hadoop logs it is clearly visible the problem that workers cannot connect to master.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (WHIRR-128) In ec2 instances instead of public dns names a public ip address is resolved for the started master node which causes workers to not be able to connect back to the master

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/WHIRR-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated WHIRR-128:
-------------------------------

    Fix Version/s: 0.3.0
         Assignee: Tibor Kiss

> In ec2 instances instead of public dns names a public ip address is resolved for the started master node which causes workers to not be able to connect back to the master
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: WHIRR-128
>                 URL: https://issues.apache.org/jira/browse/WHIRR-128
>             Project: Whirr
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.3.0
>         Environment: Running hadoop (apache or CDH distro) in ec2 instances (Ubuntu or CentOS or Fedora).
> The same issue with the integration test of whirr.
>            Reporter: Tibor Kiss
>            Assignee: Tibor Kiss
>             Fix For: 0.3.0
>
>         Attachments: compare-myhost-with-ec2.txt, on-ec2-after-patch.tar.gz, on-ec2-before-patch.tar.gz, whirr-trunk.patch
>
>
> The problem it is related to the nature how it is resolved the reverse address in ec2 instances.
> After isolating the problem I could write a very simple app which reproduces the cause of the issue.
> Pass in args the public ip address of the ec2 instance where are you running the following small code.
>     InetAddress namenodePublicAddress = InetAddress.getByName(args[0]);
>     System.out.println("getHostAddress: " + namenodePublicAddress.getHostAddress());
>     System.out.println("getHostName: " + namenodePublicAddress.getHostName());
>     System.out.println("getCanonicalHostName: " + namenodePublicAddress.getCanonicalHostName());
> If I am running it on my laptop I get
> getHostAddress: 50.16.71.64
> getHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> getCanonicalHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> if I am running it on ec2 instance
> getHostAddress: 50.16.71.64
> getHostName: 50.16.71.64
> getCanonicalHostName: 50.16.71.64 
> My laptop has the same CentOS 5.5 as my ec2 instance and the /etc/resolv.conf in each cases contains a nameserver entry.
> For some unknown reason, the java.net.InetAddress's getHostName() or getCanonicalHostName() does not resolves reverse dns names for ec2 public addresses if it was running in ec2 instance.
> But any other resolver tools correctly resolves that reverse dns name.
> In whirr codebase there are some getHostName() calls, which because of the previously described symptom, causes that /etc/hadoop/conf/hadoop-site.xml on the worker nodes are incorrectly filled with ip addresses instead of dns names. As we know, it is important to use public dns name of the ec2 instance because amazon's nameserver it can resolve to an external or internal ip address, one that is better for direct communication. In case of hadoop cluster, the used security group does not allow intercommunication between nodes by using public ip address and therefore the worker nodes cannot contact the services on the master node. Looking into the hadoop logs it is clearly visible the problem that workers cannot connect to master.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (WHIRR-128) In ec2 instances instead of public dns names a public ip address is resolved for the started master node which causes workers to not be able to connect back to the master

Posted by "Tibor Kiss (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/WHIRR-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tibor Kiss updated WHIRR-128:
-----------------------------

    Attachment: whirr-trunk.patch

I also attached the patch.

> In ec2 instances instead of public dns names a public ip address is resolved for the started master node which causes workers to not be able to connect back to the master
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: WHIRR-128
>                 URL: https://issues.apache.org/jira/browse/WHIRR-128
>             Project: Whirr
>          Issue Type: Bug
>    Affects Versions: 0.3.0
>         Environment: Running hadoop (apache or CDH distro) in ec2 instances (Ubuntu or CentOS or Fedora).
> The same issue with the integration test of whirr.
>            Reporter: Tibor Kiss
>         Attachments: compare-myhost-with-ec2.txt, on-ec2-after-patch.tar.gz, on-ec2-before-patch.tar.gz, whirr-trunk.patch
>
>
> The problem it is related to the nature how it is resolved the reverse address in ec2 instances.
> After isolating the problem I could write a very simple app which reproduces the cause of the issue.
> Pass in args the public ip address of the ec2 instance where are you running the following small code.
>     InetAddress namenodePublicAddress = InetAddress.getByName(args[0]);
>     System.out.println("getHostAddress: " + namenodePublicAddress.getHostAddress());
>     System.out.println("getHostName: " + namenodePublicAddress.getHostName());
>     System.out.println("getCanonicalHostName: " + namenodePublicAddress.getCanonicalHostName());
> If I am running it on my laptop I get
> getHostAddress: 50.16.71.64
> getHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> getCanonicalHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> if I am running it on ec2 instance
> getHostAddress: 50.16.71.64
> getHostName: 50.16.71.64
> getCanonicalHostName: 50.16.71.64 
> My laptop has the same CentOS 5.5 as my ec2 instance and the /etc/resolv.conf in each cases contains a nameserver entry.
> For some unknown reason, the java.net.InetAddress's getHostName() or getCanonicalHostName() does not resolves reverse dns names for ec2 public addresses if it was running in ec2 instance.
> But any other resolver tools correctly resolves that reverse dns name.
> In whirr codebase there are some getHostName() calls, which because of the previously described symptom, causes that /etc/hadoop/conf/hadoop-site.xml on the worker nodes are incorrectly filled with ip addresses instead of dns names. As we know, it is important to use public dns name of the ec2 instance because amazon's nameserver it can resolve to an external or internal ip address, one that is better for direct communication. In case of hadoop cluster, the used security group does not allow intercommunication between nodes by using public ip address and therefore the worker nodes cannot contact the services on the master node. Looking into the hadoop logs it is clearly visible the problem that workers cannot connect to master.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (WHIRR-128) In ec2 instances instead of public dns names a public ip address is resolved for the started master node which causes workers to not be able to connect back to the master

Posted by "Tibor Kiss (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/WHIRR-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tibor Kiss updated WHIRR-128:
-----------------------------

    Attachment:     (was: whirr-trunk.patch)

> In ec2 instances instead of public dns names a public ip address is resolved for the started master node which causes workers to not be able to connect back to the master
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: WHIRR-128
>                 URL: https://issues.apache.org/jira/browse/WHIRR-128
>             Project: Whirr
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.3.0
>         Environment: Running hadoop (apache or CDH distro) in ec2 instances (Ubuntu or CentOS or Fedora).
> The same issue with the integration test of whirr.
>            Reporter: Tibor Kiss
>            Assignee: Tibor Kiss
>             Fix For: 0.3.0
>
>         Attachments: compare-myhost-with-ec2.txt, on-ec2-after-patch.tar.gz, on-ec2-before-patch.tar.gz, whirr-trunk.patch
>
>
> The problem it is related to the nature how it is resolved the reverse address in ec2 instances.
> After isolating the problem I could write a very simple app which reproduces the cause of the issue.
> Pass in args the public ip address of the ec2 instance where are you running the following small code.
>     InetAddress namenodePublicAddress = InetAddress.getByName(args[0]);
>     System.out.println("getHostAddress: " + namenodePublicAddress.getHostAddress());
>     System.out.println("getHostName: " + namenodePublicAddress.getHostName());
>     System.out.println("getCanonicalHostName: " + namenodePublicAddress.getCanonicalHostName());
> If I am running it on my laptop I get
> getHostAddress: 50.16.71.64
> getHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> getCanonicalHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> if I am running it on ec2 instance
> getHostAddress: 50.16.71.64
> getHostName: 50.16.71.64
> getCanonicalHostName: 50.16.71.64 
> My laptop has the same CentOS 5.5 as my ec2 instance and the /etc/resolv.conf in each cases contains a nameserver entry.
> For some unknown reason, the java.net.InetAddress's getHostName() or getCanonicalHostName() does not resolves reverse dns names for ec2 public addresses if it was running in ec2 instance.
> But any other resolver tools correctly resolves that reverse dns name.
> In whirr codebase there are some getHostName() calls, which because of the previously described symptom, causes that /etc/hadoop/conf/hadoop-site.xml on the worker nodes are incorrectly filled with ip addresses instead of dns names. As we know, it is important to use public dns name of the ec2 instance because amazon's nameserver it can resolve to an external or internal ip address, one that is better for direct communication. In case of hadoop cluster, the used security group does not allow intercommunication between nodes by using public ip address and therefore the worker nodes cannot contact the services on the master node. Looking into the hadoop logs it is clearly visible the problem that workers cannot connect to master.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (WHIRR-128) In ec2 instances instead of public dns names a public ip address is resolved for the started master node which causes workers to not be able to connect back to the master

Posted by "Tom White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/WHIRR-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928657#action_12928657 ] 

Tom White commented on WHIRR-128:
---------------------------------

Thanks for submitting this Tibor. Overall it looks good. A few comments:

* Could you put the resolving logic (resolveAddress) outside the Service class? It's really an implementation detail and Service is a public interface for users.
* Is there a test we could write for resolveAddress?
* We should test that this works with Rackspace too. (I've got some credentials and can do that.)
* There are some tabs in the patch.


> In ec2 instances instead of public dns names a public ip address is resolved for the started master node which causes workers to not be able to connect back to the master
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: WHIRR-128
>                 URL: https://issues.apache.org/jira/browse/WHIRR-128
>             Project: Whirr
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.3.0
>         Environment: Running hadoop (apache or CDH distro) in ec2 instances (Ubuntu or CentOS or Fedora).
> The same issue with the integration test of whirr.
>            Reporter: Tibor Kiss
>            Assignee: Tibor Kiss
>             Fix For: 0.3.0
>
>         Attachments: compare-myhost-with-ec2.txt, on-ec2-after-patch.tar.gz, on-ec2-before-patch.tar.gz, whirr-trunk.patch
>
>
> The problem it is related to the nature how it is resolved the reverse address in ec2 instances.
> After isolating the problem I could write a very simple app which reproduces the cause of the issue.
> Pass in args the public ip address of the ec2 instance where are you running the following small code.
>     InetAddress namenodePublicAddress = InetAddress.getByName(args[0]);
>     System.out.println("getHostAddress: " + namenodePublicAddress.getHostAddress());
>     System.out.println("getHostName: " + namenodePublicAddress.getHostName());
>     System.out.println("getCanonicalHostName: " + namenodePublicAddress.getCanonicalHostName());
> If I am running it on my laptop I get
> getHostAddress: 50.16.71.64
> getHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> getCanonicalHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> if I am running it on ec2 instance
> getHostAddress: 50.16.71.64
> getHostName: 50.16.71.64
> getCanonicalHostName: 50.16.71.64 
> My laptop has the same CentOS 5.5 as my ec2 instance and the /etc/resolv.conf in each cases contains a nameserver entry.
> For some unknown reason, the java.net.InetAddress's getHostName() or getCanonicalHostName() does not resolves reverse dns names for ec2 public addresses if it was running in ec2 instance.
> But any other resolver tools correctly resolves that reverse dns name.
> In whirr codebase there are some getHostName() calls, which because of the previously described symptom, causes that /etc/hadoop/conf/hadoop-site.xml on the worker nodes are incorrectly filled with ip addresses instead of dns names. As we know, it is important to use public dns name of the ec2 instance because amazon's nameserver it can resolve to an external or internal ip address, one that is better for direct communication. In case of hadoop cluster, the used security group does not allow intercommunication between nodes by using public ip address and therefore the worker nodes cannot contact the services on the master node. Looking into the hadoop logs it is clearly visible the problem that workers cannot connect to master.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (WHIRR-128) In ec2 instances instead of public dns names a public ip address is resolved for the started master node which causes workers to not be able to connect back to the master

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/WHIRR-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated WHIRR-128:
----------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I've just committed this. Thanks Tibor!

(BTW you don't need to delete old attachments. In fact, it's good to leave them so that folks can see the progression of development.)

> In ec2 instances instead of public dns names a public ip address is resolved for the started master node which causes workers to not be able to connect back to the master
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: WHIRR-128
>                 URL: https://issues.apache.org/jira/browse/WHIRR-128
>             Project: Whirr
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.3.0
>         Environment: Running hadoop (apache or CDH distro) in ec2 instances (Ubuntu or CentOS or Fedora).
> The same issue with the integration test of whirr.
>            Reporter: Tibor Kiss
>            Assignee: Tibor Kiss
>             Fix For: 0.3.0
>
>         Attachments: compare-myhost-with-ec2.txt, on-ec2-after-patch.tar.gz, on-ec2-before-patch.tar.gz, whirr-trunk.patch
>
>
> The problem it is related to the nature how it is resolved the reverse address in ec2 instances.
> After isolating the problem I could write a very simple app which reproduces the cause of the issue.
> Pass in args the public ip address of the ec2 instance where are you running the following small code.
>     InetAddress namenodePublicAddress = InetAddress.getByName(args[0]);
>     System.out.println("getHostAddress: " + namenodePublicAddress.getHostAddress());
>     System.out.println("getHostName: " + namenodePublicAddress.getHostName());
>     System.out.println("getCanonicalHostName: " + namenodePublicAddress.getCanonicalHostName());
> If I am running it on my laptop I get
> getHostAddress: 50.16.71.64
> getHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> getCanonicalHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> if I am running it on ec2 instance
> getHostAddress: 50.16.71.64
> getHostName: 50.16.71.64
> getCanonicalHostName: 50.16.71.64 
> My laptop has the same CentOS 5.5 as my ec2 instance and the /etc/resolv.conf in each cases contains a nameserver entry.
> For some unknown reason, the java.net.InetAddress's getHostName() or getCanonicalHostName() does not resolves reverse dns names for ec2 public addresses if it was running in ec2 instance.
> But any other resolver tools correctly resolves that reverse dns name.
> In whirr codebase there are some getHostName() calls, which because of the previously described symptom, causes that /etc/hadoop/conf/hadoop-site.xml on the worker nodes are incorrectly filled with ip addresses instead of dns names. As we know, it is important to use public dns name of the ec2 instance because amazon's nameserver it can resolve to an external or internal ip address, one that is better for direct communication. In case of hadoop cluster, the used security group does not allow intercommunication between nodes by using public ip address and therefore the worker nodes cannot contact the services on the master node. Looking into the hadoop logs it is clearly visible the problem that workers cannot connect to master.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (WHIRR-128) In ec2 instances instead of public dns names a public ip address is resolved for the started master node which causes workers to not be able to connect back to the master

Posted by "Tibor Kiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/WHIRR-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929616#action_12929616 ] 

Tibor Kiss commented on WHIRR-128:
----------------------------------

I made a new patch in which:
 - I moved out the resolveAddress to a DnsUtil class, then I placed this inside to services/hadoop module only, because currently only this service module is affected by the problem.
 - I wrote an integration test for resolveAddress which is running in hosts with multi-interfaces configuration too. Basically I apply a check over all the interfaces and if there are some with reverse address, I do a cross-check to proof that the obtained reverse is still valid in forward direction. This test works on ec2 too, where the same test logic would fail with java.net.InetAddress's getHostName(ip) <-> getByName(reverse).
 - I also removed the tabs.

Remained the Rackspace test. Tom, could you please do a check on Rackspace?

> In ec2 instances instead of public dns names a public ip address is resolved for the started master node which causes workers to not be able to connect back to the master
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: WHIRR-128
>                 URL: https://issues.apache.org/jira/browse/WHIRR-128
>             Project: Whirr
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.3.0
>         Environment: Running hadoop (apache or CDH distro) in ec2 instances (Ubuntu or CentOS or Fedora).
> The same issue with the integration test of whirr.
>            Reporter: Tibor Kiss
>            Assignee: Tibor Kiss
>             Fix For: 0.3.0
>
>         Attachments: compare-myhost-with-ec2.txt, on-ec2-after-patch.tar.gz, on-ec2-before-patch.tar.gz, whirr-trunk.patch
>
>
> The problem it is related to the nature how it is resolved the reverse address in ec2 instances.
> After isolating the problem I could write a very simple app which reproduces the cause of the issue.
> Pass in args the public ip address of the ec2 instance where are you running the following small code.
>     InetAddress namenodePublicAddress = InetAddress.getByName(args[0]);
>     System.out.println("getHostAddress: " + namenodePublicAddress.getHostAddress());
>     System.out.println("getHostName: " + namenodePublicAddress.getHostName());
>     System.out.println("getCanonicalHostName: " + namenodePublicAddress.getCanonicalHostName());
> If I am running it on my laptop I get
> getHostAddress: 50.16.71.64
> getHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> getCanonicalHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> if I am running it on ec2 instance
> getHostAddress: 50.16.71.64
> getHostName: 50.16.71.64
> getCanonicalHostName: 50.16.71.64 
> My laptop has the same CentOS 5.5 as my ec2 instance and the /etc/resolv.conf in each cases contains a nameserver entry.
> For some unknown reason, the java.net.InetAddress's getHostName() or getCanonicalHostName() does not resolves reverse dns names for ec2 public addresses if it was running in ec2 instance.
> But any other resolver tools correctly resolves that reverse dns name.
> In whirr codebase there are some getHostName() calls, which because of the previously described symptom, causes that /etc/hadoop/conf/hadoop-site.xml on the worker nodes are incorrectly filled with ip addresses instead of dns names. As we know, it is important to use public dns name of the ec2 instance because amazon's nameserver it can resolve to an external or internal ip address, one that is better for direct communication. In case of hadoop cluster, the used security group does not allow intercommunication between nodes by using public ip address and therefore the worker nodes cannot contact the services on the master node. Looking into the hadoop logs it is clearly visible the problem that workers cannot connect to master.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (WHIRR-128) In ec2 instances instead of public dns names a public ip address is resolved for the started master node which causes workers to not be able to connect back to the master

Posted by "Tibor Kiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/WHIRR-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929714#action_12929714 ] 

Tibor Kiss commented on WHIRR-128:
----------------------------------

Again a new patch is available:
    * I added ASF header to DnsUtil. (mvn apache-rat:check still complains cli/whirr.log, I also run apache-rat:rat to see in detail all these problems)
    * DnsUtilTest has been moved in org.apache.whirr.service.hadoop, since you can use it in any environment?
    * I created a formatter settings in my Eclipe and the indentation problems now I was able to fix in much consistent way.

Glad to hear that the test is running on Rackspace too. Thanks!

> In ec2 instances instead of public dns names a public ip address is resolved for the started master node which causes workers to not be able to connect back to the master
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: WHIRR-128
>                 URL: https://issues.apache.org/jira/browse/WHIRR-128
>             Project: Whirr
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.3.0
>         Environment: Running hadoop (apache or CDH distro) in ec2 instances (Ubuntu or CentOS or Fedora).
> The same issue with the integration test of whirr.
>            Reporter: Tibor Kiss
>            Assignee: Tibor Kiss
>             Fix For: 0.3.0
>
>         Attachments: compare-myhost-with-ec2.txt, on-ec2-after-patch.tar.gz, on-ec2-before-patch.tar.gz, whirr-trunk.patch
>
>
> The problem it is related to the nature how it is resolved the reverse address in ec2 instances.
> After isolating the problem I could write a very simple app which reproduces the cause of the issue.
> Pass in args the public ip address of the ec2 instance where are you running the following small code.
>     InetAddress namenodePublicAddress = InetAddress.getByName(args[0]);
>     System.out.println("getHostAddress: " + namenodePublicAddress.getHostAddress());
>     System.out.println("getHostName: " + namenodePublicAddress.getHostName());
>     System.out.println("getCanonicalHostName: " + namenodePublicAddress.getCanonicalHostName());
> If I am running it on my laptop I get
> getHostAddress: 50.16.71.64
> getHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> getCanonicalHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> if I am running it on ec2 instance
> getHostAddress: 50.16.71.64
> getHostName: 50.16.71.64
> getCanonicalHostName: 50.16.71.64 
> My laptop has the same CentOS 5.5 as my ec2 instance and the /etc/resolv.conf in each cases contains a nameserver entry.
> For some unknown reason, the java.net.InetAddress's getHostName() or getCanonicalHostName() does not resolves reverse dns names for ec2 public addresses if it was running in ec2 instance.
> But any other resolver tools correctly resolves that reverse dns name.
> In whirr codebase there are some getHostName() calls, which because of the previously described symptom, causes that /etc/hadoop/conf/hadoop-site.xml on the worker nodes are incorrectly filled with ip addresses instead of dns names. As we know, it is important to use public dns name of the ec2 instance because amazon's nameserver it can resolve to an external or internal ip address, one that is better for direct communication. In case of hadoop cluster, the used security group does not allow intercommunication between nodes by using public ip address and therefore the worker nodes cannot contact the services on the master node. Looking into the hadoop logs it is clearly visible the problem that workers cannot connect to master.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (WHIRR-128) In ec2 instances instead of public dns names a public ip address is resolved for the started master node which causes workers to not be able to connect back to the master

Posted by "Tibor Kiss (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/WHIRR-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tibor Kiss updated WHIRR-128:
-----------------------------

    Status: Patch Available  (was: Open)

This is my patch proposal which works for me on apache distribution (also on CDH).
Eventually somebody may dig into the problem why the java.net.InetAddress cannot resolve reverse dns names on ec2 instances. I just localized the problem which is very simple to reproduce even without whirr or hadoop.
Regarding the solution I applied, maybe there are better ideas in the sense to use or not use the dnsjava api. You decide.

> In ec2 instances instead of public dns names a public ip address is resolved for the started master node which causes workers to not be able to connect back to the master
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: WHIRR-128
>                 URL: https://issues.apache.org/jira/browse/WHIRR-128
>             Project: Whirr
>          Issue Type: Bug
>    Affects Versions: 0.3.0
>         Environment: Running hadoop (apache or CDH distro) in ec2 instances (Ubuntu or CentOS or Fedora).
> The same issue with the integration test of whirr.
>            Reporter: Tibor Kiss
>         Attachments: compare-myhost-with-ec2.txt, on-ec2-after-patch.tar.gz, on-ec2-before-patch.tar.gz, whirr-trunk.patch
>
>
> The problem it is related to the nature how it is resolved the reverse address in ec2 instances.
> After isolating the problem I could write a very simple app which reproduces the cause of the issue.
> Pass in args the public ip address of the ec2 instance where are you running the following small code.
>     InetAddress namenodePublicAddress = InetAddress.getByName(args[0]);
>     System.out.println("getHostAddress: " + namenodePublicAddress.getHostAddress());
>     System.out.println("getHostName: " + namenodePublicAddress.getHostName());
>     System.out.println("getCanonicalHostName: " + namenodePublicAddress.getCanonicalHostName());
> If I am running it on my laptop I get
> getHostAddress: 50.16.71.64
> getHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> getCanonicalHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> if I am running it on ec2 instance
> getHostAddress: 50.16.71.64
> getHostName: 50.16.71.64
> getCanonicalHostName: 50.16.71.64 
> My laptop has the same CentOS 5.5 as my ec2 instance and the /etc/resolv.conf in each cases contains a nameserver entry.
> For some unknown reason, the java.net.InetAddress's getHostName() or getCanonicalHostName() does not resolves reverse dns names for ec2 public addresses if it was running in ec2 instance.
> But any other resolver tools correctly resolves that reverse dns name.
> In whirr codebase there are some getHostName() calls, which because of the previously described symptom, causes that /etc/hadoop/conf/hadoop-site.xml on the worker nodes are incorrectly filled with ip addresses instead of dns names. As we know, it is important to use public dns name of the ec2 instance because amazon's nameserver it can resolve to an external or internal ip address, one that is better for direct communication. In case of hadoop cluster, the used security group does not allow intercommunication between nodes by using public ip address and therefore the worker nodes cannot contact the services on the master node. Looking into the hadoop logs it is clearly visible the problem that workers cannot connect to master.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.