You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by igor Finkelshteyn <ie...@gmail.com> on 2012/08/23 21:34:44 UTC

Hadoop on EC2 Managing Internal/External IPs

Hi,
I'm currently setting up a Hadoop cluster on EC2, and everything works just fine when accessing the cluster from inside EC2, but as soon as I try to do something like upload a file from an external client, I get timeout errors like:

12/08/23 12:06:16 ERROR hdfs.DFSClient: Failed to close file /user/some_file._COPYING_
java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/10.123.x.x:50010]

What's clearly happening is my NameNode is resolving my DataNode's IPs to their internal EC2 values instead of their external values, and then sending along the internal IP to my external client, which is obviously unable to reach those. I'm thinking this must be a common problem. How do other people deal with it? Is there a way to just force my name node to send along my DataNode's hostname instead of IP, so that the hostname can be resolved properly from whatever box will be sending files?

Eli

Re: Hadoop on EC2 Managing Internal/External IPs

Posted by Håvard Wahl Kongsgård <ha...@gmail.com>.
Hi, a vpn or simply first uploading the files to an ec2 node is the best option

but an alternative is to use the external interface/IP instead of the
internal in the hadoop config¸ I assume this will be slower and more
costly...

-Håvard

On Fri, Aug 24, 2012 at 4:54 AM, igor Finkelshteyn <ie...@gmail.com> wrote:
> I've seen a bunch of people with this exact same question all over Google with no answers. I know people have successful non-temporary clusters in EC2. Is there really no one that's needed to deal with having EC2 expose external addresses instead of internal addresses before? This seems like it should be a common thing.
>
> On Aug 23, 2012, at 12:34 PM, igor Finkelshteyn wrote:
>
>> Hi,
>> I'm currently setting up a Hadoop cluster on EC2, and everything works just fine when accessing the cluster from inside EC2, but as soon as I try to do something like upload a file from an external client, I get timeout errors like:
>>
>> 12/08/23 12:06:16 ERROR hdfs.DFSClient: Failed to close file /user/some_file._COPYING_
>> java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/10.123.x.x:50010]
>>
>> What's clearly happening is my NameNode is resolving my DataNode's IPs to their internal EC2 values instead of their external values, and then sending along the internal IP to my external client, which is obviously unable to reach those. I'm thinking this must be a common problem. How do other people deal with it? Is there a way to just force my name node to send along my DataNode's hostname instead of IP, so that the hostname can be resolved properly from whatever box will be sending files?
>>
>> Eli
>



-- 
Håvard Wahl Kongsgård
Faculty of Medicine &
Department of Mathematical Sciences
NTNU

http://havard.security-review.net/

Re: Hadoop on EC2 Managing Internal/External IPs

Posted by Håvard Wahl Kongsgård <ha...@gmail.com>.
Hi, a vpn or simply first uploading the files to an ec2 node is the best option

but an alternative is to use the external interface/IP instead of the
internal in the hadoop config¸ I assume this will be slower and more
costly...

-Håvard

On Fri, Aug 24, 2012 at 4:54 AM, igor Finkelshteyn <ie...@gmail.com> wrote:
> I've seen a bunch of people with this exact same question all over Google with no answers. I know people have successful non-temporary clusters in EC2. Is there really no one that's needed to deal with having EC2 expose external addresses instead of internal addresses before? This seems like it should be a common thing.
>
> On Aug 23, 2012, at 12:34 PM, igor Finkelshteyn wrote:
>
>> Hi,
>> I'm currently setting up a Hadoop cluster on EC2, and everything works just fine when accessing the cluster from inside EC2, but as soon as I try to do something like upload a file from an external client, I get timeout errors like:
>>
>> 12/08/23 12:06:16 ERROR hdfs.DFSClient: Failed to close file /user/some_file._COPYING_
>> java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/10.123.x.x:50010]
>>
>> What's clearly happening is my NameNode is resolving my DataNode's IPs to their internal EC2 values instead of their external values, and then sending along the internal IP to my external client, which is obviously unable to reach those. I'm thinking this must be a common problem. How do other people deal with it? Is there a way to just force my name node to send along my DataNode's hostname instead of IP, so that the hostname can be resolved properly from whatever box will be sending files?
>>
>> Eli
>



-- 
Håvard Wahl Kongsgård
Faculty of Medicine &
Department of Mathematical Sciences
NTNU

http://havard.security-review.net/

Re: Hadoop on EC2 Managing Internal/External IPs

Posted by Håvard Wahl Kongsgård <ha...@gmail.com>.
Hi, a vpn or simply first uploading the files to an ec2 node is the best option

but an alternative is to use the external interface/IP instead of the
internal in the hadoop config¸ I assume this will be slower and more
costly...

-Håvard

On Fri, Aug 24, 2012 at 4:54 AM, igor Finkelshteyn <ie...@gmail.com> wrote:
> I've seen a bunch of people with this exact same question all over Google with no answers. I know people have successful non-temporary clusters in EC2. Is there really no one that's needed to deal with having EC2 expose external addresses instead of internal addresses before? This seems like it should be a common thing.
>
> On Aug 23, 2012, at 12:34 PM, igor Finkelshteyn wrote:
>
>> Hi,
>> I'm currently setting up a Hadoop cluster on EC2, and everything works just fine when accessing the cluster from inside EC2, but as soon as I try to do something like upload a file from an external client, I get timeout errors like:
>>
>> 12/08/23 12:06:16 ERROR hdfs.DFSClient: Failed to close file /user/some_file._COPYING_
>> java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/10.123.x.x:50010]
>>
>> What's clearly happening is my NameNode is resolving my DataNode's IPs to their internal EC2 values instead of their external values, and then sending along the internal IP to my external client, which is obviously unable to reach those. I'm thinking this must be a common problem. How do other people deal with it? Is there a way to just force my name node to send along my DataNode's hostname instead of IP, so that the hostname can be resolved properly from whatever box will be sending files?
>>
>> Eli
>



-- 
Håvard Wahl Kongsgård
Faculty of Medicine &
Department of Mathematical Sciences
NTNU

http://havard.security-review.net/

Re: Hadoop on EC2 Managing Internal/External IPs

Posted by Håvard Wahl Kongsgård <ha...@gmail.com>.
Hi, a vpn or simply first uploading the files to an ec2 node is the best option

but an alternative is to use the external interface/IP instead of the
internal in the hadoop config¸ I assume this will be slower and more
costly...

-Håvard

On Fri, Aug 24, 2012 at 4:54 AM, igor Finkelshteyn <ie...@gmail.com> wrote:
> I've seen a bunch of people with this exact same question all over Google with no answers. I know people have successful non-temporary clusters in EC2. Is there really no one that's needed to deal with having EC2 expose external addresses instead of internal addresses before? This seems like it should be a common thing.
>
> On Aug 23, 2012, at 12:34 PM, igor Finkelshteyn wrote:
>
>> Hi,
>> I'm currently setting up a Hadoop cluster on EC2, and everything works just fine when accessing the cluster from inside EC2, but as soon as I try to do something like upload a file from an external client, I get timeout errors like:
>>
>> 12/08/23 12:06:16 ERROR hdfs.DFSClient: Failed to close file /user/some_file._COPYING_
>> java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/10.123.x.x:50010]
>>
>> What's clearly happening is my NameNode is resolving my DataNode's IPs to their internal EC2 values instead of their external values, and then sending along the internal IP to my external client, which is obviously unable to reach those. I'm thinking this must be a common problem. How do other people deal with it? Is there a way to just force my name node to send along my DataNode's hostname instead of IP, so that the hostname can be resolved properly from whatever box will be sending files?
>>
>> Eli
>



-- 
Håvard Wahl Kongsgård
Faculty of Medicine &
Department of Mathematical Sciences
NTNU

http://havard.security-review.net/

Re: Hadoop on EC2 Managing Internal/External IPs

Posted by igor Finkelshteyn <ie...@gmail.com>.
I've seen a bunch of people with this exact same question all over Google with no answers. I know people have successful non-temporary clusters in EC2. Is there really no one that's needed to deal with having EC2 expose external addresses instead of internal addresses before? This seems like it should be a common thing.

On Aug 23, 2012, at 12:34 PM, igor Finkelshteyn wrote:

> Hi,
> I'm currently setting up a Hadoop cluster on EC2, and everything works just fine when accessing the cluster from inside EC2, but as soon as I try to do something like upload a file from an external client, I get timeout errors like:
> 
> 12/08/23 12:06:16 ERROR hdfs.DFSClient: Failed to close file /user/some_file._COPYING_
> java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/10.123.x.x:50010]
> 
> What's clearly happening is my NameNode is resolving my DataNode's IPs to their internal EC2 values instead of their external values, and then sending along the internal IP to my external client, which is obviously unable to reach those. I'm thinking this must be a common problem. How do other people deal with it? Is there a way to just force my name node to send along my DataNode's hostname instead of IP, so that the hostname can be resolved properly from whatever box will be sending files?
> 
> Eli


Re: Hadoop on EC2 Managing Internal/External IPs

Posted by Aaron Eng <ae...@maprtech.com>.
Hi Igor,

I don't think theres anything in Hadoop thats going to allow you to have an
internal IP assigned to a machines network interface and to have it
advertise the external IP.  Even if that were in place, you'd then have to
differentiate between requests coming from the other nodes in the cluster
vs. from external networks.  If your nodes were advertising their external
IP to other nodes, then its likely your network traffic would have to
traverse many more hops to be NAT'd and then come back.  The whole thing is
messy and would require a bunch of network configs that are probably going
to be a hassle to manage.

Also, I don't think you want to have your nodes exposed to the internet
since Hadoop traffic is unencrypted unless you've done something special to
encrypt it.  Even if you restrict your nodes to respond only to a specific
IP, you can't guarantee that someone else can't take over your IP from a
security perspective.  The VPN gateway offers both security and ease of
networking.


On Thu, Aug 23, 2012 at 8:09 PM, igor Finkelshteyn <ie...@gmail.com>wrote:

> That would work, but wouldn't a much simpler solution just be to force the
> machines in the cluster to always pass around their external FQDNs, since
> those will properly resolve to the internal or external IP depending on
> what machine is asking? Is there no way to just do that?
>
>
> On Aug 23, 2012, at 8:02 PM, Aaron Eng wrote:
>
> Hi Igor,
>
> Amazon offers a service where you can have a VPN gateway on your network
> that leads directly back to the network where youre instances are at.  So
> that 10.123.x.x subnet would be connected off of the VPN gateway on your
> network and you'd set up your routers/routing to push traffic for that
> subnet at the gateway.
>
> On Thu, Aug 23, 2012 at 12:34 PM, igor Finkelshteyn <ie...@gmail.com>wrote:
>
>> Hi,
>> I'm currently setting up a Hadoop cluster on EC2, and everything works
>> just fine when accessing the cluster from inside EC2, but as soon as I try
>> to do something like upload a file from an external client, I get timeout
>> errors like:
>>
>> 12/08/23 12:06:16 ERROR hdfs.DFSClient: Failed to close file
>> /user/some_file._COPYING_
>> java.net.SocketTimeoutException: 65000 millis timeout while waiting for
>> channel to be ready for connect. ch :
>> java.nio.channels.SocketChannel[connection-pending remote=/10.123.x.x:50010]
>>
>> What's clearly happening is my NameNode is resolving my DataNode's IPs to
>> their internal EC2 values instead of their external values, and then
>> sending along the internal IP to my external client, which is obviously
>> unable to reach those. I'm thinking this must be a common problem. How do
>> other people deal with it? Is there a way to just force my name node to
>> send along my DataNode's hostname instead of IP, so that the hostname can
>> be resolved properly from whatever box will be sending files?
>>
>> Eli
>
>
>
>

Re: Hadoop on EC2 Managing Internal/External IPs

Posted by Aaron Eng <ae...@maprtech.com>.
Hi Igor,

I don't think theres anything in Hadoop thats going to allow you to have an
internal IP assigned to a machines network interface and to have it
advertise the external IP.  Even if that were in place, you'd then have to
differentiate between requests coming from the other nodes in the cluster
vs. from external networks.  If your nodes were advertising their external
IP to other nodes, then its likely your network traffic would have to
traverse many more hops to be NAT'd and then come back.  The whole thing is
messy and would require a bunch of network configs that are probably going
to be a hassle to manage.

Also, I don't think you want to have your nodes exposed to the internet
since Hadoop traffic is unencrypted unless you've done something special to
encrypt it.  Even if you restrict your nodes to respond only to a specific
IP, you can't guarantee that someone else can't take over your IP from a
security perspective.  The VPN gateway offers both security and ease of
networking.


On Thu, Aug 23, 2012 at 8:09 PM, igor Finkelshteyn <ie...@gmail.com>wrote:

> That would work, but wouldn't a much simpler solution just be to force the
> machines in the cluster to always pass around their external FQDNs, since
> those will properly resolve to the internal or external IP depending on
> what machine is asking? Is there no way to just do that?
>
>
> On Aug 23, 2012, at 8:02 PM, Aaron Eng wrote:
>
> Hi Igor,
>
> Amazon offers a service where you can have a VPN gateway on your network
> that leads directly back to the network where youre instances are at.  So
> that 10.123.x.x subnet would be connected off of the VPN gateway on your
> network and you'd set up your routers/routing to push traffic for that
> subnet at the gateway.
>
> On Thu, Aug 23, 2012 at 12:34 PM, igor Finkelshteyn <ie...@gmail.com>wrote:
>
>> Hi,
>> I'm currently setting up a Hadoop cluster on EC2, and everything works
>> just fine when accessing the cluster from inside EC2, but as soon as I try
>> to do something like upload a file from an external client, I get timeout
>> errors like:
>>
>> 12/08/23 12:06:16 ERROR hdfs.DFSClient: Failed to close file
>> /user/some_file._COPYING_
>> java.net.SocketTimeoutException: 65000 millis timeout while waiting for
>> channel to be ready for connect. ch :
>> java.nio.channels.SocketChannel[connection-pending remote=/10.123.x.x:50010]
>>
>> What's clearly happening is my NameNode is resolving my DataNode's IPs to
>> their internal EC2 values instead of their external values, and then
>> sending along the internal IP to my external client, which is obviously
>> unable to reach those. I'm thinking this must be a common problem. How do
>> other people deal with it? Is there a way to just force my name node to
>> send along my DataNode's hostname instead of IP, so that the hostname can
>> be resolved properly from whatever box will be sending files?
>>
>> Eli
>
>
>
>

Re: Hadoop on EC2 Managing Internal/External IPs

Posted by Aaron Eng <ae...@maprtech.com>.
Hi Igor,

I don't think theres anything in Hadoop thats going to allow you to have an
internal IP assigned to a machines network interface and to have it
advertise the external IP.  Even if that were in place, you'd then have to
differentiate between requests coming from the other nodes in the cluster
vs. from external networks.  If your nodes were advertising their external
IP to other nodes, then its likely your network traffic would have to
traverse many more hops to be NAT'd and then come back.  The whole thing is
messy and would require a bunch of network configs that are probably going
to be a hassle to manage.

Also, I don't think you want to have your nodes exposed to the internet
since Hadoop traffic is unencrypted unless you've done something special to
encrypt it.  Even if you restrict your nodes to respond only to a specific
IP, you can't guarantee that someone else can't take over your IP from a
security perspective.  The VPN gateway offers both security and ease of
networking.


On Thu, Aug 23, 2012 at 8:09 PM, igor Finkelshteyn <ie...@gmail.com>wrote:

> That would work, but wouldn't a much simpler solution just be to force the
> machines in the cluster to always pass around their external FQDNs, since
> those will properly resolve to the internal or external IP depending on
> what machine is asking? Is there no way to just do that?
>
>
> On Aug 23, 2012, at 8:02 PM, Aaron Eng wrote:
>
> Hi Igor,
>
> Amazon offers a service where you can have a VPN gateway on your network
> that leads directly back to the network where youre instances are at.  So
> that 10.123.x.x subnet would be connected off of the VPN gateway on your
> network and you'd set up your routers/routing to push traffic for that
> subnet at the gateway.
>
> On Thu, Aug 23, 2012 at 12:34 PM, igor Finkelshteyn <ie...@gmail.com>wrote:
>
>> Hi,
>> I'm currently setting up a Hadoop cluster on EC2, and everything works
>> just fine when accessing the cluster from inside EC2, but as soon as I try
>> to do something like upload a file from an external client, I get timeout
>> errors like:
>>
>> 12/08/23 12:06:16 ERROR hdfs.DFSClient: Failed to close file
>> /user/some_file._COPYING_
>> java.net.SocketTimeoutException: 65000 millis timeout while waiting for
>> channel to be ready for connect. ch :
>> java.nio.channels.SocketChannel[connection-pending remote=/10.123.x.x:50010]
>>
>> What's clearly happening is my NameNode is resolving my DataNode's IPs to
>> their internal EC2 values instead of their external values, and then
>> sending along the internal IP to my external client, which is obviously
>> unable to reach those. I'm thinking this must be a common problem. How do
>> other people deal with it? Is there a way to just force my name node to
>> send along my DataNode's hostname instead of IP, so that the hostname can
>> be resolved properly from whatever box will be sending files?
>>
>> Eli
>
>
>
>

Re: Hadoop on EC2 Managing Internal/External IPs

Posted by Aaron Eng <ae...@maprtech.com>.
Hi Igor,

I don't think theres anything in Hadoop thats going to allow you to have an
internal IP assigned to a machines network interface and to have it
advertise the external IP.  Even if that were in place, you'd then have to
differentiate between requests coming from the other nodes in the cluster
vs. from external networks.  If your nodes were advertising their external
IP to other nodes, then its likely your network traffic would have to
traverse many more hops to be NAT'd and then come back.  The whole thing is
messy and would require a bunch of network configs that are probably going
to be a hassle to manage.

Also, I don't think you want to have your nodes exposed to the internet
since Hadoop traffic is unencrypted unless you've done something special to
encrypt it.  Even if you restrict your nodes to respond only to a specific
IP, you can't guarantee that someone else can't take over your IP from a
security perspective.  The VPN gateway offers both security and ease of
networking.


On Thu, Aug 23, 2012 at 8:09 PM, igor Finkelshteyn <ie...@gmail.com>wrote:

> That would work, but wouldn't a much simpler solution just be to force the
> machines in the cluster to always pass around their external FQDNs, since
> those will properly resolve to the internal or external IP depending on
> what machine is asking? Is there no way to just do that?
>
>
> On Aug 23, 2012, at 8:02 PM, Aaron Eng wrote:
>
> Hi Igor,
>
> Amazon offers a service where you can have a VPN gateway on your network
> that leads directly back to the network where youre instances are at.  So
> that 10.123.x.x subnet would be connected off of the VPN gateway on your
> network and you'd set up your routers/routing to push traffic for that
> subnet at the gateway.
>
> On Thu, Aug 23, 2012 at 12:34 PM, igor Finkelshteyn <ie...@gmail.com>wrote:
>
>> Hi,
>> I'm currently setting up a Hadoop cluster on EC2, and everything works
>> just fine when accessing the cluster from inside EC2, but as soon as I try
>> to do something like upload a file from an external client, I get timeout
>> errors like:
>>
>> 12/08/23 12:06:16 ERROR hdfs.DFSClient: Failed to close file
>> /user/some_file._COPYING_
>> java.net.SocketTimeoutException: 65000 millis timeout while waiting for
>> channel to be ready for connect. ch :
>> java.nio.channels.SocketChannel[connection-pending remote=/10.123.x.x:50010]
>>
>> What's clearly happening is my NameNode is resolving my DataNode's IPs to
>> their internal EC2 values instead of their external values, and then
>> sending along the internal IP to my external client, which is obviously
>> unable to reach those. I'm thinking this must be a common problem. How do
>> other people deal with it? Is there a way to just force my name node to
>> send along my DataNode's hostname instead of IP, so that the hostname can
>> be resolved properly from whatever box will be sending files?
>>
>> Eli
>
>
>
>

Re: Hadoop on EC2 Managing Internal/External IPs

Posted by igor Finkelshteyn <ie...@gmail.com>.
That would work, but wouldn't a much simpler solution just be to force the machines in the cluster to always pass around their external FQDNs, since those will properly resolve to the internal or external IP depending on what machine is asking? Is there no way to just do that?


On Aug 23, 2012, at 8:02 PM, Aaron Eng wrote:

> Hi Igor,
> 
> Amazon offers a service where you can have a VPN gateway on your network that leads directly back to the network where youre instances are at.  So that 10.123.x.x subnet would be connected off of the VPN gateway on your network and you'd set up your routers/routing to push traffic for that subnet at the gateway.  
> 
> On Thu, Aug 23, 2012 at 12:34 PM, igor Finkelshteyn <ie...@gmail.com> wrote:
> Hi,
> I'm currently setting up a Hadoop cluster on EC2, and everything works just fine when accessing the cluster from inside EC2, but as soon as I try to do something like upload a file from an external client, I get timeout errors like:
> 
> 12/08/23 12:06:16 ERROR hdfs.DFSClient: Failed to close file /user/some_file._COPYING_
> java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/10.123.x.x:50010]
> 
> What's clearly happening is my NameNode is resolving my DataNode's IPs to their internal EC2 values instead of their external values, and then sending along the internal IP to my external client, which is obviously unable to reach those. I'm thinking this must be a common problem. How do other people deal with it? Is there a way to just force my name node to send along my DataNode's hostname instead of IP, so that the hostname can be resolved properly from whatever box will be sending files?
> 
> Eli
> 


Re: Hadoop on EC2 Managing Internal/External IPs

Posted by igor Finkelshteyn <ie...@gmail.com>.
That would work, but wouldn't a much simpler solution just be to force the machines in the cluster to always pass around their external FQDNs, since those will properly resolve to the internal or external IP depending on what machine is asking? Is there no way to just do that?


On Aug 23, 2012, at 8:02 PM, Aaron Eng wrote:

> Hi Igor,
> 
> Amazon offers a service where you can have a VPN gateway on your network that leads directly back to the network where youre instances are at.  So that 10.123.x.x subnet would be connected off of the VPN gateway on your network and you'd set up your routers/routing to push traffic for that subnet at the gateway.  
> 
> On Thu, Aug 23, 2012 at 12:34 PM, igor Finkelshteyn <ie...@gmail.com> wrote:
> Hi,
> I'm currently setting up a Hadoop cluster on EC2, and everything works just fine when accessing the cluster from inside EC2, but as soon as I try to do something like upload a file from an external client, I get timeout errors like:
> 
> 12/08/23 12:06:16 ERROR hdfs.DFSClient: Failed to close file /user/some_file._COPYING_
> java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/10.123.x.x:50010]
> 
> What's clearly happening is my NameNode is resolving my DataNode's IPs to their internal EC2 values instead of their external values, and then sending along the internal IP to my external client, which is obviously unable to reach those. I'm thinking this must be a common problem. How do other people deal with it? Is there a way to just force my name node to send along my DataNode's hostname instead of IP, so that the hostname can be resolved properly from whatever box will be sending files?
> 
> Eli
> 


Re: Hadoop on EC2 Managing Internal/External IPs

Posted by igor Finkelshteyn <ie...@gmail.com>.
That would work, but wouldn't a much simpler solution just be to force the machines in the cluster to always pass around their external FQDNs, since those will properly resolve to the internal or external IP depending on what machine is asking? Is there no way to just do that?


On Aug 23, 2012, at 8:02 PM, Aaron Eng wrote:

> Hi Igor,
> 
> Amazon offers a service where you can have a VPN gateway on your network that leads directly back to the network where youre instances are at.  So that 10.123.x.x subnet would be connected off of the VPN gateway on your network and you'd set up your routers/routing to push traffic for that subnet at the gateway.  
> 
> On Thu, Aug 23, 2012 at 12:34 PM, igor Finkelshteyn <ie...@gmail.com> wrote:
> Hi,
> I'm currently setting up a Hadoop cluster on EC2, and everything works just fine when accessing the cluster from inside EC2, but as soon as I try to do something like upload a file from an external client, I get timeout errors like:
> 
> 12/08/23 12:06:16 ERROR hdfs.DFSClient: Failed to close file /user/some_file._COPYING_
> java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/10.123.x.x:50010]
> 
> What's clearly happening is my NameNode is resolving my DataNode's IPs to their internal EC2 values instead of their external values, and then sending along the internal IP to my external client, which is obviously unable to reach those. I'm thinking this must be a common problem. How do other people deal with it? Is there a way to just force my name node to send along my DataNode's hostname instead of IP, so that the hostname can be resolved properly from whatever box will be sending files?
> 
> Eli
> 


Re: Hadoop on EC2 Managing Internal/External IPs

Posted by igor Finkelshteyn <ie...@gmail.com>.
That would work, but wouldn't a much simpler solution just be to force the machines in the cluster to always pass around their external FQDNs, since those will properly resolve to the internal or external IP depending on what machine is asking? Is there no way to just do that?


On Aug 23, 2012, at 8:02 PM, Aaron Eng wrote:

> Hi Igor,
> 
> Amazon offers a service where you can have a VPN gateway on your network that leads directly back to the network where youre instances are at.  So that 10.123.x.x subnet would be connected off of the VPN gateway on your network and you'd set up your routers/routing to push traffic for that subnet at the gateway.  
> 
> On Thu, Aug 23, 2012 at 12:34 PM, igor Finkelshteyn <ie...@gmail.com> wrote:
> Hi,
> I'm currently setting up a Hadoop cluster on EC2, and everything works just fine when accessing the cluster from inside EC2, but as soon as I try to do something like upload a file from an external client, I get timeout errors like:
> 
> 12/08/23 12:06:16 ERROR hdfs.DFSClient: Failed to close file /user/some_file._COPYING_
> java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/10.123.x.x:50010]
> 
> What's clearly happening is my NameNode is resolving my DataNode's IPs to their internal EC2 values instead of their external values, and then sending along the internal IP to my external client, which is obviously unable to reach those. I'm thinking this must be a common problem. How do other people deal with it? Is there a way to just force my name node to send along my DataNode's hostname instead of IP, so that the hostname can be resolved properly from whatever box will be sending files?
> 
> Eli
> 


Re: Hadoop on EC2 Managing Internal/External IPs

Posted by Aaron Eng <ae...@maprtech.com>.
Hi Igor,

Amazon offers a service where you can have a VPN gateway on your network
that leads directly back to the network where youre instances are at.  So
that 10.123.x.x subnet would be connected off of the VPN gateway on your
network and you'd set up your routers/routing to push traffic for that
subnet at the gateway.

On Thu, Aug 23, 2012 at 12:34 PM, igor Finkelshteyn <ie...@gmail.com>wrote:

> Hi,
> I'm currently setting up a Hadoop cluster on EC2, and everything works
> just fine when accessing the cluster from inside EC2, but as soon as I try
> to do something like upload a file from an external client, I get timeout
> errors like:
>
> 12/08/23 12:06:16 ERROR hdfs.DFSClient: Failed to close file
> /user/some_file._COPYING_
> java.net.SocketTimeoutException: 65000 millis timeout while waiting for
> channel to be ready for connect. ch :
> java.nio.channels.SocketChannel[connection-pending remote=/10.123.x.x:50010]
>
> What's clearly happening is my NameNode is resolving my DataNode's IPs to
> their internal EC2 values instead of their external values, and then
> sending along the internal IP to my external client, which is obviously
> unable to reach those. I'm thinking this must be a common problem. How do
> other people deal with it? Is there a way to just force my name node to
> send along my DataNode's hostname instead of IP, so that the hostname can
> be resolved properly from whatever box will be sending files?
>
> Eli

Re: Hadoop on EC2 Managing Internal/External IPs

Posted by igor Finkelshteyn <ie...@gmail.com>.
I've seen a bunch of people with this exact same question all over Google with no answers. I know people have successful non-temporary clusters in EC2. Is there really no one that's needed to deal with having EC2 expose external addresses instead of internal addresses before? This seems like it should be a common thing.

On Aug 23, 2012, at 12:34 PM, igor Finkelshteyn wrote:

> Hi,
> I'm currently setting up a Hadoop cluster on EC2, and everything works just fine when accessing the cluster from inside EC2, but as soon as I try to do something like upload a file from an external client, I get timeout errors like:
> 
> 12/08/23 12:06:16 ERROR hdfs.DFSClient: Failed to close file /user/some_file._COPYING_
> java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/10.123.x.x:50010]
> 
> What's clearly happening is my NameNode is resolving my DataNode's IPs to their internal EC2 values instead of their external values, and then sending along the internal IP to my external client, which is obviously unable to reach those. I'm thinking this must be a common problem. How do other people deal with it? Is there a way to just force my name node to send along my DataNode's hostname instead of IP, so that the hostname can be resolved properly from whatever box will be sending files?
> 
> Eli


Re: Hadoop on EC2 Managing Internal/External IPs

Posted by Aaron Eng <ae...@maprtech.com>.
Hi Igor,

Amazon offers a service where you can have a VPN gateway on your network
that leads directly back to the network where youre instances are at.  So
that 10.123.x.x subnet would be connected off of the VPN gateway on your
network and you'd set up your routers/routing to push traffic for that
subnet at the gateway.

On Thu, Aug 23, 2012 at 12:34 PM, igor Finkelshteyn <ie...@gmail.com>wrote:

> Hi,
> I'm currently setting up a Hadoop cluster on EC2, and everything works
> just fine when accessing the cluster from inside EC2, but as soon as I try
> to do something like upload a file from an external client, I get timeout
> errors like:
>
> 12/08/23 12:06:16 ERROR hdfs.DFSClient: Failed to close file
> /user/some_file._COPYING_
> java.net.SocketTimeoutException: 65000 millis timeout while waiting for
> channel to be ready for connect. ch :
> java.nio.channels.SocketChannel[connection-pending remote=/10.123.x.x:50010]
>
> What's clearly happening is my NameNode is resolving my DataNode's IPs to
> their internal EC2 values instead of their external values, and then
> sending along the internal IP to my external client, which is obviously
> unable to reach those. I'm thinking this must be a common problem. How do
> other people deal with it? Is there a way to just force my name node to
> send along my DataNode's hostname instead of IP, so that the hostname can
> be resolved properly from whatever box will be sending files?
>
> Eli

Re: Hadoop on EC2 Managing Internal/External IPs

Posted by igor Finkelshteyn <ie...@gmail.com>.
I've seen a bunch of people with this exact same question all over Google with no answers. I know people have successful non-temporary clusters in EC2. Is there really no one that's needed to deal with having EC2 expose external addresses instead of internal addresses before? This seems like it should be a common thing.

On Aug 23, 2012, at 12:34 PM, igor Finkelshteyn wrote:

> Hi,
> I'm currently setting up a Hadoop cluster on EC2, and everything works just fine when accessing the cluster from inside EC2, but as soon as I try to do something like upload a file from an external client, I get timeout errors like:
> 
> 12/08/23 12:06:16 ERROR hdfs.DFSClient: Failed to close file /user/some_file._COPYING_
> java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/10.123.x.x:50010]
> 
> What's clearly happening is my NameNode is resolving my DataNode's IPs to their internal EC2 values instead of their external values, and then sending along the internal IP to my external client, which is obviously unable to reach those. I'm thinking this must be a common problem. How do other people deal with it? Is there a way to just force my name node to send along my DataNode's hostname instead of IP, so that the hostname can be resolved properly from whatever box will be sending files?
> 
> Eli


Re: Hadoop on EC2 Managing Internal/External IPs

Posted by Aaron Eng <ae...@maprtech.com>.
Hi Igor,

Amazon offers a service where you can have a VPN gateway on your network
that leads directly back to the network where youre instances are at.  So
that 10.123.x.x subnet would be connected off of the VPN gateway on your
network and you'd set up your routers/routing to push traffic for that
subnet at the gateway.

On Thu, Aug 23, 2012 at 12:34 PM, igor Finkelshteyn <ie...@gmail.com>wrote:

> Hi,
> I'm currently setting up a Hadoop cluster on EC2, and everything works
> just fine when accessing the cluster from inside EC2, but as soon as I try
> to do something like upload a file from an external client, I get timeout
> errors like:
>
> 12/08/23 12:06:16 ERROR hdfs.DFSClient: Failed to close file
> /user/some_file._COPYING_
> java.net.SocketTimeoutException: 65000 millis timeout while waiting for
> channel to be ready for connect. ch :
> java.nio.channels.SocketChannel[connection-pending remote=/10.123.x.x:50010]
>
> What's clearly happening is my NameNode is resolving my DataNode's IPs to
> their internal EC2 values instead of their external values, and then
> sending along the internal IP to my external client, which is obviously
> unable to reach those. I'm thinking this must be a common problem. How do
> other people deal with it? Is there a way to just force my name node to
> send along my DataNode's hostname instead of IP, so that the hostname can
> be resolved properly from whatever box will be sending files?
>
> Eli

Re: Hadoop on EC2 Managing Internal/External IPs

Posted by igor Finkelshteyn <ie...@gmail.com>.
I've seen a bunch of people with this exact same question all over Google with no answers. I know people have successful non-temporary clusters in EC2. Is there really no one that's needed to deal with having EC2 expose external addresses instead of internal addresses before? This seems like it should be a common thing.

On Aug 23, 2012, at 12:34 PM, igor Finkelshteyn wrote:

> Hi,
> I'm currently setting up a Hadoop cluster on EC2, and everything works just fine when accessing the cluster from inside EC2, but as soon as I try to do something like upload a file from an external client, I get timeout errors like:
> 
> 12/08/23 12:06:16 ERROR hdfs.DFSClient: Failed to close file /user/some_file._COPYING_
> java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/10.123.x.x:50010]
> 
> What's clearly happening is my NameNode is resolving my DataNode's IPs to their internal EC2 values instead of their external values, and then sending along the internal IP to my external client, which is obviously unable to reach those. I'm thinking this must be a common problem. How do other people deal with it? Is there a way to just force my name node to send along my DataNode's hostname instead of IP, so that the hostname can be resolved properly from whatever box will be sending files?
> 
> Eli


Re: Hadoop on EC2 Managing Internal/External IPs

Posted by Aaron Eng <ae...@maprtech.com>.
Hi Igor,

Amazon offers a service where you can have a VPN gateway on your network
that leads directly back to the network where youre instances are at.  So
that 10.123.x.x subnet would be connected off of the VPN gateway on your
network and you'd set up your routers/routing to push traffic for that
subnet at the gateway.

On Thu, Aug 23, 2012 at 12:34 PM, igor Finkelshteyn <ie...@gmail.com>wrote:

> Hi,
> I'm currently setting up a Hadoop cluster on EC2, and everything works
> just fine when accessing the cluster from inside EC2, but as soon as I try
> to do something like upload a file from an external client, I get timeout
> errors like:
>
> 12/08/23 12:06:16 ERROR hdfs.DFSClient: Failed to close file
> /user/some_file._COPYING_
> java.net.SocketTimeoutException: 65000 millis timeout while waiting for
> channel to be ready for connect. ch :
> java.nio.channels.SocketChannel[connection-pending remote=/10.123.x.x:50010]
>
> What's clearly happening is my NameNode is resolving my DataNode's IPs to
> their internal EC2 values instead of their external values, and then
> sending along the internal IP to my external client, which is obviously
> unable to reach those. I'm thinking this must be a common problem. How do
> other people deal with it? Is there a way to just force my name node to
> send along my DataNode's hostname instead of IP, so that the hostname can
> be resolved properly from whatever box will be sending files?
>
> Eli