You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Mauro Cohen <ma...@gmail.com> on 2013/04/10 23:32:13 UTC

Problem With NAT ips

Hello, i have a problem with the new version of hadoop.

I have cluster with 2 nodes.
Each one has a private ip and a public IP configured through NAT.
The problem is that the private IP of each node doesnt belong to the same
net. (I have no conectivity between nodes through that ip)
I have conectvity between nodes thorugh the NAT ip only, (ssh, ping, etc ).

With the hadoop 0.20.x version when i configured datanodes and namenodes
configuration files i allways used the host-name for propertys (ex:
fs.defaul.name property)  and never have problems with this.
But with the new version of hadoop, theres has to be change the way that
nodes comunicates itself, and they use the private ip in some point instead
of host-names.

I have installed a cluster with 2 nodes:

hadoop-2-00 is the namenode.
In hadoop-2-00 i have this /etc/hosts file and this ifconfig output:

*etc/hosts:*

172.16.67.68 hadoop-2-00

*ifconfig*:

eth0      Link encap:Ethernet  HWaddr fa:16:3e:4c:06:25
          inet addr:172.16.67.68  Bcast:172.16.95.255  Mask:255.255.224.0
          inet6 addr: fe80::f816:3eff:fe4c:625/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:73475 errors:0 dropped:0 overruns:0 frame:0
          TX packets:58912 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:100923399 (100.9 MB)  TX bytes:101169918 (101.1 MB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:10 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:588 (588.0 B)  TX bytes:588 (588.0 B)

The NAT ip for this node is 10.70.5.51

I use the host-name(*hadoop-2-00*) in all the configuration files of hadoop.

The other node is the datanode* hadoop-2-01* and has this etc/hosts and
ifconfig output:

eth0      Link encap:Ethernet  HWaddr fa:16:3e:70:5e:bd
          inet addr:172.16.67.69  Bcast:172.16.95.255  Mask:255.255.224.0
          inet6 addr: fe80::f816:3eff:fe70:5ebd/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:27081 errors:0 dropped:0 overruns:0 frame:0
          TX packets:24105 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:95842550 (95.8 MB)  TX bytes:4314694 (4.3 MB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:34 errors:0 dropped:0 overruns:0 frame:0
          TX packets:34 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1900 (1.9 KB)  TX bytes:1900 (1.9 KB)

*/etc/hosts*

172.16.67.69 hadoop-2-01

The nat ip for that host is 10.70.5.57

When i start the namenode there  is no problem.

But when i start the datanode i theres is an error.

This is the stacktrace of the datanode log:

2013-04-10 16:01:26,997 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool
BP-2054036249-172.16.67.68-1365621320283 (storage id
DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/
10.70.5.51:8020 beginning handshake with NN
2013-04-10 16:01:27,013 FATAL
org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for
block pool Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id
DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/
10.70.5.51:8020
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException):
Datanode denied communication with namenode: DatanodeRegistration(0.0.0.0,
storageID=DS-1556234100-172.16.67.69-50010-1365621786288, infoPort=50075,
ipcPort=50020,
storageInfo=lv=-40;cid=CID-65f42cc4-6c02-4537-9fb8-627a612ec74e;nsid=1995699852;c=0)
        at
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:629)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:3459)
        at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:881)
        at
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:90)
        at
org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:18295)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1735)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1731)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1441)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1729)

        at org.apache.hadoop.ipc.Client.call(Client.java:1235)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
        at $Proxy10.registerDatanode(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
        at $Proxy10.registerDatanode(Unknown Source)
        at
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146)
        at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623)
        at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)
        at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
        at java.lang.Thread.run(Thread.java:662)
2013-04-10 16:01:27,015 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service
for: Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id
DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/
10.70.5.51:8020
2013-04-10 16:01:27,016 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool
BP-2054036249-172.16.67.68-1365621320283 (storage id
DS-1556234100-172.16.67.69-50010-1365621786288)
2013-04-10 16:01:27,016 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Removed
bpid=BP-2054036249-172.16.67.68-1365621320283 from blockPoolScannerMap
2013-04-10 16:01:27,016 INFO
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Removing block pool BP-2054036249-172.16.67.68-1365621320283
2013-04-10 16:01:29,017 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2013-04-10 16:01:29,019 INFO org.apache.hadoop.util.ExitUtil: Exiting with
status 0
2013-04-10 16:01:29,021 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at hadoop-2-01/172.16.67.69



Do you know if theres a way to solve this?

Any ideas?

Thanks.
Mauro.

Re: Problem With NAT ips

Posted by Daryn Sharp <da...@yahoo-inc.com>.
That's unfortunate.  The NN can't really know its public ip so all it can do is tell the DN its address which happens to be private.  My first thought would be the DN knows the NN's public ip to register itself, so the NN doesn't need to send its address in the web link.  However that won't work for federation (multiple NNs).  I suppose the NN could send its nameservice id and the DN can use it to lookup the NN hostname.

I'm not sure there's a good workaround short of code changes.  Hadoop (currently) isn't designed for NAT, multiple NICs, etc.  Your best bet is to file a jira and in the meantime try to set up your cluster on the same network.

Daryn

On Apr 11, 2013, at 10:10 AM, Mauro Cohen wrote:

Thank you Daryn for your response.

I try what you tell me, and now the datanode is working. But now there is another problem.

When you get to the name node live nodes page i can see mi data node as alive. But when i try to enter to the datanode page i have this message as a responde:

No Route to Host from hadoop-2-01/172.16.67.69<http://172.16.67.69/> to 172.16.67.68:8020<http://172.16.67.68:8020/> failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see: http://wiki.apache.org/hadoop/NoRouteToHost
It seems that in some point it still passing the private ip to comunicate between the nodes.

When i look into the url of the link it pass the private ip of the namenode as the nnaddress param:

http://hadoop-2-01:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F&nnaddr=172.16.67.68:8020<http://172.16.67.68:8020/>

If i put that param with the namenode hostname or with the public ip of the namenode it works fine.

But when i run any job that looks for information in the datanode, it is using the private ip to comunicate , so i get the typical msg of "could not obtain block".

Any ideas?.


Thanks.
Mauro.

2013/4/11 Daryn Sharp <da...@yahoo-inc.com>>
Hi Mauro,

The registration process has changed quite a bit.  I don't think the NN "trusts" the DN's self-identification anymore.  Otherwise it makes it trivial to spoof another DN, intentionally or not, which can be a security hazard.

I suspect the NN can't resolve the DN.  Unresolvable hosts are rejected because the allow/deny lists may contain hostnames.  If dns is temporarily unavailable, you don't want a node blocked by hostname to slip through.    Try adding the DN's public ip 10.70.5.57 to the NN's /etc/hosts if it's not resolvable via dns.

I hope this helps!

Daryn

On Apr 10, 2013, at 4:32 PM, Mauro Cohen wrote:



Hello, i have a problem with the new version of hadoop.

I have cluster with 2 nodes.
Each one has a private ip and a public IP configured through NAT.
The problem is that the private IP of each node doesnt belong to the same net. (I have no conectivity between nodes through that ip)
I have conectvity between nodes thorugh the NAT ip only, (ssh, ping, etc ).

With the hadoop 0.20.x version when i configured datanodes and namenodes configuration files i allways used the host-name for propertys (ex: fs.defaul.name<http://fs.defaul.name/> property)  and never have problems with this.
But with the new version of hadoop, theres has to be change the way that nodes comunicates itself, and they use the private ip in some point instead of host-names.

I have installed a cluster with 2 nodes:

hadoop-2-00 is the namenode.
In hadoop-2-00 i have this /etc/hosts file and this ifconfig output:

etc/hosts:

172.16.67.68 hadoop-2-00

ifconfig:

eth0      Link encap:Ethernet  HWaddr fa:16:3e:4c:06:25
          inet addr:172.16.67.68  Bcast:172.16.95.255  Mask:255.255.224.0
          inet6 addr: fe80::f816:3eff:fe4c:625/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:73475 errors:0 dropped:0 overruns:0 frame:0
          TX packets:58912 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:100923399 (100.9 MB)  TX bytes:101169918 (101.1 MB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:10 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:588 (588.0 B)  TX bytes:588 (588.0 B)

The NAT ip for this node is 10.70.5.51

I use the host-name(hadoop-2-00) in all the configuration files of hadoop.

The other node is the datanode hadoop-2-01 and has this etc/hosts and ifconfig output:

eth0      Link encap:Ethernet  HWaddr fa:16:3e:70:5e:bd
          inet addr:172.16.67.69  Bcast:172.16.95.255  Mask:255.255.224.0
          inet6 addr: fe80::f816:3eff:fe70:5ebd/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:27081 errors:0 dropped:0 overruns:0 frame:0
          TX packets:24105 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:95842550 (95.8 MB)  TX bytes:4314694 (4.3 MB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:34 errors:0 dropped:0 overruns:0 frame:0
          TX packets:34 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1900 (1.9 KB)  TX bytes:1900 (1.9 KB)

/etc/hosts

172.16.67.69 hadoop-2-01

The nat ip for that host is 10.70.5.57

When i start the namenode there  is no problem.

But when i start the datanode i theres is an error.

This is the stacktrace of the datanode log:

2013-04-10 16:01:26,997 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/10.70.5.51:8020<http://10.70.5.51:8020/> beginning handshake with NN
2013-04-10 16:01:27,013 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/10.70.5.51:8020<http://10.70.5.51:8020/>
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException): Datanode denied communication with namenode: DatanodeRegistration(0.0.0.0, storageID=DS-1556234100-172.16.67.69-50010-1365621786288, infoPort=50075, ipcPort=50020, storageInfo=lv=-40;cid=CID-65f42cc4-6c02-4537-9fb8-627a612ec74e;nsid=1995699852;c=0)
        at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:629)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:3459)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:881)
        at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:90)
        at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:18295)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1735)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1731)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1441)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1729)

        at org.apache.hadoop.ipc.Client.call(Client.java:1235)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
        at $Proxy10.registerDatanode(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
        at $Proxy10.registerDatanode(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
        at java.lang.Thread.run(Thread.java:662)
2013-04-10 16:01:27,015 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/10.70.5.51:8020<http://10.70.5.51:8020/>
2013-04-10 16:01:27,016 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id DS-1556234100-172.16.67.69-50010-1365621786288)
2013-04-10 16:01:27,016 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Removed bpid=BP-2054036249-172.16.67.68-1365621320283 from blockPoolScannerMap
2013-04-10 16:01:27,016 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing block pool BP-2054036249-172.16.67.68-1365621320283
2013-04-10 16:01:29,017 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2013-04-10 16:01:29,019 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
2013-04-10 16:01:29,021 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at hadoop-2-01/172.16.67.69<http://172.16.67.69/>



Do you know if theres a way to solve this?

Any ideas?

Thanks.
Mauro.








Re: Problem With NAT ips

Posted by Daryn Sharp <da...@yahoo-inc.com>.
That's unfortunate.  The NN can't really know its public ip so all it can do is tell the DN its address which happens to be private.  My first thought would be the DN knows the NN's public ip to register itself, so the NN doesn't need to send its address in the web link.  However that won't work for federation (multiple NNs).  I suppose the NN could send its nameservice id and the DN can use it to lookup the NN hostname.

I'm not sure there's a good workaround short of code changes.  Hadoop (currently) isn't designed for NAT, multiple NICs, etc.  Your best bet is to file a jira and in the meantime try to set up your cluster on the same network.

Daryn

On Apr 11, 2013, at 10:10 AM, Mauro Cohen wrote:

Thank you Daryn for your response.

I try what you tell me, and now the datanode is working. But now there is another problem.

When you get to the name node live nodes page i can see mi data node as alive. But when i try to enter to the datanode page i have this message as a responde:

No Route to Host from hadoop-2-01/172.16.67.69<http://172.16.67.69/> to 172.16.67.68:8020<http://172.16.67.68:8020/> failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see: http://wiki.apache.org/hadoop/NoRouteToHost
It seems that in some point it still passing the private ip to comunicate between the nodes.

When i look into the url of the link it pass the private ip of the namenode as the nnaddress param:

http://hadoop-2-01:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F&nnaddr=172.16.67.68:8020<http://172.16.67.68:8020/>

If i put that param with the namenode hostname or with the public ip of the namenode it works fine.

But when i run any job that looks for information in the datanode, it is using the private ip to comunicate , so i get the typical msg of "could not obtain block".

Any ideas?.


Thanks.
Mauro.

2013/4/11 Daryn Sharp <da...@yahoo-inc.com>>
Hi Mauro,

The registration process has changed quite a bit.  I don't think the NN "trusts" the DN's self-identification anymore.  Otherwise it makes it trivial to spoof another DN, intentionally or not, which can be a security hazard.

I suspect the NN can't resolve the DN.  Unresolvable hosts are rejected because the allow/deny lists may contain hostnames.  If dns is temporarily unavailable, you don't want a node blocked by hostname to slip through.    Try adding the DN's public ip 10.70.5.57 to the NN's /etc/hosts if it's not resolvable via dns.

I hope this helps!

Daryn

On Apr 10, 2013, at 4:32 PM, Mauro Cohen wrote:



Hello, i have a problem with the new version of hadoop.

I have cluster with 2 nodes.
Each one has a private ip and a public IP configured through NAT.
The problem is that the private IP of each node doesnt belong to the same net. (I have no conectivity between nodes through that ip)
I have conectvity between nodes thorugh the NAT ip only, (ssh, ping, etc ).

With the hadoop 0.20.x version when i configured datanodes and namenodes configuration files i allways used the host-name for propertys (ex: fs.defaul.name<http://fs.defaul.name/> property)  and never have problems with this.
But with the new version of hadoop, theres has to be change the way that nodes comunicates itself, and they use the private ip in some point instead of host-names.

I have installed a cluster with 2 nodes:

hadoop-2-00 is the namenode.
In hadoop-2-00 i have this /etc/hosts file and this ifconfig output:

etc/hosts:

172.16.67.68 hadoop-2-00

ifconfig:

eth0      Link encap:Ethernet  HWaddr fa:16:3e:4c:06:25
          inet addr:172.16.67.68  Bcast:172.16.95.255  Mask:255.255.224.0
          inet6 addr: fe80::f816:3eff:fe4c:625/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:73475 errors:0 dropped:0 overruns:0 frame:0
          TX packets:58912 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:100923399 (100.9 MB)  TX bytes:101169918 (101.1 MB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:10 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:588 (588.0 B)  TX bytes:588 (588.0 B)

The NAT ip for this node is 10.70.5.51

I use the host-name(hadoop-2-00) in all the configuration files of hadoop.

The other node is the datanode hadoop-2-01 and has this etc/hosts and ifconfig output:

eth0      Link encap:Ethernet  HWaddr fa:16:3e:70:5e:bd
          inet addr:172.16.67.69  Bcast:172.16.95.255  Mask:255.255.224.0
          inet6 addr: fe80::f816:3eff:fe70:5ebd/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:27081 errors:0 dropped:0 overruns:0 frame:0
          TX packets:24105 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:95842550 (95.8 MB)  TX bytes:4314694 (4.3 MB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:34 errors:0 dropped:0 overruns:0 frame:0
          TX packets:34 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1900 (1.9 KB)  TX bytes:1900 (1.9 KB)

/etc/hosts

172.16.67.69 hadoop-2-01

The nat ip for that host is 10.70.5.57

When i start the namenode there  is no problem.

But when i start the datanode i theres is an error.

This is the stacktrace of the datanode log:

2013-04-10 16:01:26,997 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/10.70.5.51:8020<http://10.70.5.51:8020/> beginning handshake with NN
2013-04-10 16:01:27,013 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/10.70.5.51:8020<http://10.70.5.51:8020/>
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException): Datanode denied communication with namenode: DatanodeRegistration(0.0.0.0, storageID=DS-1556234100-172.16.67.69-50010-1365621786288, infoPort=50075, ipcPort=50020, storageInfo=lv=-40;cid=CID-65f42cc4-6c02-4537-9fb8-627a612ec74e;nsid=1995699852;c=0)
        at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:629)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:3459)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:881)
        at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:90)
        at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:18295)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1735)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1731)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1441)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1729)

        at org.apache.hadoop.ipc.Client.call(Client.java:1235)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
        at $Proxy10.registerDatanode(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
        at $Proxy10.registerDatanode(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
        at java.lang.Thread.run(Thread.java:662)
2013-04-10 16:01:27,015 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/10.70.5.51:8020<http://10.70.5.51:8020/>
2013-04-10 16:01:27,016 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id DS-1556234100-172.16.67.69-50010-1365621786288)
2013-04-10 16:01:27,016 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Removed bpid=BP-2054036249-172.16.67.68-1365621320283 from blockPoolScannerMap
2013-04-10 16:01:27,016 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing block pool BP-2054036249-172.16.67.68-1365621320283
2013-04-10 16:01:29,017 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2013-04-10 16:01:29,019 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
2013-04-10 16:01:29,021 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at hadoop-2-01/172.16.67.69<http://172.16.67.69/>



Do you know if theres a way to solve this?

Any ideas?

Thanks.
Mauro.








Re: Problem With NAT ips

Posted by Daryn Sharp <da...@yahoo-inc.com>.
That's unfortunate.  The NN can't really know its public ip so all it can do is tell the DN its address which happens to be private.  My first thought would be the DN knows the NN's public ip to register itself, so the NN doesn't need to send its address in the web link.  However that won't work for federation (multiple NNs).  I suppose the NN could send its nameservice id and the DN can use it to lookup the NN hostname.

I'm not sure there's a good workaround short of code changes.  Hadoop (currently) isn't designed for NAT, multiple NICs, etc.  Your best bet is to file a jira and in the meantime try to set up your cluster on the same network.

Daryn

On Apr 11, 2013, at 10:10 AM, Mauro Cohen wrote:

Thank you Daryn for your response.

I try what you tell me, and now the datanode is working. But now there is another problem.

When you get to the name node live nodes page i can see mi data node as alive. But when i try to enter to the datanode page i have this message as a responde:

No Route to Host from hadoop-2-01/172.16.67.69<http://172.16.67.69/> to 172.16.67.68:8020<http://172.16.67.68:8020/> failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see: http://wiki.apache.org/hadoop/NoRouteToHost
It seems that in some point it still passing the private ip to comunicate between the nodes.

When i look into the url of the link it pass the private ip of the namenode as the nnaddress param:

http://hadoop-2-01:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F&nnaddr=172.16.67.68:8020<http://172.16.67.68:8020/>

If i put that param with the namenode hostname or with the public ip of the namenode it works fine.

But when i run any job that looks for information in the datanode, it is using the private ip to comunicate , so i get the typical msg of "could not obtain block".

Any ideas?.


Thanks.
Mauro.

2013/4/11 Daryn Sharp <da...@yahoo-inc.com>>
Hi Mauro,

The registration process has changed quite a bit.  I don't think the NN "trusts" the DN's self-identification anymore.  Otherwise it makes it trivial to spoof another DN, intentionally or not, which can be a security hazard.

I suspect the NN can't resolve the DN.  Unresolvable hosts are rejected because the allow/deny lists may contain hostnames.  If dns is temporarily unavailable, you don't want a node blocked by hostname to slip through.    Try adding the DN's public ip 10.70.5.57 to the NN's /etc/hosts if it's not resolvable via dns.

I hope this helps!

Daryn

On Apr 10, 2013, at 4:32 PM, Mauro Cohen wrote:



Hello, i have a problem with the new version of hadoop.

I have cluster with 2 nodes.
Each one has a private ip and a public IP configured through NAT.
The problem is that the private IP of each node doesnt belong to the same net. (I have no conectivity between nodes through that ip)
I have conectvity between nodes thorugh the NAT ip only, (ssh, ping, etc ).

With the hadoop 0.20.x version when i configured datanodes and namenodes configuration files i allways used the host-name for propertys (ex: fs.defaul.name<http://fs.defaul.name/> property)  and never have problems with this.
But with the new version of hadoop, theres has to be change the way that nodes comunicates itself, and they use the private ip in some point instead of host-names.

I have installed a cluster with 2 nodes:

hadoop-2-00 is the namenode.
In hadoop-2-00 i have this /etc/hosts file and this ifconfig output:

etc/hosts:

172.16.67.68 hadoop-2-00

ifconfig:

eth0      Link encap:Ethernet  HWaddr fa:16:3e:4c:06:25
          inet addr:172.16.67.68  Bcast:172.16.95.255  Mask:255.255.224.0
          inet6 addr: fe80::f816:3eff:fe4c:625/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:73475 errors:0 dropped:0 overruns:0 frame:0
          TX packets:58912 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:100923399 (100.9 MB)  TX bytes:101169918 (101.1 MB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:10 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:588 (588.0 B)  TX bytes:588 (588.0 B)

The NAT ip for this node is 10.70.5.51

I use the host-name(hadoop-2-00) in all the configuration files of hadoop.

The other node is the datanode hadoop-2-01 and has this etc/hosts and ifconfig output:

eth0      Link encap:Ethernet  HWaddr fa:16:3e:70:5e:bd
          inet addr:172.16.67.69  Bcast:172.16.95.255  Mask:255.255.224.0
          inet6 addr: fe80::f816:3eff:fe70:5ebd/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:27081 errors:0 dropped:0 overruns:0 frame:0
          TX packets:24105 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:95842550 (95.8 MB)  TX bytes:4314694 (4.3 MB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:34 errors:0 dropped:0 overruns:0 frame:0
          TX packets:34 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1900 (1.9 KB)  TX bytes:1900 (1.9 KB)

/etc/hosts

172.16.67.69 hadoop-2-01

The nat ip for that host is 10.70.5.57

When i start the namenode there  is no problem.

But when i start the datanode i theres is an error.

This is the stacktrace of the datanode log:

2013-04-10 16:01:26,997 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/10.70.5.51:8020<http://10.70.5.51:8020/> beginning handshake with NN
2013-04-10 16:01:27,013 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/10.70.5.51:8020<http://10.70.5.51:8020/>
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException): Datanode denied communication with namenode: DatanodeRegistration(0.0.0.0, storageID=DS-1556234100-172.16.67.69-50010-1365621786288, infoPort=50075, ipcPort=50020, storageInfo=lv=-40;cid=CID-65f42cc4-6c02-4537-9fb8-627a612ec74e;nsid=1995699852;c=0)
        at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:629)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:3459)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:881)
        at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:90)
        at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:18295)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1735)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1731)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1441)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1729)

        at org.apache.hadoop.ipc.Client.call(Client.java:1235)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
        at $Proxy10.registerDatanode(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
        at $Proxy10.registerDatanode(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
        at java.lang.Thread.run(Thread.java:662)
2013-04-10 16:01:27,015 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/10.70.5.51:8020<http://10.70.5.51:8020/>
2013-04-10 16:01:27,016 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id DS-1556234100-172.16.67.69-50010-1365621786288)
2013-04-10 16:01:27,016 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Removed bpid=BP-2054036249-172.16.67.68-1365621320283 from blockPoolScannerMap
2013-04-10 16:01:27,016 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing block pool BP-2054036249-172.16.67.68-1365621320283
2013-04-10 16:01:29,017 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2013-04-10 16:01:29,019 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
2013-04-10 16:01:29,021 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at hadoop-2-01/172.16.67.69<http://172.16.67.69/>



Do you know if theres a way to solve this?

Any ideas?

Thanks.
Mauro.








Re: Problem With NAT ips

Posted by Daryn Sharp <da...@yahoo-inc.com>.
That's unfortunate.  The NN can't really know its public ip so all it can do is tell the DN its address which happens to be private.  My first thought would be the DN knows the NN's public ip to register itself, so the NN doesn't need to send its address in the web link.  However that won't work for federation (multiple NNs).  I suppose the NN could send its nameservice id and the DN can use it to lookup the NN hostname.

I'm not sure there's a good workaround short of code changes.  Hadoop (currently) isn't designed for NAT, multiple NICs, etc.  Your best bet is to file a jira and in the meantime try to set up your cluster on the same network.

Daryn

On Apr 11, 2013, at 10:10 AM, Mauro Cohen wrote:

Thank you Daryn for your response.

I try what you tell me, and now the datanode is working. But now there is another problem.

When you get to the name node live nodes page i can see mi data node as alive. But when i try to enter to the datanode page i have this message as a responde:

No Route to Host from hadoop-2-01/172.16.67.69<http://172.16.67.69/> to 172.16.67.68:8020<http://172.16.67.68:8020/> failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see: http://wiki.apache.org/hadoop/NoRouteToHost
It seems that in some point it still passing the private ip to comunicate between the nodes.

When i look into the url of the link it pass the private ip of the namenode as the nnaddress param:

http://hadoop-2-01:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F&nnaddr=172.16.67.68:8020<http://172.16.67.68:8020/>

If i put that param with the namenode hostname or with the public ip of the namenode it works fine.

But when i run any job that looks for information in the datanode, it is using the private ip to comunicate , so i get the typical msg of "could not obtain block".

Any ideas?.


Thanks.
Mauro.

2013/4/11 Daryn Sharp <da...@yahoo-inc.com>>
Hi Mauro,

The registration process has changed quite a bit.  I don't think the NN "trusts" the DN's self-identification anymore.  Otherwise it makes it trivial to spoof another DN, intentionally or not, which can be a security hazard.

I suspect the NN can't resolve the DN.  Unresolvable hosts are rejected because the allow/deny lists may contain hostnames.  If dns is temporarily unavailable, you don't want a node blocked by hostname to slip through.    Try adding the DN's public ip 10.70.5.57 to the NN's /etc/hosts if it's not resolvable via dns.

I hope this helps!

Daryn

On Apr 10, 2013, at 4:32 PM, Mauro Cohen wrote:



Hello, i have a problem with the new version of hadoop.

I have cluster with 2 nodes.
Each one has a private ip and a public IP configured through NAT.
The problem is that the private IP of each node doesnt belong to the same net. (I have no conectivity between nodes through that ip)
I have conectvity between nodes thorugh the NAT ip only, (ssh, ping, etc ).

With the hadoop 0.20.x version when i configured datanodes and namenodes configuration files i allways used the host-name for propertys (ex: fs.defaul.name<http://fs.defaul.name/> property)  and never have problems with this.
But with the new version of hadoop, theres has to be change the way that nodes comunicates itself, and they use the private ip in some point instead of host-names.

I have installed a cluster with 2 nodes:

hadoop-2-00 is the namenode.
In hadoop-2-00 i have this /etc/hosts file and this ifconfig output:

etc/hosts:

172.16.67.68 hadoop-2-00

ifconfig:

eth0      Link encap:Ethernet  HWaddr fa:16:3e:4c:06:25
          inet addr:172.16.67.68  Bcast:172.16.95.255  Mask:255.255.224.0
          inet6 addr: fe80::f816:3eff:fe4c:625/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:73475 errors:0 dropped:0 overruns:0 frame:0
          TX packets:58912 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:100923399 (100.9 MB)  TX bytes:101169918 (101.1 MB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:10 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:588 (588.0 B)  TX bytes:588 (588.0 B)

The NAT ip for this node is 10.70.5.51

I use the host-name(hadoop-2-00) in all the configuration files of hadoop.

The other node is the datanode hadoop-2-01 and has this etc/hosts and ifconfig output:

eth0      Link encap:Ethernet  HWaddr fa:16:3e:70:5e:bd
          inet addr:172.16.67.69  Bcast:172.16.95.255  Mask:255.255.224.0
          inet6 addr: fe80::f816:3eff:fe70:5ebd/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:27081 errors:0 dropped:0 overruns:0 frame:0
          TX packets:24105 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:95842550 (95.8 MB)  TX bytes:4314694 (4.3 MB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:34 errors:0 dropped:0 overruns:0 frame:0
          TX packets:34 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1900 (1.9 KB)  TX bytes:1900 (1.9 KB)

/etc/hosts

172.16.67.69 hadoop-2-01

The nat ip for that host is 10.70.5.57

When i start the namenode there  is no problem.

But when i start the datanode i theres is an error.

This is the stacktrace of the datanode log:

2013-04-10 16:01:26,997 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/10.70.5.51:8020<http://10.70.5.51:8020/> beginning handshake with NN
2013-04-10 16:01:27,013 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/10.70.5.51:8020<http://10.70.5.51:8020/>
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException): Datanode denied communication with namenode: DatanodeRegistration(0.0.0.0, storageID=DS-1556234100-172.16.67.69-50010-1365621786288, infoPort=50075, ipcPort=50020, storageInfo=lv=-40;cid=CID-65f42cc4-6c02-4537-9fb8-627a612ec74e;nsid=1995699852;c=0)
        at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:629)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:3459)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:881)
        at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:90)
        at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:18295)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1735)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1731)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1441)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1729)

        at org.apache.hadoop.ipc.Client.call(Client.java:1235)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
        at $Proxy10.registerDatanode(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
        at $Proxy10.registerDatanode(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
        at java.lang.Thread.run(Thread.java:662)
2013-04-10 16:01:27,015 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/10.70.5.51:8020<http://10.70.5.51:8020/>
2013-04-10 16:01:27,016 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id DS-1556234100-172.16.67.69-50010-1365621786288)
2013-04-10 16:01:27,016 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Removed bpid=BP-2054036249-172.16.67.68-1365621320283 from blockPoolScannerMap
2013-04-10 16:01:27,016 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing block pool BP-2054036249-172.16.67.68-1365621320283
2013-04-10 16:01:29,017 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2013-04-10 16:01:29,019 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
2013-04-10 16:01:29,021 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at hadoop-2-01/172.16.67.69<http://172.16.67.69/>



Do you know if theres a way to solve this?

Any ideas?

Thanks.
Mauro.








Re: Problem With NAT ips

Posted by Mauro Cohen <ma...@gmail.com>.
Thank you Daryn for your response.

I try what you tell me, and now the datanode is working. But now there is
another problem.

When you get to the name node live nodes page i can see mi data node as
alive. But when i try to enter to the datanode page i have this message as
a responde:

No Route to Host from hadoop-2-01/172.16.67.69 to 172.16.67.68:8020 failed
on socket timeout exception: java.net.NoRouteToHostException: No route to
host; For more details see: http://wiki.apache.org/hadoop/NoRouteToHost
It seems that in some point it still passing the private ip to comunicate
between the nodes.

When i look into the url of the link it pass the private ip of the namenode
as the nnaddress param:

*
http://hadoop-2-01:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F&nnaddr=
172.16.67.68:8020*

If i put that param with the namenode hostname or with the public ip of the
namenode it works fine.

But when i run any job that looks for information in the datanode, it is
using the private ip to comunicate , so i get the typical msg of "could not
obtain block".

Any ideas?.


Thanks.
Mauro.

2013/4/11 Daryn Sharp <da...@yahoo-inc.com>

>  Hi Mauro,
>
>  The registration process has changed quite a bit.  I don't think the NN
> "trusts" the DN's self-identification anymore.  Otherwise it makes it
> trivial to spoof another DN, intentionally or not, which can be a security
> hazard.
>
>  I suspect the NN can't resolve the DN.  Unresolvable hosts are rejected
> because the allow/deny lists may contain hostnames.  If dns is temporarily
> unavailable, you don't want a node blocked by hostname to slip through.
>    Try adding the DN's public ip 10.70.5.57 to the NN's /etc/hosts if it's
> not resolvable via dns.
>
>  I hope this helps!
>
>  Daryn
>
>   On Apr 10, 2013, at 4:32 PM, Mauro Cohen wrote:
>
>
>
> Hello, i have a problem with the new version of hadoop.
>
>  I have cluster with 2 nodes.
> Each one has a private ip and a public IP configured through NAT.
> The problem is that the private IP of each node doesnt belong to the same
> net. (I have no conectivity between nodes through that ip)
> I have conectvity between nodes thorugh the NAT ip only, (ssh, ping, etc
> ).
>
>  With the hadoop 0.20.x version when i configured datanodes and namenodes
> configuration files i allways used the host-name for propertys (ex:
> fs.defaul.name property)  and never have problems with this.
> But with the new version of hadoop, theres has to be change the way that
> nodes comunicates itself, and they use the private ip in some point instead
> of host-names.
>
>  I have installed a cluster with 2 nodes:
>
>  hadoop-2-00 is the namenode.
> In hadoop-2-00 i have this /etc/hosts file and this ifconfig output:
>
>  *etc/hosts:*
>
>  172.16.67.68 hadoop-2-00
>
>  *ifconfig*:
>
>  eth0      Link encap:Ethernet  HWaddr fa:16:3e:4c:06:25
>           inet addr:172.16.67.68  Bcast:172.16.95.255  Mask:255.255.224.0
>           inet6 addr: fe80::f816:3eff:fe4c:625/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:73475 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:58912 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:100923399 (100.9 MB)  TX bytes:101169918 (101.1 MB)
>
>  lo        Link encap:Local Loopback
>           inet addr:127.0.0.1  Mask:255.0.0.0
>           inet6 addr: ::1/128 Scope:Host
>           UP LOOPBACK RUNNING  MTU:16436  Metric:1
>           RX packets:10 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:10 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:588 (588.0 B)  TX bytes:588 (588.0 B)
>
>  The NAT ip for this node is 10.70.5.51
>
>  I use the host-name(*hadoop-2-00*) in all the configuration files of
> hadoop.
>
>  The other node is the datanode* hadoop-2-01* and has this etc/hosts and
> ifconfig output:
>
>  eth0      Link encap:Ethernet  HWaddr fa:16:3e:70:5e:bd
>           inet addr:172.16.67.69  Bcast:172.16.95.255  Mask:255.255.224.0
>           inet6 addr: fe80::f816:3eff:fe70:5ebd/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:27081 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:24105 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:95842550 (95.8 MB)  TX bytes:4314694 (4.3 MB)
>
>  lo        Link encap:Local Loopback
>           inet addr:127.0.0.1  Mask:255.0.0.0
>           inet6 addr: ::1/128 Scope:Host
>           UP LOOPBACK RUNNING  MTU:16436  Metric:1
>           RX packets:34 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:34 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:1900 (1.9 KB)  TX bytes:1900 (1.9 KB)
>
>  */etc/hosts*
>
>  172.16.67.69 hadoop-2-01
>
>  The nat ip for that host is 10.70.5.57
>
>  When i start the namenode there  is no problem.
>
>  But when i start the datanode i theres is an error.
>
>  This is the stacktrace of the datanode log:
>
>  2013-04-10 16:01:26,997 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool
> BP-2054036249-172.16.67.68-1365621320283 (storage id
> DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/
> 10.70.5.51:8020 beginning handshake with NN
> 2013-04-10 16:01:27,013 FATAL
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for
> block pool Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id
> DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/
> 10.70.5.51:8020
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException):
> Datanode denied communication with namenode: DatanodeRegistration(0.0.0.0,
> storageID=DS-1556234100-172.16.67.69-50010-1365621786288, infoPort=50075,
> ipcPort=50020,
> storageInfo=lv=-40;cid=CID-65f42cc4-6c02-4537-9fb8-627a612ec74e;nsid=1995699852;c=0)
>         at
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:629)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:3459)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:881)
>         at
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:90)
>         at
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:18295)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1735)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1731)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1441)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1729)
>
>          at org.apache.hadoop.ipc.Client.call(Client.java:1235)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>         at $Proxy10.registerDatanode(Unknown Source)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>         at $Proxy10.registerDatanode(Unknown Source)
>         at
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146)
>         at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623)
>         at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)
>         at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
>         at java.lang.Thread.run(Thread.java:662)
> 2013-04-10 16:01:27,015 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service
> for: Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id
> DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/
> 10.70.5.51:8020
> 2013-04-10 16:01:27,016 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool
> BP-2054036249-172.16.67.68-1365621320283 (storage id
> DS-1556234100-172.16.67.69-50010-1365621786288)
> 2013-04-10 16:01:27,016 INFO
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Removed
> bpid=BP-2054036249-172.16.67.68-1365621320283 from blockPoolScannerMap
> 2013-04-10 16:01:27,016 INFO
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
> Removing block pool BP-2054036249-172.16.67.68-1365621320283
> 2013-04-10 16:01:29,017 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
> 2013-04-10 16:01:29,019 INFO org.apache.hadoop.util.ExitUtil: Exiting with
> status 0
> 2013-04-10 16:01:29,021 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down DataNode at hadoop-2-01/172.16.67.69
>
>
>
>  Do you know if theres a way to solve this?
>
>  Any ideas?
>
>  Thanks.
>  Mauro.
>
>
>
>
>
>

Re: Problem With NAT ips

Posted by Mauro Cohen <ma...@gmail.com>.
Thank you Daryn for your response.

I try what you tell me, and now the datanode is working. But now there is
another problem.

When you get to the name node live nodes page i can see mi data node as
alive. But when i try to enter to the datanode page i have this message as
a responde:

No Route to Host from hadoop-2-01/172.16.67.69 to 172.16.67.68:8020 failed
on socket timeout exception: java.net.NoRouteToHostException: No route to
host; For more details see: http://wiki.apache.org/hadoop/NoRouteToHost
It seems that in some point it still passing the private ip to comunicate
between the nodes.

When i look into the url of the link it pass the private ip of the namenode
as the nnaddress param:

*
http://hadoop-2-01:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F&nnaddr=
172.16.67.68:8020*

If i put that param with the namenode hostname or with the public ip of the
namenode it works fine.

But when i run any job that looks for information in the datanode, it is
using the private ip to comunicate , so i get the typical msg of "could not
obtain block".

Any ideas?.


Thanks.
Mauro.

2013/4/11 Daryn Sharp <da...@yahoo-inc.com>

>  Hi Mauro,
>
>  The registration process has changed quite a bit.  I don't think the NN
> "trusts" the DN's self-identification anymore.  Otherwise it makes it
> trivial to spoof another DN, intentionally or not, which can be a security
> hazard.
>
>  I suspect the NN can't resolve the DN.  Unresolvable hosts are rejected
> because the allow/deny lists may contain hostnames.  If dns is temporarily
> unavailable, you don't want a node blocked by hostname to slip through.
>    Try adding the DN's public ip 10.70.5.57 to the NN's /etc/hosts if it's
> not resolvable via dns.
>
>  I hope this helps!
>
>  Daryn
>
>   On Apr 10, 2013, at 4:32 PM, Mauro Cohen wrote:
>
>
>
> Hello, i have a problem with the new version of hadoop.
>
>  I have cluster with 2 nodes.
> Each one has a private ip and a public IP configured through NAT.
> The problem is that the private IP of each node doesnt belong to the same
> net. (I have no conectivity between nodes through that ip)
> I have conectvity between nodes thorugh the NAT ip only, (ssh, ping, etc
> ).
>
>  With the hadoop 0.20.x version when i configured datanodes and namenodes
> configuration files i allways used the host-name for propertys (ex:
> fs.defaul.name property)  and never have problems with this.
> But with the new version of hadoop, theres has to be change the way that
> nodes comunicates itself, and they use the private ip in some point instead
> of host-names.
>
>  I have installed a cluster with 2 nodes:
>
>  hadoop-2-00 is the namenode.
> In hadoop-2-00 i have this /etc/hosts file and this ifconfig output:
>
>  *etc/hosts:*
>
>  172.16.67.68 hadoop-2-00
>
>  *ifconfig*:
>
>  eth0      Link encap:Ethernet  HWaddr fa:16:3e:4c:06:25
>           inet addr:172.16.67.68  Bcast:172.16.95.255  Mask:255.255.224.0
>           inet6 addr: fe80::f816:3eff:fe4c:625/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:73475 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:58912 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:100923399 (100.9 MB)  TX bytes:101169918 (101.1 MB)
>
>  lo        Link encap:Local Loopback
>           inet addr:127.0.0.1  Mask:255.0.0.0
>           inet6 addr: ::1/128 Scope:Host
>           UP LOOPBACK RUNNING  MTU:16436  Metric:1
>           RX packets:10 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:10 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:588 (588.0 B)  TX bytes:588 (588.0 B)
>
>  The NAT ip for this node is 10.70.5.51
>
>  I use the host-name(*hadoop-2-00*) in all the configuration files of
> hadoop.
>
>  The other node is the datanode* hadoop-2-01* and has this etc/hosts and
> ifconfig output:
>
>  eth0      Link encap:Ethernet  HWaddr fa:16:3e:70:5e:bd
>           inet addr:172.16.67.69  Bcast:172.16.95.255  Mask:255.255.224.0
>           inet6 addr: fe80::f816:3eff:fe70:5ebd/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:27081 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:24105 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:95842550 (95.8 MB)  TX bytes:4314694 (4.3 MB)
>
>  lo        Link encap:Local Loopback
>           inet addr:127.0.0.1  Mask:255.0.0.0
>           inet6 addr: ::1/128 Scope:Host
>           UP LOOPBACK RUNNING  MTU:16436  Metric:1
>           RX packets:34 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:34 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:1900 (1.9 KB)  TX bytes:1900 (1.9 KB)
>
>  */etc/hosts*
>
>  172.16.67.69 hadoop-2-01
>
>  The nat ip for that host is 10.70.5.57
>
>  When i start the namenode there  is no problem.
>
>  But when i start the datanode i theres is an error.
>
>  This is the stacktrace of the datanode log:
>
>  2013-04-10 16:01:26,997 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool
> BP-2054036249-172.16.67.68-1365621320283 (storage id
> DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/
> 10.70.5.51:8020 beginning handshake with NN
> 2013-04-10 16:01:27,013 FATAL
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for
> block pool Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id
> DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/
> 10.70.5.51:8020
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException):
> Datanode denied communication with namenode: DatanodeRegistration(0.0.0.0,
> storageID=DS-1556234100-172.16.67.69-50010-1365621786288, infoPort=50075,
> ipcPort=50020,
> storageInfo=lv=-40;cid=CID-65f42cc4-6c02-4537-9fb8-627a612ec74e;nsid=1995699852;c=0)
>         at
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:629)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:3459)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:881)
>         at
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:90)
>         at
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:18295)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1735)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1731)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1441)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1729)
>
>          at org.apache.hadoop.ipc.Client.call(Client.java:1235)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>         at $Proxy10.registerDatanode(Unknown Source)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>         at $Proxy10.registerDatanode(Unknown Source)
>         at
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146)
>         at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623)
>         at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)
>         at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
>         at java.lang.Thread.run(Thread.java:662)
> 2013-04-10 16:01:27,015 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service
> for: Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id
> DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/
> 10.70.5.51:8020
> 2013-04-10 16:01:27,016 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool
> BP-2054036249-172.16.67.68-1365621320283 (storage id
> DS-1556234100-172.16.67.69-50010-1365621786288)
> 2013-04-10 16:01:27,016 INFO
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Removed
> bpid=BP-2054036249-172.16.67.68-1365621320283 from blockPoolScannerMap
> 2013-04-10 16:01:27,016 INFO
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
> Removing block pool BP-2054036249-172.16.67.68-1365621320283
> 2013-04-10 16:01:29,017 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
> 2013-04-10 16:01:29,019 INFO org.apache.hadoop.util.ExitUtil: Exiting with
> status 0
> 2013-04-10 16:01:29,021 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down DataNode at hadoop-2-01/172.16.67.69
>
>
>
>  Do you know if theres a way to solve this?
>
>  Any ideas?
>
>  Thanks.
>  Mauro.
>
>
>
>
>
>

Re: Problem With NAT ips

Posted by Mauro Cohen <ma...@gmail.com>.
Thank you Daryn for your response.

I try what you tell me, and now the datanode is working. But now there is
another problem.

When you get to the name node live nodes page i can see mi data node as
alive. But when i try to enter to the datanode page i have this message as
a responde:

No Route to Host from hadoop-2-01/172.16.67.69 to 172.16.67.68:8020 failed
on socket timeout exception: java.net.NoRouteToHostException: No route to
host; For more details see: http://wiki.apache.org/hadoop/NoRouteToHost
It seems that in some point it still passing the private ip to comunicate
between the nodes.

When i look into the url of the link it pass the private ip of the namenode
as the nnaddress param:

*
http://hadoop-2-01:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F&nnaddr=
172.16.67.68:8020*

If i put that param with the namenode hostname or with the public ip of the
namenode it works fine.

But when i run any job that looks for information in the datanode, it is
using the private ip to comunicate , so i get the typical msg of "could not
obtain block".

Any ideas?.


Thanks.
Mauro.

2013/4/11 Daryn Sharp <da...@yahoo-inc.com>

>  Hi Mauro,
>
>  The registration process has changed quite a bit.  I don't think the NN
> "trusts" the DN's self-identification anymore.  Otherwise it makes it
> trivial to spoof another DN, intentionally or not, which can be a security
> hazard.
>
>  I suspect the NN can't resolve the DN.  Unresolvable hosts are rejected
> because the allow/deny lists may contain hostnames.  If dns is temporarily
> unavailable, you don't want a node blocked by hostname to slip through.
>    Try adding the DN's public ip 10.70.5.57 to the NN's /etc/hosts if it's
> not resolvable via dns.
>
>  I hope this helps!
>
>  Daryn
>
>   On Apr 10, 2013, at 4:32 PM, Mauro Cohen wrote:
>
>
>
> Hello, i have a problem with the new version of hadoop.
>
>  I have cluster with 2 nodes.
> Each one has a private ip and a public IP configured through NAT.
> The problem is that the private IP of each node doesnt belong to the same
> net. (I have no conectivity between nodes through that ip)
> I have conectvity between nodes thorugh the NAT ip only, (ssh, ping, etc
> ).
>
>  With the hadoop 0.20.x version when i configured datanodes and namenodes
> configuration files i allways used the host-name for propertys (ex:
> fs.defaul.name property)  and never have problems with this.
> But with the new version of hadoop, theres has to be change the way that
> nodes comunicates itself, and they use the private ip in some point instead
> of host-names.
>
>  I have installed a cluster with 2 nodes:
>
>  hadoop-2-00 is the namenode.
> In hadoop-2-00 i have this /etc/hosts file and this ifconfig output:
>
>  *etc/hosts:*
>
>  172.16.67.68 hadoop-2-00
>
>  *ifconfig*:
>
>  eth0      Link encap:Ethernet  HWaddr fa:16:3e:4c:06:25
>           inet addr:172.16.67.68  Bcast:172.16.95.255  Mask:255.255.224.0
>           inet6 addr: fe80::f816:3eff:fe4c:625/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:73475 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:58912 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:100923399 (100.9 MB)  TX bytes:101169918 (101.1 MB)
>
>  lo        Link encap:Local Loopback
>           inet addr:127.0.0.1  Mask:255.0.0.0
>           inet6 addr: ::1/128 Scope:Host
>           UP LOOPBACK RUNNING  MTU:16436  Metric:1
>           RX packets:10 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:10 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:588 (588.0 B)  TX bytes:588 (588.0 B)
>
>  The NAT ip for this node is 10.70.5.51
>
>  I use the host-name(*hadoop-2-00*) in all the configuration files of
> hadoop.
>
>  The other node is the datanode* hadoop-2-01* and has this etc/hosts and
> ifconfig output:
>
>  eth0      Link encap:Ethernet  HWaddr fa:16:3e:70:5e:bd
>           inet addr:172.16.67.69  Bcast:172.16.95.255  Mask:255.255.224.0
>           inet6 addr: fe80::f816:3eff:fe70:5ebd/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:27081 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:24105 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:95842550 (95.8 MB)  TX bytes:4314694 (4.3 MB)
>
>  lo        Link encap:Local Loopback
>           inet addr:127.0.0.1  Mask:255.0.0.0
>           inet6 addr: ::1/128 Scope:Host
>           UP LOOPBACK RUNNING  MTU:16436  Metric:1
>           RX packets:34 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:34 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:1900 (1.9 KB)  TX bytes:1900 (1.9 KB)
>
>  */etc/hosts*
>
>  172.16.67.69 hadoop-2-01
>
>  The nat ip for that host is 10.70.5.57
>
>  When i start the namenode there  is no problem.
>
>  But when i start the datanode i theres is an error.
>
>  This is the stacktrace of the datanode log:
>
>  2013-04-10 16:01:26,997 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool
> BP-2054036249-172.16.67.68-1365621320283 (storage id
> DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/
> 10.70.5.51:8020 beginning handshake with NN
> 2013-04-10 16:01:27,013 FATAL
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for
> block pool Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id
> DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/
> 10.70.5.51:8020
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException):
> Datanode denied communication with namenode: DatanodeRegistration(0.0.0.0,
> storageID=DS-1556234100-172.16.67.69-50010-1365621786288, infoPort=50075,
> ipcPort=50020,
> storageInfo=lv=-40;cid=CID-65f42cc4-6c02-4537-9fb8-627a612ec74e;nsid=1995699852;c=0)
>         at
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:629)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:3459)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:881)
>         at
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:90)
>         at
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:18295)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1735)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1731)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1441)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1729)
>
>          at org.apache.hadoop.ipc.Client.call(Client.java:1235)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>         at $Proxy10.registerDatanode(Unknown Source)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>         at $Proxy10.registerDatanode(Unknown Source)
>         at
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146)
>         at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623)
>         at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)
>         at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
>         at java.lang.Thread.run(Thread.java:662)
> 2013-04-10 16:01:27,015 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service
> for: Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id
> DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/
> 10.70.5.51:8020
> 2013-04-10 16:01:27,016 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool
> BP-2054036249-172.16.67.68-1365621320283 (storage id
> DS-1556234100-172.16.67.69-50010-1365621786288)
> 2013-04-10 16:01:27,016 INFO
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Removed
> bpid=BP-2054036249-172.16.67.68-1365621320283 from blockPoolScannerMap
> 2013-04-10 16:01:27,016 INFO
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
> Removing block pool BP-2054036249-172.16.67.68-1365621320283
> 2013-04-10 16:01:29,017 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
> 2013-04-10 16:01:29,019 INFO org.apache.hadoop.util.ExitUtil: Exiting with
> status 0
> 2013-04-10 16:01:29,021 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down DataNode at hadoop-2-01/172.16.67.69
>
>
>
>  Do you know if theres a way to solve this?
>
>  Any ideas?
>
>  Thanks.
>  Mauro.
>
>
>
>
>
>

Re: Problem With NAT ips

Posted by Mauro Cohen <ma...@gmail.com>.
Thank you Daryn for your response.

I try what you tell me, and now the datanode is working. But now there is
another problem.

When you get to the name node live nodes page i can see mi data node as
alive. But when i try to enter to the datanode page i have this message as
a responde:

No Route to Host from hadoop-2-01/172.16.67.69 to 172.16.67.68:8020 failed
on socket timeout exception: java.net.NoRouteToHostException: No route to
host; For more details see: http://wiki.apache.org/hadoop/NoRouteToHost
It seems that in some point it still passing the private ip to comunicate
between the nodes.

When i look into the url of the link it pass the private ip of the namenode
as the nnaddress param:

*
http://hadoop-2-01:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F&nnaddr=
172.16.67.68:8020*

If i put that param with the namenode hostname or with the public ip of the
namenode it works fine.

But when i run any job that looks for information in the datanode, it is
using the private ip to comunicate , so i get the typical msg of "could not
obtain block".

Any ideas?.


Thanks.
Mauro.

2013/4/11 Daryn Sharp <da...@yahoo-inc.com>

>  Hi Mauro,
>
>  The registration process has changed quite a bit.  I don't think the NN
> "trusts" the DN's self-identification anymore.  Otherwise it makes it
> trivial to spoof another DN, intentionally or not, which can be a security
> hazard.
>
>  I suspect the NN can't resolve the DN.  Unresolvable hosts are rejected
> because the allow/deny lists may contain hostnames.  If dns is temporarily
> unavailable, you don't want a node blocked by hostname to slip through.
>    Try adding the DN's public ip 10.70.5.57 to the NN's /etc/hosts if it's
> not resolvable via dns.
>
>  I hope this helps!
>
>  Daryn
>
>   On Apr 10, 2013, at 4:32 PM, Mauro Cohen wrote:
>
>
>
> Hello, i have a problem with the new version of hadoop.
>
>  I have cluster with 2 nodes.
> Each one has a private ip and a public IP configured through NAT.
> The problem is that the private IP of each node doesnt belong to the same
> net. (I have no conectivity between nodes through that ip)
> I have conectvity between nodes thorugh the NAT ip only, (ssh, ping, etc
> ).
>
>  With the hadoop 0.20.x version when i configured datanodes and namenodes
> configuration files i allways used the host-name for propertys (ex:
> fs.defaul.name property)  and never have problems with this.
> But with the new version of hadoop, theres has to be change the way that
> nodes comunicates itself, and they use the private ip in some point instead
> of host-names.
>
>  I have installed a cluster with 2 nodes:
>
>  hadoop-2-00 is the namenode.
> In hadoop-2-00 i have this /etc/hosts file and this ifconfig output:
>
>  *etc/hosts:*
>
>  172.16.67.68 hadoop-2-00
>
>  *ifconfig*:
>
>  eth0      Link encap:Ethernet  HWaddr fa:16:3e:4c:06:25
>           inet addr:172.16.67.68  Bcast:172.16.95.255  Mask:255.255.224.0
>           inet6 addr: fe80::f816:3eff:fe4c:625/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:73475 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:58912 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:100923399 (100.9 MB)  TX bytes:101169918 (101.1 MB)
>
>  lo        Link encap:Local Loopback
>           inet addr:127.0.0.1  Mask:255.0.0.0
>           inet6 addr: ::1/128 Scope:Host
>           UP LOOPBACK RUNNING  MTU:16436  Metric:1
>           RX packets:10 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:10 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:588 (588.0 B)  TX bytes:588 (588.0 B)
>
>  The NAT ip for this node is 10.70.5.51
>
>  I use the host-name(*hadoop-2-00*) in all the configuration files of
> hadoop.
>
>  The other node is the datanode* hadoop-2-01* and has this etc/hosts and
> ifconfig output:
>
>  eth0      Link encap:Ethernet  HWaddr fa:16:3e:70:5e:bd
>           inet addr:172.16.67.69  Bcast:172.16.95.255  Mask:255.255.224.0
>           inet6 addr: fe80::f816:3eff:fe70:5ebd/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:27081 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:24105 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:95842550 (95.8 MB)  TX bytes:4314694 (4.3 MB)
>
>  lo        Link encap:Local Loopback
>           inet addr:127.0.0.1  Mask:255.0.0.0
>           inet6 addr: ::1/128 Scope:Host
>           UP LOOPBACK RUNNING  MTU:16436  Metric:1
>           RX packets:34 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:34 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:1900 (1.9 KB)  TX bytes:1900 (1.9 KB)
>
>  */etc/hosts*
>
>  172.16.67.69 hadoop-2-01
>
>  The nat ip for that host is 10.70.5.57
>
>  When i start the namenode there  is no problem.
>
>  But when i start the datanode i theres is an error.
>
>  This is the stacktrace of the datanode log:
>
>  2013-04-10 16:01:26,997 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool
> BP-2054036249-172.16.67.68-1365621320283 (storage id
> DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/
> 10.70.5.51:8020 beginning handshake with NN
> 2013-04-10 16:01:27,013 FATAL
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for
> block pool Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id
> DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/
> 10.70.5.51:8020
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException):
> Datanode denied communication with namenode: DatanodeRegistration(0.0.0.0,
> storageID=DS-1556234100-172.16.67.69-50010-1365621786288, infoPort=50075,
> ipcPort=50020,
> storageInfo=lv=-40;cid=CID-65f42cc4-6c02-4537-9fb8-627a612ec74e;nsid=1995699852;c=0)
>         at
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:629)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:3459)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:881)
>         at
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:90)
>         at
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:18295)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1735)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1731)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1441)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1729)
>
>          at org.apache.hadoop.ipc.Client.call(Client.java:1235)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>         at $Proxy10.registerDatanode(Unknown Source)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>         at $Proxy10.registerDatanode(Unknown Source)
>         at
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146)
>         at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623)
>         at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)
>         at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
>         at java.lang.Thread.run(Thread.java:662)
> 2013-04-10 16:01:27,015 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service
> for: Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id
> DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/
> 10.70.5.51:8020
> 2013-04-10 16:01:27,016 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool
> BP-2054036249-172.16.67.68-1365621320283 (storage id
> DS-1556234100-172.16.67.69-50010-1365621786288)
> 2013-04-10 16:01:27,016 INFO
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Removed
> bpid=BP-2054036249-172.16.67.68-1365621320283 from blockPoolScannerMap
> 2013-04-10 16:01:27,016 INFO
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
> Removing block pool BP-2054036249-172.16.67.68-1365621320283
> 2013-04-10 16:01:29,017 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
> 2013-04-10 16:01:29,019 INFO org.apache.hadoop.util.ExitUtil: Exiting with
> status 0
> 2013-04-10 16:01:29,021 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down DataNode at hadoop-2-01/172.16.67.69
>
>
>
>  Do you know if theres a way to solve this?
>
>  Any ideas?
>
>  Thanks.
>  Mauro.
>
>
>
>
>
>

Re: Problem With NAT ips

Posted by Daryn Sharp <da...@yahoo-inc.com>.
Hi Mauro,

The registration process has changed quite a bit.  I don't think the NN "trusts" the DN's self-identification anymore.  Otherwise it makes it trivial to spoof another DN, intentionally or not, which can be a security hazard.

I suspect the NN can't resolve the DN.  Unresolvable hosts are rejected because the allow/deny lists may contain hostnames.  If dns is temporarily unavailable, you don't want a node blocked by hostname to slip through.    Try adding the DN's public ip 10.70.5.57 to the NN's /etc/hosts if it's not resolvable via dns.

I hope this helps!

Daryn

On Apr 10, 2013, at 4:32 PM, Mauro Cohen wrote:



Hello, i have a problem with the new version of hadoop.

I have cluster with 2 nodes.
Each one has a private ip and a public IP configured through NAT.
The problem is that the private IP of each node doesnt belong to the same net. (I have no conectivity between nodes through that ip)
I have conectvity between nodes thorugh the NAT ip only, (ssh, ping, etc ).

With the hadoop 0.20.x version when i configured datanodes and namenodes configuration files i allways used the host-name for propertys (ex: fs.defaul.name<http://fs.defaul.name/> property)  and never have problems with this.
But with the new version of hadoop, theres has to be change the way that nodes comunicates itself, and they use the private ip in some point instead of host-names.

I have installed a cluster with 2 nodes:

hadoop-2-00 is the namenode.
In hadoop-2-00 i have this /etc/hosts file and this ifconfig output:

etc/hosts:

172.16.67.68 hadoop-2-00

ifconfig:

eth0      Link encap:Ethernet  HWaddr fa:16:3e:4c:06:25
          inet addr:172.16.67.68  Bcast:172.16.95.255  Mask:255.255.224.0
          inet6 addr: fe80::f816:3eff:fe4c:625/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:73475 errors:0 dropped:0 overruns:0 frame:0
          TX packets:58912 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:100923399 (100.9 MB)  TX bytes:101169918 (101.1 MB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:10 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:588 (588.0 B)  TX bytes:588 (588.0 B)

The NAT ip for this node is 10.70.5.51

I use the host-name(hadoop-2-00) in all the configuration files of hadoop.

The other node is the datanode hadoop-2-01 and has this etc/hosts and ifconfig output:

eth0      Link encap:Ethernet  HWaddr fa:16:3e:70:5e:bd
          inet addr:172.16.67.69  Bcast:172.16.95.255  Mask:255.255.224.0
          inet6 addr: fe80::f816:3eff:fe70:5ebd/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:27081 errors:0 dropped:0 overruns:0 frame:0
          TX packets:24105 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:95842550 (95.8 MB)  TX bytes:4314694 (4.3 MB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:34 errors:0 dropped:0 overruns:0 frame:0
          TX packets:34 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1900 (1.9 KB)  TX bytes:1900 (1.9 KB)

/etc/hosts

172.16.67.69 hadoop-2-01

The nat ip for that host is 10.70.5.57

When i start the namenode there  is no problem.

But when i start the datanode i theres is an error.

This is the stacktrace of the datanode log:

2013-04-10 16:01:26,997 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/10.70.5.51:8020<http://10.70.5.51:8020/> beginning handshake with NN
2013-04-10 16:01:27,013 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/10.70.5.51:8020<http://10.70.5.51:8020/>
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException): Datanode denied communication with namenode: DatanodeRegistration(0.0.0.0, storageID=DS-1556234100-172.16.67.69-50010-1365621786288, infoPort=50075, ipcPort=50020, storageInfo=lv=-40;cid=CID-65f42cc4-6c02-4537-9fb8-627a612ec74e;nsid=1995699852;c=0)
        at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:629)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:3459)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:881)
        at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:90)
        at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:18295)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1735)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1731)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1441)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1729)

        at org.apache.hadoop.ipc.Client.call(Client.java:1235)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
        at $Proxy10.registerDatanode(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
        at $Proxy10.registerDatanode(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
        at java.lang.Thread.run(Thread.java:662)
2013-04-10 16:01:27,015 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/10.70.5.51:8020<http://10.70.5.51:8020/>
2013-04-10 16:01:27,016 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id DS-1556234100-172.16.67.69-50010-1365621786288)
2013-04-10 16:01:27,016 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Removed bpid=BP-2054036249-172.16.67.68-1365621320283 from blockPoolScannerMap
2013-04-10 16:01:27,016 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing block pool BP-2054036249-172.16.67.68-1365621320283
2013-04-10 16:01:29,017 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2013-04-10 16:01:29,019 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
2013-04-10 16:01:29,021 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at hadoop-2-01/172.16.67.69<http://172.16.67.69/>



Do you know if theres a way to solve this?

Any ideas?

Thanks.
Mauro.






Re: Problem With NAT ips

Posted by Daryn Sharp <da...@yahoo-inc.com>.
Hi Mauro,

The registration process has changed quite a bit.  I don't think the NN "trusts" the DN's self-identification anymore.  Otherwise it makes it trivial to spoof another DN, intentionally or not, which can be a security hazard.

I suspect the NN can't resolve the DN.  Unresolvable hosts are rejected because the allow/deny lists may contain hostnames.  If dns is temporarily unavailable, you don't want a node blocked by hostname to slip through.    Try adding the DN's public ip 10.70.5.57 to the NN's /etc/hosts if it's not resolvable via dns.

I hope this helps!

Daryn

On Apr 10, 2013, at 4:32 PM, Mauro Cohen wrote:



Hello, i have a problem with the new version of hadoop.

I have cluster with 2 nodes.
Each one has a private ip and a public IP configured through NAT.
The problem is that the private IP of each node doesnt belong to the same net. (I have no conectivity between nodes through that ip)
I have conectvity between nodes thorugh the NAT ip only, (ssh, ping, etc ).

With the hadoop 0.20.x version when i configured datanodes and namenodes configuration files i allways used the host-name for propertys (ex: fs.defaul.name<http://fs.defaul.name/> property)  and never have problems with this.
But with the new version of hadoop, theres has to be change the way that nodes comunicates itself, and they use the private ip in some point instead of host-names.

I have installed a cluster with 2 nodes:

hadoop-2-00 is the namenode.
In hadoop-2-00 i have this /etc/hosts file and this ifconfig output:

etc/hosts:

172.16.67.68 hadoop-2-00

ifconfig:

eth0      Link encap:Ethernet  HWaddr fa:16:3e:4c:06:25
          inet addr:172.16.67.68  Bcast:172.16.95.255  Mask:255.255.224.0
          inet6 addr: fe80::f816:3eff:fe4c:625/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:73475 errors:0 dropped:0 overruns:0 frame:0
          TX packets:58912 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:100923399 (100.9 MB)  TX bytes:101169918 (101.1 MB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:10 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:588 (588.0 B)  TX bytes:588 (588.0 B)

The NAT ip for this node is 10.70.5.51

I use the host-name(hadoop-2-00) in all the configuration files of hadoop.

The other node is the datanode hadoop-2-01 and has this etc/hosts and ifconfig output:

eth0      Link encap:Ethernet  HWaddr fa:16:3e:70:5e:bd
          inet addr:172.16.67.69  Bcast:172.16.95.255  Mask:255.255.224.0
          inet6 addr: fe80::f816:3eff:fe70:5ebd/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:27081 errors:0 dropped:0 overruns:0 frame:0
          TX packets:24105 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:95842550 (95.8 MB)  TX bytes:4314694 (4.3 MB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:34 errors:0 dropped:0 overruns:0 frame:0
          TX packets:34 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1900 (1.9 KB)  TX bytes:1900 (1.9 KB)

/etc/hosts

172.16.67.69 hadoop-2-01

The nat ip for that host is 10.70.5.57

When i start the namenode there  is no problem.

But when i start the datanode i theres is an error.

This is the stacktrace of the datanode log:

2013-04-10 16:01:26,997 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/10.70.5.51:8020<http://10.70.5.51:8020/> beginning handshake with NN
2013-04-10 16:01:27,013 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/10.70.5.51:8020<http://10.70.5.51:8020/>
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException): Datanode denied communication with namenode: DatanodeRegistration(0.0.0.0, storageID=DS-1556234100-172.16.67.69-50010-1365621786288, infoPort=50075, ipcPort=50020, storageInfo=lv=-40;cid=CID-65f42cc4-6c02-4537-9fb8-627a612ec74e;nsid=1995699852;c=0)
        at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:629)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:3459)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:881)
        at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:90)
        at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:18295)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1735)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1731)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1441)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1729)

        at org.apache.hadoop.ipc.Client.call(Client.java:1235)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
        at $Proxy10.registerDatanode(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
        at $Proxy10.registerDatanode(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
        at java.lang.Thread.run(Thread.java:662)
2013-04-10 16:01:27,015 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/10.70.5.51:8020<http://10.70.5.51:8020/>
2013-04-10 16:01:27,016 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id DS-1556234100-172.16.67.69-50010-1365621786288)
2013-04-10 16:01:27,016 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Removed bpid=BP-2054036249-172.16.67.68-1365621320283 from blockPoolScannerMap
2013-04-10 16:01:27,016 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing block pool BP-2054036249-172.16.67.68-1365621320283
2013-04-10 16:01:29,017 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2013-04-10 16:01:29,019 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
2013-04-10 16:01:29,021 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at hadoop-2-01/172.16.67.69<http://172.16.67.69/>



Do you know if theres a way to solve this?

Any ideas?

Thanks.
Mauro.






Re: Problem With NAT ips

Posted by Daryn Sharp <da...@yahoo-inc.com>.
Hi Mauro,

The registration process has changed quite a bit.  I don't think the NN "trusts" the DN's self-identification anymore.  Otherwise it makes it trivial to spoof another DN, intentionally or not, which can be a security hazard.

I suspect the NN can't resolve the DN.  Unresolvable hosts are rejected because the allow/deny lists may contain hostnames.  If dns is temporarily unavailable, you don't want a node blocked by hostname to slip through.    Try adding the DN's public ip 10.70.5.57 to the NN's /etc/hosts if it's not resolvable via dns.

I hope this helps!

Daryn

On Apr 10, 2013, at 4:32 PM, Mauro Cohen wrote:



Hello, i have a problem with the new version of hadoop.

I have cluster with 2 nodes.
Each one has a private ip and a public IP configured through NAT.
The problem is that the private IP of each node doesnt belong to the same net. (I have no conectivity between nodes through that ip)
I have conectvity between nodes thorugh the NAT ip only, (ssh, ping, etc ).

With the hadoop 0.20.x version when i configured datanodes and namenodes configuration files i allways used the host-name for propertys (ex: fs.defaul.name<http://fs.defaul.name/> property)  and never have problems with this.
But with the new version of hadoop, theres has to be change the way that nodes comunicates itself, and they use the private ip in some point instead of host-names.

I have installed a cluster with 2 nodes:

hadoop-2-00 is the namenode.
In hadoop-2-00 i have this /etc/hosts file and this ifconfig output:

etc/hosts:

172.16.67.68 hadoop-2-00

ifconfig:

eth0      Link encap:Ethernet  HWaddr fa:16:3e:4c:06:25
          inet addr:172.16.67.68  Bcast:172.16.95.255  Mask:255.255.224.0
          inet6 addr: fe80::f816:3eff:fe4c:625/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:73475 errors:0 dropped:0 overruns:0 frame:0
          TX packets:58912 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:100923399 (100.9 MB)  TX bytes:101169918 (101.1 MB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:10 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:588 (588.0 B)  TX bytes:588 (588.0 B)

The NAT ip for this node is 10.70.5.51

I use the host-name(hadoop-2-00) in all the configuration files of hadoop.

The other node is the datanode hadoop-2-01 and has this etc/hosts and ifconfig output:

eth0      Link encap:Ethernet  HWaddr fa:16:3e:70:5e:bd
          inet addr:172.16.67.69  Bcast:172.16.95.255  Mask:255.255.224.0
          inet6 addr: fe80::f816:3eff:fe70:5ebd/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:27081 errors:0 dropped:0 overruns:0 frame:0
          TX packets:24105 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:95842550 (95.8 MB)  TX bytes:4314694 (4.3 MB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:34 errors:0 dropped:0 overruns:0 frame:0
          TX packets:34 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1900 (1.9 KB)  TX bytes:1900 (1.9 KB)

/etc/hosts

172.16.67.69 hadoop-2-01

The nat ip for that host is 10.70.5.57

When i start the namenode there  is no problem.

But when i start the datanode i theres is an error.

This is the stacktrace of the datanode log:

2013-04-10 16:01:26,997 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/10.70.5.51:8020<http://10.70.5.51:8020/> beginning handshake with NN
2013-04-10 16:01:27,013 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/10.70.5.51:8020<http://10.70.5.51:8020/>
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException): Datanode denied communication with namenode: DatanodeRegistration(0.0.0.0, storageID=DS-1556234100-172.16.67.69-50010-1365621786288, infoPort=50075, ipcPort=50020, storageInfo=lv=-40;cid=CID-65f42cc4-6c02-4537-9fb8-627a612ec74e;nsid=1995699852;c=0)
        at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:629)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:3459)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:881)
        at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:90)
        at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:18295)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1735)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1731)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1441)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1729)

        at org.apache.hadoop.ipc.Client.call(Client.java:1235)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
        at $Proxy10.registerDatanode(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
        at $Proxy10.registerDatanode(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
        at java.lang.Thread.run(Thread.java:662)
2013-04-10 16:01:27,015 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/10.70.5.51:8020<http://10.70.5.51:8020/>
2013-04-10 16:01:27,016 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id DS-1556234100-172.16.67.69-50010-1365621786288)
2013-04-10 16:01:27,016 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Removed bpid=BP-2054036249-172.16.67.68-1365621320283 from blockPoolScannerMap
2013-04-10 16:01:27,016 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing block pool BP-2054036249-172.16.67.68-1365621320283
2013-04-10 16:01:29,017 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2013-04-10 16:01:29,019 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
2013-04-10 16:01:29,021 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at hadoop-2-01/172.16.67.69<http://172.16.67.69/>



Do you know if theres a way to solve this?

Any ideas?

Thanks.
Mauro.






Re: Problem With NAT ips

Posted by Daryn Sharp <da...@yahoo-inc.com>.
Hi Mauro,

The registration process has changed quite a bit.  I don't think the NN "trusts" the DN's self-identification anymore.  Otherwise it makes it trivial to spoof another DN, intentionally or not, which can be a security hazard.

I suspect the NN can't resolve the DN.  Unresolvable hosts are rejected because the allow/deny lists may contain hostnames.  If dns is temporarily unavailable, you don't want a node blocked by hostname to slip through.    Try adding the DN's public ip 10.70.5.57 to the NN's /etc/hosts if it's not resolvable via dns.

I hope this helps!

Daryn

On Apr 10, 2013, at 4:32 PM, Mauro Cohen wrote:



Hello, i have a problem with the new version of hadoop.

I have cluster with 2 nodes.
Each one has a private ip and a public IP configured through NAT.
The problem is that the private IP of each node doesnt belong to the same net. (I have no conectivity between nodes through that ip)
I have conectvity between nodes thorugh the NAT ip only, (ssh, ping, etc ).

With the hadoop 0.20.x version when i configured datanodes and namenodes configuration files i allways used the host-name for propertys (ex: fs.defaul.name<http://fs.defaul.name/> property)  and never have problems with this.
But with the new version of hadoop, theres has to be change the way that nodes comunicates itself, and they use the private ip in some point instead of host-names.

I have installed a cluster with 2 nodes:

hadoop-2-00 is the namenode.
In hadoop-2-00 i have this /etc/hosts file and this ifconfig output:

etc/hosts:

172.16.67.68 hadoop-2-00

ifconfig:

eth0      Link encap:Ethernet  HWaddr fa:16:3e:4c:06:25
          inet addr:172.16.67.68  Bcast:172.16.95.255  Mask:255.255.224.0
          inet6 addr: fe80::f816:3eff:fe4c:625/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:73475 errors:0 dropped:0 overruns:0 frame:0
          TX packets:58912 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:100923399 (100.9 MB)  TX bytes:101169918 (101.1 MB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:10 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:588 (588.0 B)  TX bytes:588 (588.0 B)

The NAT ip for this node is 10.70.5.51

I use the host-name(hadoop-2-00) in all the configuration files of hadoop.

The other node is the datanode hadoop-2-01 and has this etc/hosts and ifconfig output:

eth0      Link encap:Ethernet  HWaddr fa:16:3e:70:5e:bd
          inet addr:172.16.67.69  Bcast:172.16.95.255  Mask:255.255.224.0
          inet6 addr: fe80::f816:3eff:fe70:5ebd/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:27081 errors:0 dropped:0 overruns:0 frame:0
          TX packets:24105 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:95842550 (95.8 MB)  TX bytes:4314694 (4.3 MB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:34 errors:0 dropped:0 overruns:0 frame:0
          TX packets:34 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1900 (1.9 KB)  TX bytes:1900 (1.9 KB)

/etc/hosts

172.16.67.69 hadoop-2-01

The nat ip for that host is 10.70.5.57

When i start the namenode there  is no problem.

But when i start the datanode i theres is an error.

This is the stacktrace of the datanode log:

2013-04-10 16:01:26,997 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/10.70.5.51:8020<http://10.70.5.51:8020/> beginning handshake with NN
2013-04-10 16:01:27,013 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/10.70.5.51:8020<http://10.70.5.51:8020/>
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException): Datanode denied communication with namenode: DatanodeRegistration(0.0.0.0, storageID=DS-1556234100-172.16.67.69-50010-1365621786288, infoPort=50075, ipcPort=50020, storageInfo=lv=-40;cid=CID-65f42cc4-6c02-4537-9fb8-627a612ec74e;nsid=1995699852;c=0)
        at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:629)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:3459)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:881)
        at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:90)
        at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:18295)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1735)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1731)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1441)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1729)

        at org.apache.hadoop.ipc.Client.call(Client.java:1235)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
        at $Proxy10.registerDatanode(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
        at $Proxy10.registerDatanode(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
        at java.lang.Thread.run(Thread.java:662)
2013-04-10 16:01:27,015 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id DS-1556234100-172.16.67.69-50010-1365621786288) service to hadoop-2-00/10.70.5.51:8020<http://10.70.5.51:8020/>
2013-04-10 16:01:27,016 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool BP-2054036249-172.16.67.68-1365621320283 (storage id DS-1556234100-172.16.67.69-50010-1365621786288)
2013-04-10 16:01:27,016 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Removed bpid=BP-2054036249-172.16.67.68-1365621320283 from blockPoolScannerMap
2013-04-10 16:01:27,016 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing block pool BP-2054036249-172.16.67.68-1365621320283
2013-04-10 16:01:29,017 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2013-04-10 16:01:29,019 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
2013-04-10 16:01:29,021 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at hadoop-2-01/172.16.67.69<http://172.16.67.69/>



Do you know if theres a way to solve this?

Any ideas?

Thanks.
Mauro.