You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Daniel Watrous <dw...@gmail.com> on 2015/09/24 19:51:04 UTC

Datanodes not connecting to the cluster

I have a multi-node cluster with two datanodes. After running start-dfs.sh,
I show the following processes running

hadoop@hadoop-master:~$ $HADOOP_HOME/sbin/slaves.sh jps
hadoop-master: 10933 DataNode
hadoop-master: 10759 NameNode
hadoop-master: 11145 SecondaryNameNode
hadoop-master: 11567 Jps
hadoop-data1: 5186 Jps
hadoop-data1: 5059 DataNode
hadoop-data2: 5180 Jps
hadoop-data2: 5053 DataNode


However, the other two DataNodes aren't visible.
http://screencast.com/t/icsLnXXDk

Where can I look for clues?

Re: Datanodes not connecting to the cluster

Posted by Daniel Watrous <dw...@gmail.com>.
phew, I finally added the property below to yarn-site.xml

  <property>
    <name>yarn.resourcemanager.bind-host</name>
    <value>0.0.0.0</value>
  </property>

I now see the datanodes, but not at the same time. Under datanodes in
operation I see the master and either one of the other datanodes. Is that
typical behavior? Perhaps it's switching between them for redundancy?

Daniel

On Thu, Sep 24, 2015 at 2:49 PM, Daniel Watrous <dw...@gmail.com>
wrote:

> I'm making a little progress here.
>
> I added the following properties to hdfs-site.xml
>
>   <property>
>     <name>dfs.namenode.rpc-bind-host</name>
>     <value>0.0.0.0</value>
>   </property>
>   <property>
>     <name>dfs.namenode.servicerpc-bind-host</name>
>     <value>0.0.0.0</value>
>   </property>
>
> I can now connect to hadoop-master:
>
> hadoop@hadoop-data1:~$ telnet hadoop-master 54310
> Trying 192.168.51.4...
> Connected to hadoop-master.
> Escape character is '^]'.
>
> BUT I'm now getting the error below. I'm confused that it's trying to
> connect to 192.168.51.1 because that's not even a valid IP in my
> installation.
>
> 2015-09-24 19:40:08,821 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for
> Block pool BP-852923283-127.0.1.1-1443119668806 (Datanode Uuid null)
> service to hadoop-master/192.168.51.4:54310 Datanode denied communication
> with namenode because hostname cannot be resolved (ip=192.168.51.1,
> hostname=192.168.51.1): DatanodeRegistration(0.0.0.0:50010,
> datanodeUuid=cd8107ac-f1dd-4c33-a054-2dee95cc4a16, infoPort=50075,
> infoSecurePort=0, ipcPort=50020,
> storageInfo=lv=-56;cid=CID-8e9d0e10-5744-448b-a45a-87644364e714;nsid=1441606918;c=0)
>
> Any idea what's happening here?
>
>
> On Thu, Sep 24, 2015 at 2:26 PM, Daniel Watrous <dw...@gmail.com>
> wrote:
>
>> In a further test, I tried connecting to the NameNode from hadoop-master
>> (where it's running) using both the hostname and the IP address.
>>
>> vagrant@hadoop-master:~/src$ telnet 192.168.51.4 54310
>> Trying 192.168.51.4...
>> telnet: Unable to connect to remote host: Connection refused
>> vagrant@hadoop-master:~/src$ telnet localhost 54310
>> Trying 127.0.0.1...
>> telnet: Unable to connect to remote host: Connection refused
>> vagrant@hadoop-master:~/src$ telnet hadoop-master 54310
>> Trying 127.0.1.1...
>> Connected to hadoop-master.
>> Escape character is '^]'.
>>
>>
>> As you can see the IP address or localhost connection is refused, but the
>> hostname connection succeeds. Is there some way to configure the namenode
>> to accept connections from all hosts?
>>
>> On Thu, Sep 24, 2015 at 2:14 PM, Daniel Watrous <dw...@gmail.com>
>> wrote:
>>
>>> On one of the namenodes I have found the following warning:
>>>
>>> 2015-09-24 18:40:17,639 WARN
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to
>>> server: hadoop-master/192.168.51.4:54310
>>>
>>> On my master node I see that the process is running and has bound that
>>> port
>>>
>>> vagrant@hadoop-master:~/src$ sudo lsof -i :54310
>>> COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
>>> java    7480 hadoop  202u  IPv4  26931      0t0  TCP hadoop-master:54310
>>> (LISTEN)
>>> java    7480 hadoop  212u  IPv4  28758      0t0  TCP
>>> hadoop-master:54310->localhost:47226 (ESTABLISHED)
>>> java    7651 hadoop  238u  IPv4  28247      0t0  TCP
>>> localhost:47226->hadoop-master:54310 (ESTABLISHED)
>>> hadoop@hadoop-master:~$ jps
>>> 7856 SecondaryNameNode
>>> 7651 DataNode
>>> 7480 NameNode
>>> 8106 Jps
>>>
>>> I don't appear to have any firewall rules interfering with traffic
>>> vagrant@hadoop-master:~/src$ sudo iptables --list
>>> Chain INPUT (policy ACCEPT)
>>> target     prot opt source               destination
>>>
>>> Chain FORWARD (policy ACCEPT)
>>> target     prot opt source               destination
>>>
>>> Chain OUTPUT (policy ACCEPT)
>>> target     prot opt source               destination
>>>
>>> The iptables --list output is identical on hadoop-data1. I also show a
>>> process attempting to connect to hadoop-master
>>> vagrant@hadoop-data1:~$ sudo lsof -i :54310
>>> COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
>>> java    3823 hadoop  238u  IPv4  19304      0t0  TCP
>>> hadoop-data1:45600->hadoop-master:54310 (SYN_SENT)
>>>
>>> I am confused by the notation of hostname/IP:port.
>>> All help appreciated.
>>>
>>> Daniel
>>>
>>>
>>>
>>> On Thu, Sep 24, 2015 at 12:51 PM, Daniel Watrous <dw...@gmail.com>
>>> wrote:
>>>
>>>> I have a multi-node cluster with two datanodes. After running
>>>> start-dfs.sh, I show the following processes running
>>>>
>>>> hadoop@hadoop-master:~$ $HADOOP_HOME/sbin/slaves.sh jps
>>>> hadoop-master: 10933 DataNode
>>>> hadoop-master: 10759 NameNode
>>>> hadoop-master: 11145 SecondaryNameNode
>>>> hadoop-master: 11567 Jps
>>>> hadoop-data1: 5186 Jps
>>>> hadoop-data1: 5059 DataNode
>>>> hadoop-data2: 5180 Jps
>>>> hadoop-data2: 5053 DataNode
>>>>
>>>>
>>>> However, the other two DataNodes aren't visible.
>>>> http://screencast.com/t/icsLnXXDk
>>>>
>>>> Where can I look for clues?
>>>>
>>>
>>>
>>
>

Re: Datanodes not connecting to the cluster

Posted by Daniel Watrous <dw...@gmail.com>.
phew, I finally added the property below to yarn-site.xml

  <property>
    <name>yarn.resourcemanager.bind-host</name>
    <value>0.0.0.0</value>
  </property>

I now see the datanodes, but not at the same time. Under datanodes in
operation I see the master and either one of the other datanodes. Is that
typical behavior? Perhaps it's switching between them for redundancy?

Daniel

On Thu, Sep 24, 2015 at 2:49 PM, Daniel Watrous <dw...@gmail.com>
wrote:

> I'm making a little progress here.
>
> I added the following properties to hdfs-site.xml
>
>   <property>
>     <name>dfs.namenode.rpc-bind-host</name>
>     <value>0.0.0.0</value>
>   </property>
>   <property>
>     <name>dfs.namenode.servicerpc-bind-host</name>
>     <value>0.0.0.0</value>
>   </property>
>
> I can now connect to hadoop-master:
>
> hadoop@hadoop-data1:~$ telnet hadoop-master 54310
> Trying 192.168.51.4...
> Connected to hadoop-master.
> Escape character is '^]'.
>
> BUT I'm now getting the error below. I'm confused that it's trying to
> connect to 192.168.51.1 because that's not even a valid IP in my
> installation.
>
> 2015-09-24 19:40:08,821 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for
> Block pool BP-852923283-127.0.1.1-1443119668806 (Datanode Uuid null)
> service to hadoop-master/192.168.51.4:54310 Datanode denied communication
> with namenode because hostname cannot be resolved (ip=192.168.51.1,
> hostname=192.168.51.1): DatanodeRegistration(0.0.0.0:50010,
> datanodeUuid=cd8107ac-f1dd-4c33-a054-2dee95cc4a16, infoPort=50075,
> infoSecurePort=0, ipcPort=50020,
> storageInfo=lv=-56;cid=CID-8e9d0e10-5744-448b-a45a-87644364e714;nsid=1441606918;c=0)
>
> Any idea what's happening here?
>
>
> On Thu, Sep 24, 2015 at 2:26 PM, Daniel Watrous <dw...@gmail.com>
> wrote:
>
>> In a further test, I tried connecting to the NameNode from hadoop-master
>> (where it's running) using both the hostname and the IP address.
>>
>> vagrant@hadoop-master:~/src$ telnet 192.168.51.4 54310
>> Trying 192.168.51.4...
>> telnet: Unable to connect to remote host: Connection refused
>> vagrant@hadoop-master:~/src$ telnet localhost 54310
>> Trying 127.0.0.1...
>> telnet: Unable to connect to remote host: Connection refused
>> vagrant@hadoop-master:~/src$ telnet hadoop-master 54310
>> Trying 127.0.1.1...
>> Connected to hadoop-master.
>> Escape character is '^]'.
>>
>>
>> As you can see the IP address or localhost connection is refused, but the
>> hostname connection succeeds. Is there some way to configure the namenode
>> to accept connections from all hosts?
>>
>> On Thu, Sep 24, 2015 at 2:14 PM, Daniel Watrous <dw...@gmail.com>
>> wrote:
>>
>>> On one of the namenodes I have found the following warning:
>>>
>>> 2015-09-24 18:40:17,639 WARN
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to
>>> server: hadoop-master/192.168.51.4:54310
>>>
>>> On my master node I see that the process is running and has bound that
>>> port
>>>
>>> vagrant@hadoop-master:~/src$ sudo lsof -i :54310
>>> COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
>>> java    7480 hadoop  202u  IPv4  26931      0t0  TCP hadoop-master:54310
>>> (LISTEN)
>>> java    7480 hadoop  212u  IPv4  28758      0t0  TCP
>>> hadoop-master:54310->localhost:47226 (ESTABLISHED)
>>> java    7651 hadoop  238u  IPv4  28247      0t0  TCP
>>> localhost:47226->hadoop-master:54310 (ESTABLISHED)
>>> hadoop@hadoop-master:~$ jps
>>> 7856 SecondaryNameNode
>>> 7651 DataNode
>>> 7480 NameNode
>>> 8106 Jps
>>>
>>> I don't appear to have any firewall rules interfering with traffic
>>> vagrant@hadoop-master:~/src$ sudo iptables --list
>>> Chain INPUT (policy ACCEPT)
>>> target     prot opt source               destination
>>>
>>> Chain FORWARD (policy ACCEPT)
>>> target     prot opt source               destination
>>>
>>> Chain OUTPUT (policy ACCEPT)
>>> target     prot opt source               destination
>>>
>>> The iptables --list output is identical on hadoop-data1. I also show a
>>> process attempting to connect to hadoop-master
>>> vagrant@hadoop-data1:~$ sudo lsof -i :54310
>>> COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
>>> java    3823 hadoop  238u  IPv4  19304      0t0  TCP
>>> hadoop-data1:45600->hadoop-master:54310 (SYN_SENT)
>>>
>>> I am confused by the notation of hostname/IP:port.
>>> All help appreciated.
>>>
>>> Daniel
>>>
>>>
>>>
>>> On Thu, Sep 24, 2015 at 12:51 PM, Daniel Watrous <dw...@gmail.com>
>>> wrote:
>>>
>>>> I have a multi-node cluster with two datanodes. After running
>>>> start-dfs.sh, I show the following processes running
>>>>
>>>> hadoop@hadoop-master:~$ $HADOOP_HOME/sbin/slaves.sh jps
>>>> hadoop-master: 10933 DataNode
>>>> hadoop-master: 10759 NameNode
>>>> hadoop-master: 11145 SecondaryNameNode
>>>> hadoop-master: 11567 Jps
>>>> hadoop-data1: 5186 Jps
>>>> hadoop-data1: 5059 DataNode
>>>> hadoop-data2: 5180 Jps
>>>> hadoop-data2: 5053 DataNode
>>>>
>>>>
>>>> However, the other two DataNodes aren't visible.
>>>> http://screencast.com/t/icsLnXXDk
>>>>
>>>> Where can I look for clues?
>>>>
>>>
>>>
>>
>

Re: Datanodes not connecting to the cluster

Posted by Daniel Watrous <dw...@gmail.com>.
phew, I finally added the property below to yarn-site.xml

  <property>
    <name>yarn.resourcemanager.bind-host</name>
    <value>0.0.0.0</value>
  </property>

I now see the datanodes, but not at the same time. Under datanodes in
operation I see the master and either one of the other datanodes. Is that
typical behavior? Perhaps it's switching between them for redundancy?

Daniel

On Thu, Sep 24, 2015 at 2:49 PM, Daniel Watrous <dw...@gmail.com>
wrote:

> I'm making a little progress here.
>
> I added the following properties to hdfs-site.xml
>
>   <property>
>     <name>dfs.namenode.rpc-bind-host</name>
>     <value>0.0.0.0</value>
>   </property>
>   <property>
>     <name>dfs.namenode.servicerpc-bind-host</name>
>     <value>0.0.0.0</value>
>   </property>
>
> I can now connect to hadoop-master:
>
> hadoop@hadoop-data1:~$ telnet hadoop-master 54310
> Trying 192.168.51.4...
> Connected to hadoop-master.
> Escape character is '^]'.
>
> BUT I'm now getting the error below. I'm confused that it's trying to
> connect to 192.168.51.1 because that's not even a valid IP in my
> installation.
>
> 2015-09-24 19:40:08,821 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for
> Block pool BP-852923283-127.0.1.1-1443119668806 (Datanode Uuid null)
> service to hadoop-master/192.168.51.4:54310 Datanode denied communication
> with namenode because hostname cannot be resolved (ip=192.168.51.1,
> hostname=192.168.51.1): DatanodeRegistration(0.0.0.0:50010,
> datanodeUuid=cd8107ac-f1dd-4c33-a054-2dee95cc4a16, infoPort=50075,
> infoSecurePort=0, ipcPort=50020,
> storageInfo=lv=-56;cid=CID-8e9d0e10-5744-448b-a45a-87644364e714;nsid=1441606918;c=0)
>
> Any idea what's happening here?
>
>
> On Thu, Sep 24, 2015 at 2:26 PM, Daniel Watrous <dw...@gmail.com>
> wrote:
>
>> In a further test, I tried connecting to the NameNode from hadoop-master
>> (where it's running) using both the hostname and the IP address.
>>
>> vagrant@hadoop-master:~/src$ telnet 192.168.51.4 54310
>> Trying 192.168.51.4...
>> telnet: Unable to connect to remote host: Connection refused
>> vagrant@hadoop-master:~/src$ telnet localhost 54310
>> Trying 127.0.0.1...
>> telnet: Unable to connect to remote host: Connection refused
>> vagrant@hadoop-master:~/src$ telnet hadoop-master 54310
>> Trying 127.0.1.1...
>> Connected to hadoop-master.
>> Escape character is '^]'.
>>
>>
>> As you can see the IP address or localhost connection is refused, but the
>> hostname connection succeeds. Is there some way to configure the namenode
>> to accept connections from all hosts?
>>
>> On Thu, Sep 24, 2015 at 2:14 PM, Daniel Watrous <dw...@gmail.com>
>> wrote:
>>
>>> On one of the namenodes I have found the following warning:
>>>
>>> 2015-09-24 18:40:17,639 WARN
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to
>>> server: hadoop-master/192.168.51.4:54310
>>>
>>> On my master node I see that the process is running and has bound that
>>> port
>>>
>>> vagrant@hadoop-master:~/src$ sudo lsof -i :54310
>>> COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
>>> java    7480 hadoop  202u  IPv4  26931      0t0  TCP hadoop-master:54310
>>> (LISTEN)
>>> java    7480 hadoop  212u  IPv4  28758      0t0  TCP
>>> hadoop-master:54310->localhost:47226 (ESTABLISHED)
>>> java    7651 hadoop  238u  IPv4  28247      0t0  TCP
>>> localhost:47226->hadoop-master:54310 (ESTABLISHED)
>>> hadoop@hadoop-master:~$ jps
>>> 7856 SecondaryNameNode
>>> 7651 DataNode
>>> 7480 NameNode
>>> 8106 Jps
>>>
>>> I don't appear to have any firewall rules interfering with traffic
>>> vagrant@hadoop-master:~/src$ sudo iptables --list
>>> Chain INPUT (policy ACCEPT)
>>> target     prot opt source               destination
>>>
>>> Chain FORWARD (policy ACCEPT)
>>> target     prot opt source               destination
>>>
>>> Chain OUTPUT (policy ACCEPT)
>>> target     prot opt source               destination
>>>
>>> The iptables --list output is identical on hadoop-data1. I also show a
>>> process attempting to connect to hadoop-master
>>> vagrant@hadoop-data1:~$ sudo lsof -i :54310
>>> COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
>>> java    3823 hadoop  238u  IPv4  19304      0t0  TCP
>>> hadoop-data1:45600->hadoop-master:54310 (SYN_SENT)
>>>
>>> I am confused by the notation of hostname/IP:port.
>>> All help appreciated.
>>>
>>> Daniel
>>>
>>>
>>>
>>> On Thu, Sep 24, 2015 at 12:51 PM, Daniel Watrous <dw...@gmail.com>
>>> wrote:
>>>
>>>> I have a multi-node cluster with two datanodes. After running
>>>> start-dfs.sh, I show the following processes running
>>>>
>>>> hadoop@hadoop-master:~$ $HADOOP_HOME/sbin/slaves.sh jps
>>>> hadoop-master: 10933 DataNode
>>>> hadoop-master: 10759 NameNode
>>>> hadoop-master: 11145 SecondaryNameNode
>>>> hadoop-master: 11567 Jps
>>>> hadoop-data1: 5186 Jps
>>>> hadoop-data1: 5059 DataNode
>>>> hadoop-data2: 5180 Jps
>>>> hadoop-data2: 5053 DataNode
>>>>
>>>>
>>>> However, the other two DataNodes aren't visible.
>>>> http://screencast.com/t/icsLnXXDk
>>>>
>>>> Where can I look for clues?
>>>>
>>>
>>>
>>
>

Re: Datanodes not connecting to the cluster

Posted by Daniel Watrous <dw...@gmail.com>.
phew, I finally added the property below to yarn-site.xml

  <property>
    <name>yarn.resourcemanager.bind-host</name>
    <value>0.0.0.0</value>
  </property>

I now see the datanodes, but not at the same time. Under datanodes in
operation I see the master and either one of the other datanodes. Is that
typical behavior? Perhaps it's switching between them for redundancy?

Daniel

On Thu, Sep 24, 2015 at 2:49 PM, Daniel Watrous <dw...@gmail.com>
wrote:

> I'm making a little progress here.
>
> I added the following properties to hdfs-site.xml
>
>   <property>
>     <name>dfs.namenode.rpc-bind-host</name>
>     <value>0.0.0.0</value>
>   </property>
>   <property>
>     <name>dfs.namenode.servicerpc-bind-host</name>
>     <value>0.0.0.0</value>
>   </property>
>
> I can now connect to hadoop-master:
>
> hadoop@hadoop-data1:~$ telnet hadoop-master 54310
> Trying 192.168.51.4...
> Connected to hadoop-master.
> Escape character is '^]'.
>
> BUT I'm now getting the error below. I'm confused that it's trying to
> connect to 192.168.51.1 because that's not even a valid IP in my
> installation.
>
> 2015-09-24 19:40:08,821 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for
> Block pool BP-852923283-127.0.1.1-1443119668806 (Datanode Uuid null)
> service to hadoop-master/192.168.51.4:54310 Datanode denied communication
> with namenode because hostname cannot be resolved (ip=192.168.51.1,
> hostname=192.168.51.1): DatanodeRegistration(0.0.0.0:50010,
> datanodeUuid=cd8107ac-f1dd-4c33-a054-2dee95cc4a16, infoPort=50075,
> infoSecurePort=0, ipcPort=50020,
> storageInfo=lv=-56;cid=CID-8e9d0e10-5744-448b-a45a-87644364e714;nsid=1441606918;c=0)
>
> Any idea what's happening here?
>
>
> On Thu, Sep 24, 2015 at 2:26 PM, Daniel Watrous <dw...@gmail.com>
> wrote:
>
>> In a further test, I tried connecting to the NameNode from hadoop-master
>> (where it's running) using both the hostname and the IP address.
>>
>> vagrant@hadoop-master:~/src$ telnet 192.168.51.4 54310
>> Trying 192.168.51.4...
>> telnet: Unable to connect to remote host: Connection refused
>> vagrant@hadoop-master:~/src$ telnet localhost 54310
>> Trying 127.0.0.1...
>> telnet: Unable to connect to remote host: Connection refused
>> vagrant@hadoop-master:~/src$ telnet hadoop-master 54310
>> Trying 127.0.1.1...
>> Connected to hadoop-master.
>> Escape character is '^]'.
>>
>>
>> As you can see the IP address or localhost connection is refused, but the
>> hostname connection succeeds. Is there some way to configure the namenode
>> to accept connections from all hosts?
>>
>> On Thu, Sep 24, 2015 at 2:14 PM, Daniel Watrous <dw...@gmail.com>
>> wrote:
>>
>>> On one of the namenodes I have found the following warning:
>>>
>>> 2015-09-24 18:40:17,639 WARN
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to
>>> server: hadoop-master/192.168.51.4:54310
>>>
>>> On my master node I see that the process is running and has bound that
>>> port
>>>
>>> vagrant@hadoop-master:~/src$ sudo lsof -i :54310
>>> COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
>>> java    7480 hadoop  202u  IPv4  26931      0t0  TCP hadoop-master:54310
>>> (LISTEN)
>>> java    7480 hadoop  212u  IPv4  28758      0t0  TCP
>>> hadoop-master:54310->localhost:47226 (ESTABLISHED)
>>> java    7651 hadoop  238u  IPv4  28247      0t0  TCP
>>> localhost:47226->hadoop-master:54310 (ESTABLISHED)
>>> hadoop@hadoop-master:~$ jps
>>> 7856 SecondaryNameNode
>>> 7651 DataNode
>>> 7480 NameNode
>>> 8106 Jps
>>>
>>> I don't appear to have any firewall rules interfering with traffic
>>> vagrant@hadoop-master:~/src$ sudo iptables --list
>>> Chain INPUT (policy ACCEPT)
>>> target     prot opt source               destination
>>>
>>> Chain FORWARD (policy ACCEPT)
>>> target     prot opt source               destination
>>>
>>> Chain OUTPUT (policy ACCEPT)
>>> target     prot opt source               destination
>>>
>>> The iptables --list output is identical on hadoop-data1. I also show a
>>> process attempting to connect to hadoop-master
>>> vagrant@hadoop-data1:~$ sudo lsof -i :54310
>>> COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
>>> java    3823 hadoop  238u  IPv4  19304      0t0  TCP
>>> hadoop-data1:45600->hadoop-master:54310 (SYN_SENT)
>>>
>>> I am confused by the notation of hostname/IP:port.
>>> All help appreciated.
>>>
>>> Daniel
>>>
>>>
>>>
>>> On Thu, Sep 24, 2015 at 12:51 PM, Daniel Watrous <dw...@gmail.com>
>>> wrote:
>>>
>>>> I have a multi-node cluster with two datanodes. After running
>>>> start-dfs.sh, I show the following processes running
>>>>
>>>> hadoop@hadoop-master:~$ $HADOOP_HOME/sbin/slaves.sh jps
>>>> hadoop-master: 10933 DataNode
>>>> hadoop-master: 10759 NameNode
>>>> hadoop-master: 11145 SecondaryNameNode
>>>> hadoop-master: 11567 Jps
>>>> hadoop-data1: 5186 Jps
>>>> hadoop-data1: 5059 DataNode
>>>> hadoop-data2: 5180 Jps
>>>> hadoop-data2: 5053 DataNode
>>>>
>>>>
>>>> However, the other two DataNodes aren't visible.
>>>> http://screencast.com/t/icsLnXXDk
>>>>
>>>> Where can I look for clues?
>>>>
>>>
>>>
>>
>

Re: Datanodes not connecting to the cluster

Posted by Daniel Watrous <dw...@gmail.com>.
I'm making a little progress here.

I added the following properties to hdfs-site.xml

  <property>
    <name>dfs.namenode.rpc-bind-host</name>
    <value>0.0.0.0</value>
  </property>
  <property>
    <name>dfs.namenode.servicerpc-bind-host</name>
    <value>0.0.0.0</value>
  </property>

I can now connect to hadoop-master:

hadoop@hadoop-data1:~$ telnet hadoop-master 54310
Trying 192.168.51.4...
Connected to hadoop-master.
Escape character is '^]'.

BUT I'm now getting the error below. I'm confused that it's trying to
connect to 192.168.51.1 because that's not even a valid IP in my
installation.

2015-09-24 19:40:08,821 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for
Block pool BP-852923283-127.0.1.1-1443119668806 (Datanode Uuid null)
service to hadoop-master/192.168.51.4:54310 Datanode denied communication
with namenode because hostname cannot be resolved (ip=192.168.51.1,
hostname=192.168.51.1): DatanodeRegistration(0.0.0.0:50010,
datanodeUuid=cd8107ac-f1dd-4c33-a054-2dee95cc4a16, infoPort=50075,
infoSecurePort=0, ipcPort=50020,
storageInfo=lv=-56;cid=CID-8e9d0e10-5744-448b-a45a-87644364e714;nsid=1441606918;c=0)

Any idea what's happening here?


On Thu, Sep 24, 2015 at 2:26 PM, Daniel Watrous <dw...@gmail.com>
wrote:

> In a further test, I tried connecting to the NameNode from hadoop-master
> (where it's running) using both the hostname and the IP address.
>
> vagrant@hadoop-master:~/src$ telnet 192.168.51.4 54310
> Trying 192.168.51.4...
> telnet: Unable to connect to remote host: Connection refused
> vagrant@hadoop-master:~/src$ telnet localhost 54310
> Trying 127.0.0.1...
> telnet: Unable to connect to remote host: Connection refused
> vagrant@hadoop-master:~/src$ telnet hadoop-master 54310
> Trying 127.0.1.1...
> Connected to hadoop-master.
> Escape character is '^]'.
>
>
> As you can see the IP address or localhost connection is refused, but the
> hostname connection succeeds. Is there some way to configure the namenode
> to accept connections from all hosts?
>
> On Thu, Sep 24, 2015 at 2:14 PM, Daniel Watrous <dw...@gmail.com>
> wrote:
>
>> On one of the namenodes I have found the following warning:
>>
>> 2015-09-24 18:40:17,639 WARN
>> org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to
>> server: hadoop-master/192.168.51.4:54310
>>
>> On my master node I see that the process is running and has bound that
>> port
>>
>> vagrant@hadoop-master:~/src$ sudo lsof -i :54310
>> COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
>> java    7480 hadoop  202u  IPv4  26931      0t0  TCP hadoop-master:54310
>> (LISTEN)
>> java    7480 hadoop  212u  IPv4  28758      0t0  TCP
>> hadoop-master:54310->localhost:47226 (ESTABLISHED)
>> java    7651 hadoop  238u  IPv4  28247      0t0  TCP
>> localhost:47226->hadoop-master:54310 (ESTABLISHED)
>> hadoop@hadoop-master:~$ jps
>> 7856 SecondaryNameNode
>> 7651 DataNode
>> 7480 NameNode
>> 8106 Jps
>>
>> I don't appear to have any firewall rules interfering with traffic
>> vagrant@hadoop-master:~/src$ sudo iptables --list
>> Chain INPUT (policy ACCEPT)
>> target     prot opt source               destination
>>
>> Chain FORWARD (policy ACCEPT)
>> target     prot opt source               destination
>>
>> Chain OUTPUT (policy ACCEPT)
>> target     prot opt source               destination
>>
>> The iptables --list output is identical on hadoop-data1. I also show a
>> process attempting to connect to hadoop-master
>> vagrant@hadoop-data1:~$ sudo lsof -i :54310
>> COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
>> java    3823 hadoop  238u  IPv4  19304      0t0  TCP
>> hadoop-data1:45600->hadoop-master:54310 (SYN_SENT)
>>
>> I am confused by the notation of hostname/IP:port.
>> All help appreciated.
>>
>> Daniel
>>
>>
>>
>> On Thu, Sep 24, 2015 at 12:51 PM, Daniel Watrous <dw...@gmail.com>
>> wrote:
>>
>>> I have a multi-node cluster with two datanodes. After running
>>> start-dfs.sh, I show the following processes running
>>>
>>> hadoop@hadoop-master:~$ $HADOOP_HOME/sbin/slaves.sh jps
>>> hadoop-master: 10933 DataNode
>>> hadoop-master: 10759 NameNode
>>> hadoop-master: 11145 SecondaryNameNode
>>> hadoop-master: 11567 Jps
>>> hadoop-data1: 5186 Jps
>>> hadoop-data1: 5059 DataNode
>>> hadoop-data2: 5180 Jps
>>> hadoop-data2: 5053 DataNode
>>>
>>>
>>> However, the other two DataNodes aren't visible.
>>> http://screencast.com/t/icsLnXXDk
>>>
>>> Where can I look for clues?
>>>
>>
>>
>

Re: Datanodes not connecting to the cluster

Posted by Daniel Watrous <dw...@gmail.com>.
I'm making a little progress here.

I added the following properties to hdfs-site.xml

  <property>
    <name>dfs.namenode.rpc-bind-host</name>
    <value>0.0.0.0</value>
  </property>
  <property>
    <name>dfs.namenode.servicerpc-bind-host</name>
    <value>0.0.0.0</value>
  </property>

I can now connect to hadoop-master:

hadoop@hadoop-data1:~$ telnet hadoop-master 54310
Trying 192.168.51.4...
Connected to hadoop-master.
Escape character is '^]'.

BUT I'm now getting the error below. I'm confused that it's trying to
connect to 192.168.51.1 because that's not even a valid IP in my
installation.

2015-09-24 19:40:08,821 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for
Block pool BP-852923283-127.0.1.1-1443119668806 (Datanode Uuid null)
service to hadoop-master/192.168.51.4:54310 Datanode denied communication
with namenode because hostname cannot be resolved (ip=192.168.51.1,
hostname=192.168.51.1): DatanodeRegistration(0.0.0.0:50010,
datanodeUuid=cd8107ac-f1dd-4c33-a054-2dee95cc4a16, infoPort=50075,
infoSecurePort=0, ipcPort=50020,
storageInfo=lv=-56;cid=CID-8e9d0e10-5744-448b-a45a-87644364e714;nsid=1441606918;c=0)

Any idea what's happening here?


On Thu, Sep 24, 2015 at 2:26 PM, Daniel Watrous <dw...@gmail.com>
wrote:

> In a further test, I tried connecting to the NameNode from hadoop-master
> (where it's running) using both the hostname and the IP address.
>
> vagrant@hadoop-master:~/src$ telnet 192.168.51.4 54310
> Trying 192.168.51.4...
> telnet: Unable to connect to remote host: Connection refused
> vagrant@hadoop-master:~/src$ telnet localhost 54310
> Trying 127.0.0.1...
> telnet: Unable to connect to remote host: Connection refused
> vagrant@hadoop-master:~/src$ telnet hadoop-master 54310
> Trying 127.0.1.1...
> Connected to hadoop-master.
> Escape character is '^]'.
>
>
> As you can see the IP address or localhost connection is refused, but the
> hostname connection succeeds. Is there some way to configure the namenode
> to accept connections from all hosts?
>
> On Thu, Sep 24, 2015 at 2:14 PM, Daniel Watrous <dw...@gmail.com>
> wrote:
>
>> On one of the namenodes I have found the following warning:
>>
>> 2015-09-24 18:40:17,639 WARN
>> org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to
>> server: hadoop-master/192.168.51.4:54310
>>
>> On my master node I see that the process is running and has bound that
>> port
>>
>> vagrant@hadoop-master:~/src$ sudo lsof -i :54310
>> COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
>> java    7480 hadoop  202u  IPv4  26931      0t0  TCP hadoop-master:54310
>> (LISTEN)
>> java    7480 hadoop  212u  IPv4  28758      0t0  TCP
>> hadoop-master:54310->localhost:47226 (ESTABLISHED)
>> java    7651 hadoop  238u  IPv4  28247      0t0  TCP
>> localhost:47226->hadoop-master:54310 (ESTABLISHED)
>> hadoop@hadoop-master:~$ jps
>> 7856 SecondaryNameNode
>> 7651 DataNode
>> 7480 NameNode
>> 8106 Jps
>>
>> I don't appear to have any firewall rules interfering with traffic
>> vagrant@hadoop-master:~/src$ sudo iptables --list
>> Chain INPUT (policy ACCEPT)
>> target     prot opt source               destination
>>
>> Chain FORWARD (policy ACCEPT)
>> target     prot opt source               destination
>>
>> Chain OUTPUT (policy ACCEPT)
>> target     prot opt source               destination
>>
>> The iptables --list output is identical on hadoop-data1. I also show a
>> process attempting to connect to hadoop-master
>> vagrant@hadoop-data1:~$ sudo lsof -i :54310
>> COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
>> java    3823 hadoop  238u  IPv4  19304      0t0  TCP
>> hadoop-data1:45600->hadoop-master:54310 (SYN_SENT)
>>
>> I am confused by the notation of hostname/IP:port.
>> All help appreciated.
>>
>> Daniel
>>
>>
>>
>> On Thu, Sep 24, 2015 at 12:51 PM, Daniel Watrous <dw...@gmail.com>
>> wrote:
>>
>>> I have a multi-node cluster with two datanodes. After running
>>> start-dfs.sh, I show the following processes running
>>>
>>> hadoop@hadoop-master:~$ $HADOOP_HOME/sbin/slaves.sh jps
>>> hadoop-master: 10933 DataNode
>>> hadoop-master: 10759 NameNode
>>> hadoop-master: 11145 SecondaryNameNode
>>> hadoop-master: 11567 Jps
>>> hadoop-data1: 5186 Jps
>>> hadoop-data1: 5059 DataNode
>>> hadoop-data2: 5180 Jps
>>> hadoop-data2: 5053 DataNode
>>>
>>>
>>> However, the other two DataNodes aren't visible.
>>> http://screencast.com/t/icsLnXXDk
>>>
>>> Where can I look for clues?
>>>
>>
>>
>

Re: Datanodes not connecting to the cluster

Posted by Daniel Watrous <dw...@gmail.com>.
I'm making a little progress here.

I added the following properties to hdfs-site.xml

  <property>
    <name>dfs.namenode.rpc-bind-host</name>
    <value>0.0.0.0</value>
  </property>
  <property>
    <name>dfs.namenode.servicerpc-bind-host</name>
    <value>0.0.0.0</value>
  </property>

I can now connect to hadoop-master:

hadoop@hadoop-data1:~$ telnet hadoop-master 54310
Trying 192.168.51.4...
Connected to hadoop-master.
Escape character is '^]'.

BUT I'm now getting the error below. I'm confused that it's trying to
connect to 192.168.51.1 because that's not even a valid IP in my
installation.

2015-09-24 19:40:08,821 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for
Block pool BP-852923283-127.0.1.1-1443119668806 (Datanode Uuid null)
service to hadoop-master/192.168.51.4:54310 Datanode denied communication
with namenode because hostname cannot be resolved (ip=192.168.51.1,
hostname=192.168.51.1): DatanodeRegistration(0.0.0.0:50010,
datanodeUuid=cd8107ac-f1dd-4c33-a054-2dee95cc4a16, infoPort=50075,
infoSecurePort=0, ipcPort=50020,
storageInfo=lv=-56;cid=CID-8e9d0e10-5744-448b-a45a-87644364e714;nsid=1441606918;c=0)

Any idea what's happening here?


On Thu, Sep 24, 2015 at 2:26 PM, Daniel Watrous <dw...@gmail.com>
wrote:

> In a further test, I tried connecting to the NameNode from hadoop-master
> (where it's running) using both the hostname and the IP address.
>
> vagrant@hadoop-master:~/src$ telnet 192.168.51.4 54310
> Trying 192.168.51.4...
> telnet: Unable to connect to remote host: Connection refused
> vagrant@hadoop-master:~/src$ telnet localhost 54310
> Trying 127.0.0.1...
> telnet: Unable to connect to remote host: Connection refused
> vagrant@hadoop-master:~/src$ telnet hadoop-master 54310
> Trying 127.0.1.1...
> Connected to hadoop-master.
> Escape character is '^]'.
>
>
> As you can see the IP address or localhost connection is refused, but the
> hostname connection succeeds. Is there some way to configure the namenode
> to accept connections from all hosts?
>
> On Thu, Sep 24, 2015 at 2:14 PM, Daniel Watrous <dw...@gmail.com>
> wrote:
>
>> On one of the namenodes I have found the following warning:
>>
>> 2015-09-24 18:40:17,639 WARN
>> org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to
>> server: hadoop-master/192.168.51.4:54310
>>
>> On my master node I see that the process is running and has bound that
>> port
>>
>> vagrant@hadoop-master:~/src$ sudo lsof -i :54310
>> COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
>> java    7480 hadoop  202u  IPv4  26931      0t0  TCP hadoop-master:54310
>> (LISTEN)
>> java    7480 hadoop  212u  IPv4  28758      0t0  TCP
>> hadoop-master:54310->localhost:47226 (ESTABLISHED)
>> java    7651 hadoop  238u  IPv4  28247      0t0  TCP
>> localhost:47226->hadoop-master:54310 (ESTABLISHED)
>> hadoop@hadoop-master:~$ jps
>> 7856 SecondaryNameNode
>> 7651 DataNode
>> 7480 NameNode
>> 8106 Jps
>>
>> I don't appear to have any firewall rules interfering with traffic
>> vagrant@hadoop-master:~/src$ sudo iptables --list
>> Chain INPUT (policy ACCEPT)
>> target     prot opt source               destination
>>
>> Chain FORWARD (policy ACCEPT)
>> target     prot opt source               destination
>>
>> Chain OUTPUT (policy ACCEPT)
>> target     prot opt source               destination
>>
>> The iptables --list output is identical on hadoop-data1. I also show a
>> process attempting to connect to hadoop-master
>> vagrant@hadoop-data1:~$ sudo lsof -i :54310
>> COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
>> java    3823 hadoop  238u  IPv4  19304      0t0  TCP
>> hadoop-data1:45600->hadoop-master:54310 (SYN_SENT)
>>
>> I am confused by the notation of hostname/IP:port.
>> All help appreciated.
>>
>> Daniel
>>
>>
>>
>> On Thu, Sep 24, 2015 at 12:51 PM, Daniel Watrous <dw...@gmail.com>
>> wrote:
>>
>>> I have a multi-node cluster with two datanodes. After running
>>> start-dfs.sh, I show the following processes running
>>>
>>> hadoop@hadoop-master:~$ $HADOOP_HOME/sbin/slaves.sh jps
>>> hadoop-master: 10933 DataNode
>>> hadoop-master: 10759 NameNode
>>> hadoop-master: 11145 SecondaryNameNode
>>> hadoop-master: 11567 Jps
>>> hadoop-data1: 5186 Jps
>>> hadoop-data1: 5059 DataNode
>>> hadoop-data2: 5180 Jps
>>> hadoop-data2: 5053 DataNode
>>>
>>>
>>> However, the other two DataNodes aren't visible.
>>> http://screencast.com/t/icsLnXXDk
>>>
>>> Where can I look for clues?
>>>
>>
>>
>

Re: Datanodes not connecting to the cluster

Posted by Daniel Watrous <dw...@gmail.com>.
I'm making a little progress here.

I added the following properties to hdfs-site.xml

  <property>
    <name>dfs.namenode.rpc-bind-host</name>
    <value>0.0.0.0</value>
  </property>
  <property>
    <name>dfs.namenode.servicerpc-bind-host</name>
    <value>0.0.0.0</value>
  </property>

I can now connect to hadoop-master:

hadoop@hadoop-data1:~$ telnet hadoop-master 54310
Trying 192.168.51.4...
Connected to hadoop-master.
Escape character is '^]'.

BUT I'm now getting the error below. I'm confused that it's trying to
connect to 192.168.51.1 because that's not even a valid IP in my
installation.

2015-09-24 19:40:08,821 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for
Block pool BP-852923283-127.0.1.1-1443119668806 (Datanode Uuid null)
service to hadoop-master/192.168.51.4:54310 Datanode denied communication
with namenode because hostname cannot be resolved (ip=192.168.51.1,
hostname=192.168.51.1): DatanodeRegistration(0.0.0.0:50010,
datanodeUuid=cd8107ac-f1dd-4c33-a054-2dee95cc4a16, infoPort=50075,
infoSecurePort=0, ipcPort=50020,
storageInfo=lv=-56;cid=CID-8e9d0e10-5744-448b-a45a-87644364e714;nsid=1441606918;c=0)

Any idea what's happening here?


On Thu, Sep 24, 2015 at 2:26 PM, Daniel Watrous <dw...@gmail.com>
wrote:

> In a further test, I tried connecting to the NameNode from hadoop-master
> (where it's running) using both the hostname and the IP address.
>
> vagrant@hadoop-master:~/src$ telnet 192.168.51.4 54310
> Trying 192.168.51.4...
> telnet: Unable to connect to remote host: Connection refused
> vagrant@hadoop-master:~/src$ telnet localhost 54310
> Trying 127.0.0.1...
> telnet: Unable to connect to remote host: Connection refused
> vagrant@hadoop-master:~/src$ telnet hadoop-master 54310
> Trying 127.0.1.1...
> Connected to hadoop-master.
> Escape character is '^]'.
>
>
> As you can see the IP address or localhost connection is refused, but the
> hostname connection succeeds. Is there some way to configure the namenode
> to accept connections from all hosts?
>
> On Thu, Sep 24, 2015 at 2:14 PM, Daniel Watrous <dw...@gmail.com>
> wrote:
>
>> On one of the namenodes I have found the following warning:
>>
>> 2015-09-24 18:40:17,639 WARN
>> org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to
>> server: hadoop-master/192.168.51.4:54310
>>
>> On my master node I see that the process is running and has bound that
>> port
>>
>> vagrant@hadoop-master:~/src$ sudo lsof -i :54310
>> COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
>> java    7480 hadoop  202u  IPv4  26931      0t0  TCP hadoop-master:54310
>> (LISTEN)
>> java    7480 hadoop  212u  IPv4  28758      0t0  TCP
>> hadoop-master:54310->localhost:47226 (ESTABLISHED)
>> java    7651 hadoop  238u  IPv4  28247      0t0  TCP
>> localhost:47226->hadoop-master:54310 (ESTABLISHED)
>> hadoop@hadoop-master:~$ jps
>> 7856 SecondaryNameNode
>> 7651 DataNode
>> 7480 NameNode
>> 8106 Jps
>>
>> I don't appear to have any firewall rules interfering with traffic
>> vagrant@hadoop-master:~/src$ sudo iptables --list
>> Chain INPUT (policy ACCEPT)
>> target     prot opt source               destination
>>
>> Chain FORWARD (policy ACCEPT)
>> target     prot opt source               destination
>>
>> Chain OUTPUT (policy ACCEPT)
>> target     prot opt source               destination
>>
>> The iptables --list output is identical on hadoop-data1. I also show a
>> process attempting to connect to hadoop-master
>> vagrant@hadoop-data1:~$ sudo lsof -i :54310
>> COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
>> java    3823 hadoop  238u  IPv4  19304      0t0  TCP
>> hadoop-data1:45600->hadoop-master:54310 (SYN_SENT)
>>
>> I am confused by the notation of hostname/IP:port.
>> All help appreciated.
>>
>> Daniel
>>
>>
>>
>> On Thu, Sep 24, 2015 at 12:51 PM, Daniel Watrous <dw...@gmail.com>
>> wrote:
>>
>>> I have a multi-node cluster with two datanodes. After running
>>> start-dfs.sh, I show the following processes running
>>>
>>> hadoop@hadoop-master:~$ $HADOOP_HOME/sbin/slaves.sh jps
>>> hadoop-master: 10933 DataNode
>>> hadoop-master: 10759 NameNode
>>> hadoop-master: 11145 SecondaryNameNode
>>> hadoop-master: 11567 Jps
>>> hadoop-data1: 5186 Jps
>>> hadoop-data1: 5059 DataNode
>>> hadoop-data2: 5180 Jps
>>> hadoop-data2: 5053 DataNode
>>>
>>>
>>> However, the other two DataNodes aren't visible.
>>> http://screencast.com/t/icsLnXXDk
>>>
>>> Where can I look for clues?
>>>
>>
>>
>

Re: Datanodes not connecting to the cluster

Posted by Daniel Watrous <dw...@gmail.com>.
In a further test, I tried connecting to the NameNode from hadoop-master
(where it's running) using both the hostname and the IP address.

vagrant@hadoop-master:~/src$ telnet 192.168.51.4 54310
Trying 192.168.51.4...
telnet: Unable to connect to remote host: Connection refused
vagrant@hadoop-master:~/src$ telnet localhost 54310
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection refused
vagrant@hadoop-master:~/src$ telnet hadoop-master 54310
Trying 127.0.1.1...
Connected to hadoop-master.
Escape character is '^]'.


As you can see the IP address or localhost connection is refused, but the
hostname connection succeeds. Is there some way to configure the namenode
to accept connections from all hosts?

On Thu, Sep 24, 2015 at 2:14 PM, Daniel Watrous <dw...@gmail.com>
wrote:

> On one of the namenodes I have found the following warning:
>
> 2015-09-24 18:40:17,639 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to
> server: hadoop-master/192.168.51.4:54310
>
> On my master node I see that the process is running and has bound that port
>
> vagrant@hadoop-master:~/src$ sudo lsof -i :54310
> COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
> java    7480 hadoop  202u  IPv4  26931      0t0  TCP hadoop-master:54310
> (LISTEN)
> java    7480 hadoop  212u  IPv4  28758      0t0  TCP
> hadoop-master:54310->localhost:47226 (ESTABLISHED)
> java    7651 hadoop  238u  IPv4  28247      0t0  TCP
> localhost:47226->hadoop-master:54310 (ESTABLISHED)
> hadoop@hadoop-master:~$ jps
> 7856 SecondaryNameNode
> 7651 DataNode
> 7480 NameNode
> 8106 Jps
>
> I don't appear to have any firewall rules interfering with traffic
> vagrant@hadoop-master:~/src$ sudo iptables --list
> Chain INPUT (policy ACCEPT)
> target     prot opt source               destination
>
> Chain FORWARD (policy ACCEPT)
> target     prot opt source               destination
>
> Chain OUTPUT (policy ACCEPT)
> target     prot opt source               destination
>
> The iptables --list output is identical on hadoop-data1. I also show a
> process attempting to connect to hadoop-master
> vagrant@hadoop-data1:~$ sudo lsof -i :54310
> COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
> java    3823 hadoop  238u  IPv4  19304      0t0  TCP
> hadoop-data1:45600->hadoop-master:54310 (SYN_SENT)
>
> I am confused by the notation of hostname/IP:port.
> All help appreciated.
>
> Daniel
>
>
>
> On Thu, Sep 24, 2015 at 12:51 PM, Daniel Watrous <dw...@gmail.com>
> wrote:
>
>> I have a multi-node cluster with two datanodes. After running
>> start-dfs.sh, I show the following processes running
>>
>> hadoop@hadoop-master:~$ $HADOOP_HOME/sbin/slaves.sh jps
>> hadoop-master: 10933 DataNode
>> hadoop-master: 10759 NameNode
>> hadoop-master: 11145 SecondaryNameNode
>> hadoop-master: 11567 Jps
>> hadoop-data1: 5186 Jps
>> hadoop-data1: 5059 DataNode
>> hadoop-data2: 5180 Jps
>> hadoop-data2: 5053 DataNode
>>
>>
>> However, the other two DataNodes aren't visible.
>> http://screencast.com/t/icsLnXXDk
>>
>> Where can I look for clues?
>>
>
>

Re: Datanodes not connecting to the cluster

Posted by Daniel Watrous <dw...@gmail.com>.
In a further test, I tried connecting to the NameNode from hadoop-master
(where it's running) using both the hostname and the IP address.

vagrant@hadoop-master:~/src$ telnet 192.168.51.4 54310
Trying 192.168.51.4...
telnet: Unable to connect to remote host: Connection refused
vagrant@hadoop-master:~/src$ telnet localhost 54310
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection refused
vagrant@hadoop-master:~/src$ telnet hadoop-master 54310
Trying 127.0.1.1...
Connected to hadoop-master.
Escape character is '^]'.


As you can see the IP address or localhost connection is refused, but the
hostname connection succeeds. Is there some way to configure the namenode
to accept connections from all hosts?

On Thu, Sep 24, 2015 at 2:14 PM, Daniel Watrous <dw...@gmail.com>
wrote:

> On one of the namenodes I have found the following warning:
>
> 2015-09-24 18:40:17,639 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to
> server: hadoop-master/192.168.51.4:54310
>
> On my master node I see that the process is running and has bound that port
>
> vagrant@hadoop-master:~/src$ sudo lsof -i :54310
> COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
> java    7480 hadoop  202u  IPv4  26931      0t0  TCP hadoop-master:54310
> (LISTEN)
> java    7480 hadoop  212u  IPv4  28758      0t0  TCP
> hadoop-master:54310->localhost:47226 (ESTABLISHED)
> java    7651 hadoop  238u  IPv4  28247      0t0  TCP
> localhost:47226->hadoop-master:54310 (ESTABLISHED)
> hadoop@hadoop-master:~$ jps
> 7856 SecondaryNameNode
> 7651 DataNode
> 7480 NameNode
> 8106 Jps
>
> I don't appear to have any firewall rules interfering with traffic
> vagrant@hadoop-master:~/src$ sudo iptables --list
> Chain INPUT (policy ACCEPT)
> target     prot opt source               destination
>
> Chain FORWARD (policy ACCEPT)
> target     prot opt source               destination
>
> Chain OUTPUT (policy ACCEPT)
> target     prot opt source               destination
>
> The iptables --list output is identical on hadoop-data1. I also show a
> process attempting to connect to hadoop-master
> vagrant@hadoop-data1:~$ sudo lsof -i :54310
> COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
> java    3823 hadoop  238u  IPv4  19304      0t0  TCP
> hadoop-data1:45600->hadoop-master:54310 (SYN_SENT)
>
> I am confused by the notation of hostname/IP:port.
> All help appreciated.
>
> Daniel
>
>
>
> On Thu, Sep 24, 2015 at 12:51 PM, Daniel Watrous <dw...@gmail.com>
> wrote:
>
>> I have a multi-node cluster with two datanodes. After running
>> start-dfs.sh, I show the following processes running
>>
>> hadoop@hadoop-master:~$ $HADOOP_HOME/sbin/slaves.sh jps
>> hadoop-master: 10933 DataNode
>> hadoop-master: 10759 NameNode
>> hadoop-master: 11145 SecondaryNameNode
>> hadoop-master: 11567 Jps
>> hadoop-data1: 5186 Jps
>> hadoop-data1: 5059 DataNode
>> hadoop-data2: 5180 Jps
>> hadoop-data2: 5053 DataNode
>>
>>
>> However, the other two DataNodes aren't visible.
>> http://screencast.com/t/icsLnXXDk
>>
>> Where can I look for clues?
>>
>
>

Re: Datanodes not connecting to the cluster

Posted by Daniel Watrous <dw...@gmail.com>.
In a further test, I tried connecting to the NameNode from hadoop-master
(where it's running) using both the hostname and the IP address.

vagrant@hadoop-master:~/src$ telnet 192.168.51.4 54310
Trying 192.168.51.4...
telnet: Unable to connect to remote host: Connection refused
vagrant@hadoop-master:~/src$ telnet localhost 54310
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection refused
vagrant@hadoop-master:~/src$ telnet hadoop-master 54310
Trying 127.0.1.1...
Connected to hadoop-master.
Escape character is '^]'.


As you can see the IP address or localhost connection is refused, but the
hostname connection succeeds. Is there some way to configure the namenode
to accept connections from all hosts?

On Thu, Sep 24, 2015 at 2:14 PM, Daniel Watrous <dw...@gmail.com>
wrote:

> On one of the namenodes I have found the following warning:
>
> 2015-09-24 18:40:17,639 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to
> server: hadoop-master/192.168.51.4:54310
>
> On my master node I see that the process is running and has bound that port
>
> vagrant@hadoop-master:~/src$ sudo lsof -i :54310
> COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
> java    7480 hadoop  202u  IPv4  26931      0t0  TCP hadoop-master:54310
> (LISTEN)
> java    7480 hadoop  212u  IPv4  28758      0t0  TCP
> hadoop-master:54310->localhost:47226 (ESTABLISHED)
> java    7651 hadoop  238u  IPv4  28247      0t0  TCP
> localhost:47226->hadoop-master:54310 (ESTABLISHED)
> hadoop@hadoop-master:~$ jps
> 7856 SecondaryNameNode
> 7651 DataNode
> 7480 NameNode
> 8106 Jps
>
> I don't appear to have any firewall rules interfering with traffic
> vagrant@hadoop-master:~/src$ sudo iptables --list
> Chain INPUT (policy ACCEPT)
> target     prot opt source               destination
>
> Chain FORWARD (policy ACCEPT)
> target     prot opt source               destination
>
> Chain OUTPUT (policy ACCEPT)
> target     prot opt source               destination
>
> The iptables --list output is identical on hadoop-data1. I also show a
> process attempting to connect to hadoop-master
> vagrant@hadoop-data1:~$ sudo lsof -i :54310
> COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
> java    3823 hadoop  238u  IPv4  19304      0t0  TCP
> hadoop-data1:45600->hadoop-master:54310 (SYN_SENT)
>
> I am confused by the notation of hostname/IP:port.
> All help appreciated.
>
> Daniel
>
>
>
> On Thu, Sep 24, 2015 at 12:51 PM, Daniel Watrous <dw...@gmail.com>
> wrote:
>
>> I have a multi-node cluster with two datanodes. After running
>> start-dfs.sh, I show the following processes running
>>
>> hadoop@hadoop-master:~$ $HADOOP_HOME/sbin/slaves.sh jps
>> hadoop-master: 10933 DataNode
>> hadoop-master: 10759 NameNode
>> hadoop-master: 11145 SecondaryNameNode
>> hadoop-master: 11567 Jps
>> hadoop-data1: 5186 Jps
>> hadoop-data1: 5059 DataNode
>> hadoop-data2: 5180 Jps
>> hadoop-data2: 5053 DataNode
>>
>>
>> However, the other two DataNodes aren't visible.
>> http://screencast.com/t/icsLnXXDk
>>
>> Where can I look for clues?
>>
>
>

Re: Datanodes not connecting to the cluster

Posted by Daniel Watrous <dw...@gmail.com>.
In a further test, I tried connecting to the NameNode from hadoop-master
(where it's running) using both the hostname and the IP address.

vagrant@hadoop-master:~/src$ telnet 192.168.51.4 54310
Trying 192.168.51.4...
telnet: Unable to connect to remote host: Connection refused
vagrant@hadoop-master:~/src$ telnet localhost 54310
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection refused
vagrant@hadoop-master:~/src$ telnet hadoop-master 54310
Trying 127.0.1.1...
Connected to hadoop-master.
Escape character is '^]'.


As you can see the IP address or localhost connection is refused, but the
hostname connection succeeds. Is there some way to configure the namenode
to accept connections from all hosts?

On Thu, Sep 24, 2015 at 2:14 PM, Daniel Watrous <dw...@gmail.com>
wrote:

> On one of the namenodes I have found the following warning:
>
> 2015-09-24 18:40:17,639 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to
> server: hadoop-master/192.168.51.4:54310
>
> On my master node I see that the process is running and has bound that port
>
> vagrant@hadoop-master:~/src$ sudo lsof -i :54310
> COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
> java    7480 hadoop  202u  IPv4  26931      0t0  TCP hadoop-master:54310
> (LISTEN)
> java    7480 hadoop  212u  IPv4  28758      0t0  TCP
> hadoop-master:54310->localhost:47226 (ESTABLISHED)
> java    7651 hadoop  238u  IPv4  28247      0t0  TCP
> localhost:47226->hadoop-master:54310 (ESTABLISHED)
> hadoop@hadoop-master:~$ jps
> 7856 SecondaryNameNode
> 7651 DataNode
> 7480 NameNode
> 8106 Jps
>
> I don't appear to have any firewall rules interfering with traffic
> vagrant@hadoop-master:~/src$ sudo iptables --list
> Chain INPUT (policy ACCEPT)
> target     prot opt source               destination
>
> Chain FORWARD (policy ACCEPT)
> target     prot opt source               destination
>
> Chain OUTPUT (policy ACCEPT)
> target     prot opt source               destination
>
> The iptables --list output is identical on hadoop-data1. I also show a
> process attempting to connect to hadoop-master
> vagrant@hadoop-data1:~$ sudo lsof -i :54310
> COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
> java    3823 hadoop  238u  IPv4  19304      0t0  TCP
> hadoop-data1:45600->hadoop-master:54310 (SYN_SENT)
>
> I am confused by the notation of hostname/IP:port.
> All help appreciated.
>
> Daniel
>
>
>
> On Thu, Sep 24, 2015 at 12:51 PM, Daniel Watrous <dw...@gmail.com>
> wrote:
>
>> I have a multi-node cluster with two datanodes. After running
>> start-dfs.sh, I show the following processes running
>>
>> hadoop@hadoop-master:~$ $HADOOP_HOME/sbin/slaves.sh jps
>> hadoop-master: 10933 DataNode
>> hadoop-master: 10759 NameNode
>> hadoop-master: 11145 SecondaryNameNode
>> hadoop-master: 11567 Jps
>> hadoop-data1: 5186 Jps
>> hadoop-data1: 5059 DataNode
>> hadoop-data2: 5180 Jps
>> hadoop-data2: 5053 DataNode
>>
>>
>> However, the other two DataNodes aren't visible.
>> http://screencast.com/t/icsLnXXDk
>>
>> Where can I look for clues?
>>
>
>

Re: Datanodes not connecting to the cluster

Posted by Daniel Watrous <dw...@gmail.com>.
On one of the namenodes I have found the following warning:

2015-09-24 18:40:17,639 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to
server: hadoop-master/192.168.51.4:54310

On my master node I see that the process is running and has bound that port

vagrant@hadoop-master:~/src$ sudo lsof -i :54310
COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
java    7480 hadoop  202u  IPv4  26931      0t0  TCP hadoop-master:54310
(LISTEN)
java    7480 hadoop  212u  IPv4  28758      0t0  TCP
hadoop-master:54310->localhost:47226 (ESTABLISHED)
java    7651 hadoop  238u  IPv4  28247      0t0  TCP
localhost:47226->hadoop-master:54310 (ESTABLISHED)
hadoop@hadoop-master:~$ jps
7856 SecondaryNameNode
7651 DataNode
7480 NameNode
8106 Jps

I don't appear to have any firewall rules interfering with traffic
vagrant@hadoop-master:~/src$ sudo iptables --list
Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

The iptables --list output is identical on hadoop-data1. I also show a
process attempting to connect to hadoop-master
vagrant@hadoop-data1:~$ sudo lsof -i :54310
COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
java    3823 hadoop  238u  IPv4  19304      0t0  TCP
hadoop-data1:45600->hadoop-master:54310 (SYN_SENT)

I am confused by the notation of hostname/IP:port.
All help appreciated.

Daniel



On Thu, Sep 24, 2015 at 12:51 PM, Daniel Watrous <dw...@gmail.com>
wrote:

> I have a multi-node cluster with two datanodes. After running
> start-dfs.sh, I show the following processes running
>
> hadoop@hadoop-master:~$ $HADOOP_HOME/sbin/slaves.sh jps
> hadoop-master: 10933 DataNode
> hadoop-master: 10759 NameNode
> hadoop-master: 11145 SecondaryNameNode
> hadoop-master: 11567 Jps
> hadoop-data1: 5186 Jps
> hadoop-data1: 5059 DataNode
> hadoop-data2: 5180 Jps
> hadoop-data2: 5053 DataNode
>
>
> However, the other two DataNodes aren't visible.
> http://screencast.com/t/icsLnXXDk
>
> Where can I look for clues?
>

Re: Datanodes not connecting to the cluster

Posted by Daniel Watrous <dw...@gmail.com>.
On one of the namenodes I have found the following warning:

2015-09-24 18:40:17,639 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to
server: hadoop-master/192.168.51.4:54310

On my master node I see that the process is running and has bound that port

vagrant@hadoop-master:~/src$ sudo lsof -i :54310
COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
java    7480 hadoop  202u  IPv4  26931      0t0  TCP hadoop-master:54310
(LISTEN)
java    7480 hadoop  212u  IPv4  28758      0t0  TCP
hadoop-master:54310->localhost:47226 (ESTABLISHED)
java    7651 hadoop  238u  IPv4  28247      0t0  TCP
localhost:47226->hadoop-master:54310 (ESTABLISHED)
hadoop@hadoop-master:~$ jps
7856 SecondaryNameNode
7651 DataNode
7480 NameNode
8106 Jps

I don't appear to have any firewall rules interfering with traffic
vagrant@hadoop-master:~/src$ sudo iptables --list
Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

The iptables --list output is identical on hadoop-data1. I also show a
process attempting to connect to hadoop-master
vagrant@hadoop-data1:~$ sudo lsof -i :54310
COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
java    3823 hadoop  238u  IPv4  19304      0t0  TCP
hadoop-data1:45600->hadoop-master:54310 (SYN_SENT)

I am confused by the notation of hostname/IP:port.
All help appreciated.

Daniel



On Thu, Sep 24, 2015 at 12:51 PM, Daniel Watrous <dw...@gmail.com>
wrote:

> I have a multi-node cluster with two datanodes. After running
> start-dfs.sh, I show the following processes running
>
> hadoop@hadoop-master:~$ $HADOOP_HOME/sbin/slaves.sh jps
> hadoop-master: 10933 DataNode
> hadoop-master: 10759 NameNode
> hadoop-master: 11145 SecondaryNameNode
> hadoop-master: 11567 Jps
> hadoop-data1: 5186 Jps
> hadoop-data1: 5059 DataNode
> hadoop-data2: 5180 Jps
> hadoop-data2: 5053 DataNode
>
>
> However, the other two DataNodes aren't visible.
> http://screencast.com/t/icsLnXXDk
>
> Where can I look for clues?
>

Re: Datanodes not connecting to the cluster

Posted by Daniel Watrous <dw...@gmail.com>.
On one of the namenodes I have found the following warning:

2015-09-24 18:40:17,639 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to
server: hadoop-master/192.168.51.4:54310

On my master node I see that the process is running and has bound that port

vagrant@hadoop-master:~/src$ sudo lsof -i :54310
COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
java    7480 hadoop  202u  IPv4  26931      0t0  TCP hadoop-master:54310
(LISTEN)
java    7480 hadoop  212u  IPv4  28758      0t0  TCP
hadoop-master:54310->localhost:47226 (ESTABLISHED)
java    7651 hadoop  238u  IPv4  28247      0t0  TCP
localhost:47226->hadoop-master:54310 (ESTABLISHED)
hadoop@hadoop-master:~$ jps
7856 SecondaryNameNode
7651 DataNode
7480 NameNode
8106 Jps

I don't appear to have any firewall rules interfering with traffic
vagrant@hadoop-master:~/src$ sudo iptables --list
Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

The iptables --list output is identical on hadoop-data1. I also show a
process attempting to connect to hadoop-master
vagrant@hadoop-data1:~$ sudo lsof -i :54310
COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
java    3823 hadoop  238u  IPv4  19304      0t0  TCP
hadoop-data1:45600->hadoop-master:54310 (SYN_SENT)

I am confused by the notation of hostname/IP:port.
All help appreciated.

Daniel



On Thu, Sep 24, 2015 at 12:51 PM, Daniel Watrous <dw...@gmail.com>
wrote:

> I have a multi-node cluster with two datanodes. After running
> start-dfs.sh, I show the following processes running
>
> hadoop@hadoop-master:~$ $HADOOP_HOME/sbin/slaves.sh jps
> hadoop-master: 10933 DataNode
> hadoop-master: 10759 NameNode
> hadoop-master: 11145 SecondaryNameNode
> hadoop-master: 11567 Jps
> hadoop-data1: 5186 Jps
> hadoop-data1: 5059 DataNode
> hadoop-data2: 5180 Jps
> hadoop-data2: 5053 DataNode
>
>
> However, the other two DataNodes aren't visible.
> http://screencast.com/t/icsLnXXDk
>
> Where can I look for clues?
>

Re: Datanodes not connecting to the cluster

Posted by Daniel Watrous <dw...@gmail.com>.
On one of the namenodes I have found the following warning:

2015-09-24 18:40:17,639 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to
server: hadoop-master/192.168.51.4:54310

On my master node I see that the process is running and has bound that port

vagrant@hadoop-master:~/src$ sudo lsof -i :54310
COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
java    7480 hadoop  202u  IPv4  26931      0t0  TCP hadoop-master:54310
(LISTEN)
java    7480 hadoop  212u  IPv4  28758      0t0  TCP
hadoop-master:54310->localhost:47226 (ESTABLISHED)
java    7651 hadoop  238u  IPv4  28247      0t0  TCP
localhost:47226->hadoop-master:54310 (ESTABLISHED)
hadoop@hadoop-master:~$ jps
7856 SecondaryNameNode
7651 DataNode
7480 NameNode
8106 Jps

I don't appear to have any firewall rules interfering with traffic
vagrant@hadoop-master:~/src$ sudo iptables --list
Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

The iptables --list output is identical on hadoop-data1. I also show a
process attempting to connect to hadoop-master
vagrant@hadoop-data1:~$ sudo lsof -i :54310
COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
java    3823 hadoop  238u  IPv4  19304      0t0  TCP
hadoop-data1:45600->hadoop-master:54310 (SYN_SENT)

I am confused by the notation of hostname/IP:port.
All help appreciated.

Daniel



On Thu, Sep 24, 2015 at 12:51 PM, Daniel Watrous <dw...@gmail.com>
wrote:

> I have a multi-node cluster with two datanodes. After running
> start-dfs.sh, I show the following processes running
>
> hadoop@hadoop-master:~$ $HADOOP_HOME/sbin/slaves.sh jps
> hadoop-master: 10933 DataNode
> hadoop-master: 10759 NameNode
> hadoop-master: 11145 SecondaryNameNode
> hadoop-master: 11567 Jps
> hadoop-data1: 5186 Jps
> hadoop-data1: 5059 DataNode
> hadoop-data2: 5180 Jps
> hadoop-data2: 5053 DataNode
>
>
> However, the other two DataNodes aren't visible.
> http://screencast.com/t/icsLnXXDk
>
> Where can I look for clues?
>