You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Michael Garcia <Mi...@alcatelonetouch.com> on 2016/12/06 16:53:36 UTC

Zookeeper failing to start and i don't know why....

All,
I'm having a problem with zookeeper starting up.  It looks like a hostname resolution problem.
I have /etc/hosts configured correctly, and password less ssh is working.

I have hadoop 2.7.3, java 1.8 u111, zookeeper 3.4.6.

I have 6 systems set up: tc1, tc2, tc3, tc4, tc5, tc6.
tc1 is the active namenode
tc2 is the passive namenode
tc3 -> tc6 are data nodes.
tc1, tc2 and tc3 are the Journal Nodes.

Here is the tail end of the log when I try to start zookeeper on tc1:

2016-12-06 00:59:27,214 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/data/tools/repository/hadoop-2.7.3/lib/native
2016-12-06 00:59:27,214 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
2016-12-06 00:59:27,214 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
2016-12-06 00:59:27,215 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.name=Linux
2016-12-06 00:59:27,215 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.arch=amd64
2016-12-06 00:59:27,215 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.version=4.8.11-1.el7.elrepo.x86_64
2016-12-06 00:59:27,215 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.name=iot-user
2016-12-06 00:59:27,215 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.home=/home/iot-user
2016-12-06 00:59:27,215 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/data/tools/repository/hadoop-2.7.3
2016-12-06 00:59:27,216 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString= tc1:2181,tc2:2181,tc3:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@223d2c72
2016-12-06 00:59:27,230 FATAL org.apache.hadoop.hdfs.tools.DFSZKFailoverController: Got a fatal error, exiting now
java.net.UnknownHostException:  tc1: Name or service not known
        at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
        at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
        at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
        at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
        at java.net.InetAddress.getAllByName(InetAddress.java:1192)
        at java.net.InetAddress.getAllByName(InetAddress.java:1126)
        at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61)
        at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)
        at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:380)
        at org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:631)
        at org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:775)
        at org.apache.hadoop.ha.ActiveStandbyElector.<init>(ActiveStandbyElector.java:229)
        at org.apache.hadoop.ha.ZKFailoverController.initZK(ZKFailoverController.java:351)
        at org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:191)

It looks like hostname resolution is working:

[iot-user@tc1 ~]$ hostname
tc1
[iot-user@tc1 ~]$ ssh tc2
Last login: Mon Dec  5 22:31:21 2016 from 172.31.61.165

[iot-user@tc2 ~]$ exit
logout
Connection to tc2 closed.
[iot-user@tc1 ~]$ ssh tc3
Last login: Mon Dec  5 23:06:06 2016 from 172.31.61.165

[iot-user@tc3 ~]$ exit
logout
Connection to tc3 closed.
[iot-user@tc1 ~]$ arp tc1
tc1 (172.31.61.165) -- no entry

[iot-user@tc1 repository]$ jps
2096 NameNode
10548 Jps
2342 JournalNode
[iot-user@tc1 repository]$ hdfs haadmin -getServiceState tc1
active
[iot-user@tc1 repository]$ hdfs haadmin -getServiceState tc2
standby
[iot-user@tc1 repository]$

Has anyone seen this error?  What am I doing wrong?

Michael Garcia
Cloud Operations Engineer | North America

Mobile +1 949 664 1431
7310 Miramar Road, Suite 440, San Diego, CA 92126












Re: Zookeeper failing to start and i don't know why....

Posted by Gagan Brahmi <ga...@gmail.com>.
Looks like the /etc/hosts file on tc1 isn't configured correctly. Make sure
you have the entry for tc1 as well in hosts file.


Regards,
Gagan Brahmi

On Tue, Dec 6, 2016 at 9:53 AM, Michael Garcia <
Michael.Garcia@alcatelonetouch.com> wrote:

> All,
> I'm having a problem with zookeeper starting up.  It looks like a hostname
> resolution problem.
> I have /etc/hosts configured correctly, and password less ssh is working.
>
> I have hadoop 2.7.3, java 1.8 u111, zookeeper 3.4.6.
>
> I have 6 systems set up: tc1, tc2, tc3, tc4, tc5, tc6.
> tc1 is the active namenode
> tc2 is the passive namenode
> tc3 -> tc6 are data nodes.
> tc1, tc2 and tc3 are the Journal Nodes.
>
> Here is the tail end of the log when I try to start zookeeper on tc1:
>
> 2016-12-06 00:59:27,214 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/data/tools/repository/hadoop-2.7.3/lib/native
> 2016-12-06 00:59:27,214 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
> 2016-12-06 00:59:27,214 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
> 2016-12-06 00:59:27,215 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.name=Linux
> 2016-12-06 00:59:27,215 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.arch=amd64
> 2016-12-06 00:59:27,215 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.version=4.8.11-1.el7.elrepo.x86_64
> 2016-12-06 00:59:27,215 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.name=iot-user
> 2016-12-06 00:59:27,215 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.home=/home/iot-user
> 2016-12-06 00:59:27,215 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/data/tools/repository/hadoop-2.7.3
> 2016-12-06 00:59:27,216 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString= tc1:2181,tc2:2181,tc3:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@223d2c72
> 2016-12-06 00:59:27,230 FATAL org.apache.hadoop.hdfs.tools.DFSZKFailoverController: Got a fatal error, exiting now
> java.net.UnknownHostException:  tc1: Name or service not known
> 	at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
> 	at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
> 	at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
> 	at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
> 	at java.net.InetAddress.getAllByName(InetAddress.java:1192)
> 	at java.net.InetAddress.getAllByName(InetAddress.java:1126)
> 	at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61)
> 	at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)
> 	at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:380)
> 	at org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:631)
> 	at org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:775)
> 	at org.apache.hadoop.ha.ActiveStandbyElector.<init>(ActiveStandbyElector.java:229)
> 	at org.apache.hadoop.ha.ZKFailoverController.initZK(ZKFailoverController.java:351)
> 	at org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:191)
>
>
> It looks like hostname resolution is working:
>
> [iot-user@tc1 ~]$ hostname
> tc1
> [iot-user@tc1 ~]$ ssh tc2
> Last login: Mon Dec  5 22:31:21 2016 from 172.31.61.165
>
> [iot-user@tc2 ~]$ exit
> logout
> Connection to tc2 closed.
> [iot-user@tc1 ~]$ ssh tc3
> Last login: Mon Dec  5 23:06:06 2016 from 172.31.61.165
>
> [iot-user@tc3 ~]$ exit
> logout
> Connection to tc3 closed.
> [iot-user@tc1 ~]$ arp tc1
> tc1 (172.31.61.165) -- no entry
>
> [iot-user@tc1 repository]$ jps
> 2096 NameNode
> 10548 Jps
> 2342 JournalNode
> [iot-user@tc1 repository]$ hdfs haadmin -getServiceState tc1
> active
> [iot-user@tc1 repository]$ hdfs haadmin -getServiceState tc2
> standby
> [iot-user@tc1 repository]$
>
> Has anyone seen this error?  What am I doing wrong?
>
>
>
> *Michael Garcia*
>
> *Cloud Operations Engineer | North America*
>
>
>
> *Mobile +1 949 664 1431 <(949)%20664-1431>  *
>
> *7310 Miramar Road, Suite 440, San Diego, CA 92126*
>
>
>
>
>
>
>
>
>
>
>
>