You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Michael Chen <yi...@u.northwestern.edu> on 2017/08/16 19:37:42 UTC

Error connecting to ZooKeeper server

Hi,

I've run into a ZooKeeper connection error during the execution of a 
Nutch hadoop job. The tasks stall on connection error to ZooKeeper 
server. Here's what I know:

1. ZK connection error is the only known problem, other logs report no issue

2. Error message on YARN NodeManager on one of the slaves is:

2017-08-16 19:03:42,280 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2017-08-16 19:03:42,281 WARN [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused

The connection keeps failing until it hits the 10min limit and the task 
fails.

3. ZooKeeper Server is deployed only on master

4. Cluster managed by CloudEra Manager 5.12.

Could a configuration on Nutch side or CloudEra Manager side be missing? 
There are no ZK servers on the slaves and the NodeManager should be 
connecting to the ZK server on the master, instead of localhost:2181.

Any suggestion or help is greatly appreciated!

Thank you,

Michael


Re: Error connecting to ZooKeeper server

Posted by Michael Chen <yi...@u.northwestern.edu>.
Thanks for the reply! The firewall is disabled and ZK is running fine per CloudEra Manager.

I might have fixed the problem by including relevant properties (ZK quorum, distributedmode) in the hbase-site.xml and nutch-site.xml... unsure if I should also include other properties/settings...? 

Thanks,
Michael

> On Aug 16, 2017, at 17:54, Dan Benediktson <db...@twitter.com.INVALID> wrote:
> 
> Given that it's trying to connect to localhost:2181, and that it's expected
> to connect to a remote machine, and that the error is "Connection refused"
> (meaning almost certainly either a firewall rejected or there was no
> process listening on that TCP port, but given that it's localhost, pretty
> much has to be the latter), that there must be some simple configuration
> problem on the side of whatever is talking to Zookeeper. Not to say you
> won't have firewall problems after you resolve that, but first things
> first: configure it so it's actually talking to the ZK ensemble.
> 
>> On Wed, Aug 16, 2017 at 4:14 PM, Martin Gainty <mg...@hotmail.com> wrote:
>> 
>> 
>> 
>> 
>> ________________________________
>> From: Michael Chen <yi...@u.northwestern.edu>
>> Sent: Wednesday, August 16, 2017 3:47 PM
>> To: user@nutch.apache.org; user@hadoop.apache.org;
>> user@zookeeper.apache.org
>> Subject: Re: Error connecting to ZooKeeper server
>> 
>> Also, the cluster is on AWS. Security group set to allow all inbound and
>> outbound traffic...
>> MG>can you verify ALL inbound ports and ALL outbound ports are enabled and
>> listening with netstat -lpn
>> 
>> Any ideas?...
>> 
>> MG>to eliminate AWS as the culprit what happens when you disable the
>> problematic AWS Security Group?
>> https://groups.google.com/forum/#!topic/chronos-scheduler/ys77mol0aWQ
>> [http://www.google.com/images/icons/product/groups-128.png]<
>> https://groups.google.com/forum/#!topic/chronos-scheduler/ys77mol0aWQ>
>> 
>> AWS Security Group settings for Chronos Cluster<https://groups.google.
>> com/forum/#!topic/chronos-scheduler/ys77mol0aWQ>
>> groups.google.com
>> Posted 9/22/14 9:04 AM, 3 messages
>> 
>> 
>> 
>> 
>> 
>>> On 08/16/2017 12:37 PM, Michael Chen wrote:
>>> 
>>> Hi,
>>> 
>>> I've run into a ZooKeeper connection error during the execution of a
>>> Nutch hadoop job. The tasks stall on connection error to ZooKeeper
>>> server. Here's what I know:
>>> 
>>> 1. ZK connection error is the only known problem, other logs report no
>>> issue
>>> 
>>> 2. Error message on YARN NodeManager on one of the slaves is:
>>> 
>>> 2017-08-16 19:03:42,280 INFO [main-SendThread(localhost:2181)]
>> org.apache.zookeeper.ClientCnxn: Opening socket connection to server
>> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
>> (unknown error)
>>> 2017-08-16 19:03:42,281 WARN [main-SendThread(localhost:2181)]
>> org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected
>> error, closing socket connection and attempting reconnect
>>> java.net.ConnectException: Connection refused
>>> 
>>> The connection keeps failing until it hits the 10min limit and the
>>> task fails.
>>> 
>>> 3. ZooKeeper Server is deployed only on master
>>> 
>>> 4. Cluster managed by CloudEra Manager 5.12.
>>> 
>>> Could a configuration on Nutch side or CloudEra Manager side be
>>> missing? There are no ZK servers on the slaves and the NodeManager
>>> should be connecting to the ZK server on the master, instead of
>>> localhost:2181.
>>> 
>>> Any suggestion or help is greatly appreciated!
>>> 
>>> Thank you,
>>> 
>>> Michael
>>> 
>> 
>> 

Re: Error connecting to ZooKeeper server

Posted by Michael Chen <yi...@u.northwestern.edu>.
Thanks for the reply! The firewall is disabled and ZK is running fine per CloudEra Manager.

I might have fixed the problem by including relevant properties (ZK quorum, distributedmode) in the hbase-site.xml and nutch-site.xml... unsure if I should also include other properties/settings...? 

Thanks,
Michael

> On Aug 16, 2017, at 17:54, Dan Benediktson <db...@twitter.com.INVALID> wrote:
> 
> Given that it's trying to connect to localhost:2181, and that it's expected
> to connect to a remote machine, and that the error is "Connection refused"
> (meaning almost certainly either a firewall rejected or there was no
> process listening on that TCP port, but given that it's localhost, pretty
> much has to be the latter), that there must be some simple configuration
> problem on the side of whatever is talking to Zookeeper. Not to say you
> won't have firewall problems after you resolve that, but first things
> first: configure it so it's actually talking to the ZK ensemble.
> 
>> On Wed, Aug 16, 2017 at 4:14 PM, Martin Gainty <mg...@hotmail.com> wrote:
>> 
>> 
>> 
>> 
>> ________________________________
>> From: Michael Chen <yi...@u.northwestern.edu>
>> Sent: Wednesday, August 16, 2017 3:47 PM
>> To: user@nutch.apache.org; user@hadoop.apache.org;
>> user@zookeeper.apache.org
>> Subject: Re: Error connecting to ZooKeeper server
>> 
>> Also, the cluster is on AWS. Security group set to allow all inbound and
>> outbound traffic...
>> MG>can you verify ALL inbound ports and ALL outbound ports are enabled and
>> listening with netstat -lpn
>> 
>> Any ideas?...
>> 
>> MG>to eliminate AWS as the culprit what happens when you disable the
>> problematic AWS Security Group?
>> https://groups.google.com/forum/#!topic/chronos-scheduler/ys77mol0aWQ
>> [http://www.google.com/images/icons/product/groups-128.png]<
>> https://groups.google.com/forum/#!topic/chronos-scheduler/ys77mol0aWQ>
>> 
>> AWS Security Group settings for Chronos Cluster<https://groups.google.
>> com/forum/#!topic/chronos-scheduler/ys77mol0aWQ>
>> groups.google.com
>> Posted 9/22/14 9:04 AM, 3 messages
>> 
>> 
>> 
>> 
>> 
>>> On 08/16/2017 12:37 PM, Michael Chen wrote:
>>> 
>>> Hi,
>>> 
>>> I've run into a ZooKeeper connection error during the execution of a
>>> Nutch hadoop job. The tasks stall on connection error to ZooKeeper
>>> server. Here's what I know:
>>> 
>>> 1. ZK connection error is the only known problem, other logs report no
>>> issue
>>> 
>>> 2. Error message on YARN NodeManager on one of the slaves is:
>>> 
>>> 2017-08-16 19:03:42,280 INFO [main-SendThread(localhost:2181)]
>> org.apache.zookeeper.ClientCnxn: Opening socket connection to server
>> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
>> (unknown error)
>>> 2017-08-16 19:03:42,281 WARN [main-SendThread(localhost:2181)]
>> org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected
>> error, closing socket connection and attempting reconnect
>>> java.net.ConnectException: Connection refused
>>> 
>>> The connection keeps failing until it hits the 10min limit and the
>>> task fails.
>>> 
>>> 3. ZooKeeper Server is deployed only on master
>>> 
>>> 4. Cluster managed by CloudEra Manager 5.12.
>>> 
>>> Could a configuration on Nutch side or CloudEra Manager side be
>>> missing? There are no ZK servers on the slaves and the NodeManager
>>> should be connecting to the ZK server on the master, instead of
>>> localhost:2181.
>>> 
>>> Any suggestion or help is greatly appreciated!
>>> 
>>> Thank you,
>>> 
>>> Michael
>>> 
>> 
>> 

Re: Error connecting to ZooKeeper server

Posted by Dan Benediktson <db...@twitter.com.INVALID>.
Given that it's trying to connect to localhost:2181, and that it's expected
to connect to a remote machine, and that the error is "Connection refused"
(meaning almost certainly either a firewall rejected or there was no
process listening on that TCP port, but given that it's localhost, pretty
much has to be the latter), that there must be some simple configuration
problem on the side of whatever is talking to Zookeeper. Not to say you
won't have firewall problems after you resolve that, but first things
first: configure it so it's actually talking to the ZK ensemble.

On Wed, Aug 16, 2017 at 4:14 PM, Martin Gainty <mg...@hotmail.com> wrote:

>
>
>
> ________________________________
> From: Michael Chen <yi...@u.northwestern.edu>
> Sent: Wednesday, August 16, 2017 3:47 PM
> To: user@nutch.apache.org; user@hadoop.apache.org;
> user@zookeeper.apache.org
> Subject: Re: Error connecting to ZooKeeper server
>
> Also, the cluster is on AWS. Security group set to allow all inbound and
> outbound traffic...
> MG>can you verify ALL inbound ports and ALL outbound ports are enabled and
> listening with netstat -lpn
>
> Any ideas?...
>
> MG>to eliminate AWS as the culprit what happens when you disable the
> problematic AWS Security Group?
> https://groups.google.com/forum/#!topic/chronos-scheduler/ys77mol0aWQ
> [http://www.google.com/images/icons/product/groups-128.png]<
> https://groups.google.com/forum/#!topic/chronos-scheduler/ys77mol0aWQ>
>
> AWS Security Group settings for Chronos Cluster<https://groups.google.
> com/forum/#!topic/chronos-scheduler/ys77mol0aWQ>
> groups.google.com
> Posted 9/22/14 9:04 AM, 3 messages
>
>
>
>
>
> On 08/16/2017 12:37 PM, Michael Chen wrote:
> >
> > Hi,
> >
> > I've run into a ZooKeeper connection error during the execution of a
> > Nutch hadoop job. The tasks stall on connection error to ZooKeeper
> > server. Here's what I know:
> >
> > 1. ZK connection error is the only known problem, other logs report no
> > issue
> >
> > 2. Error message on YARN NodeManager on one of the slaves is:
> >
> > 2017-08-16 19:03:42,280 INFO [main-SendThread(localhost:2181)]
> org.apache.zookeeper.ClientCnxn: Opening socket connection to server
> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
> (unknown error)
> > 2017-08-16 19:03:42,281 WARN [main-SendThread(localhost:2181)]
> org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected
> error, closing socket connection and attempting reconnect
> > java.net.ConnectException: Connection refused
> >
> > The connection keeps failing until it hits the 10min limit and the
> > task fails.
> >
> > 3. ZooKeeper Server is deployed only on master
> >
> > 4. Cluster managed by CloudEra Manager 5.12.
> >
> > Could a configuration on Nutch side or CloudEra Manager side be
> > missing? There are no ZK servers on the slaves and the NodeManager
> > should be connecting to the ZK server on the master, instead of
> > localhost:2181.
> >
> > Any suggestion or help is greatly appreciated!
> >
> > Thank you,
> >
> > Michael
> >
>
>

Re: Error connecting to ZooKeeper server

Posted by Martin Gainty <mg...@hotmail.com>.


________________________________
From: Michael Chen <yi...@u.northwestern.edu>
Sent: Wednesday, August 16, 2017 3:47 PM
To: user@nutch.apache.org; user@hadoop.apache.org; user@zookeeper.apache.org
Subject: Re: Error connecting to ZooKeeper server

Also, the cluster is on AWS. Security group set to allow all inbound and
outbound traffic...
MG>can you verify ALL inbound ports and ALL outbound ports are enabled and listening with netstat -lpn

Any ideas?...

MG>to eliminate AWS as the culprit what happens when you disable the problematic AWS Security Group?
https://groups.google.com/forum/#!topic/chronos-scheduler/ys77mol0aWQ
[http://www.google.com/images/icons/product/groups-128.png]<https://groups.google.com/forum/#!topic/chronos-scheduler/ys77mol0aWQ>

AWS Security Group settings for Chronos Cluster<https://groups.google.com/forum/#!topic/chronos-scheduler/ys77mol0aWQ>
groups.google.com
Posted 9/22/14 9:04 AM, 3 messages





On 08/16/2017 12:37 PM, Michael Chen wrote:
>
> Hi,
>
> I've run into a ZooKeeper connection error during the execution of a
> Nutch hadoop job. The tasks stall on connection error to ZooKeeper
> server. Here's what I know:
>
> 1. ZK connection error is the only known problem, other logs report no
> issue
>
> 2. Error message on YARN NodeManager on one of the slaves is:
>
> 2017-08-16 19:03:42,280 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
> 2017-08-16 19:03:42,281 WARN [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
>
> The connection keeps failing until it hits the 10min limit and the
> task fails.
>
> 3. ZooKeeper Server is deployed only on master
>
> 4. Cluster managed by CloudEra Manager 5.12.
>
> Could a configuration on Nutch side or CloudEra Manager side be
> missing? There are no ZK servers on the slaves and the NodeManager
> should be connecting to the ZK server on the master, instead of
> localhost:2181.
>
> Any suggestion or help is greatly appreciated!
>
> Thank you,
>
> Michael
>


Re: Error connecting to ZooKeeper server

Posted by Michael Chen <yi...@u.northwestern.edu>.
Also, the cluster is on AWS. Security group set to allow all inbound and 
outbound traffic...

Any ideas?...


On 08/16/2017 12:37 PM, Michael Chen wrote:
>
> Hi,
>
> I've run into a ZooKeeper connection error during the execution of a 
> Nutch hadoop job. The tasks stall on connection error to ZooKeeper 
> server. Here's what I know:
>
> 1. ZK connection error is the only known problem, other logs report no 
> issue
>
> 2. Error message on YARN NodeManager on one of the slaves is:
>
> 2017-08-16 19:03:42,280 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
> 2017-08-16 19:03:42,281 WARN [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
>
> The connection keeps failing until it hits the 10min limit and the 
> task fails.
>
> 3. ZooKeeper Server is deployed only on master
>
> 4. Cluster managed by CloudEra Manager 5.12.
>
> Could a configuration on Nutch side or CloudEra Manager side be 
> missing? There are no ZK servers on the slaves and the NodeManager 
> should be connecting to the ZK server on the master, instead of 
> localhost:2181.
>
> Any suggestion or help is greatly appreciated!
>
> Thank you,
>
> Michael
>


Re: Error connecting to ZooKeeper server

Posted by Michael Chen <yi...@u.northwestern.edu>.
Also, the cluster is on AWS. Security group set to allow all inbound and 
outbound traffic...

Any ideas?...


On 08/16/2017 12:37 PM, Michael Chen wrote:
>
> Hi,
>
> I've run into a ZooKeeper connection error during the execution of a 
> Nutch hadoop job. The tasks stall on connection error to ZooKeeper 
> server. Here's what I know:
>
> 1. ZK connection error is the only known problem, other logs report no 
> issue
>
> 2. Error message on YARN NodeManager on one of the slaves is:
>
> 2017-08-16 19:03:42,280 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
> 2017-08-16 19:03:42,281 WARN [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
>
> The connection keeps failing until it hits the 10min limit and the 
> task fails.
>
> 3. ZooKeeper Server is deployed only on master
>
> 4. Cluster managed by CloudEra Manager 5.12.
>
> Could a configuration on Nutch side or CloudEra Manager side be 
> missing? There are no ZK servers on the slaves and the NodeManager 
> should be connecting to the ZK server on the master, instead of 
> localhost:2181.
>
> Any suggestion or help is greatly appreciated!
>
> Thank you,
>
> Michael
>


Re: Error connecting to ZooKeeper server

Posted by Michael Chen <yi...@u.northwestern.edu>.
Also, the cluster is on AWS. Security group set to allow all inbound and 
outbound traffic...

Any ideas?...


On 08/16/2017 12:37 PM, Michael Chen wrote:
>
> Hi,
>
> I've run into a ZooKeeper connection error during the execution of a 
> Nutch hadoop job. The tasks stall on connection error to ZooKeeper 
> server. Here's what I know:
>
> 1. ZK connection error is the only known problem, other logs report no 
> issue
>
> 2. Error message on YARN NodeManager on one of the slaves is:
>
> 2017-08-16 19:03:42,280 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
> 2017-08-16 19:03:42,281 WARN [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
>
> The connection keeps failing until it hits the 10min limit and the 
> task fails.
>
> 3. ZooKeeper Server is deployed only on master
>
> 4. Cluster managed by CloudEra Manager 5.12.
>
> Could a configuration on Nutch side or CloudEra Manager side be 
> missing? There are no ZK servers on the slaves and the NodeManager 
> should be connecting to the ZK server on the master, instead of 
> localhost:2181.
>
> Any suggestion or help is greatly appreciated!
>
> Thank you,
>
> Michael
>