You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Adrian Liew <ad...@avanade.com> on 2015/10/01 11:32:22 UTC

Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd

Hi there,

Currently, I have setup an azure virtual network to connect my Zookeeper clusters together with three Azure VMs. Each VM has an internal IP of 10.0.0.4, 10.0.0.5 and 10.0.0.6. I have also setup Solr 5.3.0 which runs in Solr Cloud mode connected to all three Zookeepers in an external ensemble manner.

I am able to connect to 10.0.0.4 and 10.0.0.6 via the zkCli.cmd after starting the Zookeeper services. However for 10.0.0.5, I keep getting the below error even if I started the zookeeper service.

[cid:image001.png@01D0FC6E.BDC2D990]

I have restarted 10.0.0.5 VM several times and still am unable to connect to Zookeeper via zkCli.cmd. I have checked zoo.cfg (making sure ports, data and logs are all set correctly) and myid to ensure they have the correct configurations.

The simple command line I used to connect to Zookeeper is zkCli.cmd -server 10.0.0.5:2182 for example.

Any ideas?

Best regards,

Adrian Liew |  Consultant Application Developer
Avanade Malaysia Sdn. Bhd..| Consulting Services
(: Direct: +(603) 2382 5668
È: +6010-2288030



RE: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd

Posted by Adrian Liew <ad...@avanade.com>.
Hi Shawn,

To reiterate, this is the exception I get if unable to connect to Zookeeper service:

E:\solr-5.3.0\server\scripts\cloud-scripts>zkcli.bat -z 10.0.0.4:2181 -cmd list
Exception in thread "main" org.apache.solr.common.SolrException: java.util.concu
rrent.TimeoutException: Could not connect to ZooKeeper 10.0.0.4:2181 within 3000
0 ms
        at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:18
1)
        at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:11
5)
        at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:10
5)
        at org.apache.solr.cloud.ZkCLI.main(ZkCLI.java:181)
Caused by: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper
 10.0.0.4:2181 within 30000 ms
        at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(Conne
ctionManager.java:208)
        at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:17
3)
        ... 3 more

For example, in the event if one of the zookeeper services goes down for a few minutes, it may be too late to bring that service back online into the zookeeper cluster due the timeout faced above. In that, all zookeeper services need to be restarted at the same time. 

Please clarify if there is a configuration that I missed out, an expected behaviour or if this is a bug.

Regards,
Adrian

-----Original Message-----
From: Adrian Liew [mailto:adrian.liew@avanade.com] 
Sent: Wednesday, October 7, 2015 11:56 AM
To: solr-user@lucene.apache.org
Subject: RE: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd

Hi Shawn,

Thanks for the reply. Understood your comments and will revert back to the defaults. However, I raised this issue because I realized that Zookeeper becomes impatient if it cannot heartbeat its other peers in time. So for example, if 1 ZK server goes down out of 3 ZK servers, the 1 ZK server will stop pinging other servers and complain about timeout issues to zkCli connect to its service.

Will revert back with an update.

Regards,
Adrian

-----Original Message-----
From: Shawn Heisey [mailto:apache@elyograg.org] 
Sent: Tuesday, October 6, 2015 10:16 PM
To: solr-user@lucene.apache.org
Subject: Re: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd

On 10/6/2015 3:38 AM, Adrian Liew wrote:
> Thanks for the reply. Looks like this has been resolved by manually starting the Zookeeper services on each server promptly so that the tickTime value does not timeout too quickly to heartbeat other peers. Hence, I increased the tickTime value to about 5 minutes to give some time for a node hosting Zookeeper to restart and autostart its service. This case seems fixed but I will double check again once more to be sure. I am using nssm (non-sucking-service-manager) to autostart Zookeeper. I will need to retest this once again using nssm to make sure zookeeper services are up and running.

That sounds like a very bad idea.  A typical tickTime is two *seconds*.
 Zookeeper is designed around certain things happening very quickly.

I don't think you can increase that to five *minutes* (multiplying it by
150) without the strong possibility of something going very wrong and processes hanging for minutes at a time waiting for a timeout that should happen very quickly.

I am reasonably certain that tickTime is used for zookeeper operation in several ways, so I believe that this much of an increase will cause fundamental problems with zookeeper's normal operation.  I admit that I have not looked at the code, so I could be wrong ... but based on the following information from the Zookeeper docs, I don't think I am wrong:

 tickTime

    the length of a single tick, which is the basic time unit used by ZooKeeper, as measured in milliseconds. It is used to regulate heartbeats, and timeouts. For example, the minimum session timeout will be two ticks.

Thanks,
Shawn


RE: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd

Posted by Adrian Liew <ad...@avanade.com>.
Hi Shawn,

Thanks for the reply. Understood your comments and will revert back to the defaults. However, I raised this issue because I realized that Zookeeper becomes impatient if it cannot heartbeat its other peers in time. So for example, if 1 ZK server goes down out of 3 ZK servers, the 1 ZK server will stop pinging other servers and complain about timeout issues to zkCli connect to its service.

Will revert back with an update.

Regards,
Adrian

-----Original Message-----
From: Shawn Heisey [mailto:apache@elyograg.org] 
Sent: Tuesday, October 6, 2015 10:16 PM
To: solr-user@lucene.apache.org
Subject: Re: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd

On 10/6/2015 3:38 AM, Adrian Liew wrote:
> Thanks for the reply. Looks like this has been resolved by manually starting the Zookeeper services on each server promptly so that the tickTime value does not timeout too quickly to heartbeat other peers. Hence, I increased the tickTime value to about 5 minutes to give some time for a node hosting Zookeeper to restart and autostart its service. This case seems fixed but I will double check again once more to be sure. I am using nssm (non-sucking-service-manager) to autostart Zookeeper. I will need to retest this once again using nssm to make sure zookeeper services are up and running.

That sounds like a very bad idea.  A typical tickTime is two *seconds*.
 Zookeeper is designed around certain things happening very quickly.

I don't think you can increase that to five *minutes* (multiplying it by
150) without the strong possibility of something going very wrong and processes hanging for minutes at a time waiting for a timeout that should happen very quickly.

I am reasonably certain that tickTime is used for zookeeper operation in several ways, so I believe that this much of an increase will cause fundamental problems with zookeeper's normal operation.  I admit that I have not looked at the code, so I could be wrong ... but based on the following information from the Zookeeper docs, I don't think I am wrong:

 tickTime

    the length of a single tick, which is the basic time unit used by ZooKeeper, as measured in milliseconds. It is used to regulate heartbeats, and timeouts. For example, the minimum session timeout will be two ticks.

Thanks,
Shawn


Re: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd

Posted by Shawn Heisey <ap...@elyograg.org>.
On 10/6/2015 3:38 AM, Adrian Liew wrote:
> Thanks for the reply. Looks like this has been resolved by manually starting the Zookeeper services on each server promptly so that the tickTime value does not timeout too quickly to heartbeat other peers. Hence, I increased the tickTime value to about 5 minutes to give some time for a node hosting Zookeeper to restart and autostart its service. This case seems fixed but I will double check again once more to be sure. I am using nssm (non-sucking-service-manager) to autostart Zookeeper. I will need to retest this once again using nssm to make sure zookeeper services are up and running.

That sounds like a very bad idea.  A typical tickTime is two *seconds*.
 Zookeeper is designed around certain things happening very quickly.

I don't think you can increase that to five *minutes* (multiplying it by
150) without the strong possibility of something going very wrong and
processes hanging for minutes at a time waiting for a timeout that
should happen very quickly.

I am reasonably certain that tickTime is used for zookeeper operation in
several ways, so I believe that this much of an increase will cause
fundamental problems with zookeeper's normal operation.  I admit that I
have not looked at the code, so I could be wrong ... but based on the
following information from the Zookeeper docs, I don't think I am wrong:

 tickTime

    the length of a single tick, which is the basic time unit used by
ZooKeeper, as measured in milliseconds. It is used to regulate
heartbeats, and timeouts. For example, the minimum session timeout will
be two ticks.

Thanks,
Shawn


RE: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd

Posted by Adrian Liew <ad...@avanade.com>.
Hi Edwin,

Thanks for the reply. Looks like this has been resolved by manually starting the Zookeeper services on each server promptly so that the tickTime value does not timeout too quickly to heartbeat other peers. Hence, I increased the tickTime value to about 5 minutes to give some time for a node hosting Zookeeper to restart and autostart its service. This case seems fixed but I will double check again once more to be sure. I am using nssm (non-sucking-service-manager) to autostart Zookeeper. I will need to retest this once again using nssm to make sure zookeeper services are up and running.

Regards,
Adrian

Best regards,

Adrian Liew |  Consultant Application Developer
Avanade Malaysia Sdn. Bhd..| Consulting Services
(: Direct: +(603) 2382 5668
È: +6010-2288030


-----Original Message-----
From: Zheng Lin Edwin Yeo [mailto:edwinyeozl@gmail.com] 
Sent: Monday, October 5, 2015 10:02 AM
To: solr-user@lucene.apache.org
Subject: Re: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd

Hi Adrian,

It's unlikely to be the firewall settings if it is failing intermittently.
More of a network issues.

The error says it's a connection time out, and since you say it happens only intermittently, I'm suspecting it could be network issues.
Have you check if the connection to the various servers are always up?

Regards,
Edwin


On 3 October 2015 at 00:22, Erick Erickson <er...@gmail.com> wrote:

> Hmmm, there are usually a couple of ports that each ZK instance needs, 
> is it possible that you've got more than one process using one of 
> those ports?
>
> By default (I think), zookeeper uses "peer port + 1000" for its leader 
> election process, see:
> https://zookeeper.apache.org/doc/r3.3.3/zookeeperStarted.html
> the "Running Replicated Zookeeper" section.
>
> I'm not quite clear whether the above ZK2 port and ZK3 port are just 
> meant to indicate a single Zookeeper instance on a node or not so I 
> thought I'd check.
>
> Firewalls should always fail, not intermittently so I'm puzzled about 
> that....
>
> Best,
> Erick
>
> On Fri, Oct 2, 2015 at 1:33 AM, Adrian Liew <ad...@avanade.com>
> wrote:
> > Hi Edwin,
> >
> > I have followed the standards recommended by the Zookeeper article. 
> > It
> seems to be working.
> >
> > Incidentally, I am facing intermittent issues whereby I am unable to
> connect to Zookeeper service via Solr's zkCli.bat command, even after 
> having setting automatic startup of my ZooKeeper service. I have 
> basically configured (non-sucking-service-manager) nssm to auto start 
> Solr with a dependency of Zookeeper to ensure both services are 
> running on startup for each Solr VM.
> >
> > Here is an example what I tried to run to connect to the ZK service:
> >
> > E:\solr-5.3.0\server\scripts\cloud-scripts>zkcli.bat -z 
> > 10.0.0.6:2183
> -cmd list
> > Exception in thread "main" org.apache.solr.common.SolrException:
> java.util.concu
> > rrent.TimeoutException: Could not connect to ZooKeeper 10.0.0.6:2183
> within 3000
> > 0 ms
> >         at
> org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:18
> > 1)
> >         at
> org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:11
> > 5)
> >         at
> org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:10
> > 5)
> >         at org.apache.solr.cloud.ZkCLI.main(ZkCLI.java:181)
> > Caused by: java.util.concurrent.TimeoutException: Could not connect 
> > to
> ZooKeeper
> >  10.0.0.6:2183 within 30000 ms
> >         at
> org.apache.solr.common.cloud.ConnectionManager.waitForConnected(Conne
> > ctionManager.java:208)
> >         at
> org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:17
> > 3)
> >         ... 3 more
> >
> >
> > Further to this I inspected the output shown in console window by
> zkServer.cmd:
> >
> > 2015-10-02 08:24:09,305 [myid:3] - WARN
> [WorkerSender[myid=3]:QuorumCnxManager@
> > 382] - Cannot open channel to 2 at election address /10.0.0.5:3888
> > java.net.SocketTimeoutException: connect timed out
> >         at java.net.DualStackPlainSocketImpl.waitForConnect(Native
> Method)
> >         at java.net.DualStackPlainSocketImpl.socketConnect(Unknown
> Source)
> >         at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
> >         at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown
> Source)
> >         at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
> >         at java.net.PlainSocketImpl.connect(Unknown Source)
> >         at java.net.SocksSocketImpl.connect(Unknown Source)
> >         at java.net.Socket.connect(Unknown Source)
> >         at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(Quorum
> > CnxManager.java:368)
> >         at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxM
> > anager.java:341)
> >         at
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$Worke
> > rSender.process(FastLeaderElection.java:449)
> >         at
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$Worke
> > rSender.run(FastLeaderElection.java:430)
> >         at java.lang.Thread.run(Unknown Source)
> > 2015-10-02 08:24:09,305 [myid:3] - INFO
> [WorkerReceiver[myid=3]:FastLeaderElect
> > ion@597] - Notification: 1 (message format version), 3 (n.leader),
> 0x700000011 (
> > n.zxid), 0x1 (n.round), LOOKING (n.state), 3 (n.sid), 0x7 
> > (n.peerEpoch)
> LOOKING
> > (my state)
> >
> > I noticed the error message by zkServer.cmd as Cannot open channel 
> > to 2
> at election address /10.0.0.5:3888
> >
> > Can firewall settings be the issue here? I feel this may be a 
> > network
> issue between the individual Solr VMs. I am using a Windows Server 
> 2012 R2
> 64 bit environment to run Zookeeper 3.4.6 and Solr 5.3.0.
> >
> > Currently, I have setup my firewalls in the Advanced Configuration
> Firewall Settings as below:
> >
> > As for the Firewall settings I have configured the below for each 
> > Azure
> VM (Phoenix-Solr-0, Phoenix-Solr-1, Phoenix-Solr-2) in the Firewall 
> Advanced Security Settings:
> >
> > For allowed inbound connections:
> >
> > Solr port 8983
> > ZK1 port 2181
> > ZK2 port 2888
> > ZK3 port 3888
> >
> > Regards,
> > Adrian
> >
> > -----Original Message-----
> > From: Zheng Lin Edwin Yeo [mailto:edwinyeozl@gmail.com]
> > Sent: Friday, October 2, 2015 11:03 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Cannot connect to a zookeeper 3.4.6 instance via 
> > zkCli.cmd
> >
> > Hi Adrian,
> >
> > How is your setup of your system like? By right it shouldn't be an 
> > issue
> if we use different ports.
> >
> > in fact, if the various zookeeper instance are running on a single
> machine, they have to be on different ports in order for it to work.
> >
> >
> > Regards,
> > Edwin
> >
> >
> >
> > On 1 October 2015 at 18:19, Adrian Liew <ad...@avanade.com> wrote:
> >
> >> Hi all,
> >>
> >> The problem below was resolved by appropriately setting my server 
> >> ip addresses to have the following for each zoo.cfg:
> >>
> >> server.1=10.0.0.4:2888:3888
> >> server.2=10.0.0.5:2888:3888
> >> server.3=10.0.0.6:2888:3888
> >>
> >> as opposed to the following:
> >>
> >> server.1=10.0.0.4:2888:3888
> >> server.2=10.0.0.5:2889:3889
> >> server.3=10.0.0.6:2890:3890
> >>
> >> I am not sure why the above can be an issue (by right it should 
> >> not), however I followed the recommendations provided by Zookeeper 
> >> administration guide under RunningReplicatedZookeeper ( 
> >> https://zookeeper.apache.org/doc/r3.1.2/zookeeperStarted.html#sc_Ru
> >> nni
> >> ngReplicatedZooKeeper
> >> )
> >>
> >> Given that I am testing multiple servers in a mutiserver 
> >> environment, it will be safe to use 2888:3888 on each server rather 
> >> than have different ports.
> >>
> >> Regards,
> >> Adrian
> >>
> >> From: Adrian Liew [mailto:adrian.liew@avanade.com]
> >> Sent: Thursday, October 1, 2015 5:32 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd
> >>
> >> Hi there,
> >>
> >> Currently, I have setup an azure virtual network to connect my 
> >> Zookeeper clusters together with three Azure VMs. Each VM has an 
> >> internal IP of 10.0.0.4, 10.0.0.5 and 10.0.0.6. I have also setup 
> >> Solr
> >> 5.3.0 which runs in Solr Cloud mode connected to all three 
> >> Zookeepers in an external ensemble manner.
> >>
> >> I am able to connect to 10.0.0.4 and 10.0.0.6 via the zkCli.cmd 
> >> after starting the Zookeeper services. However for 10.0.0.5, I keep 
> >> getting the below error even if I started the zookeeper service.
> >>
> >> [cid:image001.png@01D0FC6E.BDC2D990]
> >>
> >> I have restarted 10.0.0.5 VM several times and still am unable to 
> >> connect to Zookeeper via zkCli.cmd. I have checked zoo.cfg (making 
> >> sure ports, data and logs are all set correctly) and myid to ensure 
> >> they have the correct configurations.
> >>
> >> The simple command line I used to connect to Zookeeper is zkCli.cmd 
> >> -server 10.0.0.5:2182 for example.
> >>
> >> Any ideas?
> >>
> >> Best regards,
> >>
> >> Adrian Liew |  Consultant Application Developer Avanade Malaysia Sdn.
> >> Bhd..| Consulting Services
> >> (: Direct: +(603) 2382 5668
> >> È: +6010-2288030
> >>
> >>
> >>
>

Re: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
Hi Adrian,

It's unlikely to be the firewall settings if it is failing intermittently.
More of a network issues.

The error says it's a connection time out, and since you say it happens
only intermittently, I'm suspecting it could be network issues.
Have you check if the connection to the various servers are always up?

Regards,
Edwin


On 3 October 2015 at 00:22, Erick Erickson <er...@gmail.com> wrote:

> Hmmm, there are usually a couple of ports that each ZK instance needs,
> is it possible that
> you've got more than one process using one of those ports?
>
> By default (I think), zookeeper uses "peer port + 1000" for its leader
> election process, see:
> https://zookeeper.apache.org/doc/r3.3.3/zookeeperStarted.html
> the "Running Replicated Zookeeper" section.
>
> I'm not quite clear whether the above ZK2 port and ZK3 port are just
> meant to indicate a single
> Zookeeper instance on a node or not so I thought I'd check.
>
> Firewalls should always fail, not intermittently so I'm puzzled about
> that....
>
> Best,
> Erick
>
> On Fri, Oct 2, 2015 at 1:33 AM, Adrian Liew <ad...@avanade.com>
> wrote:
> > Hi Edwin,
> >
> > I have followed the standards recommended by the Zookeeper article. It
> seems to be working.
> >
> > Incidentally, I am facing intermittent issues whereby I am unable to
> connect to Zookeeper service via Solr's zkCli.bat command, even after
> having setting automatic startup of my ZooKeeper service. I have basically
> configured (non-sucking-service-manager) nssm to auto start Solr with a
> dependency of Zookeeper to ensure both services are running on startup for
> each Solr VM.
> >
> > Here is an example what I tried to run to connect to the ZK service:
> >
> > E:\solr-5.3.0\server\scripts\cloud-scripts>zkcli.bat -z 10.0.0.6:2183
> -cmd list
> > Exception in thread "main" org.apache.solr.common.SolrException:
> java.util.concu
> > rrent.TimeoutException: Could not connect to ZooKeeper 10.0.0.6:2183
> within 3000
> > 0 ms
> >         at
> org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:18
> > 1)
> >         at
> org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:11
> > 5)
> >         at
> org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:10
> > 5)
> >         at org.apache.solr.cloud.ZkCLI.main(ZkCLI.java:181)
> > Caused by: java.util.concurrent.TimeoutException: Could not connect to
> ZooKeeper
> >  10.0.0.6:2183 within 30000 ms
> >         at
> org.apache.solr.common.cloud.ConnectionManager.waitForConnected(Conne
> > ctionManager.java:208)
> >         at
> org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:17
> > 3)
> >         ... 3 more
> >
> >
> > Further to this I inspected the output shown in console window by
> zkServer.cmd:
> >
> > 2015-10-02 08:24:09,305 [myid:3] - WARN
> [WorkerSender[myid=3]:QuorumCnxManager@
> > 382] - Cannot open channel to 2 at election address /10.0.0.5:3888
> > java.net.SocketTimeoutException: connect timed out
> >         at java.net.DualStackPlainSocketImpl.waitForConnect(Native
> Method)
> >         at java.net.DualStackPlainSocketImpl.socketConnect(Unknown
> Source)
> >         at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
> >         at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown
> Source)
> >         at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
> >         at java.net.PlainSocketImpl.connect(Unknown Source)
> >         at java.net.SocksSocketImpl.connect(Unknown Source)
> >         at java.net.Socket.connect(Unknown Source)
> >         at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(Quorum
> > CnxManager.java:368)
> >         at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxM
> > anager.java:341)
> >         at
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$Worke
> > rSender.process(FastLeaderElection.java:449)
> >         at
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$Worke
> > rSender.run(FastLeaderElection.java:430)
> >         at java.lang.Thread.run(Unknown Source)
> > 2015-10-02 08:24:09,305 [myid:3] - INFO
> [WorkerReceiver[myid=3]:FastLeaderElect
> > ion@597] - Notification: 1 (message format version), 3 (n.leader),
> 0x700000011 (
> > n.zxid), 0x1 (n.round), LOOKING (n.state), 3 (n.sid), 0x7 (n.peerEpoch)
> LOOKING
> > (my state)
> >
> > I noticed the error message by zkServer.cmd as Cannot open channel to 2
> at election address /10.0.0.5:3888
> >
> > Can firewall settings be the issue here? I feel this may be a network
> issue between the individual Solr VMs. I am using a Windows Server 2012 R2
> 64 bit environment to run Zookeeper 3.4.6 and Solr 5.3.0.
> >
> > Currently, I have setup my firewalls in the Advanced Configuration
> Firewall Settings as below:
> >
> > As for the Firewall settings I have configured the below for each Azure
> VM (Phoenix-Solr-0, Phoenix-Solr-1, Phoenix-Solr-2) in the Firewall
> Advanced Security Settings:
> >
> > For allowed inbound connections:
> >
> > Solr port 8983
> > ZK1 port 2181
> > ZK2 port 2888
> > ZK3 port 3888
> >
> > Regards,
> > Adrian
> >
> > -----Original Message-----
> > From: Zheng Lin Edwin Yeo [mailto:edwinyeozl@gmail.com]
> > Sent: Friday, October 2, 2015 11:03 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd
> >
> > Hi Adrian,
> >
> > How is your setup of your system like? By right it shouldn't be an issue
> if we use different ports.
> >
> > in fact, if the various zookeeper instance are running on a single
> machine, they have to be on different ports in order for it to work.
> >
> >
> > Regards,
> > Edwin
> >
> >
> >
> > On 1 October 2015 at 18:19, Adrian Liew <ad...@avanade.com> wrote:
> >
> >> Hi all,
> >>
> >> The problem below was resolved by appropriately setting my server ip
> >> addresses to have the following for each zoo.cfg:
> >>
> >> server.1=10.0.0.4:2888:3888
> >> server.2=10.0.0.5:2888:3888
> >> server.3=10.0.0.6:2888:3888
> >>
> >> as opposed to the following:
> >>
> >> server.1=10.0.0.4:2888:3888
> >> server.2=10.0.0.5:2889:3889
> >> server.3=10.0.0.6:2890:3890
> >>
> >> I am not sure why the above can be an issue (by right it should not),
> >> however I followed the recommendations provided by Zookeeper
> >> administration guide under RunningReplicatedZookeeper (
> >> https://zookeeper.apache.org/doc/r3.1.2/zookeeperStarted.html#sc_Runni
> >> ngReplicatedZooKeeper
> >> )
> >>
> >> Given that I am testing multiple servers in a mutiserver environment,
> >> it will be safe to use 2888:3888 on each server rather than have
> >> different ports.
> >>
> >> Regards,
> >> Adrian
> >>
> >> From: Adrian Liew [mailto:adrian.liew@avanade.com]
> >> Sent: Thursday, October 1, 2015 5:32 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd
> >>
> >> Hi there,
> >>
> >> Currently, I have setup an azure virtual network to connect my
> >> Zookeeper clusters together with three Azure VMs. Each VM has an
> >> internal IP of 10.0.0.4, 10.0.0.5 and 10.0.0.6. I have also setup Solr
> >> 5.3.0 which runs in Solr Cloud mode connected to all three Zookeepers
> >> in an external ensemble manner.
> >>
> >> I am able to connect to 10.0.0.4 and 10.0.0.6 via the zkCli.cmd after
> >> starting the Zookeeper services. However for 10.0.0.5, I keep getting
> >> the below error even if I started the zookeeper service.
> >>
> >> [cid:image001.png@01D0FC6E.BDC2D990]
> >>
> >> I have restarted 10.0.0.5 VM several times and still am unable to
> >> connect to Zookeeper via zkCli.cmd. I have checked zoo.cfg (making
> >> sure ports, data and logs are all set correctly) and myid to ensure
> >> they have the correct configurations.
> >>
> >> The simple command line I used to connect to Zookeeper is zkCli.cmd
> >> -server 10.0.0.5:2182 for example.
> >>
> >> Any ideas?
> >>
> >> Best regards,
> >>
> >> Adrian Liew |  Consultant Application Developer Avanade Malaysia Sdn.
> >> Bhd..| Consulting Services
> >> (: Direct: +(603) 2382 5668
> >> È: +6010-2288030
> >>
> >>
> >>
>

Re: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd

Posted by Erick Erickson <er...@gmail.com>.
Hmmm, there are usually a couple of ports that each ZK instance needs,
is it possible that
you've got more than one process using one of those ports?

By default (I think), zookeeper uses "peer port + 1000" for its leader
election process, see:
https://zookeeper.apache.org/doc/r3.3.3/zookeeperStarted.html
the "Running Replicated Zookeeper" section.

I'm not quite clear whether the above ZK2 port and ZK3 port are just
meant to indicate a single
Zookeeper instance on a node or not so I thought I'd check.

Firewalls should always fail, not intermittently so I'm puzzled about that....

Best,
Erick

On Fri, Oct 2, 2015 at 1:33 AM, Adrian Liew <ad...@avanade.com> wrote:
> Hi Edwin,
>
> I have followed the standards recommended by the Zookeeper article. It seems to be working.
>
> Incidentally, I am facing intermittent issues whereby I am unable to connect to Zookeeper service via Solr's zkCli.bat command, even after having setting automatic startup of my ZooKeeper service. I have basically configured (non-sucking-service-manager) nssm to auto start Solr with a dependency of Zookeeper to ensure both services are running on startup for each Solr VM.
>
> Here is an example what I tried to run to connect to the ZK service:
>
> E:\solr-5.3.0\server\scripts\cloud-scripts>zkcli.bat -z 10.0.0.6:2183 -cmd list
> Exception in thread "main" org.apache.solr.common.SolrException: java.util.concu
> rrent.TimeoutException: Could not connect to ZooKeeper 10.0.0.6:2183 within 3000
> 0 ms
>         at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:18
> 1)
>         at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:11
> 5)
>         at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:10
> 5)
>         at org.apache.solr.cloud.ZkCLI.main(ZkCLI.java:181)
> Caused by: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper
>  10.0.0.6:2183 within 30000 ms
>         at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(Conne
> ctionManager.java:208)
>         at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:17
> 3)
>         ... 3 more
>
>
> Further to this I inspected the output shown in console window by zkServer.cmd:
>
> 2015-10-02 08:24:09,305 [myid:3] - WARN  [WorkerSender[myid=3]:QuorumCnxManager@
> 382] - Cannot open channel to 2 at election address /10.0.0.5:3888
> java.net.SocketTimeoutException: connect timed out
>         at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
>         at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
>         at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
>         at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
>         at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
>         at java.net.PlainSocketImpl.connect(Unknown Source)
>         at java.net.SocksSocketImpl.connect(Unknown Source)
>         at java.net.Socket.connect(Unknown Source)
>         at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(Quorum
> CnxManager.java:368)
>         at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxM
> anager.java:341)
>         at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$Worke
> rSender.process(FastLeaderElection.java:449)
>         at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$Worke
> rSender.run(FastLeaderElection.java:430)
>         at java.lang.Thread.run(Unknown Source)
> 2015-10-02 08:24:09,305 [myid:3] - INFO  [WorkerReceiver[myid=3]:FastLeaderElect
> ion@597] - Notification: 1 (message format version), 3 (n.leader), 0x700000011 (
> n.zxid), 0x1 (n.round), LOOKING (n.state), 3 (n.sid), 0x7 (n.peerEpoch) LOOKING
> (my state)
>
> I noticed the error message by zkServer.cmd as Cannot open channel to 2 at election address /10.0.0.5:3888
>
> Can firewall settings be the issue here? I feel this may be a network issue between the individual Solr VMs. I am using a Windows Server 2012 R2 64 bit environment to run Zookeeper 3.4.6 and Solr 5.3.0.
>
> Currently, I have setup my firewalls in the Advanced Configuration Firewall Settings as below:
>
> As for the Firewall settings I have configured the below for each Azure VM (Phoenix-Solr-0, Phoenix-Solr-1, Phoenix-Solr-2) in the Firewall Advanced Security Settings:
>
> For allowed inbound connections:
>
> Solr port 8983
> ZK1 port 2181
> ZK2 port 2888
> ZK3 port 3888
>
> Regards,
> Adrian
>
> -----Original Message-----
> From: Zheng Lin Edwin Yeo [mailto:edwinyeozl@gmail.com]
> Sent: Friday, October 2, 2015 11:03 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd
>
> Hi Adrian,
>
> How is your setup of your system like? By right it shouldn't be an issue if we use different ports.
>
> in fact, if the various zookeeper instance are running on a single machine, they have to be on different ports in order for it to work.
>
>
> Regards,
> Edwin
>
>
>
> On 1 October 2015 at 18:19, Adrian Liew <ad...@avanade.com> wrote:
>
>> Hi all,
>>
>> The problem below was resolved by appropriately setting my server ip
>> addresses to have the following for each zoo.cfg:
>>
>> server.1=10.0.0.4:2888:3888
>> server.2=10.0.0.5:2888:3888
>> server.3=10.0.0.6:2888:3888
>>
>> as opposed to the following:
>>
>> server.1=10.0.0.4:2888:3888
>> server.2=10.0.0.5:2889:3889
>> server.3=10.0.0.6:2890:3890
>>
>> I am not sure why the above can be an issue (by right it should not),
>> however I followed the recommendations provided by Zookeeper
>> administration guide under RunningReplicatedZookeeper (
>> https://zookeeper.apache.org/doc/r3.1.2/zookeeperStarted.html#sc_Runni
>> ngReplicatedZooKeeper
>> )
>>
>> Given that I am testing multiple servers in a mutiserver environment,
>> it will be safe to use 2888:3888 on each server rather than have
>> different ports.
>>
>> Regards,
>> Adrian
>>
>> From: Adrian Liew [mailto:adrian.liew@avanade.com]
>> Sent: Thursday, October 1, 2015 5:32 PM
>> To: solr-user@lucene.apache.org
>> Subject: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd
>>
>> Hi there,
>>
>> Currently, I have setup an azure virtual network to connect my
>> Zookeeper clusters together with three Azure VMs. Each VM has an
>> internal IP of 10.0.0.4, 10.0.0.5 and 10.0.0.6. I have also setup Solr
>> 5.3.0 which runs in Solr Cloud mode connected to all three Zookeepers
>> in an external ensemble manner.
>>
>> I am able to connect to 10.0.0.4 and 10.0.0.6 via the zkCli.cmd after
>> starting the Zookeeper services. However for 10.0.0.5, I keep getting
>> the below error even if I started the zookeeper service.
>>
>> [cid:image001.png@01D0FC6E.BDC2D990]
>>
>> I have restarted 10.0.0.5 VM several times and still am unable to
>> connect to Zookeeper via zkCli.cmd. I have checked zoo.cfg (making
>> sure ports, data and logs are all set correctly) and myid to ensure
>> they have the correct configurations.
>>
>> The simple command line I used to connect to Zookeeper is zkCli.cmd
>> -server 10.0.0.5:2182 for example.
>>
>> Any ideas?
>>
>> Best regards,
>>
>> Adrian Liew |  Consultant Application Developer Avanade Malaysia Sdn.
>> Bhd..| Consulting Services
>> (: Direct: +(603) 2382 5668
>> È: +6010-2288030
>>
>>
>>

RE: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd

Posted by Adrian Liew <ad...@avanade.com>.
Hi Edwin,

I have followed the standards recommended by the Zookeeper article. It seems to be working.

Incidentally, I am facing intermittent issues whereby I am unable to connect to Zookeeper service via Solr's zkCli.bat command, even after having setting automatic startup of my ZooKeeper service. I have basically configured (non-sucking-service-manager) nssm to auto start Solr with a dependency of Zookeeper to ensure both services are running on startup for each Solr VM. 

Here is an example what I tried to run to connect to the ZK service:

E:\solr-5.3.0\server\scripts\cloud-scripts>zkcli.bat -z 10.0.0.6:2183 -cmd list
Exception in thread "main" org.apache.solr.common.SolrException: java.util.concu
rrent.TimeoutException: Could not connect to ZooKeeper 10.0.0.6:2183 within 3000
0 ms
        at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:18
1)
        at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:11
5)
        at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:10
5)
        at org.apache.solr.cloud.ZkCLI.main(ZkCLI.java:181)
Caused by: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper
 10.0.0.6:2183 within 30000 ms
        at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(Conne
ctionManager.java:208)
        at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:17
3)
        ... 3 more


Further to this I inspected the output shown in console window by zkServer.cmd:

2015-10-02 08:24:09,305 [myid:3] - WARN  [WorkerSender[myid=3]:QuorumCnxManager@
382] - Cannot open channel to 2 at election address /10.0.0.5:3888
java.net.SocketTimeoutException: connect timed out
        at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
        at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
        at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
        at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
        at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
        at java.net.PlainSocketImpl.connect(Unknown Source)
        at java.net.SocksSocketImpl.connect(Unknown Source)
        at java.net.Socket.connect(Unknown Source)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(Quorum
CnxManager.java:368)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxM
anager.java:341)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$Worke
rSender.process(FastLeaderElection.java:449)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$Worke
rSender.run(FastLeaderElection.java:430)
        at java.lang.Thread.run(Unknown Source)
2015-10-02 08:24:09,305 [myid:3] - INFO  [WorkerReceiver[myid=3]:FastLeaderElect
ion@597] - Notification: 1 (message format version), 3 (n.leader), 0x700000011 (
n.zxid), 0x1 (n.round), LOOKING (n.state), 3 (n.sid), 0x7 (n.peerEpoch) LOOKING
(my state)

I noticed the error message by zkServer.cmd as Cannot open channel to 2 at election address /10.0.0.5:3888

Can firewall settings be the issue here? I feel this may be a network issue between the individual Solr VMs. I am using a Windows Server 2012 R2 64 bit environment to run Zookeeper 3.4.6 and Solr 5.3.0.

Currently, I have setup my firewalls in the Advanced Configuration Firewall Settings as below:

As for the Firewall settings I have configured the below for each Azure VM (Phoenix-Solr-0, Phoenix-Solr-1, Phoenix-Solr-2) in the Firewall Advanced Security Settings:

For allowed inbound connections:

Solr port 8983
ZK1 port 2181
ZK2 port 2888
ZK3 port 3888

Regards,
Adrian

-----Original Message-----
From: Zheng Lin Edwin Yeo [mailto:edwinyeozl@gmail.com] 
Sent: Friday, October 2, 2015 11:03 AM
To: solr-user@lucene.apache.org
Subject: Re: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd

Hi Adrian,

How is your setup of your system like? By right it shouldn't be an issue if we use different ports.

in fact, if the various zookeeper instance are running on a single machine, they have to be on different ports in order for it to work.


Regards,
Edwin



On 1 October 2015 at 18:19, Adrian Liew <ad...@avanade.com> wrote:

> Hi all,
>
> The problem below was resolved by appropriately setting my server ip 
> addresses to have the following for each zoo.cfg:
>
> server.1=10.0.0.4:2888:3888
> server.2=10.0.0.5:2888:3888
> server.3=10.0.0.6:2888:3888
>
> as opposed to the following:
>
> server.1=10.0.0.4:2888:3888
> server.2=10.0.0.5:2889:3889
> server.3=10.0.0.6:2890:3890
>
> I am not sure why the above can be an issue (by right it should not), 
> however I followed the recommendations provided by Zookeeper 
> administration guide under RunningReplicatedZookeeper ( 
> https://zookeeper.apache.org/doc/r3.1.2/zookeeperStarted.html#sc_Runni
> ngReplicatedZooKeeper
> )
>
> Given that I am testing multiple servers in a mutiserver environment, 
> it will be safe to use 2888:3888 on each server rather than have 
> different ports.
>
> Regards,
> Adrian
>
> From: Adrian Liew [mailto:adrian.liew@avanade.com]
> Sent: Thursday, October 1, 2015 5:32 PM
> To: solr-user@lucene.apache.org
> Subject: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd
>
> Hi there,
>
> Currently, I have setup an azure virtual network to connect my 
> Zookeeper clusters together with three Azure VMs. Each VM has an 
> internal IP of 10.0.0.4, 10.0.0.5 and 10.0.0.6. I have also setup Solr 
> 5.3.0 which runs in Solr Cloud mode connected to all three Zookeepers 
> in an external ensemble manner.
>
> I am able to connect to 10.0.0.4 and 10.0.0.6 via the zkCli.cmd after 
> starting the Zookeeper services. However for 10.0.0.5, I keep getting 
> the below error even if I started the zookeeper service.
>
> [cid:image001.png@01D0FC6E.BDC2D990]
>
> I have restarted 10.0.0.5 VM several times and still am unable to 
> connect to Zookeeper via zkCli.cmd. I have checked zoo.cfg (making 
> sure ports, data and logs are all set correctly) and myid to ensure 
> they have the correct configurations.
>
> The simple command line I used to connect to Zookeeper is zkCli.cmd 
> -server 10.0.0.5:2182 for example.
>
> Any ideas?
>
> Best regards,
>
> Adrian Liew |  Consultant Application Developer Avanade Malaysia Sdn. 
> Bhd..| Consulting Services
> (: Direct: +(603) 2382 5668
> È: +6010-2288030
>
>
>

Re: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
Hi Adrian,

How is your setup of your system like? By right it shouldn't be an issue if
we use different ports.

in fact, if the various zookeeper instance are running on a single machine,
they have to be on different ports in order for it to work.


Regards,
Edwin



On 1 October 2015 at 18:19, Adrian Liew <ad...@avanade.com> wrote:

> Hi all,
>
> The problem below was resolved by appropriately setting my server ip
> addresses to have the following for each zoo.cfg:
>
> server.1=10.0.0.4:2888:3888
> server.2=10.0.0.5:2888:3888
> server.3=10.0.0.6:2888:3888
>
> as opposed to the following:
>
> server.1=10.0.0.4:2888:3888
> server.2=10.0.0.5:2889:3889
> server.3=10.0.0.6:2890:3890
>
> I am not sure why the above can be an issue (by right it should not),
> however I followed the recommendations provided by Zookeeper administration
> guide under RunningReplicatedZookeeper (
> https://zookeeper.apache.org/doc/r3.1.2/zookeeperStarted.html#sc_RunningReplicatedZooKeeper
> )
>
> Given that I am testing multiple servers in a mutiserver environment, it
> will be safe to use 2888:3888 on each server rather than have different
> ports.
>
> Regards,
> Adrian
>
> From: Adrian Liew [mailto:adrian.liew@avanade.com]
> Sent: Thursday, October 1, 2015 5:32 PM
> To: solr-user@lucene.apache.org
> Subject: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd
>
> Hi there,
>
> Currently, I have setup an azure virtual network to connect my Zookeeper
> clusters together with three Azure VMs. Each VM has an internal IP of
> 10.0.0.4, 10.0.0.5 and 10.0.0.6. I have also setup Solr 5.3.0 which runs in
> Solr Cloud mode connected to all three Zookeepers in an external ensemble
> manner.
>
> I am able to connect to 10.0.0.4 and 10.0.0.6 via the zkCli.cmd after
> starting the Zookeeper services. However for 10.0.0.5, I keep getting the
> below error even if I started the zookeeper service.
>
> [cid:image001.png@01D0FC6E.BDC2D990]
>
> I have restarted 10.0.0.5 VM several times and still am unable to connect
> to Zookeeper via zkCli.cmd. I have checked zoo.cfg (making sure ports, data
> and logs are all set correctly) and myid to ensure they have the correct
> configurations.
>
> The simple command line I used to connect to Zookeeper is zkCli.cmd
> -server 10.0.0.5:2182 for example.
>
> Any ideas?
>
> Best regards,
>
> Adrian Liew |  Consultant Application Developer
> Avanade Malaysia Sdn. Bhd..| Consulting Services
> (: Direct: +(603) 2382 5668
> È: +6010-2288030
>
>
>

RE: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd

Posted by Adrian Liew <ad...@avanade.com>.
Hi all,

The problem below was resolved by appropriately setting my server ip addresses to have the following for each zoo.cfg:

server.1=10.0.0.4:2888:3888
server.2=10.0.0.5:2888:3888
server.3=10.0.0.6:2888:3888

as opposed to the following:

server.1=10.0.0.4:2888:3888
server.2=10.0.0.5:2889:3889
server.3=10.0.0.6:2890:3890

I am not sure why the above can be an issue (by right it should not), however I followed the recommendations provided by Zookeeper administration guide under RunningReplicatedZookeeper (https://zookeeper.apache.org/doc/r3.1.2/zookeeperStarted.html#sc_RunningReplicatedZooKeeper)

Given that I am testing multiple servers in a mutiserver environment, it will be safe to use 2888:3888 on each server rather than have different ports.

Regards,
Adrian

From: Adrian Liew [mailto:adrian.liew@avanade.com]
Sent: Thursday, October 1, 2015 5:32 PM
To: solr-user@lucene.apache.org
Subject: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd

Hi there,

Currently, I have setup an azure virtual network to connect my Zookeeper clusters together with three Azure VMs. Each VM has an internal IP of 10.0.0.4, 10.0.0.5 and 10.0.0.6. I have also setup Solr 5.3.0 which runs in Solr Cloud mode connected to all three Zookeepers in an external ensemble manner.

I am able to connect to 10.0.0.4 and 10.0.0.6 via the zkCli.cmd after starting the Zookeeper services. However for 10.0.0.5, I keep getting the below error even if I started the zookeeper service.

[cid:image001.png@01D0FC6E.BDC2D990]

I have restarted 10.0.0.5 VM several times and still am unable to connect to Zookeeper via zkCli.cmd. I have checked zoo.cfg (making sure ports, data and logs are all set correctly) and myid to ensure they have the correct configurations.

The simple command line I used to connect to Zookeeper is zkCli.cmd -server 10.0.0.5:2182 for example.

Any ideas?

Best regards,

Adrian Liew |  Consultant Application Developer
Avanade Malaysia Sdn. Bhd..| Consulting Services
(: Direct: +(603) 2382 5668
È: +6010-2288030