You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Danny Shih <ds...@tableau.com> on 2018/08/22 22:51:37 UTC

SOLR zookeeper connection timeout during startup is hardcoded to 10000ms

Hi,

During startup in cloud mode, the SOLR zookeeper connection timeout appears to be hardcoded to 1000ms:
https://github.com/apache/lucene-solr/blob/5eab1c3c688a0d8db650c657567f197fb3dcf181/solr/solrj/src/java/org/apache/solr/client/solrj/impl/ZkClientClusterStateProvider.java#L45

And it is not configurable via zkClientTimeout (solr.xml) or SOLR_WAIT_FOR_ZK (solr.in.sh).

Is there a way to configure this, and if not, should I open a bug?

Thanks,
Danny

Re: SOLR zookeeper connection timeout during startup is hardcoded to 10000ms

Posted by AshB <bi...@gmail.com>.
Can this timeout value be changed .

Regards
Ashish



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: SOLR zookeeper connection timeout during startup is hardcoded to 10000ms

Posted by Dominique Bejean <do...@eolya.fr>.
Hi,

We also experimenting time-out issues from time to time.

I sent this message one month ago, by mistake in the dev list.

Why use hardcoded values just in ZkClientClusterStateProvider.java file
while there are existing parameters for these time-out ?

Regards

Dominique

================================================

We are experimenting an issue related to Zk Timeout

Stacktrace is :

ERROR 19 juin 2018 06:24:07,152 - h.concurrent.ConcurrentService:67   -
Erreur dans l'attente de la fin de l'exécution d'un thread
ERROR 19 juin 2018 06:24:07,152 - h.concurrent.ConcurrentService:68   -
org.apache.solr.common.SolrException:
java.util.concurrent.TimeoutException: Could not connect to ZooKeeper
xxx.xxx.xxx.xxx  :2181 within 10000 ms
ERROR 19 juin 2018 06:24:07,152 -           api.batch.Lanceur:98   -
org.apache.solr.common.SolrException:
java.util.concurrent.TimeoutException: Could not connect to ZooKeeper
xxx.xxx.xxx.xxx  :2181 within 10000 ms
java.util.concurrent.ExecutionException:
org.apache.solr.common.SolrException:
java.util.concurrent.TimeoutException: Could not connect to ZooKeeper
xxx.xxx.xxx.xxx:2181 within 10000 ms
 at java.util.concurrent.FutureTask.report(FutureTask.java:122)
 ...
Caused by: org.apache.solr.common.SolrException:
java.util.concurrent.TimeoutException: Could not connect to ZooKeeper
xxx.xxx.xxx.xxx:2181 within 10000 ms
 at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:182)
 at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:116)
 at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:106)
 at
org.apache.solr.common.cloud.ZkStateReader.<init>(ZkStateReader.java:226)
 at
org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider.connect(ZkClientClusterStateProvider.java:121)
...


In solr.xml, we have :
    <int name="zkClientTimeout">${zkClientTimeout:30000}</int>

In Solr.in.sh <http://solr.in.sh/>, we have :
#ZK_CLIENT_TIMEOUT="15000"
or
ZK_CLIENT_TIMEOUT="30000"

So zkClientTimeout  should be 30000.

In source code of ZkClientClusterStateProvider.java, I see zkClientTimeout
is hardcoded to 10000 ! Is it normal that configuration is not used ?

lucene-solr/solr/solrj/src/java/org/apache/solr/client/solrj/impl/ZkClientClusterStateProvider.java

int zkConnectTimeout = 10000;
int zkClientTimeout = 10000;

...

zk = new ZkStateReader(zkHost, zkClientTimeout, zkConnectTimeout);


Regards.

Le ven. 24 août 2018 à 20:15, dshih <ds...@tableau.com> a écrit :

> Sorry, yes 10,000 ms.
>
> We have a single test cluster (out of probably hundreds) where one node
> hits
> this consistently.  I'm not sure what kind of issues (network?) that node
> is
> having.
>
> Generally though, we ship SOLR as part of our product, and we cannot
> control
> our customers' hardware and setup besides listing minimum requirements.
> While I think this issue will probably be extremely rare, we would
> definitely prefer to be able to say: "well, if you can't fix your hardware
> issue, try increasing this timeout setting".
>
> Thanks,
> Danny
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: SOLR zookeeper connection timeout during startup is hardcoded to 10000ms

Posted by dshih <ds...@tableau.com>.
Sorry, yes 10,000 ms.

We have a single test cluster (out of probably hundreds) where one node hits
this consistently.  I'm not sure what kind of issues (network?) that node is
having.

Generally though, we ship SOLR as part of our product, and we cannot control
our customers' hardware and setup besides listing minimum requirements.
While I think this issue will probably be extremely rare, we would
definitely prefer to be able to say: "well, if you can't fix your hardware
issue, try increasing this timeout setting".

Thanks,
Danny



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: SOLR zookeeper connection timeout during startup is hardcoded to 10000ms

Posted by Erick Erickson <er...@gmail.com>.
That's actually 10,000 ms, a typo in your message?

Do you have a situation where that setting is causing you trouble?
Because 10 seconds for communications with ZK is quite a long time,
I'm curious what the circumstances are that you're seeing.

Best,
Erick

On Wed, Aug 22, 2018 at 3:51 PM, Danny Shih <ds...@tableau.com> wrote:
> Hi,
>
> During startup in cloud mode, the SOLR zookeeper connection timeout appears to be hardcoded to 1000ms:
> https://github.com/apache/lucene-solr/blob/5eab1c3c688a0d8db650c657567f197fb3dcf181/solr/solrj/src/java/org/apache/solr/client/solrj/impl/ZkClientClusterStateProvider.java#L45
>
> And it is not configurable via zkClientTimeout (solr.xml) or SOLR_WAIT_FOR_ZK (solr.in.sh).
>
> Is there a way to configure this, and if not, should I open a bug?
>
> Thanks,
> Danny