You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Mikhail Khludnev (JIRA)" <ji...@apache.org> on 2016/10/15 20:54:20 UTC

[jira] [Commented] (SOLR-9647) CollectionsAPIDistributedZkTest got stuck, reproduces failure

    [ https://issues.apache.org/jira/browse/SOLR-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15578736#comment-15578736 ] 

Mikhail Khludnev commented on SOLR-9647:
----------------------------------------

Here are excerpts from the failure log tail.
{code}
 2> 90   INFO  (SUITE-CollectionsAPIDistributedZkTest-seed#[355E7B68C1B5A5B6]-worker) [    ] o.a.s.SolrTestCaseJ4 Randomized ssl (true) and clientAuth (false) via: 
...
  2> 263082 INFO  (zkCallback-32-thread-2-processing-n:127.0.0.1:49743_) [n:127.0.0.1:49743_    ] o.a.s.c.Overseer Overseer (id=96767662755807251-127.0.0.1:49743_-n_0000000003) starting
  2> 263083 INFO  (zkCallback-39-thread-4-processing-n:127.0.0.1:49770_) [n:127.0.0.1:49770_ c:collection1 s:shard1 r:core_node4 x:collection1] o.a.s.c.ShardLeaderElectionContextBase No version found for ephemeral leader parent node, won't remove previous leader registration.
  2> 263087 INFO  (zkCallback-39-thread-4-processing-n:127.0.0.1:49770_) [n:127.0.0.1:49770_ c:collection1 s:shard1 r:core_node4 x:collection1] o.a.s.c.ActionThrottle The last leader attempt started 21ms ago.
  2> 263087 INFO  (zkCallback-39-thread-4-processing-n:127.0.0.1:49770_) [n:127.0.0.1:49770_ c:collection1 s:shard1 r:core_node4 x:collection1] o.a.s.c.ActionThrottle Throttling leader attempts - waiting for 4978ms
  2> 264298 ERROR (zkCallback-15-thread-2-EventThread) [    ] o.a.s.c.c.ZkStateReader Error reading cluster properties from zookeeper
  2> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /clusterprops.json
  2> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
...
{code}

{code}
268216 WARN  (Thread-1) [    ] o.a.s.c.ZkTestServer Watch limit violations: 
  2> Maximum concurrent create/delete watches above limit:
  2> 
  2> 	12	/solr/aliases.json
  2> 	5	/solr/security.json
  2> 	5	/solr/configs/conf1
  2> 	4	/solr/collections/collection1/state.json
  2> 
  2> Maximum concurrent data watches above limit:
  2> 
  2> 	12	/solr/clusterstate.json
  2> 	12	/solr/clusterprops.json
  2> 
  2> Maximum concurrent children watches above limit:
  2> 
  2> 	109	/solr/overseer/collection-queue-work
  2> 	39	/solr/overseer/queue
  2> 	12	/solr/live_nodes
  2> 	12	/solr/collections
  2> 	11	/solr/overseer/queue-work
  2> 
{code}

I don't know the details but what "ActionThrottle Throttling leader attempts - waiting for 4978ms" is about? Is the test aware about such trotting? 
Even concurrent watches limits does/means nothing, isn't there are leak of watches? 

> CollectionsAPIDistributedZkTest got stuck, reproduces failure
> -------------------------------------------------------------
>
>                 Key: SOLR-9647
>                 URL: https://issues.apache.org/jira/browse/SOLR-9647
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Mikhail Khludnev
>
>  I have to shoot https://builds.apache.org/job/Lucene-Solr-NightlyTests-master/1129/ just because "Took 1 day 12 hr on lucene".
>    [junit4] HEARTBEAT J0 PID(30506@lucene1-us-west): 2016-10-15T00:08:30, stalled for 48990s at: CollectionsAPIDistributedZkTest.test
>    [junit4] HEARTBEAT J0 PID(30506@lucene1-us-west): 2016-10-15T00:09:30, stalled for 49050s at: CollectionsAPIDistributedZkTest.test
>  It's just got stuck. Then I run it locally, it passes from Eclipse, but fails when I run from cmd>ant. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org