You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Mikhail Khludnev (JIRA)" <ji...@apache.org> on 2016/10/15 20:54:20 UTC
[jira] [Commented] (SOLR-9647) CollectionsAPIDistributedZkTest got
stuck, reproduces failure
[ https://issues.apache.org/jira/browse/SOLR-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15578736#comment-15578736 ]
Mikhail Khludnev commented on SOLR-9647:
----------------------------------------
Here are excerpts from the failure log tail.
{code}
2> 90 INFO (SUITE-CollectionsAPIDistributedZkTest-seed#[355E7B68C1B5A5B6]-worker) [ ] o.a.s.SolrTestCaseJ4 Randomized ssl (true) and clientAuth (false) via:
...
2> 263082 INFO (zkCallback-32-thread-2-processing-n:127.0.0.1:49743_) [n:127.0.0.1:49743_ ] o.a.s.c.Overseer Overseer (id=96767662755807251-127.0.0.1:49743_-n_0000000003) starting
2> 263083 INFO (zkCallback-39-thread-4-processing-n:127.0.0.1:49770_) [n:127.0.0.1:49770_ c:collection1 s:shard1 r:core_node4 x:collection1] o.a.s.c.ShardLeaderElectionContextBase No version found for ephemeral leader parent node, won't remove previous leader registration.
2> 263087 INFO (zkCallback-39-thread-4-processing-n:127.0.0.1:49770_) [n:127.0.0.1:49770_ c:collection1 s:shard1 r:core_node4 x:collection1] o.a.s.c.ActionThrottle The last leader attempt started 21ms ago.
2> 263087 INFO (zkCallback-39-thread-4-processing-n:127.0.0.1:49770_) [n:127.0.0.1:49770_ c:collection1 s:shard1 r:core_node4 x:collection1] o.a.s.c.ActionThrottle Throttling leader attempts - waiting for 4978ms
2> 264298 ERROR (zkCallback-15-thread-2-EventThread) [ ] o.a.s.c.c.ZkStateReader Error reading cluster properties from zookeeper
2> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /clusterprops.json
2> at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
...
{code}
{code}
268216 WARN (Thread-1) [ ] o.a.s.c.ZkTestServer Watch limit violations:
2> Maximum concurrent create/delete watches above limit:
2>
2> 12 /solr/aliases.json
2> 5 /solr/security.json
2> 5 /solr/configs/conf1
2> 4 /solr/collections/collection1/state.json
2>
2> Maximum concurrent data watches above limit:
2>
2> 12 /solr/clusterstate.json
2> 12 /solr/clusterprops.json
2>
2> Maximum concurrent children watches above limit:
2>
2> 109 /solr/overseer/collection-queue-work
2> 39 /solr/overseer/queue
2> 12 /solr/live_nodes
2> 12 /solr/collections
2> 11 /solr/overseer/queue-work
2>
{code}
I don't know the details but what "ActionThrottle Throttling leader attempts - waiting for 4978ms" is about? Is the test aware about such trotting?
Even concurrent watches limits does/means nothing, isn't there are leak of watches?
> CollectionsAPIDistributedZkTest got stuck, reproduces failure
> -------------------------------------------------------------
>
> Key: SOLR-9647
> URL: https://issues.apache.org/jira/browse/SOLR-9647
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Mikhail Khludnev
>
> I have to shoot https://builds.apache.org/job/Lucene-Solr-NightlyTests-master/1129/ just because "Took 1 day 12 hr on lucene".
> [junit4] HEARTBEAT J0 PID(30506@lucene1-us-west): 2016-10-15T00:08:30, stalled for 48990s at: CollectionsAPIDistributedZkTest.test
> [junit4] HEARTBEAT J0 PID(30506@lucene1-us-west): 2016-10-15T00:09:30, stalled for 49050s at: CollectionsAPIDistributedZkTest.test
> It's just got stuck. Then I run it locally, it passes from Eclipse, but fails when I run from cmd>ant.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org