You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Timothy Potter (JIRA)" <ji...@apache.org> on 2014/05/22 22:59:02 UTC

[jira] [Commented] (SOLR-6106) Sometimes all the cores on a SolrCloud node cannot find their config when intializing the ManagedResourceStorage storageIO impl

    [ https://issues.apache.org/jira/browse/SOLR-6106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006462#comment-14006462 ] 

Timothy Potter commented on SOLR-6106:
--------------------------------------

This looks to be caused by full GC pauses during initialization. Here's the GC log (notice the timestamp=2014-05-22T20:25:22.313+0000):
{Heap before GC invocations=2 (full 0):
garbage-first heap total 12582912K, used 7548927K [0x00000004e0000000, 0x00000007e0000000, 0x00000007e0000000)
region size 4096K, 1843 young (7548928K), 14 survivors (57344K)
compacting perm gen total 262144K, used 27401K [0x00000007e0000000, 0x00000007f0000000, 0x0000000800000000)
the space 262144K, 10% used [0x00000007e0000000, 0x00000007e1ac27f0, 0x00000007e1ac2800, 0x00000007f0000000)
No shared spaces configured.
2014-05-22T20:25:22.313+0000: 21.646: [GC pause (young)
Desired survivor size 484442112 bytes, new threshold 15 (max 15)
age 1: 21958624 bytes, 21958624 total
age 2: 19212720 bytes, 41171344 total
, 37.5640490 secs]
[Parallel Time: 37554.2 ms, GC Workers: 13]
[GC Worker Start (ms): Min: 21646.8, Avg: 21646.9, Max: 21647.1, Diff: 0.4]
[Ext Root Scanning (ms): Min: 1.9, Avg: 2.2, Max: 2.8, Diff: 0.9, Sum: 28.3]
[Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1]
[Processed Buffers: Min: 0, Avg: 3.1, Max: 25, Diff: 25, Sum: 40]
[Scan RS (ms): Min: 0.1, Avg: 0.5, Max: 0.6, Diff: 0.5, Sum: 6.2]
[Object Copy (ms): Min: 37550.4, Avg: 37550.7, Max: 37550.9, Diff: 0.5, Sum: 488159.3]
[Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 1.0]
[GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.3, Sum: 2.1]
[GC Worker Total (ms): Min: 37553.3, Avg: 37553.6, Max: 37554.0, Diff: 0.7, Sum: 488197.1]
[GC Worker End (ms): Min: 59200.4, Avg: 59200.6, Max: 59200.8, Diff: 0.4]
[Code Root Fixup: 0.6 ms]
[Clear CT: 1.3 ms]
[Other: 8.0 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 5.0 ms]
[Ref Enq: 0.1 ms]
[Free CSet: 2.1 ms]
[Eden: 7316.0M(7316.0M)->0.0B(1312.0M) Survivors: 56.0M->924.0M Heap: 7372.0M(12.0G)->1062.0M(12.0G)]
Heap after GC invocations=3 (full 0):
garbage-first heap total 12582912K, used 1087488K [0x00000004e0000000, 0x00000007e0000000, 0x00000007e0000000)
region size 4096K, 231 young (946176K), 231 survivors (946176K)
compacting perm gen total 262144K, used 27401K [0x00000007e0000000, 0x00000007f0000000, 0x0000000800000000)
the space 262144K, 10% used [0x00000007e0000000, 0x00000007e1ac27f0, 0x00000007e1ac2800, 0x00000007f0000000)
No shared spaces configured.
}
[Times: user=0.00 sys=485.51, real=37.56 secs]
In the Solr log at the time:
2014-05-22 20:25:22,312 [main-EventThread] INFO common.cloud.ZkStateReader - A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/collections/small128/state, has occurred - updating...
... ~37 second full GC pause ...
2014-05-22 20:25:59,908 [coreLoadExecutor-4-thread-5] INFO solr.update.LoggingInfoStream - [IW][coreLoadExecutor-4-thread-5]: return reader version=15668 reader=StandardDirectoryReader(segments_1r:15668:nrt _5fr(4.8):C38
49733 _2d8(4.8):C132181 _49y(4.8):C113312 _5uq(4.8):C569483 _5o1(4.8):C246764 _65f(4.8):C110028 _5x7(4.8):C72968 _694(4.8):C327388 _6ny(4.8):C313130 _6ep(4.8):C86927 _6je(4.8):C62493 _6m9(4.8):C7507 _6n6(4.8):C7995 _6qd(4.
8):C5291 _6nx(4.8):C7520 _6ph(4.8):C45366 _6oq(4.8):C15213 _6op(4.8):C6507 _6pt(4.8):C12728/293:delGen=1 _6pn(4.8):C3802 _6pj(4.8):C3837 _6pl(4.8):C3624 _6q3(4.8):C2896/200:delGen=1)
So the ZK session is now expired ... and later in the log we have:
2014-05-22 20:26:00,633 [main-EventThread] INFO common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...
2014-05-22 20:26:00,634 [coreLoadExecutor-4-thread-19] ERROR solr.rest.ManagedResourceStorage - Failed to verify znode at /configs/cloud due to: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /configs/cloud
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /configs/cloud
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
at org.apache.solr.common.cloud.SolrZkClient$4.execute(SolrZkClient.java:226)
at org.apache.solr.common.cloud.SolrZkClient$4.execute(SolrZkClient.java:223)
at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73)
at org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:223)
at org.apache.solr.rest.ManagedResourceStorage$ZooKeeperStorageIO.configure(ManagedResourceStorage.java:187)
at org.apache.solr.rest.ManagedResourceStorage.newStorageIO(ManagedResourceStorage.java:114)
at org.apache.solr.core.SolrCore.initRestManager(SolrCore.java:2339)
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:845)
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:641)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:556)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:261)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:253)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

> Sometimes all the cores on a SolrCloud node cannot find their config when intializing the ManagedResourceStorage storageIO impl
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-6106
>                 URL: https://issues.apache.org/jira/browse/SOLR-6106
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>            Reporter: Timothy Potter
>            Assignee: Timothy Potter
>            Priority: Minor
>         Attachments: SOLR-6106_prelim.patch
>
>
> Had one of my many nodes have problems initializing all cores due to the following problem. It was resolved by restarting the node (hence the minor classification).
> 2014-05-21 20:39:17,898 [coreLoadExecutor-4-thread-27] ERROR solr.core.CoreContainer  - Unable to create core: small46_shard1_replica1
> org.apache.solr.common.SolrException: Could not find config name for collection:small46
> 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:858)
> 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:641)
> 	at org.apache.solr.core.CoreContainer.create(CoreContainer.java:556)
> 	at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:261)
> 	at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:253)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.solr.common.SolrException: Could not find config name for collection:small46
> 	at org.apache.solr.rest.ManagedResourceStorage.newStorageIO(ManagedResourceStorage.java:99)
> 	at org.apache.solr.core.SolrCore.initRestManager(SolrCore.java:2339)
> 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:845)
> 	... 10 more



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org