You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2014/01/03 00:57:50 UTC

[jira] [Commented] (SOLR-5601) NPE in OverseerCollectionProcessor.migrateKey - inconsistent failure from MigrateRouteKeyTest

    [ https://issues.apache.org/jira/browse/SOLR-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13860972#comment-13860972 ] 

Hoss Man commented on SOLR-5601:
--------------------------------

The initial start of trouble seened to me an NPE in OverseerCollectionProcessor.migrateKey...

{noformat}
   [junit4]   2> 324502 T1186 oasc.ZkController.checkRecovery I am the leader, no recovery necessary
   [junit4]   2> 324502 T1186 oasc.ZkController.publish publishing core=split_shard2_temp_shard1_shard1_replica1 state=active
   [junit4]   2> 324514 T1228 C220 P42114 oasup.LogUpdateProcessor.finish [collection1] webapp=/tr_/xq path=/update params={distrib.from=http://127.0.0.1:47570/tr_/xq/collection1/&version=2&wt=javabin&update.distrib=FROMLEADER} {add=[m/1!116 (1456162963872284672)]} 0 0
   [junit4]   2> 324515 T1189 C221 P47570 oasup.LogUpdateProcessor.finish [collection1] webapp=/tr_/xq path=/update params={version=2&wt=javabin} {add=[m/1!116 (1456162963872284672)]} 0 3
   [junit4]   2> 324526 T1186 oascc.ZkStateReader.updateClusterState Updating cloud state from ZooKeeper... 
   [junit4]   2> 324527 T1186 oasc.SolrXMLCoresLocator.doPersist Persisted core descriptions to /home/hossman/lucene/dev/solr/build/solr-core/test/J2/../../../../../../../../../home/hossman/lucene/dev/solr/build/solr-core/test/J2/./org.apache.solr.cloud.MigrateRouteKeyTest-jetty1-1388705195623/solr.xml
   [junit4]   2> 324527 T1186 oass.SolrDispatchFilter.handleAdminRequest [admin] webapp=null path=/admin/cores params={numShards=1&collection.configName=conf1&version=2&wt=javabin&name=split_shard2_temp_shard1_shard1_replica1&shard=shard1&collection=split_shard2_temp_shard1&qt=/admin/cores&action=CREATE} status=0 QTime=1656 
   [junit4]   2> 324528 T1174 oasc.OverseerCollectionProcessor.createCollection Finished create command on all shards for collection: split_shard2_temp_shard1
   [junit4]   2> 324528 T1174 oasc.OverseerCollectionProcessor.migrateKey Asking source leader to wait for: split_shard2_temp_shard1_shard1_replica1 to be alive on: 127.0.0.1:47570_tr_%2Fxq
   [junit4]   2> 324528 T1174 oasc.SolrException.log ERROR Collection migrate of migrate failed:java.lang.NullPointerException
   [junit4]   2> 		at org.apache.solr.cloud.OverseerCollectionProcessor.migrateKey(OverseerCollectionProcessor.java:1208)
   [junit4]   2> 		at org.apache.solr.cloud.OverseerCollectionProcessor.migrate(OverseerCollectionProcessor.java:1102)
   [junit4]   2> 		at org.apache.solr.cloud.OverseerCollectionProcessor.processMessage(OverseerCollectionProcessor.java:252)
   [junit4]   2> 		at org.apache.solr.cloud.OverseerCollectionProcessor.run(OverseerCollectionProcessor.java:176)
   [junit4]   2> 		at java.lang.Thread.run(Thread.java:724)
   [junit4]   2> 	
   [junit4]   2> 324551 T1217 oasc.DistributedQueue$LatchChildWatcher.process LatchChildWatcher fired on path: /overseer/collection-queue-work/qnr-0000000002 state: SyncConnected type NodeDataChanged
   [junit4]   2> 324552 T1172 oasc.DistributedQueue$LatchChildWatcher.process LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged
   [junit4]   2> 324552 T1172 oasc.DistributedQueue$LatchChildWatcher.process LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged
   [junit4]   2> 324559 T1172 oasc.DistributedQueue$LatchChildWatcher.process LatchChildWatcher fired on path: /overseer/collection-queue-work state: SyncConnected type NodeChildrenChanged
   [junit4]   2> 324559 T1174 oasc.OverseerCollectionProcessor.run Overseer Collection Processor: Message id:/overseer/collection-queue-work/qn-0000000002 complete, response:{success={null={responseHeader={status=0,QTime=0},core=migrate_multipleshardtest_targetCollection_shard1_replica1,status=BUFFERING},null={responseHeader={status=0,QTime=1656},core=split_shard2_temp_shard1_shard1_replica1}},Operation migrate caused exception:=java.lang.NullPointerException,exception={msg=null,rspCode=-1}}
   [junit4]   2> 324568 T1173 oascc.ZkStateReader.updateClusterState Updating cloud state from ZooKeeper... 
   [junit4]   2> 324569 T1173 oasc.Overseer$ClusterStateUpdater.updateState Update state numShards=1 message={
   [junit4]   2> 	  "operation":"state",
   [junit4]   2> 	  "state":"active",
   [junit4]   2> 	  "base_url":"http://127.0.0.1:47570/tr_/xq",
   [junit4]   2> 	  "core":"split_shard2_temp_shard1_shard1_replica1",
   [junit4]   2> 	  "roles":null,
   [junit4]   2> 	  "node_name":"127.0.0.1:47570_tr_%2Fxq",
   [junit4]   2> 	  "shard":"shard1",
   [junit4]   2> 	  "shard_range":null,
   [junit4]   2> 	  "shard_state":"active",
   [junit4]   2> 	  "shard_parent":null,
   [junit4]   2> 	  "collection":"split_shard2_temp_shard1",
   [junit4]   2> 	  "numShards":"1",
   [junit4]   2> 	  "core_node_name":"core_node1"}
   [junit4]   2> 324570 T1209 oass.SolrDispatchFilter.handleAdminRequest [admin] webapp=null path=/admin/collections params={split.key=a/1!&version=2&wt=javabin&forward.timeout=45&collection=collection1&target.collection=migrate_multipleshardtest_targetCollection&action=MIGRATE} status=500 QTime=2211 
   [junit4]   2> 324571 T1209 oasc.SolrException.log ERROR null:org.apache.solr.common.SolrException
   [junit4]   2> 		at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:204)
   [junit4]   2> 		at org.apache.solr.handler.admin.CollectionsHandler.handleMigrate(CollectionsHandler.java:450)
   [junit4]   2> 		at org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:174)
   [junit4]   2> 		at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
   [junit4]   2> 		at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:705)
   [junit4]   2> 		at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
   [junit4]   2> 		at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:200)
   [junit4]   2> 		at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
   [junit4]   2> 		at org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:136)
   [junit4]   2> 		at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
   [junit4]   2> 		at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
   [junit4]   2> 		at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:229)
   [junit4]   2> 		at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
   [junit4]   2> 		at org.eclipse.jetty.server.handler.GzipHandler.handle(GzipHandler.java:301)
   [junit4]   2> 		at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1077)
   [junit4]   2> 		at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
   [junit4]   2> 		at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
   [junit4]   2> 		at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
   [junit4]   2> 		at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
   [junit4]   2> 		at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
   [junit4]   2> 		at org.eclipse.jetty.server.Server.handle(Server.java:368)
   [junit4]   2> 		at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
   [junit4]   2> 		at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
   [junit4]   2> 		at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
   [junit4]   2> 		at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
   [junit4]   2> 		at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
   [junit4]   2> 		at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
   [junit4]   2> 		at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628)
   [junit4]   2> 		at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
   [junit4]   2> 		at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
   [junit4]   2> 		at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
   [junit4]   2> 		at java.lang.Thread.run(Thread.java:724)
   [junit4]   2> 	
   [junit4]   2> 324572 T1249 C222 P46266 oasup.LogUpdateProcessor.finish [collection1] webapp=/tr_/xq path=/update params={distrib.from=http://127.0.0.1:52363/tr_/xq/collection1/&version=2&wt=javabin&update.distrib=FROMLEADER} {add=[n/1!117 (1456162963931004928)]} 0 0
   [junit4]   2> 324573 T1207 C223 P52363 oasup.LogUpdateProcessor.finish [collection1] webapp=/tr_/xq path=/update params={version=2&wt=javabin} {add=[n/1!117 (1456162963931004928)]} 0 5
   [junit4]   2> 324585 T1146 oasc.ChaosMonkey.monkeyLog monkey: stop shard! 36682
   [junit4]   2> 324585 T1146 oasc.CoreContainer.shutdown Shutting down CoreContainer instance=2033417453
   [junit4]   2> 324586 T1146 oasc.ZkController.publish publishing core=collection1 state=down
   [junit4]   2> 324586 T1146 oasc.ZkController.publish numShards not found on descriptor - reading it from system property
   [junit4]   2> 324617 T1172 oasc.DistributedQueue$LatchChildWatcher.process LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged
{noformat}

...which then cascaded into an "IllegalStateException: Connection pool shut down"...

{noformat}
   [junit4]   2> 324711 T1281 oascc.ZkStateReader.createClusterStateWatchersAndUpdate Updating cluster state from ZooKeeper... 
   [junit4]   2> 324718 T1197 oascc.SolrZkClient.makePath makePath: /overseer_elect/leader
   [junit4]   2> 324722 T1146 oejsh.ContextHandler.doStop stopped o.e.j.s.ServletContextHandler{/tr_/xq,null}
   [junit4]   2> 324726 T1197 oasc.Overseer.start Overseer (id=91010183481589766-127.0.0.1:47570_tr_%2Fxq-n_0000000001) starting
   [junit4]   2> 324727 T1281 oasc.MigrateRouteKeyTest$Indexer.run ERROR Exception while adding document id: id=o/1!118 org.apache.solr.client.solrj.SolrServerException: org.apache.solr.client.solrj.SolrServerException: java.lang.IllegalStateException: Connection pool shut down
   [junit4]   2> 	at org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:361)
   [junit4]   2> 	at org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:510)
   [junit4]   2> 	at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
   [junit4]   2> 	at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)
   [junit4]   2> 	at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)
   [junit4]   2> 	at org.apache.solr.cloud.MigrateRouteKeyTest$Indexer.run(MigrateRouteKeyTest.java:252)
   [junit4]   2> Caused by: org.apache.solr.client.solrj.SolrServerException: java.lang.IllegalStateException: Connection pool shut down
   [junit4]   2> 	at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:310)
   [junit4]   2> 	at org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:358)
   [junit4]   2> 	... 5 more
   [junit4]   2> Caused by: java.lang.IllegalStateException: Connection pool shut down
   [junit4]   2> 	at org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:167)
   [junit4]   2> 	at org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:199)
   [junit4]   2> 	at org.apache.http.impl.conn.PoolingClientConnectionManager.requestConnection(PoolingClientConnectionManager.java:188)
   [junit4]   2> 	at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:448)
   [junit4]   2> 	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
   [junit4]   2> 	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
   [junit4]   2> 	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
   [junit4]   2> 	at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:395)
   [junit4]   2> 	at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
   [junit4]   2> 	at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:283)
   [junit4]   2> 	... 6 more
   [junit4]   2> 
{noformat}


...lots more "IllegalStateException: Connection pool shut down" for various doc add attempts, ultimatly leading to...



{noformat}
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=MigrateRouteKeyTest -Dtests.method=testDistribSearch -Dtests.seed=26F9FF92782251B6 -Dtests.slow=true -Dtests.locale=sr_CS -Dtests.timezone=Europe/Zaporozhye -Dtests.file.encoding=UTF-8
   [junit4] ERROR   28.8s J2 | MigrateRouteKeyTest.testDistribSearch <<<
   [junit4]    > Throwable #1: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server Error
   [junit4]    > request: http://127.0.0.1:52363/tr_/xq/admin/collections?action=MIGRATE&collection=collection1&target.collection=migrate_multipleshardtest_targetCollection&split.key=a/1!&forward.timeout=45&wt=javabin&version=2
   [junit4]    > 	at __randomizedtesting.SeedInfo.seed([26F9FF92782251B6:A71F718A0F7D318A]:0)
   [junit4]    > 	at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495)
   [junit4]    > 	at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
   [junit4]    > 	at org.apache.solr.cloud.MigrateRouteKeyTest.invoke(MigrateRouteKeyTest.java:139)
   [junit4]    > 	at org.apache.solr.cloud.MigrateRouteKeyTest.invokeMigrateApi(MigrateRouteKeyTest.java:125)
   [junit4]    > 	at org.apache.solr.cloud.MigrateRouteKeyTest.multipleShardMigrateTest(MigrateRouteKeyTest.java:198)
   [junit4]    > 	at org.apache.solr.cloud.MigrateRouteKeyTest.doTest(MigrateRouteKeyTest.java:93)
   [junit4]    > 	at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:867)
   [junit4]    > 	at java.lang.Thread.run(Thread.java:724)
{noformat}


...and then...

{noformat}
   [junit4]   2> 50947 T1145 ccr.ThreadLeakControl.checkThreadLeaks SEVERE 4 threads leaked from SUITE scope at org.apache.solr.cloud.MigrateRouteKeyTest: 
   [junit4]   2> 	   1) Thread[id=1298, name=Timer-98, state=TIMED_WAITING, group=TGRP-MigrateRouteKeyTest]
   [junit4]   2> 	        at java.lang.Object.wait(Native Method)
   [junit4]   2> 	        at java.util.TimerThread.mainLoop(Timer.java:552)
   [junit4]   2> 	        at java.util.TimerThread.run(Timer.java:505)
   [junit4]   2> 	   2) Thread[id=1299, name=Thread-532-SendThread(localhost:46743), state=TIMED_WAITING, group=TGRP-MigrateRouteKeyTest]
   [junit4]   2> 	        at java.lang.Thread.sleep(Native Method)
   [junit4]   2> 	        at org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:86)
   [junit4]   2> 	        at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:937)
   [junit4]   2> 	        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:993)
   [junit4]   2> 	   3) Thread[id=1300, name=Thread-532-EventThread, state=WAITING, group=TGRP-MigrateRouteKeyTest]
   [junit4]   2> 	        at sun.misc.Unsafe.park(Native Method)
   [junit4]   2> 	        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
   [junit4]   2> 	        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
   [junit4]   2> 	        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
   [junit4]   2> 	        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:491)
   [junit4]   2> 	   4) Thread[id=1281, name=Thread-532, state=TIMED_WAITING, group=TGRP-MigrateRouteKeyTest]
   [junit4]   2> 	        at java.lang.Object.wait(Native Method)
   [junit4]   2> 	        at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:204)
   [junit4]   2> 	        at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:139)
   [junit4]   2> 	        at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:98)
   [junit4]   2> 	        at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:88)
   [junit4]   2> 	        at org.apache.solr.common.cloud.ZkStateReader.<init>(ZkStateReader.java:183)
   [junit4]   2> 	        at org.apache.solr.client.solrj.impl.CloudSolrServer.connect(CloudSolrServer.java:240)
   [junit4]   2> 	        at org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:501)
   [junit4]   2> 	        at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
   [junit4]   2> 	        at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)
   [junit4]   2> 	        at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)
   [junit4]   2> 	        at org.apache.solr.cloud.MigrateRouteKeyTest$Indexer.run(MigrateRouteKeyTest.java:252)
{noformat}




> NPE in OverseerCollectionProcessor.migrateKey - inconsistent failure from MigrateRouteKeyTest
> ---------------------------------------------------------------------------------------------
>
>                 Key: SOLR-5601
>                 URL: https://issues.apache.org/jira/browse/SOLR-5601
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Hoss Man
>
> Encountered a weird _semi-reproducible_ failure from MigrateRouteKeyTest.... 
> {code}
> ant test  -Dtestcase=MigrateRouteKeyTest -Dtests.method=testDistribSearch -Dtests.seed=26F9FF92782251B6 -Dtests.slow=true -Dtests.locale=sr_CS -Dtests.timezone=Europe/Zaporozhye -Dtests.file.encoding=UTF-8
> {code}
> Attempting to reproduce with that line caused failures the first 2 times i tried, and then succeeded every time after that.
> Details of failure to follow in comment.
> (NOTE: this may be related to some recent jenkins failures due to thread leaks from this test)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org