You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Alain Rogister <al...@gmail.com> on 2012/12/07 22:07:48 UTC

stress testing Solr 4.x

I am reporting the results of my stress tests against Solr 4.x. As I was
getting many error conditions with 4.0, I switched to the 4.1 trunk in the
hope that some of the issues would be fixed already. Here is my setup :

- Everything running on a single box (2 x 4-core CPUs, 8 GB RAM). I realize
this is not representative of a production environment but it's a fine way
to find out what happens under resource-constrained conditions.
- 3 Solr servers, 3 cores (2 of which are very small, the third one has 410
MB of data)
- single shard
- 3 Zookeeper instances
- HAProxy load balancing requests across Solr servers
- JMeter or ApacheBench running the tests : 5 thread pools of 20 threads
each, sending search requests continuously (no updates)

In nominal conditions, it all works fine i.e. it can process a million
requests, maxing out the CPUs at all time, without experiencing nasty
failures. There are errors in the logs about replication failures though;
they should be benigne in this case as no updates are taking place but it's
hard to tell what is going on exactly. Example :

Dec 07, 2012 7:50:37 PM org.apache.solr.update.PeerSync handleResponse
WARNING: PeerSync: core=adressage url=http://192.168.0.101:8983/solr
exception talking to
http://192.168.0.101:8985/solr/adressage/, failed
org.apache.solr.common.SolrException: Server at
http://192.168.0.101:8985/solr/adressage returned non ok status:404,
message:Not Found
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166)
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)

Then I simulated various failure scenarios :

- 1 Solr server stop/start
- 2 Solr servers stop/start
- 3 Solr servers stop/start : it seems that in this case, the Solr servers
*cannot* be restarted : more exactly, the restarted server will consider
that it is number 1 out of 4 and wait for the other 3 to come up. The only
way out is to stop it again, then stop all Zookeeper instances *and* clean
up their zkdata directory, start them, then start the Solr servers.

I noticed that these zkdata directory had grown to 200 MB after a while.
What exactly is in there besides the configuration data ? Does it stop
growing ?

Then I tried this :

- kill 1 Zookeeper process
- kill 2 Zookeeper processes
- stop/start 1 Solr server

When doing this, I experienced (many times) situations where the Solr
servers could not reconnect and threw scary exceptions. The only way out
was to restart the whole cluster.

Q : when, if ever, is one supposed to clean up the zkdata directories ?

Here are the errors I found in the logs. It seems that some of them have
been reported in JIRA but 4.1-trunk seems to experience basically the same
issues as 4.0 in my test scenarios.

Dec 07, 2012 8:03:59 PM org.apache.solr.update.PeerSync handleResponse
WARNING: PeerSync: core=cachede url=http://192.168.0.101:8983/solr
couldn't connect to
http://192.168.0.101:8984/solr/cachede/, counting as success
Dec 07, 2012 8:03:59 PM org.apache.solr.common.SolrException log
SEVERE: Sync request error:
org.apache.solr.client.solrj.SolrServerException: Server refused connection
at: http://192.168.0.101:8984/solr/cachede
Dec 07, 2012 8:03:59 PM org.apache.solr.common.SolrException log
SEVERE: http://192.168.0.101:8983/solr/cachede/: Could not tell a replica
to recover:org.apache.solr.client.solrj.SolrServerException: Server refused
connection at: http://192.168.0.101:8984/solr
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:406)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.http.conn.HttpHostConnectException: Connection to
http://192.168.0.101:8984 refused
at
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:158)
at
org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:150)
at
org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
at
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:575)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
... 5 more
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
at java.net.Socket.connect(Socket.java:579)
at
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:123)
at
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148)
... 13 more

Dec 07, 2012 8:03:59 PM org.apache.solr.update.PeerSync handleResponse
WARNING: PeerSync: core=adressage url=http://192.168.0.101:8983/solr  got a
404 from http://192.168.0.101:8985/solr/adressage/, counting as success
Dec 07, 2012 8:03:59 PM org.apache.solr.common.SolrException log
SEVERE: Sync request error: org.apache.solr.common.SolrException: Server at
http://192.168.0.101:8985/solr/adressage returned non ok status:404,
message:Not Found
Dec 07, 2012 8:04:00 PM org.apache.solr.update.PeerSync handleResponse
WARNING: PeerSync: core=formabanque url=http://192.168.0.101:8983/solr  got
a 404 from http://192.168.0.101:8985/solr/formabanque/, counting as success
Dec 07, 2012 8:04:00 PM org.apache.solr.common.SolrException log
SEVERE: Sync request error: org.apache.solr.common.SolrException: Server at
http://192.168.0.101:8985/solr/formabanque returned non ok status:404,
message:Not Found

Dec 07, 2012 8:04:32 PM org.apache.solr.update.PeerSync sync
WARNING: no frame of reference to tell of we've missed updates

Dec 07, 2012 8:03:58 PM org.apache.solr.common.SolrException log
SEVERE: Error while trying to
recover:org.apache.solr.client.solrj.SolrServerException: Server refused
connection at: http://192.168.0.101:8984/solr/adressage
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:406)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
at
org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:182)
at
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:134)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:407)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222)
Caused by: org.apache.http.conn.HttpHostConnectException: Connection to
http://192.168.0.101:8984 refused
at
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:158)
at
org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:150)
at
org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
at
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:575)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
... 6 more
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
at java.net.Socket.connect(Socket.java:579)
at
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:123)
at
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148)
... 14 more

Dec 07, 2012 8:03:58 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
SEVERE: Recovery failed - trying again... (0) core=adressage

SEVERE: Error getting leader from zk
org.apache.solr.common.SolrException: Could not get leader props
at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:735)
at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:699)
at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:664)
at org.apache.solr.cloud.ZkController.register(ZkController.java:603)
at org.apache.solr.cloud.ZkController.register(ZkController.java:558)
at org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:791)
at org.apache.solr.core.CoreContainer.register(CoreContainer.java:775)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:567)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:562)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for /collections/adressage/leaders/shard1
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:244)
at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:241)
at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63)
at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:241)
at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:713)
... 16 more

Dec 07, 2012 4:39:23 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: no servers hosting shard:
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:159)
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)

Re: stress testing Solr 4.x

Posted by Alain Rogister <al...@gmail.com>.
Hi Mark,

Usually I was stopping them with ctrl-c but several times, one of the
servers was hung and had to be stopped with kill -9.

Thanks,

Alain

On Mon, Dec 10, 2012 at 5:09 AM, Mark Miller <ma...@gmail.com> wrote:

> Hmmm...EOF on the segments file is odd...
>
> How were you killing the nodes? Just stopping them or kill -9 or what?
>
> - Mark
>
> On Sun, Dec 9, 2012 at 1:37 PM, Alain Rogister <al...@gmail.com>
> wrote:
> > Hi,
> >
> > I have re-ran my tests today after I updated Solr 4.1 to apply the patch.
> >
> > First, the good news : it works i.e. if I stop all three Solr servers and
> > then restart one, it will try to find the other two for a while (about 3
> > minutes I think) then give up, become the leader and start processing
> > requests.
> >
> > Now, the not-so-good : I encountered several exceptions that seem to
> > indicate 2 other issues. Here are the relevant bits.
> >
> > 1) The ZK session expiry problem : not sure what caused it but I did a
> few
> > Solr or ZK node restarts while the system was under load.
> >
> > SEVERE: There was a problem finding the leader in
> > zk:org.apache.solr.common.SolrException: Could not get leader props
> > at
> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:732)
> > at
> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:696)
> > at
> >
> org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1095)
> > at
> >
> org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:265)
> > at org.apache.solr.cloud.ZkController.access$100(ZkController.java:84)
> > at org.apache.solr.cloud.ZkController$1.command(ZkController.java:184)
> > at
> >
> org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:116)
> > at
> >
> org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
> > at
> >
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:90)
> > at
> >
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> > Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:
> > KeeperErrorCode = Session expired for
> /collections/adressage/leaders/shard1
> > at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
> > at
> >
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:244)
> > at
> >
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:241)
> > at
> >
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63)
> > at
> org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:241)
> > at
> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:710)
> > ... 10 more
> > SEVERE: :org.apache.zookeeper.KeeperException$SessionExpiredException:
> > KeeperErrorCode = Session expired for /overseer/queue/qn-
> > at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> > at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
> > at
> >
> org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:210)
> > at
> >
> org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:207)
> > at
> >
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63)
> > at
> org.apache.solr.common.cloud.SolrZkClient.create(SolrZkClient.java:207)
> > at
> org.apache.solr.cloud.DistributedQueue.offer(DistributedQueue.java:229)
> > at org.apache.solr.cloud.ZkController.publish(ZkController.java:824)
> > at org.apache.solr.cloud.ZkController.publish(ZkController.java:797)
> > at
> >
> org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:258)
> > at org.apache.solr.cloud.ZkController.access$100(ZkController.java:84)
> > at org.apache.solr.cloud.ZkController$1.command(ZkController.java:184)
> > at
> >
> org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:116)
> > at
> >
> org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
> > at
> >
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:90)
> > at
> >
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> >
> > 2) Data corruption of 1 core on 2 out of 3 Solr servers. This core failed
> > to start due to the exceptions below and both servers went into a
> seemingly
> > endless loop of exponential retries. The fix was to stop both faulty
> > servers, remove the data directory of this core and restart : replication
> > then took place correctly. As above, not sure what exactly caused this to
> > happen; no updates were taking place, only searches.
> >
> > On server 1 :
> >
> > INFO: Closing
> >
> directory:/Users/arogister/Dev/apache-solr-4.1-branch/solr/forem/solr/formabanque/data/index.20121209152525785
> > Dec 09, 2012 3:25:25 PM org.apache.solr.common.SolrException log
> > SEVERE: SnapPull failed :org.apache.solr.common.SolrException: Index
> fetch
> > failed :
> > at
> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:400)
> > at
> >
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:297)
> > at
> >
> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:152)
> > at
> >
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:407)
> > at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222)
> > Caused by: java.io.EOFException: read past EOF:
> >
> NIOFSIndexInput(path="/Users/arogister/Dev/apache-solr-4.1-branch/solr/forem/solr/formabanque/data/index.20121209152525785/segments_2d")
> > at
> >
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:266)
> > at
> >
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:51)
> > at
> >
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:41)
> > at org.apache.lucene.store.DataInput.readInt(DataInput.java:84)
> > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:287)
> > at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:347)
> > at
> >
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783)
> > at
> >
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630)
> > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343)
> > at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:647)
> > at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:75)
> > at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:62)
> > at
> >
> org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:173)
> > at
> >
> org.apache.solr.update.DefaultSolrCoreState.newIndexWriter(DefaultSolrCoreState.java:155)
> > at
> >
> org.apache.solr.update.DirectUpdateHandler2.newIndexWriter(DirectUpdateHandler2.java:609)
> > at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:538)
> > at
> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:387)
> > ... 4 more
> >
> > Dec 09, 2012 3:25:25 PM org.apache.solr.common.SolrException log
> > SEVERE: Error while trying to
> recover:org.apache.solr.common.SolrException:
> > Replication for recovery failed.
> > at
> >
> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:155)
> > at
> >
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:407)
> > at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222)
> >
> > Dec 09, 2012 3:25:30 PM org.apache.solr.common.SolrException log
> > SEVERE: Error rolling back old IndexWriter.
> > core=formabanque:org.apache.lucene.store.AlreadyClosedException: this
> > IndexWriter is closed
> > at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:557)
> > at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:571)
> > at org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:1922)
> > at
> org.apache.solr.update.SolrIndexWriter.rollback(SolrIndexWriter.java:159)
> > at
> >
> org.apache.solr.update.DefaultSolrCoreState.newIndexWriter(DefaultSolrCoreState.java:148)
> > at
> >
> org.apache.solr.update.DirectUpdateHandler2.newIndexWriter(DirectUpdateHandler2.java:609)
> > at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:538)
> > at
> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:387)
> > at
> >
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:297)
> > at
> >
> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:152)
> > at
> >
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:407)
> > at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222)
> >
> > INFO: Wait 8.0 seconds before trying to recover again (3)
> > Dec 09, 2012 3:35:27 PM org.apache.solr.common.SolrException log
> > SEVERE: org.apache.solr.common.SolrException: Error handling 'status'
> > action
> > at
> >
> org.apache.solr.handler.admin.CoreAdminHandler.handleStatusAction(CoreAdminHandler.java:714)
> > at
> >
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:157)
> > at
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:145)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:372)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:181)
> > at
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
> > at
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
> > at
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
> > at
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
> > at
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
> > at
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
> > at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
> > at
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
> > at
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
> > at
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
> > at
> >
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
> > at
> >
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
> > at
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
> > at org.eclipse.jetty.server.Server.handle(Server.java:351)
> > at
> >
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
> > at
> >
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
> > at
> >
> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:890)
> > at
> >
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:944)
> > at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:634)
> > at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230)
> > at
> >
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
> > at
> >
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
> > at
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
> > at
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
> > at java.lang.Thread.run(Thread.java:722)
> > Caused by: org.apache.lucene.store.AlreadyClosedException: this Directory
> > is closed
> > at org.apache.lucene.store.Directory.ensureOpen(Directory.java:255)
> > at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:239)
> > at
> >
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:679)
> > at
> >
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630)
> > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343)
> > at
> >
> org.apache.lucene.index.StandardDirectoryReader.isCurrent(StandardDirectoryReader.java:326)
> > at
> >
> org.apache.solr.handler.admin.LukeRequestHandler.getIndexInfo(LukeRequestHandler.java:553)
> > at
> >
> org.apache.solr.handler.admin.CoreAdminHandler.getCoreStatus(CoreAdminHandler.java:988)
> > at
> >
> org.apache.solr.handler.admin.CoreAdminHandler.handleStatusAction(CoreAdminHandler.java:700)
> > ... 29 more
> >
> > And on server 2 :
> >
> > SEVERE: Timeout waiting for all directory ref counts to be released
> > Dec 09, 2012 3:35:12 PM org.apache.solr.core.CoreContainer create
> > SEVERE: Unable to create core: formabanque
> > org.apache.solr.common.SolrException: Error opening new searcher
> > at org.apache.solr.core.SolrCore.<init>(SolrCore.java:730)
> > at org.apache.solr.core.SolrCore.<init>(SolrCore.java:573)
> > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:929)
> > at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:566)
> > at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:562)
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> > at java.lang.Thread.run(Thread.java:722)
> > Caused by: org.apache.solr.common.SolrException: Error opening new
> searcher
> > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1364)
> > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1476)
> > at org.apache.solr.core.SolrCore.<init>(SolrCore.java:705)
> > ... 12 more
> > Caused by: java.io.EOFException: read past EOF:
> >
> NIOFSIndexInput(path="/Users/arogister/Dev/apache-solr-4.1-branch/solr/forem2/solr/formabanque/data/index.20121209152951621/segments_2q")
> > at
> >
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:266)
> > at
> >
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:51)
> > at
> >
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:41)
> > at org.apache.lucene.store.DataInput.readInt(DataInput.java:84)
> > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:287)
> > at
> >
> org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:56)
> > at
> >
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783)
> > at
> >
> org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
> > at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:87)
> > at
> >
> org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:34)
> > at
> >
> org.apache.solr.search.SolrIndexSearcher.<init>(SolrIndexSearcher.java:119)
> > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1340)
> > ... 14 more
> >
> > Dec 09, 2012 3:35:12 PM org.apache.solr.common.SolrException log
> > SEVERE: null:org.apache.solr.common.SolrException: Error opening new
> > searcher
> > at org.apache.solr.core.SolrCore.<init>(SolrCore.java:730)
> > at org.apache.solr.core.SolrCore.<init>(SolrCore.java:573)
> > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:929)
> > at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:566)
> > at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:562)
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> > at java.lang.Thread.run(Thread.java:722)
> > Caused by: org.apache.solr.common.SolrException: Error opening new
> searcher
> > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1364)
> > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1476)
> > at org.apache.solr.core.SolrCore.<init>(SolrCore.java:705)
> > ... 12 more
> > Caused by: java.io.EOFException: read past EOF:
> >
> NIOFSIndexInput(path="/Users/arogister/Dev/apache-solr-4.1-branch/solr/forem2/solr/formabanque/data/index.20121209152951621/segments_2q")
> > at
> >
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:266)
> > at
> >
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:51)
> > at
> >
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:41)
> > at org.apache.lucene.store.DataInput.readInt(DataInput.java:84)
> > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:287)
> > at
> >
> org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:56)
> > at
> >
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783)
> > at
> >
> org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
> > at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:87)
> > at
> >
> org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:34)
> > at
> >
> org.apache.solr.search.SolrIndexSearcher.<init>(SolrIndexSearcher.java:119)
> > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1340)
> > ... 14 more
> >
> > Thanks,
> >
> > Alain
> >
> > On Sun, Dec 9, 2012 at 1:03 AM, Mark Miller <ma...@gmail.com>
> wrote:
> >
> >> No problem!
> >>
> >> Here is the JIRA issue: https://issues.apache.org/jira/browse/SOLR-4158
> >>
> >> - Mark
> >>
> >> On Sat, Dec 8, 2012 at 6:03 PM, Alain Rogister <
> alain.rogister@gmail.com>
> >> wrote:
> >> > Great, thanks Mark ! I'll test the fix and post my results.
> >> >
> >> > Alain
> >> >
> >> > On Saturday, December 8, 2012, Mark Miller wrote:
> >> >
> >> >> After some more playing around on 5x I have duplicated the issue.
> I'll
> >> >> file a JIRA issue for you and fix it shortly.
> >> >>
> >> >> - Mark
> >> >>
> >> >> On Dec 8, 2012, at 8:43 AM, Mark Miller <ma...@gmail.com>
> wrote:
> >> >>
> >> >> > Hmm…I've tried to replicate what looked like a bug from your
> report (3
> >> >> Solr servers stop/start ), but on 5x it works no problem for me. It
> >> >> shouldn't be any different on 4x, but I'll try that next.
> >> >> >
> >> >> > In terms of starting up Solr without a working ZooKeeper ensemble
> - it
> >> >> won't work currently. Cores won't be able to register with ZooKeeper
> and
> >> >> will fail loading. It would probably be nicer to come up in search
> only
> >> >> mode and keep trying to reconnect to zookeeper - file a JIRA issue if
> >> you
> >> >> are interested.
> >> >> >
> >> >> > On the zk data dir, see
> >> >>
> >>
> http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html#Ongoing+Data+Directory+Cleanup
> >> >> >
> >> >> > - Mark
> >> >> >
> >> >> > On Dec 7, 2012, at 10:22 PM, Mark Miller <ma...@gmail.com>
> >> wrote:
> >> >> >
> >> >> >> Hey, I'll try and answer this tomorrow.
> >> >> >>
> >> >> >> There is a def an unreported bug in there that needs to be fixed
> for
> >> >> the restarting the all nodes case.
> >> >> >>
> >> >> >> Also, a 404 one is generally when jetty is starting or stopping -
> >> there
> >> >> are points where 404's can be returned. I'm not sure why else you'd
> see
> >> >> one. Generally we do retries when that happens.
> >> >> >>
> >> >> >> - Mark
> >> >> >>
> >> >> >> On Dec 7, 2012, at 1:07 PM, Alain Rogister <
> alain.rogister@gmail.com
> >> >
> >> >> wrote:
> >> >> >>
> >> >> >>> I am reporting the results of my stress tests against Solr 4.x.
> As I
> >> >> was
> >> >> >>> getting many error conditions with 4.0, I switched to the 4.1
> trunk
> >> in
> >> >> the
> >> >> >>> hope that some of the issues would be fixed already. Here is my
> >> setup :
> >> >> >>>
> >> >> >>> - Everything running on a single box (2 x 4-core CPUs, 8 GB
> RAM). I
> >> >> realize
> >> >> >>> this is not representative of a production environment but it's a
> >> fine
> >> >> way
> >> >> >>> to find out what happens under resource-constrained conditions.
> >> >> >>> - 3 Solr servers, 3 cores (2 of which are very small, the third
> one
> >> >> has 410
> >> >> >>> MB of data)
> >> >> >>> - single shard
> >> >> >>> - 3 Zookeeper instances
> >> >> >>> - HAProxy load balancing requests across Solr servers
> >> >> >>> - JMeter or ApacheBench running the tests : 5 thread pools of 20
> >> >> threads
> >> >> >>> each, sending search requests continuously (no updates)
> >> >> >>>
> >> >> >>> In nominal conditions, it all works fine i.e. it can process a
> >> million
> >> >> >>> requests, maxing out the CPUs at all time, without experiencing
> >> nasty
> >> >> >>> failures. There are errors in the logs about replication failures
> >> >> though;
> >> >> >>> they should be benigne in this case as no updates are taking
> place
> >> but
> >> >> it's
> >> >> >>> hard to tell what is going on exactly. Example :
> >> >> >>>
> >> >> >>> Dec 07, 2012 7:50:37 PM org.apache.solr.update.PeerSync
> >> handleResponse
> >> >> >>> WARNING: PeerSync: core=adressage url=
> >> http://192.168.0.101:8983/solr
> >> >> >>> exception talking to
> >> >> >>> http://192.168.0.101:8985/solr/adressage/, failed
> >> >> >>> org.apache.solr.common.SolrException: Server at
> >> >> >>> http://192.168.0.101:8985/solr/adressage returned non ok
> >> status:404,
> >> >> >>> message:Not Found
> >> >> >>> at
> >> >> >>>
> >> >>
> >>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
> >> >> >>> at
> >> >> >>>
> >> >>
> >>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> >> >> >>> at
> >> >> >>>
> >> >>
> >>
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166)
> >> >> >>> at
> >> >> >>>
> >> >>
> >>
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133)
> >> >> >>> at
> >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >> >> >>> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >> >> >>> at
> >> >>
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >> >> >>> at
> >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >> >> >>> at java.util.concurrent.FutureTask.run(FutureTask.
> >>
> >>
> >>
> >> --
> >> - Mark
> >>
>
>
>
> --
> - Mark
>

Re: stress testing Solr 4.x

Posted by Mark Miller <ma...@gmail.com>.
Hmmm...EOF on the segments file is odd...

How were you killing the nodes? Just stopping them or kill -9 or what?

- Mark

On Sun, Dec 9, 2012 at 1:37 PM, Alain Rogister <al...@gmail.com> wrote:
> Hi,
>
> I have re-ran my tests today after I updated Solr 4.1 to apply the patch.
>
> First, the good news : it works i.e. if I stop all three Solr servers and
> then restart one, it will try to find the other two for a while (about 3
> minutes I think) then give up, become the leader and start processing
> requests.
>
> Now, the not-so-good : I encountered several exceptions that seem to
> indicate 2 other issues. Here are the relevant bits.
>
> 1) The ZK session expiry problem : not sure what caused it but I did a few
> Solr or ZK node restarts while the system was under load.
>
> SEVERE: There was a problem finding the leader in
> zk:org.apache.solr.common.SolrException: Could not get leader props
> at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:732)
> at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:696)
> at
> org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1095)
> at
> org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:265)
> at org.apache.solr.cloud.ZkController.access$100(ZkController.java:84)
> at org.apache.solr.cloud.ZkController$1.command(ZkController.java:184)
> at
> org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:116)
> at
> org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
> at
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:90)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for /collections/adressage/leaders/shard1
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
> at
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:244)
> at
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:241)
> at
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63)
> at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:241)
> at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:710)
> ... 10 more
> SEVERE: :org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for /overseer/queue/qn-
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
> at
> org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:210)
> at
> org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:207)
> at
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63)
> at org.apache.solr.common.cloud.SolrZkClient.create(SolrZkClient.java:207)
> at org.apache.solr.cloud.DistributedQueue.offer(DistributedQueue.java:229)
> at org.apache.solr.cloud.ZkController.publish(ZkController.java:824)
> at org.apache.solr.cloud.ZkController.publish(ZkController.java:797)
> at
> org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:258)
> at org.apache.solr.cloud.ZkController.access$100(ZkController.java:84)
> at org.apache.solr.cloud.ZkController$1.command(ZkController.java:184)
> at
> org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:116)
> at
> org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
> at
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:90)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>
> 2) Data corruption of 1 core on 2 out of 3 Solr servers. This core failed
> to start due to the exceptions below and both servers went into a seemingly
> endless loop of exponential retries. The fix was to stop both faulty
> servers, remove the data directory of this core and restart : replication
> then took place correctly. As above, not sure what exactly caused this to
> happen; no updates were taking place, only searches.
>
> On server 1 :
>
> INFO: Closing
> directory:/Users/arogister/Dev/apache-solr-4.1-branch/solr/forem/solr/formabanque/data/index.20121209152525785
> Dec 09, 2012 3:25:25 PM org.apache.solr.common.SolrException log
> SEVERE: SnapPull failed :org.apache.solr.common.SolrException: Index fetch
> failed :
> at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:400)
> at
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:297)
> at
> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:152)
> at
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:407)
> at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222)
> Caused by: java.io.EOFException: read past EOF:
> NIOFSIndexInput(path="/Users/arogister/Dev/apache-solr-4.1-branch/solr/forem/solr/formabanque/data/index.20121209152525785/segments_2d")
> at
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:266)
> at
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:51)
> at
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:41)
> at org.apache.lucene.store.DataInput.readInt(DataInput.java:84)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:287)
> at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:347)
> at
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783)
> at
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343)
> at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:647)
> at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:75)
> at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:62)
> at
> org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:173)
> at
> org.apache.solr.update.DefaultSolrCoreState.newIndexWriter(DefaultSolrCoreState.java:155)
> at
> org.apache.solr.update.DirectUpdateHandler2.newIndexWriter(DirectUpdateHandler2.java:609)
> at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:538)
> at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:387)
> ... 4 more
>
> Dec 09, 2012 3:25:25 PM org.apache.solr.common.SolrException log
> SEVERE: Error while trying to recover:org.apache.solr.common.SolrException:
> Replication for recovery failed.
> at
> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:155)
> at
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:407)
> at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222)
>
> Dec 09, 2012 3:25:30 PM org.apache.solr.common.SolrException log
> SEVERE: Error rolling back old IndexWriter.
> core=formabanque:org.apache.lucene.store.AlreadyClosedException: this
> IndexWriter is closed
> at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:557)
> at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:571)
> at org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:1922)
> at org.apache.solr.update.SolrIndexWriter.rollback(SolrIndexWriter.java:159)
> at
> org.apache.solr.update.DefaultSolrCoreState.newIndexWriter(DefaultSolrCoreState.java:148)
> at
> org.apache.solr.update.DirectUpdateHandler2.newIndexWriter(DirectUpdateHandler2.java:609)
> at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:538)
> at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:387)
> at
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:297)
> at
> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:152)
> at
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:407)
> at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222)
>
> INFO: Wait 8.0 seconds before trying to recover again (3)
> Dec 09, 2012 3:35:27 PM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error handling 'status'
> action
> at
> org.apache.solr.handler.admin.CoreAdminHandler.handleStatusAction(CoreAdminHandler.java:714)
> at
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:157)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:145)
> at
> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:372)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:181)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
> at org.eclipse.jetty.server.Server.handle(Server.java:351)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:890)
> at
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:944)
> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:634)
> at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
> at
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: org.apache.lucene.store.AlreadyClosedException: this Directory
> is closed
> at org.apache.lucene.store.Directory.ensureOpen(Directory.java:255)
> at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:239)
> at
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:679)
> at
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343)
> at
> org.apache.lucene.index.StandardDirectoryReader.isCurrent(StandardDirectoryReader.java:326)
> at
> org.apache.solr.handler.admin.LukeRequestHandler.getIndexInfo(LukeRequestHandler.java:553)
> at
> org.apache.solr.handler.admin.CoreAdminHandler.getCoreStatus(CoreAdminHandler.java:988)
> at
> org.apache.solr.handler.admin.CoreAdminHandler.handleStatusAction(CoreAdminHandler.java:700)
> ... 29 more
>
> And on server 2 :
>
> SEVERE: Timeout waiting for all directory ref counts to be released
> Dec 09, 2012 3:35:12 PM org.apache.solr.core.CoreContainer create
> SEVERE: Unable to create core: formabanque
> org.apache.solr.common.SolrException: Error opening new searcher
> at org.apache.solr.core.SolrCore.<init>(SolrCore.java:730)
> at org.apache.solr.core.SolrCore.<init>(SolrCore.java:573)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:929)
> at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:566)
> at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:562)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: org.apache.solr.common.SolrException: Error opening new searcher
> at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1364)
> at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1476)
> at org.apache.solr.core.SolrCore.<init>(SolrCore.java:705)
> ... 12 more
> Caused by: java.io.EOFException: read past EOF:
> NIOFSIndexInput(path="/Users/arogister/Dev/apache-solr-4.1-branch/solr/forem2/solr/formabanque/data/index.20121209152951621/segments_2q")
> at
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:266)
> at
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:51)
> at
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:41)
> at org.apache.lucene.store.DataInput.readInt(DataInput.java:84)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:287)
> at
> org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:56)
> at
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783)
> at
> org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
> at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:87)
> at
> org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:34)
> at
> org.apache.solr.search.SolrIndexSearcher.<init>(SolrIndexSearcher.java:119)
> at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1340)
> ... 14 more
>
> Dec 09, 2012 3:35:12 PM org.apache.solr.common.SolrException log
> SEVERE: null:org.apache.solr.common.SolrException: Error opening new
> searcher
> at org.apache.solr.core.SolrCore.<init>(SolrCore.java:730)
> at org.apache.solr.core.SolrCore.<init>(SolrCore.java:573)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:929)
> at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:566)
> at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:562)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: org.apache.solr.common.SolrException: Error opening new searcher
> at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1364)
> at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1476)
> at org.apache.solr.core.SolrCore.<init>(SolrCore.java:705)
> ... 12 more
> Caused by: java.io.EOFException: read past EOF:
> NIOFSIndexInput(path="/Users/arogister/Dev/apache-solr-4.1-branch/solr/forem2/solr/formabanque/data/index.20121209152951621/segments_2q")
> at
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:266)
> at
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:51)
> at
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:41)
> at org.apache.lucene.store.DataInput.readInt(DataInput.java:84)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:287)
> at
> org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:56)
> at
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783)
> at
> org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
> at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:87)
> at
> org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:34)
> at
> org.apache.solr.search.SolrIndexSearcher.<init>(SolrIndexSearcher.java:119)
> at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1340)
> ... 14 more
>
> Thanks,
>
> Alain
>
> On Sun, Dec 9, 2012 at 1:03 AM, Mark Miller <ma...@gmail.com> wrote:
>
>> No problem!
>>
>> Here is the JIRA issue: https://issues.apache.org/jira/browse/SOLR-4158
>>
>> - Mark
>>
>> On Sat, Dec 8, 2012 at 6:03 PM, Alain Rogister <al...@gmail.com>
>> wrote:
>> > Great, thanks Mark ! I'll test the fix and post my results.
>> >
>> > Alain
>> >
>> > On Saturday, December 8, 2012, Mark Miller wrote:
>> >
>> >> After some more playing around on 5x I have duplicated the issue. I'll
>> >> file a JIRA issue for you and fix it shortly.
>> >>
>> >> - Mark
>> >>
>> >> On Dec 8, 2012, at 8:43 AM, Mark Miller <ma...@gmail.com> wrote:
>> >>
>> >> > Hmm…I've tried to replicate what looked like a bug from your report (3
>> >> Solr servers stop/start ), but on 5x it works no problem for me. It
>> >> shouldn't be any different on 4x, but I'll try that next.
>> >> >
>> >> > In terms of starting up Solr without a working ZooKeeper ensemble - it
>> >> won't work currently. Cores won't be able to register with ZooKeeper and
>> >> will fail loading. It would probably be nicer to come up in search only
>> >> mode and keep trying to reconnect to zookeeper - file a JIRA issue if
>> you
>> >> are interested.
>> >> >
>> >> > On the zk data dir, see
>> >>
>> http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html#Ongoing+Data+Directory+Cleanup
>> >> >
>> >> > - Mark
>> >> >
>> >> > On Dec 7, 2012, at 10:22 PM, Mark Miller <ma...@gmail.com>
>> wrote:
>> >> >
>> >> >> Hey, I'll try and answer this tomorrow.
>> >> >>
>> >> >> There is a def an unreported bug in there that needs to be fixed for
>> >> the restarting the all nodes case.
>> >> >>
>> >> >> Also, a 404 one is generally when jetty is starting or stopping -
>> there
>> >> are points where 404's can be returned. I'm not sure why else you'd see
>> >> one. Generally we do retries when that happens.
>> >> >>
>> >> >> - Mark
>> >> >>
>> >> >> On Dec 7, 2012, at 1:07 PM, Alain Rogister <alain.rogister@gmail.com
>> >
>> >> wrote:
>> >> >>
>> >> >>> I am reporting the results of my stress tests against Solr 4.x. As I
>> >> was
>> >> >>> getting many error conditions with 4.0, I switched to the 4.1 trunk
>> in
>> >> the
>> >> >>> hope that some of the issues would be fixed already. Here is my
>> setup :
>> >> >>>
>> >> >>> - Everything running on a single box (2 x 4-core CPUs, 8 GB RAM). I
>> >> realize
>> >> >>> this is not representative of a production environment but it's a
>> fine
>> >> way
>> >> >>> to find out what happens under resource-constrained conditions.
>> >> >>> - 3 Solr servers, 3 cores (2 of which are very small, the third one
>> >> has 410
>> >> >>> MB of data)
>> >> >>> - single shard
>> >> >>> - 3 Zookeeper instances
>> >> >>> - HAProxy load balancing requests across Solr servers
>> >> >>> - JMeter or ApacheBench running the tests : 5 thread pools of 20
>> >> threads
>> >> >>> each, sending search requests continuously (no updates)
>> >> >>>
>> >> >>> In nominal conditions, it all works fine i.e. it can process a
>> million
>> >> >>> requests, maxing out the CPUs at all time, without experiencing
>> nasty
>> >> >>> failures. There are errors in the logs about replication failures
>> >> though;
>> >> >>> they should be benigne in this case as no updates are taking place
>> but
>> >> it's
>> >> >>> hard to tell what is going on exactly. Example :
>> >> >>>
>> >> >>> Dec 07, 2012 7:50:37 PM org.apache.solr.update.PeerSync
>> handleResponse
>> >> >>> WARNING: PeerSync: core=adressage url=
>> http://192.168.0.101:8983/solr
>> >> >>> exception talking to
>> >> >>> http://192.168.0.101:8985/solr/adressage/, failed
>> >> >>> org.apache.solr.common.SolrException: Server at
>> >> >>> http://192.168.0.101:8985/solr/adressage returned non ok
>> status:404,
>> >> >>> message:Not Found
>> >> >>> at
>> >> >>>
>> >>
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
>> >> >>> at
>> >> >>>
>> >>
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>> >> >>> at
>> >> >>>
>> >>
>> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166)
>> >> >>> at
>> >> >>>
>> >>
>> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133)
>> >> >>> at
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> >> >>> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> >> >>> at
>> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> >> >>> at
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> >> >>> at java.util.concurrent.FutureTask.run(FutureTask.
>>
>>
>>
>> --
>> - Mark
>>



-- 
- Mark

Re: stress testing Solr 4.x

Posted by Alain Rogister <al...@gmail.com>.
Hi,

I have re-ran my tests today after I updated Solr 4.1 to apply the patch.

First, the good news : it works i.e. if I stop all three Solr servers and
then restart one, it will try to find the other two for a while (about 3
minutes I think) then give up, become the leader and start processing
requests.

Now, the not-so-good : I encountered several exceptions that seem to
indicate 2 other issues. Here are the relevant bits.

1) The ZK session expiry problem : not sure what caused it but I did a few
Solr or ZK node restarts while the system was under load.

SEVERE: There was a problem finding the leader in
zk:org.apache.solr.common.SolrException: Could not get leader props
at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:732)
at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:696)
at
org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1095)
at
org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:265)
at org.apache.solr.cloud.ZkController.access$100(ZkController.java:84)
at org.apache.solr.cloud.ZkController$1.command(ZkController.java:184)
at
org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:116)
at
org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
at
org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:90)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /collections/adressage/leaders/shard1
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:244)
at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:241)
at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63)
at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:241)
at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:710)
... 10 more
SEVERE: :org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /overseer/queue/qn-
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
at
org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:210)
at
org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:207)
at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63)
at org.apache.solr.common.cloud.SolrZkClient.create(SolrZkClient.java:207)
at org.apache.solr.cloud.DistributedQueue.offer(DistributedQueue.java:229)
at org.apache.solr.cloud.ZkController.publish(ZkController.java:824)
at org.apache.solr.cloud.ZkController.publish(ZkController.java:797)
at
org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:258)
at org.apache.solr.cloud.ZkController.access$100(ZkController.java:84)
at org.apache.solr.cloud.ZkController$1.command(ZkController.java:184)
at
org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:116)
at
org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
at
org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:90)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)

2) Data corruption of 1 core on 2 out of 3 Solr servers. This core failed
to start due to the exceptions below and both servers went into a seemingly
endless loop of exponential retries. The fix was to stop both faulty
servers, remove the data directory of this core and restart : replication
then took place correctly. As above, not sure what exactly caused this to
happen; no updates were taking place, only searches.

On server 1 :

INFO: Closing
directory:/Users/arogister/Dev/apache-solr-4.1-branch/solr/forem/solr/formabanque/data/index.20121209152525785
Dec 09, 2012 3:25:25 PM org.apache.solr.common.SolrException log
SEVERE: SnapPull failed :org.apache.solr.common.SolrException: Index fetch
failed :
at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:400)
at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:297)
at
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:152)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:407)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222)
Caused by: java.io.EOFException: read past EOF:
NIOFSIndexInput(path="/Users/arogister/Dev/apache-solr-4.1-branch/solr/forem/solr/formabanque/data/index.20121209152525785/segments_2d")
at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:266)
at
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:51)
at
org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:41)
at org.apache.lucene.store.DataInput.readInt(DataInput.java:84)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:287)
at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:347)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343)
at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:647)
at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:75)
at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:62)
at
org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:173)
at
org.apache.solr.update.DefaultSolrCoreState.newIndexWriter(DefaultSolrCoreState.java:155)
at
org.apache.solr.update.DirectUpdateHandler2.newIndexWriter(DirectUpdateHandler2.java:609)
at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:538)
at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:387)
... 4 more

Dec 09, 2012 3:25:25 PM org.apache.solr.common.SolrException log
SEVERE: Error while trying to recover:org.apache.solr.common.SolrException:
Replication for recovery failed.
at
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:155)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:407)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222)

Dec 09, 2012 3:25:30 PM org.apache.solr.common.SolrException log
SEVERE: Error rolling back old IndexWriter.
core=formabanque:org.apache.lucene.store.AlreadyClosedException: this
IndexWriter is closed
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:557)
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:571)
at org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:1922)
at org.apache.solr.update.SolrIndexWriter.rollback(SolrIndexWriter.java:159)
at
org.apache.solr.update.DefaultSolrCoreState.newIndexWriter(DefaultSolrCoreState.java:148)
at
org.apache.solr.update.DirectUpdateHandler2.newIndexWriter(DirectUpdateHandler2.java:609)
at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:538)
at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:387)
at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:297)
at
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:152)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:407)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222)

INFO: Wait 8.0 seconds before trying to recover again (3)
Dec 09, 2012 3:35:27 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error handling 'status'
action
at
org.apache.solr.handler.admin.CoreAdminHandler.handleStatusAction(CoreAdminHandler.java:714)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:157)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:145)
at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:372)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:181)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
at org.eclipse.jetty.server.Server.handle(Server.java:351)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:890)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:944)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:634)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.lucene.store.AlreadyClosedException: this Directory
is closed
at org.apache.lucene.store.Directory.ensureOpen(Directory.java:255)
at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:239)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:679)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343)
at
org.apache.lucene.index.StandardDirectoryReader.isCurrent(StandardDirectoryReader.java:326)
at
org.apache.solr.handler.admin.LukeRequestHandler.getIndexInfo(LukeRequestHandler.java:553)
at
org.apache.solr.handler.admin.CoreAdminHandler.getCoreStatus(CoreAdminHandler.java:988)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleStatusAction(CoreAdminHandler.java:700)
... 29 more

And on server 2 :

SEVERE: Timeout waiting for all directory ref counts to be released
Dec 09, 2012 3:35:12 PM org.apache.solr.core.CoreContainer create
SEVERE: Unable to create core: formabanque
org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:730)
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:573)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:929)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:566)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:562)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1364)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1476)
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:705)
... 12 more
Caused by: java.io.EOFException: read past EOF:
NIOFSIndexInput(path="/Users/arogister/Dev/apache-solr-4.1-branch/solr/forem2/solr/formabanque/data/index.20121209152951621/segments_2q")
at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:266)
at
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:51)
at
org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:41)
at org.apache.lucene.store.DataInput.readInt(DataInput.java:84)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:287)
at
org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:56)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783)
at
org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:87)
at
org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:34)
at
org.apache.solr.search.SolrIndexSearcher.<init>(SolrIndexSearcher.java:119)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1340)
... 14 more

Dec 09, 2012 3:35:12 PM org.apache.solr.common.SolrException log
SEVERE: null:org.apache.solr.common.SolrException: Error opening new
searcher
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:730)
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:573)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:929)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:566)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:562)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1364)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1476)
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:705)
... 12 more
Caused by: java.io.EOFException: read past EOF:
NIOFSIndexInput(path="/Users/arogister/Dev/apache-solr-4.1-branch/solr/forem2/solr/formabanque/data/index.20121209152951621/segments_2q")
at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:266)
at
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:51)
at
org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:41)
at org.apache.lucene.store.DataInput.readInt(DataInput.java:84)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:287)
at
org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:56)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783)
at
org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:87)
at
org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:34)
at
org.apache.solr.search.SolrIndexSearcher.<init>(SolrIndexSearcher.java:119)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1340)
... 14 more

Thanks,

Alain

On Sun, Dec 9, 2012 at 1:03 AM, Mark Miller <ma...@gmail.com> wrote:

> No problem!
>
> Here is the JIRA issue: https://issues.apache.org/jira/browse/SOLR-4158
>
> - Mark
>
> On Sat, Dec 8, 2012 at 6:03 PM, Alain Rogister <al...@gmail.com>
> wrote:
> > Great, thanks Mark ! I'll test the fix and post my results.
> >
> > Alain
> >
> > On Saturday, December 8, 2012, Mark Miller wrote:
> >
> >> After some more playing around on 5x I have duplicated the issue. I'll
> >> file a JIRA issue for you and fix it shortly.
> >>
> >> - Mark
> >>
> >> On Dec 8, 2012, at 8:43 AM, Mark Miller <ma...@gmail.com> wrote:
> >>
> >> > Hmm…I've tried to replicate what looked like a bug from your report (3
> >> Solr servers stop/start ), but on 5x it works no problem for me. It
> >> shouldn't be any different on 4x, but I'll try that next.
> >> >
> >> > In terms of starting up Solr without a working ZooKeeper ensemble - it
> >> won't work currently. Cores won't be able to register with ZooKeeper and
> >> will fail loading. It would probably be nicer to come up in search only
> >> mode and keep trying to reconnect to zookeeper - file a JIRA issue if
> you
> >> are interested.
> >> >
> >> > On the zk data dir, see
> >>
> http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html#Ongoing+Data+Directory+Cleanup
> >> >
> >> > - Mark
> >> >
> >> > On Dec 7, 2012, at 10:22 PM, Mark Miller <ma...@gmail.com>
> wrote:
> >> >
> >> >> Hey, I'll try and answer this tomorrow.
> >> >>
> >> >> There is a def an unreported bug in there that needs to be fixed for
> >> the restarting the all nodes case.
> >> >>
> >> >> Also, a 404 one is generally when jetty is starting or stopping -
> there
> >> are points where 404's can be returned. I'm not sure why else you'd see
> >> one. Generally we do retries when that happens.
> >> >>
> >> >> - Mark
> >> >>
> >> >> On Dec 7, 2012, at 1:07 PM, Alain Rogister <alain.rogister@gmail.com
> >
> >> wrote:
> >> >>
> >> >>> I am reporting the results of my stress tests against Solr 4.x. As I
> >> was
> >> >>> getting many error conditions with 4.0, I switched to the 4.1 trunk
> in
> >> the
> >> >>> hope that some of the issues would be fixed already. Here is my
> setup :
> >> >>>
> >> >>> - Everything running on a single box (2 x 4-core CPUs, 8 GB RAM). I
> >> realize
> >> >>> this is not representative of a production environment but it's a
> fine
> >> way
> >> >>> to find out what happens under resource-constrained conditions.
> >> >>> - 3 Solr servers, 3 cores (2 of which are very small, the third one
> >> has 410
> >> >>> MB of data)
> >> >>> - single shard
> >> >>> - 3 Zookeeper instances
> >> >>> - HAProxy load balancing requests across Solr servers
> >> >>> - JMeter or ApacheBench running the tests : 5 thread pools of 20
> >> threads
> >> >>> each, sending search requests continuously (no updates)
> >> >>>
> >> >>> In nominal conditions, it all works fine i.e. it can process a
> million
> >> >>> requests, maxing out the CPUs at all time, without experiencing
> nasty
> >> >>> failures. There are errors in the logs about replication failures
> >> though;
> >> >>> they should be benigne in this case as no updates are taking place
> but
> >> it's
> >> >>> hard to tell what is going on exactly. Example :
> >> >>>
> >> >>> Dec 07, 2012 7:50:37 PM org.apache.solr.update.PeerSync
> handleResponse
> >> >>> WARNING: PeerSync: core=adressage url=
> http://192.168.0.101:8983/solr
> >> >>> exception talking to
> >> >>> http://192.168.0.101:8985/solr/adressage/, failed
> >> >>> org.apache.solr.common.SolrException: Server at
> >> >>> http://192.168.0.101:8985/solr/adressage returned non ok
> status:404,
> >> >>> message:Not Found
> >> >>> at
> >> >>>
> >>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
> >> >>> at
> >> >>>
> >>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> >> >>> at
> >> >>>
> >>
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166)
> >> >>> at
> >> >>>
> >>
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133)
> >> >>> at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >> >>> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >> >>> at
> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >> >>> at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >> >>> at java.util.concurrent.FutureTask.run(FutureTask.
>
>
>
> --
> - Mark
>

Re: stress testing Solr 4.x

Posted by Mark Miller <ma...@gmail.com>.
No problem!

Here is the JIRA issue: https://issues.apache.org/jira/browse/SOLR-4158

- Mark

On Sat, Dec 8, 2012 at 6:03 PM, Alain Rogister <al...@gmail.com> wrote:
> Great, thanks Mark ! I'll test the fix and post my results.
>
> Alain
>
> On Saturday, December 8, 2012, Mark Miller wrote:
>
>> After some more playing around on 5x I have duplicated the issue. I'll
>> file a JIRA issue for you and fix it shortly.
>>
>> - Mark
>>
>> On Dec 8, 2012, at 8:43 AM, Mark Miller <ma...@gmail.com> wrote:
>>
>> > Hmm…I've tried to replicate what looked like a bug from your report (3
>> Solr servers stop/start ), but on 5x it works no problem for me. It
>> shouldn't be any different on 4x, but I'll try that next.
>> >
>> > In terms of starting up Solr without a working ZooKeeper ensemble - it
>> won't work currently. Cores won't be able to register with ZooKeeper and
>> will fail loading. It would probably be nicer to come up in search only
>> mode and keep trying to reconnect to zookeeper - file a JIRA issue if you
>> are interested.
>> >
>> > On the zk data dir, see
>> http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html#Ongoing+Data+Directory+Cleanup
>> >
>> > - Mark
>> >
>> > On Dec 7, 2012, at 10:22 PM, Mark Miller <ma...@gmail.com> wrote:
>> >
>> >> Hey, I'll try and answer this tomorrow.
>> >>
>> >> There is a def an unreported bug in there that needs to be fixed for
>> the restarting the all nodes case.
>> >>
>> >> Also, a 404 one is generally when jetty is starting or stopping - there
>> are points where 404's can be returned. I'm not sure why else you'd see
>> one. Generally we do retries when that happens.
>> >>
>> >> - Mark
>> >>
>> >> On Dec 7, 2012, at 1:07 PM, Alain Rogister <al...@gmail.com>
>> wrote:
>> >>
>> >>> I am reporting the results of my stress tests against Solr 4.x. As I
>> was
>> >>> getting many error conditions with 4.0, I switched to the 4.1 trunk in
>> the
>> >>> hope that some of the issues would be fixed already. Here is my setup :
>> >>>
>> >>> - Everything running on a single box (2 x 4-core CPUs, 8 GB RAM). I
>> realize
>> >>> this is not representative of a production environment but it's a fine
>> way
>> >>> to find out what happens under resource-constrained conditions.
>> >>> - 3 Solr servers, 3 cores (2 of which are very small, the third one
>> has 410
>> >>> MB of data)
>> >>> - single shard
>> >>> - 3 Zookeeper instances
>> >>> - HAProxy load balancing requests across Solr servers
>> >>> - JMeter or ApacheBench running the tests : 5 thread pools of 20
>> threads
>> >>> each, sending search requests continuously (no updates)
>> >>>
>> >>> In nominal conditions, it all works fine i.e. it can process a million
>> >>> requests, maxing out the CPUs at all time, without experiencing nasty
>> >>> failures. There are errors in the logs about replication failures
>> though;
>> >>> they should be benigne in this case as no updates are taking place but
>> it's
>> >>> hard to tell what is going on exactly. Example :
>> >>>
>> >>> Dec 07, 2012 7:50:37 PM org.apache.solr.update.PeerSync handleResponse
>> >>> WARNING: PeerSync: core=adressage url=http://192.168.0.101:8983/solr
>> >>> exception talking to
>> >>> http://192.168.0.101:8985/solr/adressage/, failed
>> >>> org.apache.solr.common.SolrException: Server at
>> >>> http://192.168.0.101:8985/solr/adressage returned non ok status:404,
>> >>> message:Not Found
>> >>> at
>> >>>
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
>> >>> at
>> >>>
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>> >>> at
>> >>>
>> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166)
>> >>> at
>> >>>
>> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133)
>> >>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> >>> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> >>> at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> >>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> >>> at java.util.concurrent.FutureTask.run(FutureTask.



-- 
- Mark

Re: stress testing Solr 4.x

Posted by Alain Rogister <al...@gmail.com>.
Great, thanks Mark ! I'll test the fix and post my results.

Alain

On Saturday, December 8, 2012, Mark Miller wrote:

> After some more playing around on 5x I have duplicated the issue. I'll
> file a JIRA issue for you and fix it shortly.
>
> - Mark
>
> On Dec 8, 2012, at 8:43 AM, Mark Miller <ma...@gmail.com> wrote:
>
> > Hmm…I've tried to replicate what looked like a bug from your report (3
> Solr servers stop/start ), but on 5x it works no problem for me. It
> shouldn't be any different on 4x, but I'll try that next.
> >
> > In terms of starting up Solr without a working ZooKeeper ensemble - it
> won't work currently. Cores won't be able to register with ZooKeeper and
> will fail loading. It would probably be nicer to come up in search only
> mode and keep trying to reconnect to zookeeper - file a JIRA issue if you
> are interested.
> >
> > On the zk data dir, see
> http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html#Ongoing+Data+Directory+Cleanup
> >
> > - Mark
> >
> > On Dec 7, 2012, at 10:22 PM, Mark Miller <ma...@gmail.com> wrote:
> >
> >> Hey, I'll try and answer this tomorrow.
> >>
> >> There is a def an unreported bug in there that needs to be fixed for
> the restarting the all nodes case.
> >>
> >> Also, a 404 one is generally when jetty is starting or stopping - there
> are points where 404's can be returned. I'm not sure why else you'd see
> one. Generally we do retries when that happens.
> >>
> >> - Mark
> >>
> >> On Dec 7, 2012, at 1:07 PM, Alain Rogister <al...@gmail.com>
> wrote:
> >>
> >>> I am reporting the results of my stress tests against Solr 4.x. As I
> was
> >>> getting many error conditions with 4.0, I switched to the 4.1 trunk in
> the
> >>> hope that some of the issues would be fixed already. Here is my setup :
> >>>
> >>> - Everything running on a single box (2 x 4-core CPUs, 8 GB RAM). I
> realize
> >>> this is not representative of a production environment but it's a fine
> way
> >>> to find out what happens under resource-constrained conditions.
> >>> - 3 Solr servers, 3 cores (2 of which are very small, the third one
> has 410
> >>> MB of data)
> >>> - single shard
> >>> - 3 Zookeeper instances
> >>> - HAProxy load balancing requests across Solr servers
> >>> - JMeter or ApacheBench running the tests : 5 thread pools of 20
> threads
> >>> each, sending search requests continuously (no updates)
> >>>
> >>> In nominal conditions, it all works fine i.e. it can process a million
> >>> requests, maxing out the CPUs at all time, without experiencing nasty
> >>> failures. There are errors in the logs about replication failures
> though;
> >>> they should be benigne in this case as no updates are taking place but
> it's
> >>> hard to tell what is going on exactly. Example :
> >>>
> >>> Dec 07, 2012 7:50:37 PM org.apache.solr.update.PeerSync handleResponse
> >>> WARNING: PeerSync: core=adressage url=http://192.168.0.101:8983/solr
> >>> exception talking to
> >>> http://192.168.0.101:8985/solr/adressage/, failed
> >>> org.apache.solr.common.SolrException: Server at
> >>> http://192.168.0.101:8985/solr/adressage returned non ok status:404,
> >>> message:Not Found
> >>> at
> >>>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
> >>> at
> >>>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> >>> at
> >>>
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166)
> >>> at
> >>>
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133)
> >>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >>> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >>> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >>> at java.util.concurrent.FutureTask.run(FutureTask.

Re: stress testing Solr 4.x

Posted by Mark Miller <ma...@gmail.com>.
After some more playing around on 5x I have duplicated the issue. I'll file a JIRA issue for you and fix it shortly.

- Mark

On Dec 8, 2012, at 8:43 AM, Mark Miller <ma...@gmail.com> wrote:

> Hmm…I've tried to replicate what looked like a bug from your report (3 Solr servers stop/start ), but on 5x it works no problem for me. It shouldn't be any different on 4x, but I'll try that next.
> 
> In terms of starting up Solr without a working ZooKeeper ensemble - it won't work currently. Cores won't be able to register with ZooKeeper and will fail loading. It would probably be nicer to come up in search only mode and keep trying to reconnect to zookeeper - file a JIRA issue if you are interested.
> 
> On the zk data dir, see http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html#Ongoing+Data+Directory+Cleanup
> 
> - Mark
> 
> On Dec 7, 2012, at 10:22 PM, Mark Miller <ma...@gmail.com> wrote:
> 
>> Hey, I'll try and answer this tomorrow.
>> 
>> There is a def an unreported bug in there that needs to be fixed for the restarting the all nodes case.
>> 
>> Also, a 404 one is generally when jetty is starting or stopping - there are points where 404's can be returned. I'm not sure why else you'd see one. Generally we do retries when that happens.
>> 
>> - Mark
>> 
>> On Dec 7, 2012, at 1:07 PM, Alain Rogister <al...@gmail.com> wrote:
>> 
>>> I am reporting the results of my stress tests against Solr 4.x. As I was
>>> getting many error conditions with 4.0, I switched to the 4.1 trunk in the
>>> hope that some of the issues would be fixed already. Here is my setup :
>>> 
>>> - Everything running on a single box (2 x 4-core CPUs, 8 GB RAM). I realize
>>> this is not representative of a production environment but it's a fine way
>>> to find out what happens under resource-constrained conditions.
>>> - 3 Solr servers, 3 cores (2 of which are very small, the third one has 410
>>> MB of data)
>>> - single shard
>>> - 3 Zookeeper instances
>>> - HAProxy load balancing requests across Solr servers
>>> - JMeter or ApacheBench running the tests : 5 thread pools of 20 threads
>>> each, sending search requests continuously (no updates)
>>> 
>>> In nominal conditions, it all works fine i.e. it can process a million
>>> requests, maxing out the CPUs at all time, without experiencing nasty
>>> failures. There are errors in the logs about replication failures though;
>>> they should be benigne in this case as no updates are taking place but it's
>>> hard to tell what is going on exactly. Example :
>>> 
>>> Dec 07, 2012 7:50:37 PM org.apache.solr.update.PeerSync handleResponse
>>> WARNING: PeerSync: core=adressage url=http://192.168.0.101:8983/solr
>>> exception talking to
>>> http://192.168.0.101:8985/solr/adressage/, failed
>>> org.apache.solr.common.SolrException: Server at
>>> http://192.168.0.101:8985/solr/adressage returned non ok status:404,
>>> message:Not Found
>>> at
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
>>> at
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>>> at
>>> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166)
>>> at
>>> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133)
>>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>> at java.lang.Thread.run(Thread.java:722)
>>> 
>>> Then I simulated various failure scenarios :
>>> 
>>> - 1 Solr server stop/start
>>> - 2 Solr servers stop/start
>>> - 3 Solr servers stop/start : it seems that in this case, the Solr servers
>>> *cannot* be restarted : more exactly, the restarted server will consider
>>> that it is number 1 out of 4 and wait for the other 3 to come up. The only
>>> way out is to stop it again, then stop all Zookeeper instances *and* clean
>>> up their zkdata directory, start them, then start the Solr servers.
>>> 
>>> I noticed that these zkdata directory had grown to 200 MB after a while.
>>> What exactly is in there besides the configuration data ? Does it stop
>>> growing ?
>>> 
>>> Then I tried this :
>>> 
>>> - kill 1 Zookeeper process
>>> - kill 2 Zookeeper processes
>>> - stop/start 1 Solr server
>>> 
>>> When doing this, I experienced (many times) situations where the Solr
>>> servers could not reconnect and threw scary exceptions. The only way out
>>> was to restart the whole cluster.
>>> 
>>> Q : when, if ever, is one supposed to clean up the zkdata directories ?
>>> 
>>> Here are the errors I found in the logs. It seems that some of them have
>>> been reported in JIRA but 4.1-trunk seems to experience basically the same
>>> issues as 4.0 in my test scenarios.
>>> 
>>> Dec 07, 2012 8:03:59 PM org.apache.solr.update.PeerSync handleResponse
>>> WARNING: PeerSync: core=cachede url=http://192.168.0.101:8983/solr
>>> couldn't connect to
>>> http://192.168.0.101:8984/solr/cachede/, counting as success
>>> Dec 07, 2012 8:03:59 PM org.apache.solr.common.SolrException log
>>> SEVERE: Sync request error:
>>> org.apache.solr.client.solrj.SolrServerException: Server refused connection
>>> at: http://192.168.0.101:8984/solr/cachede
>>> Dec 07, 2012 8:03:59 PM org.apache.solr.common.SolrException log
>>> SEVERE: http://192.168.0.101:8983/solr/cachede/: Could not tell a replica
>>> to recover:org.apache.solr.client.solrj.SolrServerException: Server refused
>>> connection at: http://192.168.0.101:8984/solr
>>> at
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:406)
>>> at
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>>> at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:293)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>> at java.lang.Thread.run(Thread.java:722)
>>> Caused by: org.apache.http.conn.HttpHostConnectException: Connection to
>>> http://192.168.0.101:8984 refused
>>> at
>>> org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:158)
>>> at
>>> org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:150)
>>> at
>>> org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
>>> at
>>> org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:575)
>>> at
>>> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425)
>>> at
>>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
>>> at
>>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
>>> at
>>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
>>> at
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
>>> ... 5 more
>>> Caused by: java.net.ConnectException: Connection refused
>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>> at
>>> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>>> at
>>> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>>> at
>>> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
>>> at java.net.Socket.connect(Socket.java:579)
>>> at
>>> org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:123)
>>> at
>>> org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148)
>>> ... 13 more
>>> 
>>> Dec 07, 2012 8:03:59 PM org.apache.solr.update.PeerSync handleResponse
>>> WARNING: PeerSync: core=adressage url=http://192.168.0.101:8983/solr  got a
>>> 404 from http://192.168.0.101:8985/solr/adressage/, counting as success
>>> Dec 07, 2012 8:03:59 PM org.apache.solr.common.SolrException log
>>> SEVERE: Sync request error: org.apache.solr.common.SolrException: Server at
>>> http://192.168.0.101:8985/solr/adressage returned non ok status:404,
>>> message:Not Found
>>> Dec 07, 2012 8:04:00 PM org.apache.solr.update.PeerSync handleResponse
>>> WARNING: PeerSync: core=formabanque url=http://192.168.0.101:8983/solr  got
>>> a 404 from http://192.168.0.101:8985/solr/formabanque/, counting as success
>>> Dec 07, 2012 8:04:00 PM org.apache.solr.common.SolrException log
>>> SEVERE: Sync request error: org.apache.solr.common.SolrException: Server at
>>> http://192.168.0.101:8985/solr/formabanque returned non ok status:404,
>>> message:Not Found
>>> 
>>> Dec 07, 2012 8:04:32 PM org.apache.solr.update.PeerSync sync
>>> WARNING: no frame of reference to tell of we've missed updates
>>> 
>>> Dec 07, 2012 8:03:58 PM org.apache.solr.common.SolrException log
>>> SEVERE: Error while trying to
>>> recover:org.apache.solr.client.solrj.SolrServerException: Server refused
>>> connection at: http://192.168.0.101:8984/solr/adressage
>>> at
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:406)
>>> at
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>>> at
>>> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
>>> at
>>> org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:182)
>>> at
>>> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:134)
>>> at
>>> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:407)
>>> at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222)
>>> Caused by: org.apache.http.conn.HttpHostConnectException: Connection to
>>> http://192.168.0.101:8984 refused
>>> at
>>> org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:158)
>>> at
>>> org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:150)
>>> at
>>> org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
>>> at
>>> org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:575)
>>> at
>>> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425)
>>> at
>>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
>>> at
>>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
>>> at
>>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
>>> at
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
>>> ... 6 more
>>> Caused by: java.net.ConnectException: Connection refused
>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>> at
>>> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>>> at
>>> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>>> at
>>> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
>>> at java.net.Socket.connect(Socket.java:579)
>>> at
>>> org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:123)
>>> at
>>> org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148)
>>> ... 14 more
>>> 
>>> Dec 07, 2012 8:03:58 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
>>> SEVERE: Recovery failed - trying again... (0) core=adressage
>>> 
>>> SEVERE: Error getting leader from zk
>>> org.apache.solr.common.SolrException: Could not get leader props
>>> at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:735)
>>> at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:699)
>>> at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:664)
>>> at org.apache.solr.cloud.ZkController.register(ZkController.java:603)
>>> at org.apache.solr.cloud.ZkController.register(ZkController.java:558)
>>> at org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:791)
>>> at org.apache.solr.core.CoreContainer.register(CoreContainer.java:775)
>>> at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:567)
>>> at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:562)
>>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>> at java.lang.Thread.run(Thread.java:722)
>>> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
>>> KeeperErrorCode = NoNode for /collections/adressage/leaders/shard1
>>> at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
>>> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>>> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
>>> at
>>> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:244)
>>> at
>>> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:241)
>>> at
>>> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63)
>>> at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:241)
>>> at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:713)
>>> ... 16 more
>>> 
>>> Dec 07, 2012 4:39:23 PM org.apache.solr.common.SolrException log
>>> SEVERE: org.apache.solr.common.SolrException: no servers hosting shard:
>>> at
>>> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:159)
>>> at
>>> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133)
>>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>> at java.lang.Thread.run(Thread.java:722)
>> 
> 


Re: stress testing Solr 4.x

Posted by Mark Miller <ma...@gmail.com>.
Hmm…I've tried to replicate what looked like a bug from your report (3 Solr servers stop/start ), but on 5x it works no problem for me. It shouldn't be any different on 4x, but I'll try that next.

In terms of starting up Solr without a working ZooKeeper ensemble - it won't work currently. Cores won't be able to register with ZooKeeper and will fail loading. It would probably be nicer to come up in search only mode and keep trying to reconnect to zookeeper - file a JIRA issue if you are interested.

On the zk data dir, see http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html#Ongoing+Data+Directory+Cleanup

- Mark

On Dec 7, 2012, at 10:22 PM, Mark Miller <ma...@gmail.com> wrote:

> Hey, I'll try and answer this tomorrow.
> 
> There is a def an unreported bug in there that needs to be fixed for the restarting the all nodes case.
> 
> Also, a 404 one is generally when jetty is starting or stopping - there are points where 404's can be returned. I'm not sure why else you'd see one. Generally we do retries when that happens.
> 
> - Mark
> 
> On Dec 7, 2012, at 1:07 PM, Alain Rogister <al...@gmail.com> wrote:
> 
>> I am reporting the results of my stress tests against Solr 4.x. As I was
>> getting many error conditions with 4.0, I switched to the 4.1 trunk in the
>> hope that some of the issues would be fixed already. Here is my setup :
>> 
>> - Everything running on a single box (2 x 4-core CPUs, 8 GB RAM). I realize
>> this is not representative of a production environment but it's a fine way
>> to find out what happens under resource-constrained conditions.
>> - 3 Solr servers, 3 cores (2 of which are very small, the third one has 410
>> MB of data)
>> - single shard
>> - 3 Zookeeper instances
>> - HAProxy load balancing requests across Solr servers
>> - JMeter or ApacheBench running the tests : 5 thread pools of 20 threads
>> each, sending search requests continuously (no updates)
>> 
>> In nominal conditions, it all works fine i.e. it can process a million
>> requests, maxing out the CPUs at all time, without experiencing nasty
>> failures. There are errors in the logs about replication failures though;
>> they should be benigne in this case as no updates are taking place but it's
>> hard to tell what is going on exactly. Example :
>> 
>> Dec 07, 2012 7:50:37 PM org.apache.solr.update.PeerSync handleResponse
>> WARNING: PeerSync: core=adressage url=http://192.168.0.101:8983/solr
>> exception talking to
>> http://192.168.0.101:8985/solr/adressage/, failed
>> org.apache.solr.common.SolrException: Server at
>> http://192.168.0.101:8985/solr/adressage returned non ok status:404,
>> message:Not Found
>> at
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
>> at
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>> at
>> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166)
>> at
>> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133)
>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> at java.lang.Thread.run(Thread.java:722)
>> 
>> Then I simulated various failure scenarios :
>> 
>> - 1 Solr server stop/start
>> - 2 Solr servers stop/start
>> - 3 Solr servers stop/start : it seems that in this case, the Solr servers
>> *cannot* be restarted : more exactly, the restarted server will consider
>> that it is number 1 out of 4 and wait for the other 3 to come up. The only
>> way out is to stop it again, then stop all Zookeeper instances *and* clean
>> up their zkdata directory, start them, then start the Solr servers.
>> 
>> I noticed that these zkdata directory had grown to 200 MB after a while.
>> What exactly is in there besides the configuration data ? Does it stop
>> growing ?
>> 
>> Then I tried this :
>> 
>> - kill 1 Zookeeper process
>> - kill 2 Zookeeper processes
>> - stop/start 1 Solr server
>> 
>> When doing this, I experienced (many times) situations where the Solr
>> servers could not reconnect and threw scary exceptions. The only way out
>> was to restart the whole cluster.
>> 
>> Q : when, if ever, is one supposed to clean up the zkdata directories ?
>> 
>> Here are the errors I found in the logs. It seems that some of them have
>> been reported in JIRA but 4.1-trunk seems to experience basically the same
>> issues as 4.0 in my test scenarios.
>> 
>> Dec 07, 2012 8:03:59 PM org.apache.solr.update.PeerSync handleResponse
>> WARNING: PeerSync: core=cachede url=http://192.168.0.101:8983/solr
>> couldn't connect to
>> http://192.168.0.101:8984/solr/cachede/, counting as success
>> Dec 07, 2012 8:03:59 PM org.apache.solr.common.SolrException log
>> SEVERE: Sync request error:
>> org.apache.solr.client.solrj.SolrServerException: Server refused connection
>> at: http://192.168.0.101:8984/solr/cachede
>> Dec 07, 2012 8:03:59 PM org.apache.solr.common.SolrException log
>> SEVERE: http://192.168.0.101:8983/solr/cachede/: Could not tell a replica
>> to recover:org.apache.solr.client.solrj.SolrServerException: Server refused
>> connection at: http://192.168.0.101:8984/solr
>> at
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:406)
>> at
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>> at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:293)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> at java.lang.Thread.run(Thread.java:722)
>> Caused by: org.apache.http.conn.HttpHostConnectException: Connection to
>> http://192.168.0.101:8984 refused
>> at
>> org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:158)
>> at
>> org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:150)
>> at
>> org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
>> at
>> org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:575)
>> at
>> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425)
>> at
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
>> at
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
>> at
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
>> at
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
>> ... 5 more
>> Caused by: java.net.ConnectException: Connection refused
>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>> at
>> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>> at
>> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>> at
>> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
>> at java.net.Socket.connect(Socket.java:579)
>> at
>> org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:123)
>> at
>> org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148)
>> ... 13 more
>> 
>> Dec 07, 2012 8:03:59 PM org.apache.solr.update.PeerSync handleResponse
>> WARNING: PeerSync: core=adressage url=http://192.168.0.101:8983/solr  got a
>> 404 from http://192.168.0.101:8985/solr/adressage/, counting as success
>> Dec 07, 2012 8:03:59 PM org.apache.solr.common.SolrException log
>> SEVERE: Sync request error: org.apache.solr.common.SolrException: Server at
>> http://192.168.0.101:8985/solr/adressage returned non ok status:404,
>> message:Not Found
>> Dec 07, 2012 8:04:00 PM org.apache.solr.update.PeerSync handleResponse
>> WARNING: PeerSync: core=formabanque url=http://192.168.0.101:8983/solr  got
>> a 404 from http://192.168.0.101:8985/solr/formabanque/, counting as success
>> Dec 07, 2012 8:04:00 PM org.apache.solr.common.SolrException log
>> SEVERE: Sync request error: org.apache.solr.common.SolrException: Server at
>> http://192.168.0.101:8985/solr/formabanque returned non ok status:404,
>> message:Not Found
>> 
>> Dec 07, 2012 8:04:32 PM org.apache.solr.update.PeerSync sync
>> WARNING: no frame of reference to tell of we've missed updates
>> 
>> Dec 07, 2012 8:03:58 PM org.apache.solr.common.SolrException log
>> SEVERE: Error while trying to
>> recover:org.apache.solr.client.solrj.SolrServerException: Server refused
>> connection at: http://192.168.0.101:8984/solr/adressage
>> at
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:406)
>> at
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>> at
>> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
>> at
>> org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:182)
>> at
>> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:134)
>> at
>> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:407)
>> at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222)
>> Caused by: org.apache.http.conn.HttpHostConnectException: Connection to
>> http://192.168.0.101:8984 refused
>> at
>> org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:158)
>> at
>> org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:150)
>> at
>> org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
>> at
>> org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:575)
>> at
>> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425)
>> at
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
>> at
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
>> at
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
>> at
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
>> ... 6 more
>> Caused by: java.net.ConnectException: Connection refused
>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>> at
>> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>> at
>> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>> at
>> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
>> at java.net.Socket.connect(Socket.java:579)
>> at
>> org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:123)
>> at
>> org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148)
>> ... 14 more
>> 
>> Dec 07, 2012 8:03:58 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
>> SEVERE: Recovery failed - trying again... (0) core=adressage
>> 
>> SEVERE: Error getting leader from zk
>> org.apache.solr.common.SolrException: Could not get leader props
>> at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:735)
>> at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:699)
>> at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:664)
>> at org.apache.solr.cloud.ZkController.register(ZkController.java:603)
>> at org.apache.solr.cloud.ZkController.register(ZkController.java:558)
>> at org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:791)
>> at org.apache.solr.core.CoreContainer.register(CoreContainer.java:775)
>> at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:567)
>> at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:562)
>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> at java.lang.Thread.run(Thread.java:722)
>> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
>> KeeperErrorCode = NoNode for /collections/adressage/leaders/shard1
>> at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
>> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
>> at
>> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:244)
>> at
>> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:241)
>> at
>> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63)
>> at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:241)
>> at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:713)
>> ... 16 more
>> 
>> Dec 07, 2012 4:39:23 PM org.apache.solr.common.SolrException log
>> SEVERE: org.apache.solr.common.SolrException: no servers hosting shard:
>> at
>> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:159)
>> at
>> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133)
>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> at java.lang.Thread.run(Thread.java:722)
> 


Re: stress testing Solr 4.x

Posted by Mark Miller <ma...@gmail.com>.
Hey, I'll try and answer this tomorrow.

There is a def an unreported bug in there that needs to be fixed for the restarting the all nodes case.

Also, a 404 one is generally when jetty is starting or stopping - there are points where 404's can be returned. I'm not sure why else you'd see one. Generally we do retries when that happens.

- Mark

On Dec 7, 2012, at 1:07 PM, Alain Rogister <al...@gmail.com> wrote:

> I am reporting the results of my stress tests against Solr 4.x. As I was
> getting many error conditions with 4.0, I switched to the 4.1 trunk in the
> hope that some of the issues would be fixed already. Here is my setup :
> 
> - Everything running on a single box (2 x 4-core CPUs, 8 GB RAM). I realize
> this is not representative of a production environment but it's a fine way
> to find out what happens under resource-constrained conditions.
> - 3 Solr servers, 3 cores (2 of which are very small, the third one has 410
> MB of data)
> - single shard
> - 3 Zookeeper instances
> - HAProxy load balancing requests across Solr servers
> - JMeter or ApacheBench running the tests : 5 thread pools of 20 threads
> each, sending search requests continuously (no updates)
> 
> In nominal conditions, it all works fine i.e. it can process a million
> requests, maxing out the CPUs at all time, without experiencing nasty
> failures. There are errors in the logs about replication failures though;
> they should be benigne in this case as no updates are taking place but it's
> hard to tell what is going on exactly. Example :
> 
> Dec 07, 2012 7:50:37 PM org.apache.solr.update.PeerSync handleResponse
> WARNING: PeerSync: core=adressage url=http://192.168.0.101:8983/solr
> exception talking to
> http://192.168.0.101:8985/solr/adressage/, failed
> org.apache.solr.common.SolrException: Server at
> http://192.168.0.101:8985/solr/adressage returned non ok status:404,
> message:Not Found
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> at
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166)
> at
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
> 
> Then I simulated various failure scenarios :
> 
> - 1 Solr server stop/start
> - 2 Solr servers stop/start
> - 3 Solr servers stop/start : it seems that in this case, the Solr servers
> *cannot* be restarted : more exactly, the restarted server will consider
> that it is number 1 out of 4 and wait for the other 3 to come up. The only
> way out is to stop it again, then stop all Zookeeper instances *and* clean
> up their zkdata directory, start them, then start the Solr servers.
> 
> I noticed that these zkdata directory had grown to 200 MB after a while.
> What exactly is in there besides the configuration data ? Does it stop
> growing ?
> 
> Then I tried this :
> 
> - kill 1 Zookeeper process
> - kill 2 Zookeeper processes
> - stop/start 1 Solr server
> 
> When doing this, I experienced (many times) situations where the Solr
> servers could not reconnect and threw scary exceptions. The only way out
> was to restart the whole cluster.
> 
> Q : when, if ever, is one supposed to clean up the zkdata directories ?
> 
> Here are the errors I found in the logs. It seems that some of them have
> been reported in JIRA but 4.1-trunk seems to experience basically the same
> issues as 4.0 in my test scenarios.
> 
> Dec 07, 2012 8:03:59 PM org.apache.solr.update.PeerSync handleResponse
> WARNING: PeerSync: core=cachede url=http://192.168.0.101:8983/solr
> couldn't connect to
> http://192.168.0.101:8984/solr/cachede/, counting as success
> Dec 07, 2012 8:03:59 PM org.apache.solr.common.SolrException log
> SEVERE: Sync request error:
> org.apache.solr.client.solrj.SolrServerException: Server refused connection
> at: http://192.168.0.101:8984/solr/cachede
> Dec 07, 2012 8:03:59 PM org.apache.solr.common.SolrException log
> SEVERE: http://192.168.0.101:8983/solr/cachede/: Could not tell a replica
> to recover:org.apache.solr.client.solrj.SolrServerException: Server refused
> connection at: http://192.168.0.101:8984/solr
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:406)
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:293)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: org.apache.http.conn.HttpHostConnectException: Connection to
> http://192.168.0.101:8984 refused
> at
> org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:158)
> at
> org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:150)
> at
> org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
> at
> org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:575)
> at
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425)
> at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
> at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
> at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
> ... 5 more
> Caused by: java.net.ConnectException: Connection refused
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
> at
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
> at
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
> at java.net.Socket.connect(Socket.java:579)
> at
> org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:123)
> at
> org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148)
> ... 13 more
> 
> Dec 07, 2012 8:03:59 PM org.apache.solr.update.PeerSync handleResponse
> WARNING: PeerSync: core=adressage url=http://192.168.0.101:8983/solr  got a
> 404 from http://192.168.0.101:8985/solr/adressage/, counting as success
> Dec 07, 2012 8:03:59 PM org.apache.solr.common.SolrException log
> SEVERE: Sync request error: org.apache.solr.common.SolrException: Server at
> http://192.168.0.101:8985/solr/adressage returned non ok status:404,
> message:Not Found
> Dec 07, 2012 8:04:00 PM org.apache.solr.update.PeerSync handleResponse
> WARNING: PeerSync: core=formabanque url=http://192.168.0.101:8983/solr  got
> a 404 from http://192.168.0.101:8985/solr/formabanque/, counting as success
> Dec 07, 2012 8:04:00 PM org.apache.solr.common.SolrException log
> SEVERE: Sync request error: org.apache.solr.common.SolrException: Server at
> http://192.168.0.101:8985/solr/formabanque returned non ok status:404,
> message:Not Found
> 
> Dec 07, 2012 8:04:32 PM org.apache.solr.update.PeerSync sync
> WARNING: no frame of reference to tell of we've missed updates
> 
> Dec 07, 2012 8:03:58 PM org.apache.solr.common.SolrException log
> SEVERE: Error while trying to
> recover:org.apache.solr.client.solrj.SolrServerException: Server refused
> connection at: http://192.168.0.101:8984/solr/adressage
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:406)
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
> at
> org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:182)
> at
> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:134)
> at
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:407)
> at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222)
> Caused by: org.apache.http.conn.HttpHostConnectException: Connection to
> http://192.168.0.101:8984 refused
> at
> org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:158)
> at
> org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:150)
> at
> org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
> at
> org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:575)
> at
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425)
> at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
> at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
> at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
> ... 6 more
> Caused by: java.net.ConnectException: Connection refused
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
> at
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
> at
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
> at java.net.Socket.connect(Socket.java:579)
> at
> org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:123)
> at
> org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148)
> ... 14 more
> 
> Dec 07, 2012 8:03:58 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
> SEVERE: Recovery failed - trying again... (0) core=adressage
> 
> SEVERE: Error getting leader from zk
> org.apache.solr.common.SolrException: Could not get leader props
> at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:735)
> at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:699)
> at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:664)
> at org.apache.solr.cloud.ZkController.register(ZkController.java:603)
> at org.apache.solr.cloud.ZkController.register(ZkController.java:558)
> at org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:791)
> at org.apache.solr.core.CoreContainer.register(CoreContainer.java:775)
> at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:567)
> at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:562)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
> KeeperErrorCode = NoNode for /collections/adressage/leaders/shard1
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
> at
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:244)
> at
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:241)
> at
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63)
> at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:241)
> at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:713)
> ... 16 more
> 
> Dec 07, 2012 4:39:23 PM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: no servers hosting shard:
> at
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:159)
> at
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)