You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Peter Keegan <pe...@gmail.com> on 2013/10/25 14:43:47 UTC
How to reinitialize a solrcloud replica

I'm running 4.3 in solrcloud mode and trying to test index recovery, but
it's failing.
I have one shard, 2 replicas:
Leader: 10.159.8.105
Replica: 10.159.6.73

To test, I stopped the replica, deleted the 'data' directory and restarted
solr. Here is the replica's logging:

INFO  - 2013-10-25 12:19:40.773; org.apache.solr.cloud.ZkController; We are
http://10.159.6.73:8983/solr/collection/ and leader is
http://10.159.8.105:8983/solr/collection/
INFO  - 2013-10-25 12:19:40.774; org.apache.solr.cloud.ZkController; No
LogReplay needed for core=collection baseURL=http://10.159.6.73:8983/solr
INFO  - 2013-10-25 12:19:40.774; org.apache.solr.cloud.ZkController; Core
needs to recover:collection
INFO  - 2013-10-25 12:19:40.774;
org.apache.solr.update.DefaultSolrCoreState; Running recovery - first
canceling any ongoing recovery
INFO  - 2013-10-25 12:19:40.778; org.apache.solr.cloud.RecoveryStrategy;
Starting recovery process.  core=collection recoveringAfterStartup=true
...
ERROR - 2013-10-25 12:20:25.281; org.apache.solr.common.SolrException;
Error while trying to recover.
core=collection:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
I was asked to wait on state recovering for 10.159.6.73:8983_solr but I
still do not see the requested state. I see state: down live:true
...
ERROR - 2013-10-25 12:20:25.281; org.apache.solr.cloud.RecoveryStrategy;
Recovery failed - trying again... (5) core=collection
ERROR - 2013-10-25 12:20:25.281; org.apache.solr.common.SolrException;
Recovery failed - interrupted. core=collection
ERROR - 2013-10-25 12:20:25.282; org.apache.solr.common.SolrException;
Recovery failed - I give up. core=collection
INFO  - 2013-10-25 12:20:25.282; org.apache.solr.cloud.ZkController;
publishing core=collection state=recovery_failed

Here is the Leader's logging:

INFO  - 2013-10-25 12:19:40.883;
org.apache.solr.handler.admin.CoreAdminHandler; Going to wait for
coreNodeName: 10.159.6.73:8983_solr_collection, state: recovering,
checkLive: true, onlyIfLeader: true
INFO  - 2013-10-25 12:19:55.886;
org.apache.solr.common.cloud.ZkStateReader; Updating cloud state from
ZooKeeper...
ERROR - 2013-10-25 12:20:25.277; org.apache.solr.common.SolrException;
org.apache.solr.common.SolrException: I was asked to wait on state
recovering for 10.159.6.73:8983_solr but I still do not see the requested
state. I see state: down live:true
(repeats every minute)

Is it valid to simply delete the 'data' directory, or does a znode have to
be modified, too?
What's the right way to reinitialize and re-synch a core?

Peter