You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ranjith Venkatesan <ra...@zohocorp.com> on 2013/08/01 18:43:58 UTC

Solr Split Shard - Document loss and down time

We are using solr 4.3.0 for our search application. We will be splitting
shard at run time. I simulated a scenario, let me explain that first.

I am indexing some 20000 docs via solrj. At the same time i m triggering my
split shard command. And also, I am killing the leader node when both the
above operations are in progress. 

In the above case i m facing *downtime(indexing fails), document loss,* and
also delete shard after split shard not gets completed.

<nabble_img src=&quot;Solr1.png&quot; border=&quot;0&quot; &lt;nabble_img
src=&quot;Solr3.png&quot; border=&quot;0&quot; alt=&quot;When i search *:*,
i m getting numFound as 0. Doc loss&quot;/>alt="Cloud View after split shard
gets completed."/>

<http://lucene.472066.n3.nabble.com/file/n4082002/Solr2.png> 

<http://lucene.472066.n3.nabble.com/file/n4082002/Solr3.png> 

<http://lucene.472066.n3.nabble.com/file/n4082002/Solr4.png> 


And also error was thrown during indexing. let me post that too

/Doc:::11203
Doc:::11204
1 Aug, 2013 9:24:52 PM org.apache.solr.common.cloud.ZkStateReader$2 process
INFO: A cluster state change: WatchedEvent state:SyncConnected
type:NodeDataChanged path:/clusterstate.json, has occurred - updating...
(live nodes size: 3)
Doc:::11205
org.apache.solr.client.solrj.SolrServerException: No live SolrServers
available to handle this request
        at
org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:331)
        at
org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:306)
        at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
        at
org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:168)
        at
org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:146)
        at tokyosolrindex.Main.main(Main.java:44)/


Is there any approach to overcome this??? 


Thanks in advance


RANJITH VENKATESAN



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Split-Shard-Document-loss-and-down-time-tp4082002.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Split Shard - Document loss and down time

Posted by Ranjith Venkatesan <ra...@zohocorp.com>.
I have tried in 4.4 too. It also produces same kind of problem only. 



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Split-Shard-Document-loss-and-down-time-tp4082002p4082180.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Split Shard - Document loss and down time

Posted by Ranjith Venkatesan <ra...@zohocorp.com>.
I have explained in the above post with screenshots. Indexing gets failed
when any node is down and also shard splitting is in progress



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Split-Shard-Document-loss-and-down-time-tp4082002p4082994.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Split Shard - Document loss and down time

Posted by Anshum Gupta <an...@anshumgupta.net>.
Hi Ranjith,

Here are a few things to note about shard split:
1. The command auto-retries. Also, if there's something that went wrong
during a split, you should wait for it to complete.
2. In case of a failure, the parent shard is supposed to be intact and the
new sub-shards wouldn't replace the parent shard.
3. If you tried using 4.3.*, the commit isn't called and so the documents
wouldn't be visible on the subshards unless you call an explicit commit.
Having said that, I'd highly recommend you not to use 4.3 for trying to
shard splitting.

Can you explain further by what you mean by "documents are getting lost"?
AFAIR, the code is supposed to handle failure midway through the shard
split call, including dead leader/overseer.


On Wed, Aug 7, 2013 at 3:07 PM, Ranjith Venkatesan
<ra...@zohocorp.com>wrote:

> Hi Erick,
>
> I have a question. Suppose if any error occurred during shard split , is
> there any approach to revert back the split action? .  This is seriously
> breaking my head. For me documents are getting lost when any of the node
> for
> that shard is dead when split shard is in progress.
>
> Thanks
>
> Ranjith
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Split-Shard-Document-loss-and-down-time-tp4082002p4082973.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 

Anshum Gupta
http://www.anshumgupta.net

Re: Solr Split Shard - Document loss and down time

Posted by Ranjith Venkatesan <ra...@zohocorp.com>.
Hi Erick,

I have a question. Suppose if any error occurred during shard split , is
there any approach to revert back the split action? .  This is seriously
breaking my head. For me documents are getting lost when any of the node for
that shard is dead when split shard is in progress. 

Thanks

Ranjith



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Split-Shard-Document-loss-and-down-time-tp4082002p4082973.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Split Shard - Document loss and down time

Posted by Erick Erickson <er...@gmail.com>.
>From the Wiki page:

 4.3 however due to bugs found after 4.3 release, it is recommended that
you wait for release 4.3.1 before using this feature

So it would be great if you could try it with a newer Solr (4.4 was
recently released) because if it's still a problem we need to know.

Best
Erick


On Thu, Aug 1, 2013 at 12:43 PM, Ranjith Venkatesan
<ra...@zohocorp.com>wrote:

> We are using solr 4.3.0 for our search application. We will be splitting
> shard at run time. I simulated a scenario, let me explain that first.
>
> I am indexing some 20000 docs via solrj. At the same time i m triggering my
> split shard command. And also, I am killing the leader node when both the
> above operations are in progress.
>
> In the above case i m facing *downtime(indexing fails), document loss,* and
> also delete shard after split shard not gets completed.
>
> <nabble_img src=&quot;Solr1.png&quot; border=&quot;0&quot; &lt;nabble_img
> src=&quot;Solr3.png&quot; border=&quot;0&quot; alt=&quot;When i search *:*,
> i m getting numFound as 0. Doc loss&quot;/>alt="Cloud View after split
> shard
> gets completed."/>
>
> <http://lucene.472066.n3.nabble.com/file/n4082002/Solr2.png>
>
> <http://lucene.472066.n3.nabble.com/file/n4082002/Solr3.png>
>
> <http://lucene.472066.n3.nabble.com/file/n4082002/Solr4.png>
>
>
> And also error was thrown during indexing. let me post that too
>
> /Doc:::11203
> Doc:::11204
> 1 Aug, 2013 9:24:52 PM org.apache.solr.common.cloud.ZkStateReader$2 process
> INFO: A cluster state change: WatchedEvent state:SyncConnected
> type:NodeDataChanged path:/clusterstate.json, has occurred - updating...
> (live nodes size: 3)
> Doc:::11205
> org.apache.solr.client.solrj.SolrServerException: No live SolrServers
> available to handle this request
>         at
>
> org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:331)
>         at
>
> org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:306)
>         at
>
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
>         at
> org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:168)
>         at
> org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:146)
>         at tokyosolrindex.Main.main(Main.java:44)/
>
>
> Is there any approach to overcome this???
>
>
> Thanks in advance
>
>
> RANJITH VENKATESAN
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Split-Shard-Document-loss-and-down-time-tp4082002.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>