You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by ku3ia <de...@gmail.com> on 2012/11/14 16:05:47 UTC

SolrCloud: Shard resize

Hi all!

My index is dynamically updated. This means, that every day I have new data,
and every day I remove unused documents from it. Approximately, I know
number of documents, which I'm indexing per day.

Today I had tested a situation. Simple imagine, there is an one collection
and two shards with two replicas. I had indexed some data to it. After that
I run a query:

http://10.112.1.2:8080/solr/collection1/select?q=*%3A*&wt=xml&rows=0&shards.info=true

<response>
 <lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">74</int>
  <lst name="params">
   <str name="wt">xml</str>
   <str name="q">*:*</str>
   <str name="rows">0</str>
   <str name="shards.info">true</str>
  </lst>
 </lst>
 <lst name="shards.info">
  <lst
name="10.112.1.2:8081/solr/collection1/|10.112.1.2:8083/solr/collection1/">
   <long name="numFound">58647</long>
   <float name="maxScore">1.0</float>
   <long name="time">37</long>
  </lst>
  <lst
name="10.112.1.2:8080/solr/collection1/|10.112.1.2:8082/solr/collection1/">
   <long name="numFound">58876</long>
   <float name="maxScore">1.0</float>
   <long name="time">48</long>
  </lst>
 <result name="response" numFound="117523" start="0" maxScore="1.0"/>
</response>

Lets say, at one shiny day, number of documents, which I need to index will
doubled. This means, that I will have more docuemnts in shards and this is
very bad for me for some reasons.
But I found a solution, I had simply add new shard with replica to
collection and index some data again:

<response>
 <lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">74</int>
  <lst name="params">
   <str name="wt">xml</str>
   <str name="q">*:*</str>
   <str name="rows">0</str>
   <str name="shards.info">true</str>
  </lst>
 </lst>
 <lst name="shards.info">
  <lst
name="10.112.1.2:8081/solr/collection1/|10.112.1.2:8083/solr/collection1/">
   <long name="numFound">98949</long>
   <float name="maxScore">1.0</float>
   <long name="time">37</long>
  </lst>
  <lst
name="10.112.1.2:8080/solr/collection1/|10.112.1.2:8082/solr/collection1/">
   <long name="numFound">98898</long>
   <float name="maxScore">1.0</float>
   <long name="time">48</long>
  </lst>
  <lst
name="10.112.1.2:8084/solr/collection1/|10.112.1.2:8085/solr/collection1/">
   <long name="numFound">40183</long>
   <float name="maxScore">1.0</float>
   <long name="time">68</long>
  </lst>
 </lst>
 <result name="response" numFound="238030" start="0" maxScore="1.0"/>
</response>

Seems this works, and it suits me, but I don't remember where I'd read, that
if I want to increase a number of shards in collection I need to reindex
whole data.

Please advise me, does my solution correct?

Many thanks.



--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Shard-resize-tp4020282.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud: Shard resize

Posted by ku3ia <de...@gmail.com>.
Hi Erick! Many thanks for your response. I understand the situation. Thanks.



--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Shard-resize-tp4020282p4020538.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud: Shard resize

Posted by Erick Erickson <er...@gmail.com>.
Currently you have to re-index all of your data. If you don't you'll have a
situation in which the same document (by uniqueKey) exists in two shards
and that document may show up twice in your results list.

NOTE: by "reindex all your data", you need to _delete_ all your data first.
If you just add a shard and index more data, SolrCloud will simply try to
re-index each doc in the (new) "proper" shard. The fact that it already
exists on another shard won't be automatically handled.

There is currently work under consideration to allow shards to be split,
which would solve the reindex everything problem, but it's not in the code
yet. And it's also not an easy problem.

Best
Erick


On Thu, Nov 15, 2012 at 5:14 AM, ku3ia <de...@gmail.com> wrote:

> Any ideas?
> Thanks
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrCloud-Shard-resize-tp4020282p4020449.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: SolrCloud: Shard resize

Posted by ku3ia <de...@gmail.com>.
Any ideas?
Thanks



--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Shard-resize-tp4020282p4020449.html
Sent from the Solr - User mailing list archive at Nabble.com.