You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by vsilgalis <vs...@gmail.com> on 2013/04/03 18:31:23 UTC

SolrCloud not distributing documents across shards

So we have 3 servers in a SolrCloud cluster.

<http://lucene.472066.n3.nabble.com/file/n4053506/Cloud1.png> 

We have 2 shards for our collection (classic_bt) with a shard on each of the
first two servers as the picture shows. The third server has replicas of the
first 2 shards just for high availability purposes.

Now if we go into counts we have the following information:
shard1 - Numdocs - 33010
shard2 - Numdocs - 85934

Both shards replicate to the third server with no issues.

For some reason the documents aren't distributing across the shards, nothing
in the logs indicates a problem but I'm not sure what we should be looking
for.

Let me know if you need more information.



--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-not-distributing-documents-across-shards-tp4053506.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud not distributing documents across shards

Posted by Michael Della Bitta <mi...@appinions.com>.
Thank you for all your hard work!

Michael Della Bitta

------------------------------------------------
Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Wed, Apr 3, 2013 at 6:08 PM, Mark Miller <ma...@gmail.com> wrote:
>
> On Apr 3, 2013, at 5:53 PM, Michael Della Bitta <mi...@appinions.com> wrote:
>
>> From what I can tell, the Collections API has been hardened
>> significantly since 4.2
>
> I did a lot of work here for 4.2.1 - there was a lot to improve. Hopefully there is much less now, but if anyone finds anything, I'll fix any JIRA's.
>
> - Mark

Re: SolrCloud not distributing documents across shards

Posted by Mark Miller <ma...@gmail.com>.
On Apr 3, 2013, at 5:53 PM, Michael Della Bitta <mi...@appinions.com> wrote:

> From what I can tell, the Collections API has been hardened
> significantly since 4.2 

I did a lot of work here for 4.2.1 - there was a lot to improve. Hopefully there is much less now, but if anyone finds anything, I'll fix any JIRA's.

- Mark

Re: SolrCloud not distributing documents across shards

Posted by Michael Della Bitta <mi...@appinions.com>.
>From what I can tell, the Collections API has been hardened
significantly since 4.2 and now will refuse to create a collection if
you give it something ambiguous to do. So if you upgrade to 4.2,
things will become more safe.

But overall I'd find a way of using the Collections API that works and
stick with it.

Michael Della Bitta

------------------------------------------------
Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Wed, Apr 3, 2013 at 5:01 PM, vsilgalis <vs...@gmail.com> wrote:
> Michael Della Bitta-2 wrote
>> If you can work with a clean state, I'd turn off all your shards,
>> clear out the Solr directories in Zookeeper, reset solr.xml for each
>> of your shards, upgrade to the latest version of Solr, and turn
>> everything back on again. Then upload config, recreate your
>> collection, etc.
>>
>> I do it like this, but YMMV:
>>
>> curl
>> "http://localhost:8080/solr/admin/collections?action=CREATE&name=$name&numShards=$num&collection.configName=$config-name"
>>
>>
>> Michael Della Bitta
>
>
> Looks like that was the problem.  Thanks, much appreciated.
>
> Is there any insight into specifically what I should look into for
> preventing this in the future?
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-not-distributing-documents-across-shards-tp4053506p4053622.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud not distributing documents across shards

Posted by vsilgalis <vs...@gmail.com>.
Michael Della Bitta-2 wrote
> If you can work with a clean state, I'd turn off all your shards,
> clear out the Solr directories in Zookeeper, reset solr.xml for each
> of your shards, upgrade to the latest version of Solr, and turn
> everything back on again. Then upload config, recreate your
> collection, etc.
> 
> I do it like this, but YMMV:
> 
> curl
> "http://localhost:8080/solr/admin/collections?action=CREATE&name=$name&numShards=$num&collection.configName=$config-name"
> 
> 
> Michael Della Bitta


Looks like that was the problem.  Thanks, much appreciated.

Is there any insight into specifically what I should look into for
preventing this in the future?



--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-not-distributing-documents-across-shards-tp4053506p4053622.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud not distributing documents across shards

Posted by Michael Della Bitta <mi...@appinions.com>.
If you can work with a clean state, I'd turn off all your shards,
clear out the Solr directories in Zookeeper, reset solr.xml for each
of your shards, upgrade to the latest version of Solr, and turn
everything back on again. Then upload config, recreate your
collection, etc.

I do it like this, but YMMV:

curl "http://localhost:8080/solr/admin/collections?action=CREATE&name=$name&numShards=$num&collection.configName=$config-name"


Michael Della Bitta

------------------------------------------------
Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Wed, Apr 3, 2013 at 3:40 PM, vsilgalis <vs...@gmail.com> wrote:
> Michael Della Bitta-2 wrote
>> With earlier versions of Solr Cloud, if there was any error or warning
>> when you made a collection, you likely were set up for "implicit"
>> routing which means that documents only go to the shard you're talking
>> to. What you want is "compositeId" routing, which works how you think
>> it should.
>>
>> Go into the cloud GUI and look at clusterstate.json in the Tree tab.
>> You should see the routing algorithm it's using in that file.
>>
>> Michael Della Bitta
>
> That sounds like my huckleberry.
>
>  "router":"implicit"
>
> Is in the collection info in the clusterstate.json
>
> How do I fix this? Just wipe the clusterstate.json?
>
> Thanks for your help.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-not-distributing-documents-across-shards-tp4053506p4053593.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud not distributing documents across shards

Posted by vsilgalis <vs...@gmail.com>.
Michael Della Bitta-2 wrote
> With earlier versions of Solr Cloud, if there was any error or warning
> when you made a collection, you likely were set up for "implicit"
> routing which means that documents only go to the shard you're talking
> to. What you want is "compositeId" routing, which works how you think
> it should.
> 
> Go into the cloud GUI and look at clusterstate.json in the Tree tab.
> You should see the routing algorithm it's using in that file.
> 
> Michael Della Bitta

That sounds like my huckleberry.

 "router":"implicit"

Is in the collection info in the clusterstate.json

How do I fix this? Just wipe the clusterstate.json?

Thanks for your help.



--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-not-distributing-documents-across-shards-tp4053506p4053593.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud not distributing documents across shards

Posted by Michael Della Bitta <mi...@appinions.com>.
With earlier versions of Solr Cloud, if there was any error or warning
when you made a collection, you likely were set up for "implicit"
routing which means that documents only go to the shard you're talking
to. What you want is "compositeId" routing, which works how you think
it should.

Go into the cloud GUI and look at clusterstate.json in the Tree tab.
You should see the routing algorithm it's using in that file.

Michael Della Bitta

------------------------------------------------
Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Wed, Apr 3, 2013 at 2:59 PM, vsilgalis <vs...@gmail.com> wrote:
> Chris Hostetter-3 wrote
>> I'm not familiar with the details, but i've seen miller respond to a
>> similar question with reference to the issue of not explicitly specifying
>> numShards when creating your collections...
>>
>> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201303.mbox/%
>
>> 3C0AA0B422-F1DE-4915-B602-53CB1849204A@
>
>> %3E
>>
>>
>> -Hoss
>
> Well theoretically we are okay there.
>
> The commands we run to create our collection are as follow (note the
> numShards being specified):
> http://server01/solr/admin/cores?action=CREATE&name=classic_bt&collection=classic_bt&numShards=2&instanceDir=instances/basistech&dataDir=/opt/index/classic_bt&config=solrconfig.xml&schema=schema.xml&collection.configName=classic_bt
>
> http://server02/solr/admin/cores?action=CREATE&name=classic_bt&collection=classic_bt&numShards=2&instanceDir=instances/basistech&dataDir=/opt/index/classic_bt&config=solrconfig.xml&schema=schema.xml&collection.configName=classic_bt
>
> http://server03/solr/admin/cores?action=CREATE&name=classic_bt_shard1&collection=classic_bt&numShards=2&instanceDir=instances/basistech&dataDir=/opt/index/classic_bt_shard1&config=solrconfig.xml&schema=schema.xml&collection.configName=classic_bt&shard=shard1
>
> http://server03/solr/admin/cores?action=CREATE&name=classic_bt_shard2&collection=classic_bt&numShards=2&instanceDir=instances/basistech&dataDir=/opt/index/classic_bt_shard2&config=solrconfig.xml&schema=schema.xml&collection.configName=classic_bt&shard=shard2
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-not-distributing-documents-across-shards-tp4053506p4053581.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud not distributing documents across shards

Posted by vsilgalis <vs...@gmail.com>.
Chris Hostetter-3 wrote
> I'm not familiar with the details, but i've seen miller respond to a 
> similar question with reference to the issue of not explicitly specifying 
> numShards when creating your collections...
> 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201303.mbox/%

> 3C0AA0B422-F1DE-4915-B602-53CB1849204A@

> %3E
> 
> 
> -Hoss

Well theoretically we are okay there.

The commands we run to create our collection are as follow (note the
numShards being specified):
http://server01/solr/admin/cores?action=CREATE&name=classic_bt&collection=classic_bt&numShards=2&instanceDir=instances/basistech&dataDir=/opt/index/classic_bt&config=solrconfig.xml&schema=schema.xml&collection.configName=classic_bt

http://server02/solr/admin/cores?action=CREATE&name=classic_bt&collection=classic_bt&numShards=2&instanceDir=instances/basistech&dataDir=/opt/index/classic_bt&config=solrconfig.xml&schema=schema.xml&collection.configName=classic_bt

http://server03/solr/admin/cores?action=CREATE&name=classic_bt_shard1&collection=classic_bt&numShards=2&instanceDir=instances/basistech&dataDir=/opt/index/classic_bt_shard1&config=solrconfig.xml&schema=schema.xml&collection.configName=classic_bt&shard=shard1

http://server03/solr/admin/cores?action=CREATE&name=classic_bt_shard2&collection=classic_bt&numShards=2&instanceDir=instances/basistech&dataDir=/opt/index/classic_bt_shard2&config=solrconfig.xml&schema=schema.xml&collection.configName=classic_bt&shard=shard2




--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-not-distributing-documents-across-shards-tp4053506p4053581.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud not distributing documents across shards

Posted by Chris Hostetter <ho...@fucit.org>.
: So we indexed a set of 33010 documents on server01 which are now in shard1.
: And we kicked off a set of 85934 documents on server02 which are now in
: shard2 (as tests).  In my understanding of how SolrCloud works, the
: documents should be distributed across the shards in the collection.  Now I

I'm not familiar with the details, but i've seen miller respond to a 
similar question with reference to the issue of not explicitly specifying 
numShards when creating your collections...

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201303.mbox/%3C0AA0B422-F1DE-4915-B602-53CB1849204A@gmail.com%3E


-Hoss

Re: SolrCloud not distributing documents across shards

Posted by vsilgalis <vs...@gmail.com>.
Michael Della Bitta-2 wrote
> Hello Vytenis,
> 
> What exactly do you mean by "aren't distributing across the shards"?
> Do you mean that POSTs against the server for shard 1 never end up
> resulting in documents saved in shard 2?

So we indexed a set of 33010 documents on server01 which are now in shard1.
And we kicked off a set of 85934 documents on server02 which are now in
shard2 (as tests).  In my understanding of how SolrCloud works, the
documents should be distributed across the shards in the collection.  Now I
have seen this work before in my environment.  Not sure what I need to look
at to ensure this distribution.

Just as a FYI, this is SOLR 4.1



--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-not-distributing-documents-across-shards-tp4053506p4053563.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud not distributing documents across shards

Posted by Michael Della Bitta <mi...@appinions.com>.
Hello Vytenis,

What exactly do you mean by "aren't distributing across the shards"?
Do you mean that POSTs against the server for shard 1 never end up
resulting in documents saved in shard 2?

Michael Della Bitta

------------------------------------------------
Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Wed, Apr 3, 2013 at 12:31 PM, vsilgalis <vs...@gmail.com> wrote:
> So we have 3 servers in a SolrCloud cluster.
>
> <http://lucene.472066.n3.nabble.com/file/n4053506/Cloud1.png>
>
> We have 2 shards for our collection (classic_bt) with a shard on each of the
> first two servers as the picture shows. The third server has replicas of the
> first 2 shards just for high availability purposes.
>
> Now if we go into counts we have the following information:
> shard1 - Numdocs - 33010
> shard2 - Numdocs - 85934
>
> Both shards replicate to the third server with no issues.
>
> For some reason the documents aren't distributing across the shards, nothing
> in the logs indicates a problem but I'm not sure what we should be looking
> for.
>
> Let me know if you need more information.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-not-distributing-documents-across-shards-tp4053506.html
> Sent from the Solr - User mailing list archive at Nabble.com.