You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Will Miller <wi...@ospgroup.com> on 2014/12/15 20:39:25 UTC
All documents indexed into the same shard despite different prefix in
id field
?I have a SolrCloud cluster with two servers and I created a collection using two shards with this command:
http://server1:8983/admin/collections?action=CREATE&name=products&numShards=2
When I look at clusterstate.json in the Solr admin page I can see the collection is correctly places across the two servers:
"products":{
"replicationFactor":"1",
"shards":{
"shard1":{
"range":"80000000-ffffffff",
"state":"active",
"replicas":{"core_node2":{
"core":"products_shard1_replica1",
"base_url":"http://10.0.0.5:8983/solr",
"node_name":"10.0.0.5:8983_solr",
"state":"active",
"leader":"true"}}},
"shard2":{
"range":"0-7fffffff",
"state":"active",
"replicas":{"core_node1":{
"core":"products_shard2_replica1",
"base_url":"http://10.0.0.6:8983/solr",
"node_name":"10.0.0.6:8983_solr",
"state":"active",
"leader":"true"}}}},
"router":{"name":"compositeId"},
"maxShardsPerNode":"1",
"autoAddReplicas":"false"}
However when I index products with different prefixes in the id field, all of the documents go into the same shard. This is seen when querying for *:*:
"shards.info":{
"http://10.0.0.6:8983/solr/products_shard2_replica1/":{
"time":11,
"shardAddress":"http://10.0.0.6:8983/solr/products_shard2_replica1/",
"numFound":0,
"maxScore":"NaN"},
"http://10.0.0.5:8983/solr/products_shard1_replica1/":{
"time":11,
"shardAddress":"http://10.0.0.5:8983/solr/products_shard1_replica1/",
"numFound":230,
"maxScore":"NaN"}}
There were 230 documents in the set I indexed and there were 3 different prefixes (RM!, WW! and BH!) but all were routed into the same shard. Is there anything I can do to debug this further?
Thanks,
Will
Re: All documents indexed into the same shard despite different
prefix in id field
Posted by Will Miller <wi...@ospgroup.com>.
Thanks Chris...
I changed the test and assigned a unique number to each document as the prefix and the documents did index across the two shards. I then increased the data set to include documents from all 6 expected shard keys and I do see them being indexed across both shards. I was just lucky to have started testing with 3 different prefixes that happened to index into the same shard.
-Will
________________________________________
From: Chris Hostetter <ho...@fucit.org>
Sent: Monday, December 15, 2014 6:45 PM
To: solr-user@lucene.apache.org
Subject: Re: All documents indexed into the same shard despite different prefix in id field
: ?I have a SolrCloud cluster with two servers and I created a collection using two shards with this command:
...
: There were 230 documents in the set I indexed and there were 3 different prefixes (RM!, WW! and BH!) but all were routed into the same shard. Is there anything I can do to debug this further?
I'm not really a math expert but...
If you have N (2) shards, and a single prefix ("RM") there is a 100%
chance that that prefix will hash into 1 of those N=2 shards.
For a 2nd prefix ("WW") there is a 1/N (1/2) chance that it will hash into
the same shard as your first prefix ("RM").
Likewise, there is a 1/N (1/2) chance that any other prefix ("BH") will
hash into the same hard as your first prefix ("RM").
Which means there is a 25% (1/2 * 1/2 = 1/4) chance tha 3 randomly
selected prefixes will all hash to the same shard.
(In general, if you have N shards, and P # of unique prefixes, then the
odds that they all wind up in the same shard is going to be:
"(1/N)**(P-1)")
So i suspect you just go unlucky with the 3 prefixes you happen to try in
your small test.
-Hoss
http://www.lucidworks.com/
Re: All documents indexed into the same shard despite different
prefix in id field
Posted by Chris Hostetter <ho...@fucit.org>.
: ?I have a SolrCloud cluster with two servers and I created a collection using two shards with this command:
...
: There were 230 documents in the set I indexed and there were 3 different prefixes (RM!, WW! and BH!) but all were routed into the same shard. Is there anything I can do to debug this further?
I'm not really a math expert but...
If you have N (2) shards, and a single prefix ("RM") there is a 100%
chance that that prefix will hash into 1 of those N=2 shards.
For a 2nd prefix ("WW") there is a 1/N (1/2) chance that it will hash into
the same shard as your first prefix ("RM").
Likewise, there is a 1/N (1/2) chance that any other prefix ("BH") will
hash into the same hard as your first prefix ("RM").
Which means there is a 25% (1/2 * 1/2 = 1/4) chance tha 3 randomly
selected prefixes will all hash to the same shard.
(In general, if you have N shards, and P # of unique prefixes, then the
odds that they all wind up in the same shard is going to be:
"(1/N)**(P-1)")
So i suspect you just go unlucky with the 3 prefixes you happen to try in
your small test.
-Hoss
http://www.lucidworks.com/