You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Will Miller <wi...@ospgroup.com> on 2014/12/15 20:39:25 UTC

All documents indexed into the same shard despite different prefix in id field

?I have a SolrCloud cluster with two servers and I created a collection using two shards with this command:


http://server1:8983/admin/collections?action=CREATE&name=products&numShards=2


When I look at clusterstate.json in the Solr admin page I can see the collection is correctly places across the two servers:


  "products":{
    "replicationFactor":"1",
    "shards":{
      "shard1":{
        "range":"80000000-ffffffff",
        "state":"active",
        "replicas":{"core_node2":{
            "core":"products_shard1_replica1",
            "base_url":"http://10.0.0.5:8983/solr",
            "node_name":"10.0.0.5:8983_solr",
            "state":"active",
            "leader":"true"}}},
      "shard2":{
        "range":"0-7fffffff",
        "state":"active",
        "replicas":{"core_node1":{
            "core":"products_shard2_replica1",
            "base_url":"http://10.0.0.6:8983/solr",
            "node_name":"10.0.0.6:8983_solr",
            "state":"active",
            "leader":"true"}}}},
    "router":{"name":"compositeId"},
    "maxShardsPerNode":"1",
    "autoAddReplicas":"false"}


However when I index products with different prefixes in the id field, all of the documents go into the same shard. This is seen when querying for *:*:


"shards.info":{
    "http://10.0.0.6:8983/solr/products_shard2_replica1/":{
      "time":11,
      "shardAddress":"http://10.0.0.6:8983/solr/products_shard2_replica1/",
      "numFound":0,
      "maxScore":"NaN"},
    "http://10.0.0.5:8983/solr/products_shard1_replica1/":{
      "time":11,
      "shardAddress":"http://10.0.0.5:8983/solr/products_shard1_replica1/",
      "numFound":230,
      "maxScore":"NaN"}}


There were 230 documents in the set I indexed and there were 3 different prefixes (RM!, WW! and BH!) but all were routed into the same shard. Is there anything I can do to debug this further?


Thanks,

Will

Re: All documents indexed into the same shard despite different prefix in id field

Posted by Will Miller <wi...@ospgroup.com>.

Thanks Chris...

I changed the test and assigned a unique number to each document as the prefix and the documents did index across the two shards. I then increased the data set to include documents from all 6 expected shard keys and I do see them being indexed across both shards. I was just lucky to have started testing with 3 different prefixes that happened to index into the same shard. 

-Will

________________________________________
From: Chris Hostetter <ho...@fucit.org>
Sent: Monday, December 15, 2014 6:45 PM
To: solr-user@lucene.apache.org
Subject: Re: All documents indexed into the same shard despite different prefix in id field

: ?I have a SolrCloud cluster with two servers and I created a collection using two shards with this command:
        ...
: There were 230 documents in the set I indexed and there were 3 different prefixes (RM!, WW! and BH!) but all were routed into the same shard. Is there anything I can do to debug this further?

I'm not really a math expert but...

If you have N (2) shards, and a single prefix ("RM") there is a 100%
chance that that prefix will hash into 1 of those N=2 shards.

For a 2nd prefix ("WW") there is a 1/N (1/2) chance that it will hash into
the same shard as your first prefix ("RM").

Likewise, there is a 1/N (1/2) chance that any other prefix ("BH") will
hash into the same hard as your first prefix ("RM").

Which means there is a 25% (1/2 * 1/2 = 1/4) chance tha 3 randomly
selected prefixes will all hash to the same shard.

(In general, if you have N shards, and P # of unique prefixes, then the
odds that they all wind up in the same shard is going to be:
"(1/N)**(P-1)")

So i suspect you just go unlucky with the 3 prefixes you happen to try in
your small test.

-Hoss
http://www.lucidworks.com/

Re: All documents indexed into the same shard despite different prefix in id field

Posted by Chris Hostetter <ho...@fucit.org>.

: ?I have a SolrCloud cluster with two servers and I created a collection using two shards with this command:
	...
: There were 230 documents in the set I indexed and there were 3 different prefixes (RM!, WW! and BH!) but all were routed into the same shard. Is there anything I can do to debug this further?

I'm not really a math expert but...

If you have N (2) shards, and a single prefix ("RM") there is a 100% 
chance that that prefix will hash into 1 of those N=2 shards.

For a 2nd prefix ("WW") there is a 1/N (1/2) chance that it will hash into 
the same shard as your first prefix ("RM").

Likewise, there is a 1/N (1/2) chance that any other prefix ("BH") will 
hash into the same hard as your first prefix ("RM").

Which means there is a 25% (1/2 * 1/2 = 1/4) chance tha 3 randomly 
selected prefixes will all hash to the same shard.

(In general, if you have N shards, and P # of unique prefixes, then the 
odds that they all wind up in the same shard is going to be: 
"(1/N)**(P-1)")

So i suspect you just go unlucky with the 3 prefixes you happen to try in 
your small test.






-Hoss
http://www.lucidworks.com/