You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2011/08/18 15:55:09 UTC

[Solr Wiki] Update of "NewSolrCloudDesign" by YonikSeeley

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "NewSolrCloudDesign" page has been changed by YonikSeeley:
http://wiki.apache.org/solr/NewSolrCloudDesign?action=diff&rev1=5&rev2=6

Comment:
no need for max_hash_value

  The hash is added as an indexed field in the doc and it is immutable. This may also be used during an index split
  
  The hash function is pluggable. It can accept a document and return a consistent & positive integer hash value. The system provides a default hash function which uses the content of a configured, required & immutable field (default is unique_key field) to calculate hash values.
+ 
+ === Using full hash range ===
+ Alternatively, there need not be any max_hash_value - the full 32 bits of the hash can be used since each shard will have a range of hash values anyway.
+ Avoiding a configurable max_hash_value makes things easier on clients wanting related hash values next to each other.  For example, in an email search application, one could construct a hashcode as follows: {{{
+ (hash(user_id)<<24) | (hash(message_id)>>>8)
+ }}}
+ By deriving the top 8 bits of the hashcode from the user_id, it guarantees that any users emails are in the same 256th portion of the cluster.  At search time, this information can be used to only query that portion of the cluster.
  
  == Shard Assignment ==