You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael Garski (JIRA)" <ji...@apache.org> on 2012/05/21 02:45:41 UTC
[jira] [Commented] (SOLR-2592) Pluggable shard lookup mechanism for SolrCloud

    [ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13279906#comment-13279906 ] 

Michael Garski commented on SOLR-2592:
--------------------------------------

I've been tinkering with this a bit more and discovered that the real-time get component requires the ability to hash on the unique id of a document to determine the shard to forward the request to, requiring any hashing implementation to support hashing based off the unique id of a document. To account for this any custom hashing based on an arbitrary document property or other value would have to hash to the same value as the unique document id. 

Looking at what representation of the unique id to hash on, by hashing based off of the string value rather than the indexed value, solrj and any other smart client can hash it and submit requests directly to the proper shard.  The method oas.common.util.ByteUtils.UTF16toUTF8 would serve the purpose of generating the bytes for the murmur hash on both client and server.

On both the client and server side updates and deletes by id are simple enough to hash, queries would require an additional parameter to specify a value to be hashed to determine the shard to forward the request to. It would also be a good idea to be able to optionally specify a value to hash on to direct the query to a particular shard rather than broadcast to all shards.

I'm going to rework what I had in the previous patch to account for this.

                
> Pluggable shard lookup mechanism for SolrCloud
> ----------------------------------------------
>
>                 Key: SOLR-2592
>                 URL: https://issues.apache.org/jira/browse/SOLR-2592
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrCloud
>    Affects Versions: 4.0
>            Reporter: Noble Paul
>         Attachments: pluggable_sharding.patch
>
>
> If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org