You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael Garski (JIRA)" <ji...@apache.org> on 2012/05/15 21:31:10 UTC

[jira] [Updated] (SOLR-2592) Pluggable shard lookup mechanism for SolrCloud

     [ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Garski updated SOLR-2592:
---------------------------------

    Attachment: pluggable_sharding.patch

This patch is intended to be a cocktail napkin sketch to get feedback (as such forwarding queries to the appropriate shards is not yet implemented). I can iterate on this as needed.

The attached patch is a very simple implementation of pluggable sharding which works as follows:

1. Configure a ShardingStrategy in SolrConfig under config/shardingStrategy, if none is configured the default implementation of sharding on the document's unique id will be performed.
{code:xml} 
     <shardingStrategy class="solr.UniqueIdShardingStrategy"/>
{code} 

2. The ShardingStrategy accepts an AddUpdateCommand, DeleteUpdateCommand, or SolrParams to return a BytesRef that is hashed to determine the destination slice.

3. I have only implemented updates at this time, queries are still distributed across all shards in the collection. I have added a param to common.params.ShardParams for a 'shard.keys' parameter that would contain the value(s) which is(are) to be hashed to determine the shard(s) which is(are) to be queried within the the HttpShardHandler.checkDistributed method. if 'shard.keys' does not have a value the query would be distributed across all shards in the collection.

Notes:

There are no unit tests yet however all existing tests pass. 

I am not quite sure about the configuration location within solr config, however as sharding is used by both update and search requests placing it in the udpateHandler and (potentially multiple) requestHandler sections would require a duplication of the same information in the solr config for what I believe is more of a collection-wide configuration.

As hashing currently requires the lucene.util.BytesRef class the solrj client can not currently hash the request to send the request to a specific node without having solrj add a dependency on lucene core - something that is most likely not desired.  Additionally, hashing on a unique id also requires access to the schema as well to determine the field that contains the unique id. Are there any thoughts on how to alter the hashing to remove these dependencies and allow for solrj to be a 'smart' client that submits requests directly to nodes that contain the data?

How would solrj work when multiple updates are included in the request that belong to different shards? Send the request to one of the nodes and let the server distribute them to the proper nodes? Perform concurrent requests to the specific nodes?


                
> Pluggable shard lookup mechanism for SolrCloud
> ----------------------------------------------
>
>                 Key: SOLR-2592
>                 URL: https://issues.apache.org/jira/browse/SOLR-2592
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrCloud
>    Affects Versions: 4.0
>            Reporter: Noble Paul
>         Attachments: pluggable_sharding.patch
>
>
> If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org