You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Paolo Cappuccini (JIRA)" <ji...@apache.org> on 2015/03/16 20:35:38 UTC

[jira] [Comment Edited] (SOLR-7247) sliceHash for compositeIdRouter is not coherent with routing

    [ https://issues.apache.org/jira/browse/SOLR-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363757#comment-14363757 ] 

Paolo Cappuccini edited comment on SOLR-7247 at 3/16/15 7:34 PM:
-----------------------------------------------------------------

Thanks Shalin! 
I finally understood better splitting behaviour.
I did further investigation and i found the real reason of my problems.

After splitting i have obvsiouly new distribution of docs in shards.
The reason because i didn't find documents is in RealTimeGetComponent.java (line 366) :

Slice slice = coll.getRouter().getTargetSlice(id, null, params, coll);

In this case "nobody" consider routeField and it should be impossible to consider : at that time is not possible to get the value of route field.

Also the sliceHash function in CompositeIdRouter doesn't consider _route_ field in params. So the document is lost and passing explicit "\_route_" field is not useful.

Around same behaviour is in DistributedUpdateProcessor in case of "processDelete".

The behaviour is so strange that perhaps i am completely wrong!!!!!!

I think that CompositeIdRouter.sliceHash sliceHash could have explicit overloads to hash by "doc"/"collection" or hash by "value" (like in IndexSplitter")

getTargetSlice itself should have same overloads (actually it has same ambigous signature then sliceHash ).

RealtimeGetComponent can only "think" by id (and not by routeField) so it should consider all active slices if routeField is specified for collection; a good optimization for these case could be to consider "\_route_" param to route specific shard.

About "processDelete" any solution look very complicate but in general, if i'm not wrong, routeField break something.



was (Author: cappuccini):
Thanks Shalin! 
I finally understood better splitting behaviour.
I did further investigation and i found the real reason of my problems.

After splitting i have obvsiouly new distribution of docs in shards.
The reason because i didn't find documents is in RealTimeGetComponent.java (line 366) :

Slice slice = coll.getRouter().getTargetSlice(id, null, params, coll);

In this case "nobody" consider routeField and it should be impossible to consider at that time is not possible to get the value of route field.

Also the sliceHash function in CompositeIdRouter doesn't consider _route_ field in params. So the document is lost and passing explicit _route_ field is not useful.

Around same behaviour is in DsitributedUpdateProcessor in case of "processDelete".

The behaviour is so strange that perhaps i am completely wrong!!!!!!

I think that CompositeIdRouter.sliceHash sliceHash could have explicit overloads to hash by "doc"/"collection" or hash by "value" (like in IndexSplitter")

getTargetSlice itself should have same overloads (actually it has same ambigous signature then sliceHash ).

RealtimeGetComponent can only "think" by id (and not by routeField) so it should consider all active slices if routeField is specified; a good optimization for these case could be to consider "_route_" param to route specific shard.

About "processDelete" any solution look very complicate but in general, if i'm not wrong, routeField break something.


> sliceHash for compositeIdRouter is not coherent with routing
> ------------------------------------------------------------
>
>                 Key: SOLR-7247
>                 URL: https://issues.apache.org/jira/browse/SOLR-7247
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 4.10.3
>            Reporter: Paolo Cappuccini
>
> in CompositeIdRouter the function sliceHash check routeField configured for collection.
> This make me to guess that intended behaviour is manage alternative field to  id field to hash documents.
> But the signature of this method is very general ( can take id, doc or params) and it is used in different ways from different functionality.
> In my opinion it should have overloads instead of a weak internal logic. One overload with "doc" and "collection" and another one with "id" , "params" and "collections".
> In any case , if "\_route_" is not available by "params" , "collection" should be mandatory and in case of RouteField, also "doc" should be mandatory.
> This will break SplitIndex but it will save coherence of data.
> If i configure routeField i noticed that is broken the DeleteCommand (this pass to sliceHash only "id" and "params" ) and SolrIndexSplitter ( this pass only "id" )
> It should be forbidden to specify RouteField to compositeIdRouter or implements related functionality to make possible to hash documents based on RouteField.
> in case of DeleteCommand command the workaround is to specify "_route_" param in request but in case of Index Splitting is not possible any workaround.
> In this case it should be passed entire document during splitting ("doc" parameter") or build params with proper "\_route_" parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org