You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "David Smiley (Jira)" <ji...@apache.org> on 2020/03/06 22:13:00 UTC

[jira] [Updated] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )

     [ https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Smiley updated SOLR-13749:
--------------------------------
    Priority: Blocker  (was: Major)

I understand your point of view but they don't sway my opinion: Params can differ based on the method.  The whitelist thing is optional (only applies to multi-cluster).  

With 8.5 out soon and if nobody has time to develop this further at the moment, I think we have to _do something_ here to prevent a back-compat concern:
 * Option A: document in an obvious way (i.e. some call-out box) that the name & parameters will likely change without back-compat.  In the project we sometimes throw out the word "experimental" a lot but here I'm just claiming the syntax/way it's invoked will change; I'm making no quality judgement on what's underneath.
 * Option B: comment it out making it invisible
 * Option C: remove from 8x/8.5; leave in master

Please pick do the one that suits you Gus.  They are all fine with me.

BTW that whitelist thing reminds me heavily of the _existing_ "shardsWhitelist" feature (see distributed-requests.adoc).  It's not clear to me if we need a new mechanism here.

> Implement support for joining across collections with multiple shards ( XCJF )
> ------------------------------------------------------------------------------
>
>                 Key: SOLR-13749
>                 URL: https://issues.apache.org/jira/browse/SOLR-13749
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Kevin Watters
>            Assignee: Gus Heck
>            Priority: Blocker
>             Fix For: 8.5
>
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This ticket includes 2 query parsers.
> The first one is the "Cross collection join filter"  (XCJF) parser. This is the "Cross-collection join filter" query parser. It can do a call out to a remote collection to get a set of join keys to be used as a filter against the local collection.
> The second one is the Hash Range query parser that you can specify a field name and a hash range, the result is that only the documents that would have hashed to that range will be returned.
> This query parser will do an intersection based on join keys between 2 collections.
> The local collection is the collection that you are searching against.
> The remote collection is the collection that contains the join keys that you want to use as a filter.
> Each shard participating in the distributed request will execute a query against the remote collection.  If the local collection is setup with the compositeId router to be routed on the join key field, a hash range query is applied to the remote collection query to only match the documents that contain a potential match for the documents that are in the local shard/core.  
>  
> Here's some vocab to help with the descriptions of the various parameters.
> ||Term||Description||
> |Local Collection|This is the main collection that is being queried.|
> |Remote Collection|This is the collection that the XCJFQuery will query to resolve the join keys.|
> |XCJFQuery|The lucene query that executes a search to get back a set of join keys from a remote collection|
> |HashRangeQuery|The lucene query that matches only the documents whose hash code on a field falls within a specified range.|
>  
>  
> ||Param ||Required ||Description||
> |collection|Required|The name of the external Solr collection to be queried to retrieve the set of join key values ( required )|
> |zkHost|Optional|The connection string to be used to connect to Zookeeper.  
> zkHost and solrUrl are both optional parameters, and at most one of them should be specified.  
> If neither of zkHost or solrUrl are specified, the local Zookeeper cluster will be used. ( optional )|
> |solrUrl|Optional|The URL of the external Solr node to be queried ( optional )|
> |from|Required|The join key field name in the external collection ( required )|
> |to|Required|The join key field name in the local collection|
> |v|See Note|The query to be executed against the external Solr collection to retrieve the set of join key values.  
> Note:  The original query can be passed at the end of the string or as the "v" parameter.  
> It's recommended to use query parameter substitution with the "v" parameter 
> to ensure no issues arise with the default query parsers.|
> |routed| |true / false.  If true, the XCJF query will use each shard's hash range to determine the set of join keys to retrieve for that shard.
> This parameter improves the performance of the cross-collection join, but 
> it depends on the local collection being routed by the toField.  If this parameter is not specified, 
> the XCJF query will try to determine the correct value automatically.|
> |ttl| |The length of time that an XCJF query in the cache will be considered valid, in seconds.  Defaults to 3600 (one hour).  
> The XCJF query will not be aware of changes to the remote collection, so 
> if the remote collection is updated, cached XCJF queries may give inaccurate results.  
> After the ttl period has expired, the XCJF query will re-execute the join against the remote collection.|
> |_All others_| |Any normal Solr parameter can also be specified as a local param.|
>  
> Example Solr Config.xml changes:
>  
>  {{<}}{{cache}} {{name}}{{=}}{{"hash_vin"}}
>  {{       }}{{class}}{{=}}{{"solr.LRUCache"}}
>  {{       }}{{size}}{{=}}{{"128"}}
>  {{       }}{{initialSize}}{{=}}{{"0"}}
>  {{       }}{{regenerator}}{{=}}{{"solr.NoOpRegenerator"}}{{/>}}
>   
>  {{<}}{{queryParser}} {{name}}{{=}}{{"xcjf"}} {{class}}{{=}}{{"org.apache.solr.search.join.XCJFQueryParserPlugin"}}{{>}}
>  {{  }}{{<}}{{str}} {{name}}{{=}}{{"routerField"}}{{>vin</}}{{str}}{{>}}
>  {{</}}{{queryParser}}{{>}}
>   
>  {{<}}{{queryParser}} {{name}}{{=}}{{"hash_range"}} {{class}}{{=}}{{"org.apache.solr.search.join.HashRangeQueryParserPlugin"}} {{/>}}
>   
> Example Usage:
> {{{!xcjf collection=}}{{"otherCollection"}} {{from=}}{{"fromField"}} {{to=}}{{"toField"}} {{v=}}{{"**:**"}}{{}}}
>   
>   
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org