You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Gus Heck (Jira)" <ji...@apache.org> on 2020/05/21 20:27:00 UTC

[jira] [Comment Edited] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )

    [ https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113517#comment-17113517 ] 

Gus Heck edited comment on SOLR-13749 at 5/21/20, 8:26 PM:
-----------------------------------------------------------

Let me clarify the above... some of it is forward looking in the event that the NPE I mentioned above gets changed, or some aspect of when we do/don't encode/decode URL's gets changed, etc... or in the event that there are parameter hacking/hiding/encoding tricks I didn't think of... HTTP is just too ubiquitous, and it initiates the connection with a path string of arbitrary size... the ZK protocol is only relevant to ZK servers and there is no way (that I know of) to make the initial zk connection send a lot of data.


was (Author: gus_heck):
Let me clarify the above... some of it is forward looking in the even that the NPE I mentioned above gets changed, or some aspect of when we do/don't encode/decode URL's gets changed, etc... or in the event that there are parameter hacking/hiding/encoding tricks I didn't think of... HTTP is just too ubiquitous, and it initiates the connection with a path string of arbitrary size... the ZK protocol is only relevant to ZK servers and there is no way (that I know of) to make the initial zk connection send a lot of data.

> Implement support for joining across collections with multiple shards ( XCJF )
> ------------------------------------------------------------------------------
>
>                 Key: SOLR-13749
>                 URL: https://issues.apache.org/jira/browse/SOLR-13749
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Kevin Watters
>            Assignee: Gus Heck
>            Priority: Blocker
>             Fix For: 8.6
>
>         Attachments: 2020-03 Smiley with ASF hat.jpeg
>
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> This ticket includes 2 query parsers.
> The first one is the "Cross collection join filter"  (XCJF) parser. This is the "Cross-collection join filter" query parser. It can do a call out to a remote collection to get a set of join keys to be used as a filter against the local collection.
> The second one is the Hash Range query parser that you can specify a field name and a hash range, the result is that only the documents that would have hashed to that range will be returned.
> This query parser will do an intersection based on join keys between 2 collections.
> The local collection is the collection that you are searching against.
> The remote collection is the collection that contains the join keys that you want to use as a filter.
> Each shard participating in the distributed request will execute a query against the remote collection.  If the local collection is setup with the compositeId router to be routed on the join key field, a hash range query is applied to the remote collection query to only match the documents that contain a potential match for the documents that are in the local shard/core.  
>  
> Here's some vocab to help with the descriptions of the various parameters.
> ||Term||Description||
> |Local Collection|This is the main collection that is being queried.|
> |Remote Collection|This is the collection that the XCJFQuery will query to resolve the join keys.|
> |XCJFQuery|The lucene query that executes a search to get back a set of join keys from a remote collection|
> |HashRangeQuery|The lucene query that matches only the documents whose hash code on a field falls within a specified range.|
>  
>  
> ||Param ||Required ||Description||
> |collection|Required|The name of the external Solr collection to be queried to retrieve the set of join key values ( required )|
> |zkHost|Optional|The connection string to be used to connect to Zookeeper.  
> zkHost and solrUrl are both optional parameters, and at most one of them should be specified.  
> If neither of zkHost or solrUrl are specified, the local Zookeeper cluster will be used. ( optional )|
> |solrUrl|Optional|The URL of the external Solr node to be queried ( optional )|
> |from|Required|The join key field name in the external collection ( required )|
> |to|Required|The join key field name in the local collection|
> |v|See Note|The query to be executed against the external Solr collection to retrieve the set of join key values.  
> Note:  The original query can be passed at the end of the string or as the "v" parameter.  
> It's recommended to use query parameter substitution with the "v" parameter 
> to ensure no issues arise with the default query parsers.|
> |routed| |true / false.  If true, the XCJF query will use each shard's hash range to determine the set of join keys to retrieve for that shard.
> This parameter improves the performance of the cross-collection join, but 
> it depends on the local collection being routed by the toField.  If this parameter is not specified, 
> the XCJF query will try to determine the correct value automatically.|
> |ttl| |The length of time that an XCJF query in the cache will be considered valid, in seconds.  Defaults to 3600 (one hour).  
> The XCJF query will not be aware of changes to the remote collection, so 
> if the remote collection is updated, cached XCJF queries may give inaccurate results.  
> After the ttl period has expired, the XCJF query will re-execute the join against the remote collection.|
> |_All others_| |Any normal Solr parameter can also be specified as a local param.|
>  
> Example Solr Config.xml changes:
>  
>  {{<}}{{cache}} {{name}}{{=}}{{"hash_vin"}}
>  {{       }}{{class}}{{=}}{{"solr.LRUCache"}}
>  {{       }}{{size}}{{=}}{{"128"}}
>  {{       }}{{initialSize}}{{=}}{{"0"}}
>  {{       }}{{regenerator}}{{=}}{{"solr.NoOpRegenerator"}}{{/>}}
>   
>  {{<}}{{queryParser}} {{name}}{{=}}{{"xcjf"}} {{class}}{{=}}{{"org.apache.solr.search.join.XCJFQueryParserPlugin"}}{{>}}
>  {{  }}{{<}}{{str}} {{name}}{{=}}{{"routerField"}}{{>vin</}}{{str}}{{>}}
>  {{</}}{{queryParser}}{{>}}
>   
>  {{<}}{{queryParser}} {{name}}{{=}}{{"hash_range"}} {{class}}{{=}}{{"org.apache.solr.search.join.HashRangeQueryParserPlugin"}} {{/>}}
>   
> Example Usage:
> {{{!xcjf collection=}}{{"otherCollection"}} {{from=}}{{"fromField"}} {{to=}}{{"toField"}} {{v=}}{{"**:**"}}{{}}}
>   
>   
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org