You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Gus Heck (Jira)" <ji...@apache.org> on 2020/05/21 20:27:00 UTC
[jira] [Comment Edited] (SOLR-13749) Implement support for joining
across collections with multiple shards ( XCJF )
[ https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113517#comment-17113517 ]
Gus Heck edited comment on SOLR-13749 at 5/21/20, 8:26 PM:
-----------------------------------------------------------
Let me clarify the above... some of it is forward looking in the event that the NPE I mentioned above gets changed, or some aspect of when we do/don't encode/decode URL's gets changed, etc... or in the event that there are parameter hacking/hiding/encoding tricks I didn't think of... HTTP is just too ubiquitous, and it initiates the connection with a path string of arbitrary size... the ZK protocol is only relevant to ZK servers and there is no way (that I know of) to make the initial zk connection send a lot of data.
was (Author: gus_heck):
Let me clarify the above... some of it is forward looking in the even that the NPE I mentioned above gets changed, or some aspect of when we do/don't encode/decode URL's gets changed, etc... or in the event that there are parameter hacking/hiding/encoding tricks I didn't think of... HTTP is just too ubiquitous, and it initiates the connection with a path string of arbitrary size... the ZK protocol is only relevant to ZK servers and there is no way (that I know of) to make the initial zk connection send a lot of data.
> Implement support for joining across collections with multiple shards ( XCJF )
> ------------------------------------------------------------------------------
>
> Key: SOLR-13749
> URL: https://issues.apache.org/jira/browse/SOLR-13749
> Project: Solr
> Issue Type: New Feature
> Reporter: Kevin Watters
> Assignee: Gus Heck
> Priority: Blocker
> Fix For: 8.6
>
> Attachments: 2020-03 Smiley with ASF hat.jpeg
>
> Time Spent: 1h 40m
> Remaining Estimate: 0h
>
> This ticket includes 2 query parsers.
> The first one is the "Cross collection join filter" (XCJF) parser. This is the "Cross-collection join filter" query parser. It can do a call out to a remote collection to get a set of join keys to be used as a filter against the local collection.
> The second one is the Hash Range query parser that you can specify a field name and a hash range, the result is that only the documents that would have hashed to that range will be returned.
> This query parser will do an intersection based on join keys between 2 collections.
> The local collection is the collection that you are searching against.
> The remote collection is the collection that contains the join keys that you want to use as a filter.
> Each shard participating in the distributed request will execute a query against the remote collection. If the local collection is setup with the compositeId router to be routed on the join key field, a hash range query is applied to the remote collection query to only match the documents that contain a potential match for the documents that are in the local shard/core.
>
> Here's some vocab to help with the descriptions of the various parameters.
> ||Term||Description||
> |Local Collection|This is the main collection that is being queried.|
> |Remote Collection|This is the collection that the XCJFQuery will query to resolve the join keys.|
> |XCJFQuery|The lucene query that executes a search to get back a set of join keys from a remote collection|
> |HashRangeQuery|The lucene query that matches only the documents whose hash code on a field falls within a specified range.|
>
>
> ||Param ||Required ||Description||
> |collection|Required|The name of the external Solr collection to be queried to retrieve the set of join key values ( required )|
> |zkHost|Optional|The connection string to be used to connect to Zookeeper.
> zkHost and solrUrl are both optional parameters, and at most one of them should be specified.
> If neither of zkHost or solrUrl are specified, the local Zookeeper cluster will be used. ( optional )|
> |solrUrl|Optional|The URL of the external Solr node to be queried ( optional )|
> |from|Required|The join key field name in the external collection ( required )|
> |to|Required|The join key field name in the local collection|
> |v|See Note|The query to be executed against the external Solr collection to retrieve the set of join key values.
> Note: The original query can be passed at the end of the string or as the "v" parameter.
> It's recommended to use query parameter substitution with the "v" parameter
> to ensure no issues arise with the default query parsers.|
> |routed| |true / false. If true, the XCJF query will use each shard's hash range to determine the set of join keys to retrieve for that shard.
> This parameter improves the performance of the cross-collection join, but
> it depends on the local collection being routed by the toField. If this parameter is not specified,
> the XCJF query will try to determine the correct value automatically.|
> |ttl| |The length of time that an XCJF query in the cache will be considered valid, in seconds. Defaults to 3600 (one hour).
> The XCJF query will not be aware of changes to the remote collection, so
> if the remote collection is updated, cached XCJF queries may give inaccurate results.
> After the ttl period has expired, the XCJF query will re-execute the join against the remote collection.|
> |_All others_| |Any normal Solr parameter can also be specified as a local param.|
>
> Example Solr Config.xml changes:
>
> {{<}}{{cache}} {{name}}{{=}}{{"hash_vin"}}
> {{ }}{{class}}{{=}}{{"solr.LRUCache"}}
> {{ }}{{size}}{{=}}{{"128"}}
> {{ }}{{initialSize}}{{=}}{{"0"}}
> {{ }}{{regenerator}}{{=}}{{"solr.NoOpRegenerator"}}{{/>}}
>
> {{<}}{{queryParser}} {{name}}{{=}}{{"xcjf"}} {{class}}{{=}}{{"org.apache.solr.search.join.XCJFQueryParserPlugin"}}{{>}}
> {{ }}{{<}}{{str}} {{name}}{{=}}{{"routerField"}}{{>vin</}}{{str}}{{>}}
> {{</}}{{queryParser}}{{>}}
>
> {{<}}{{queryParser}} {{name}}{{=}}{{"hash_range"}} {{class}}{{=}}{{"org.apache.solr.search.join.HashRangeQueryParserPlugin"}} {{/>}}
>
> Example Usage:
> {{{!xcjf collection=}}{{"otherCollection"}} {{from=}}{{"fromField"}} {{to=}}{{"toField"}} {{v=}}{{"**:**"}}{{}}}
>
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org