You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Kevin Watters (Jira)" <ji...@apache.org> on 2019/12/10 15:53:00 UTC
[jira] [Commented] (SOLR-13749) Implement support for joining
across collections with multiple shards ( XCJF )
[ https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992677#comment-16992677 ]
Kevin Watters commented on SOLR-13749:
--------------------------------------
Under the covers, this query parser does use a streaming expression to get back the full set of join keys from the remote collection.
Here's the stream creation: [https://github.com/apache/lucene-solr/pull/976/files#diff-6f5d64d0defefc8535e889677b3a7ed1R233]
> Implement support for joining across collections with multiple shards ( XCJF )
> ------------------------------------------------------------------------------
>
> Key: SOLR-13749
> URL: https://issues.apache.org/jira/browse/SOLR-13749
> Project: Solr
> Issue Type: New Feature
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Kevin Watters
> Assignee: Gus Heck
> Priority: Major
> Time Spent: 1h
> Remaining Estimate: 0h
>
> This ticket includes 2 query parsers.
> The first one is the "Cross collection join filter" (XCJF) parser. This is the "Cross-collection join filter" query parser. It can do a call out to a remote collection to get a set of join keys to be used as a filter against the local collection.
> The second one is the Hash Range query parser that you can specify a field name and a hash range, the result is that only the documents that would have hashed to that range will be returned.
> This query parser will do an intersection based on join keys between 2 collections.
> The local collection is the collection that you are searching against.
> The remote collection is the collection that contains the join keys that you want to use as a filter.
> Each shard participating in the distributed request will execute a query against the remote collection. If the local collection is setup with the compositeId router to be routed on the join key field, a hash range query is applied to the remote collection query to only match the documents that contain a potential match for the documents that are in the local shard/core.
>
> Here's some vocab to help with the descriptions of the various parameters.
> ||Term||Description||
> |Local Collection|This is the main collection that is being queried.|
> |Remote Collection|This is the collection that the XCJFQuery will query to resolve the join keys.|
> |XCJFQuery|The lucene query that executes a search to get back a set of join keys from a remote collection|
> |HashRangeQuery|The lucene query that matches only the documents whose hash code on a field falls within a specified range.|
>
>
> ||Param ||Required ||Description||
> |collection|Required|The name of the external Solr collection to be queried to retrieve the set of join key values ( required )|
> |zkHost|Optional|The connection string to be used to connect to Zookeeper.
> zkHost and solrUrl are both optional parameters, and at most one of them should be specified.
> If neither of zkHost or solrUrl are specified, the local Zookeeper cluster will be used. ( optional )|
> |solrUrl|Optional|The URL of the external Solr node to be queried ( optional )|
> |from|Required|The join key field name in the external collection ( required )|
> |to|Required|The join key field name in the local collection|
> |v|See Note|The query to be executed against the external Solr collection to retrieve the set of join key values.
> Note: The original query can be passed at the end of the string or as the "v" parameter.
> It's recommended to use query parameter substitution with the "v" parameter
> to ensure no issues arise with the default query parsers.|
> |routed| |true / false. If true, the XCJF query will use each shard's hash range to determine the set of join keys to retrieve for that shard.
> This parameter improves the performance of the cross-collection join, but
> it depends on the local collection being routed by the toField. If this parameter is not specified,
> the XCJF query will try to determine the correct value automatically.|
> |ttl| |The length of time that an XCJF query in the cache will be considered valid, in seconds. Defaults to 3600 (one hour).
> The XCJF query will not be aware of changes to the remote collection, so
> if the remote collection is updated, cached XCJF queries may give inaccurate results.
> After the ttl period has expired, the XCJF query will re-execute the join against the remote collection.|
> |_All others_| |Any normal Solr parameter can also be specified as a local param.|
>
> Example Solr Config.xml changes:
>
> {{<}}{{cache}} {{name}}{{=}}{{"hash_vin"}}
> {{ }}{{class}}{{=}}{{"solr.LRUCache"}}
> {{ }}{{size}}{{=}}{{"128"}}
> {{ }}{{initialSize}}{{=}}{{"0"}}
> {{ }}{{regenerator}}{{=}}{{"solr.NoOpRegenerator"}}{{/>}}
>
> {{<}}{{queryParser}} {{name}}{{=}}{{"xcjf"}} {{class}}{{=}}{{"org.apache.solr.search.join.XCJFQueryParserPlugin"}}{{>}}
> {{ }}{{<}}{{str}} {{name}}{{=}}{{"routerField"}}{{>vin</}}{{str}}{{>}}
> {{</}}{{queryParser}}{{>}}
>
> {{<}}{{queryParser}} {{name}}{{=}}{{"hash_range"}} {{class}}{{=}}{{"org.apache.solr.search.join.HashRangeQueryParserPlugin"}} {{/>}}
>
> Example Usage:
> {{{!xcjf collection=}}{{"otherCollection"}} {{from=}}{{"fromField"}} {{to=}}{{"toField"}} {{v=}}{{"**:**"}}{{}}}
>
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org