You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Ishan Chattopadhyaya (JIRA)" <ji...@apache.org> on 2015/02/09 12:08:34 UTC

[jira] [Updated] (SOLR-7090) Cross collection join

     [ https://issues.apache.org/jira/browse/SOLR-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ishan Chattopadhyaya updated SOLR-7090:
---------------------------------------
    Attachment: SOLR-7090.patch

Here's an implementation for this using a value source, backed by a per core cache.

Here's how to use:

Add this to solrconfig.xml's <query> section,

    <cache name="join"
                class="solr.LRUCache"
                size="4096"
                initialSize="1024"
                autowarmCount="1024"
               regenerator="org.apache.solr.util.SolrPluginUtils$IdentityRegenerator"
                />

At query time, the "coljoin" function can be used:
coljoin(fromCollection,fromKey,fromVal,toKey)

fromCollection: the name of the secondary/"from" collection to be joined from
fromKey: the field name of the foreign key in the "from" collection to be joined against
fromVal: the field name of the value to be returned from "from" collection
toKey: the field name of the key in primary collection to be joined against 

Implementation details:
All values from the secondary collection are fetched at the primary collection's cores and cached into an LRU "join" cache. An executor thread runs continuously in the background to update the cache (by fetching values again from secondary collection) at specified intervals (in this patch this is 2000ms).

> Cross collection join
> ---------------------
>
>                 Key: SOLR-7090
>                 URL: https://issues.apache.org/jira/browse/SOLR-7090
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Ishan Chattopadhyaya
>             Fix For: 5.1
>
>         Attachments: SOLR-7090.patch
>
>
> Although SOLR-4905 supports joins across collections in Cloud mode, there are limitations, (i) the secondary collection must be replicated at each node where the primary collection has a replica, (ii) the secondary collection must be singly sharded.
> This issue explores ideas/possibilities of cross collection joins, even across nodes. This will be helpful for users who wish to maintain boosts or signals in a secondary, more frequently updated collection, and perform query time join of these boosts/signals with results from the primary collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org