You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "Joel Bernstein (Jira)" <ji...@apache.org> on 2022/06/06 13:58:00 UTC
[jira] [Comment Edited] (SOLR-16239) Add Join query plans to Solr SQL

    [ https://issues.apache.org/jira/browse/SOLR-16239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17550488#comment-17550488 ] 

Joel Bernstein edited comment on SOLR-16239 at 6/6/22 1:57 PM:
---------------------------------------------------------------

One of the first joins that would be nice to support is an aggregation that joins to return the grouping key from a different collection. This allows for query plans that aggregate first and then join to fetch the group key following the aggregation. Here is an example:

{code:java}
SELECT c.product_name, COUNT(*) AS cnt FROM signals s
LEFT JOIN catalog c ON s.product_id = c.product_id
GROUP BY c.product_name 
ORDER BY cnt desc 
LIMIT 25
{code}

This could be rewritten to a very efficient:

{code:java}
select(fetch(facet(products, buckets="product_id"),
               on="product_id=product_id",
               fl="product_name"),
        count as cnt,
        product_name)          
{code}




was (Author: joel.bernstein):
One of the first joins that would be nice to support is an aggregation that joins to return the grouping key from a different collection. This allows for query plans that aggregate first and then join to fetch the group key following the aggregation. Here is an example:

{code:java}
SELECT c.product_name, COUNT(*) AS cnt FROM signals s
LEFT JOIN catalog c ON s.product_id = c.product_id
GROUP BY c.product_name 
ORDER BY cnt desc 
LIMIT 25
{code}

This could be rewritten to a very efficient:

{code:java}
select(fetch(facet(products, buckets="product_id"),
                   on="product_id=product_id",
                   fl="product_name"),
            count as cnt,
            product_name)          
{code}



> Add Join query plans to Solr SQL
> --------------------------------
>
>                 Key: SOLR-16239
>                 URL: https://issues.apache.org/jira/browse/SOLR-16239
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Parallel SQL
>            Reporter: Joel Bernstein
>            Priority: Major
>              Labels: RobustSQL
>
> This is an umbrella ticket for adding join query plans for Solr SQL.
> Solr 9 adds significant performance improvements to the export handler. These improvements were done in part to support fast distributed joins in Solr SQL. 
> Streaming Expressions already supports hash joins and merge joins and has limited support for nested loop joins (fetch). What needs to be done is to add Rules to the Calcite planner that pushes the joins down to the SQL handler.
> Calcite also has the ability to execute joins so part of this work will also be to gracefully fall back to Calcite's join engine. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org