You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Kelun Chai (Jira)" <ji...@apache.org> on 2023/04/24 08:20:00 UTC

[jira] [Updated] (CALCITE-5673) FilterIntoJoinRule cannot pushdown filter to TableScan

     [ https://issues.apache.org/jira/browse/CALCITE-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kelun Chai updated CALCITE-5673:
--------------------------------
    Description: 
The current rule can only push down the join condition from one side to table scan. If the join condition comes from both sides, it cannot push down the filter, but extracts the relevant columns to Join through scan for calculation.

SQL Query:
{code:sql}
SELECT A.name AS cName, B.name AS fName FROM vehicles AS A JOIN dimTable as B ON ST_Contains(B.fence, ST_MakePoint(A.x, A.y));{code}
Query Plan:
{code:bash}
== Abstract Syntax Tree ==
LogicalProject(cName=[$1], fName=[$5])
+- LogicalJoin(condition=[ST_Contains($6, ST_MakePoint($2, $3))], joinType=[inner])
   :- LogicalTableScan(table=[[default_catalog, default_database, vehicles]])
   +- LogicalTableScan(table=[[default_catalog, default_database, dimTable]]){code}
The same query in postgres behaves as follows (w/ & w/o index):
{code:bash}
Nested Loop  (cost=0.00..18071570.38 rows=722 width=64)
   Join Filter: st_contains(b.fence, a.location)
   ->  Seq Scan on vehicles a  (cost=0.00..18.50 rows=850 width=64)
   ->  Materialize  (cost=0.00..22.75 rows=850 width=64)
         ->  Seq Scan on fences b  (cost=0.00..18.50 rows=850 width=64)
# Using GIST Index
Nested Loop  (cost=0.13..84.50 rows=1 width=64)
   ->  Seq Scan on fences b  (cost=0.00..1.03 rows=3 width=64)
   ->  Index Scan using g_idx on vehicles a  (cost=0.13..27.81 rows=1 width=64)
         Index Cond: (location @ b.fence)
         Filter: st_contains(b.fence, location){code}
We created an in-memory based index in TableScan, is there a way to convert the spatial join condition to NestedLoopJoin (LogicalCorrelate) and push down to the TableScan node?

If Calcite does not support such a design, can I ask what are the considerations/concerns?

  was:
The current rule can only push down the join condition from one side to table scan. If the join condition comes from both sides, it cannot push down the filter, but extracts the relevant columns to Join through scan for calculation.

SQL Query:
{code:sql}
SELECT A.name AS cName, B.name AS fName FROM vehicles AS A JOIN dimTable as B ON ST_Contains(B.fence, ST_MakePoint(A.x, A.y));{code}
Query Plan:
{code:bash}
== Abstract Syntax Tree ==
LogicalProject(cName=[$1], fName=[$5])
+- LogicalJoin(condition=[ST_Contains($6, ST_MakePoint($2, $3))], joinType=[inner])
   :- LogicalTableScan(table=[[default_catalog, default_database, vehicles]])
   +- LogicalTableScan(table=[[default_catalog, default_database, dimTable]]){code}
The same query in postgres behaves as follows (w/ & w/o index):
{code:bash}
Nested Loop  (cost=0.00..18071570.38 rows=722 width=64)
   Join Filter: st_contains(b.fence, a.location)
   ->  Seq Scan on vehicles a  (cost=0.00..18.50 rows=850 width=64)
   ->  Materialize  (cost=0.00..22.75 rows=850 width=64)
         ->  Seq Scan on fences b  (cost=0.00..18.50 rows=850 width=64)
# Using GIST Index
Nested Loop  (cost=0.13..84.50 rows=1 width=64)
   ->  Seq Scan on fences b  (cost=0.00..1.03 rows=3 width=64)
   ->  Index Scan using g_idx on vehicles a  (cost=0.13..27.81 rows=1 width=64)
         Index Cond: (location @ b.fence)
         Filter: st_contains(b.fence, location){code}
We created an in-memory based index in Tablescan, is there a way to convert the spatial join condition to NestedLoopJoin (LogicalCorrelate) and push down to the Tablescan node?


> FilterIntoJoinRule cannot pushdown filter to TableScan
> ------------------------------------------------------
>
>                 Key: CALCITE-5673
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5673
>             Project: Calcite
>          Issue Type: Bug
>          Components: core, spatial
>    Affects Versions: 1.26.0
>            Reporter: Kelun Chai
>            Priority: Major
>              Labels: features
>
> The current rule can only push down the join condition from one side to table scan. If the join condition comes from both sides, it cannot push down the filter, but extracts the relevant columns to Join through scan for calculation.
> SQL Query:
> {code:sql}
> SELECT A.name AS cName, B.name AS fName FROM vehicles AS A JOIN dimTable as B ON ST_Contains(B.fence, ST_MakePoint(A.x, A.y));{code}
> Query Plan:
> {code:bash}
> == Abstract Syntax Tree ==
> LogicalProject(cName=[$1], fName=[$5])
> +- LogicalJoin(condition=[ST_Contains($6, ST_MakePoint($2, $3))], joinType=[inner])
>    :- LogicalTableScan(table=[[default_catalog, default_database, vehicles]])
>    +- LogicalTableScan(table=[[default_catalog, default_database, dimTable]]){code}
> The same query in postgres behaves as follows (w/ & w/o index):
> {code:bash}
> Nested Loop  (cost=0.00..18071570.38 rows=722 width=64)
>    Join Filter: st_contains(b.fence, a.location)
>    ->  Seq Scan on vehicles a  (cost=0.00..18.50 rows=850 width=64)
>    ->  Materialize  (cost=0.00..22.75 rows=850 width=64)
>          ->  Seq Scan on fences b  (cost=0.00..18.50 rows=850 width=64)
> # Using GIST Index
> Nested Loop  (cost=0.13..84.50 rows=1 width=64)
>    ->  Seq Scan on fences b  (cost=0.00..1.03 rows=3 width=64)
>    ->  Index Scan using g_idx on vehicles a  (cost=0.13..27.81 rows=1 width=64)
>          Index Cond: (location @ b.fence)
>          Filter: st_contains(b.fence, location){code}
> We created an in-memory based index in TableScan, is there a way to convert the spatial join condition to NestedLoopJoin (LogicalCorrelate) and push down to the TableScan node?
> If Calcite does not support such a design, can I ask what are the considerations/concerns?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)