You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Anton Haidai (Jira)" <ji...@apache.org> on 2019/11/26 20:06:00 UTC

[jira] [Updated] (CALCITE-3538) Materialized view with AggregateFilterTransposeRule and a simple filter may corrupt a physical plan.

     [ https://issues.apache.org/jira/browse/CALCITE-3538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anton Haidai updated CALCITE-3538:
----------------------------------
    Description: 
A combination of the following rules may lead to a corrupted physical plan:
* AbstractMaterializedViewRule.INSTANCE_AGGREGATE
* AggregateFilterTransposeRule.INSTANCE (in allColumnsInAggregate=false mode)

While my attempts to reproduce the issue in tests were not successful, I have a detailed description of the issue with plans examples.

h4. Example.
Materialized View definition SQL (MV_TABLE):
{code}
SELECT "product_category", "income_bracket", COUNT(*) AS "volume"
FROM "DATA_TABLE"
GROUP BY "product_category", "income_bracket"
{code}

Logical plan (produced using RelBuilder):
{code}
LogicalSort(sort0=[$1], dir0=[DESC], fetch=[50])
  LogicalAggregate(group=[{0}], volume=[COUNT()])
    LogicalProject(income_bracket=[$0])
      LogicalFilter(condition=[=($1, 'Accessories')])
        LogicalProject(income_bracket=[$12], product_category=[$16])
          CustomTableScan(table=[[catalog, DATA_TABLE]])
{code}

It is a simple aggregation with a filter "product_category" = 'Accessories'.

Physical plan (corrupted):
{code}
EnumerableLimit(fetch=[50])
  EnumerableSort(sort0=[$1], dir0=[DESC])
    EnumerableAggregate(group=[{0}], volume=[$SUM0($2)])
      EnumerableCalc(expr#0..2=[{inputs}], expr#3=['Accessories'], expr#4=[=($t1, $t3)], proj#0..2=[{exprs}], $condition=[$t4])
        EnumerableAggregate(group=[{0, 1}], volume=[$SUM0($2)])
          EnumerableTableScan(table=[[catalog, MV_TABLE]]) 
{code}
This plan is corrupted because the the EnumerableCalc condition checks if 'Accessories' ($t3) is equal to "income_bracket" ($t1) instead of "product_category"($t0) field of MV_TABLE leading to an empty result set. 
Swapping the order of fields in a materialized view definition (SELECT  "income_bracket", "product_category" ..., so they match the order of fields in DATA_TABLE, see "LogicalProject" indexes) masks the the issue producing a correct physical plan.

The issue is not reproducible if using a complex filter condition. For example, here are  correct plans if trying to execute the same query but with a complex filter ("product_category" = 'Accessories' OR "product_category" = 'Audio'):

Logical plan:
{code}
LogicalSort(sort0=[$1], dir0=[DESC], fetch=[50])
  LogicalAggregate(group=[{0}], volume=[COUNT()])
    LogicalProject(income_bracket=[$0])
      LogicalFilter(condition=[OR(=($1, 'Accessories'), =($1, 'Audio'))])
        LogicalProject(income_bracket=[$12], product_category=[$16])
          CustomTableScan(table=[[catalog, DATA_TABLE]])
{code}

Physical plan (correct):
{code}
EnumerableLimit(fetch=[50])
  EnumerableSort(sort0=[$1], dir0=[DESC])
    EnumerableAggregate(group=[{1}], volume=[$SUM0($2)])
      EnumerableCalc(expr#0..2=[{inputs}], expr#3=['Accessories'], expr#4=[=($t3, $t0)], expr#5=['Audio'], expr#6=[=($t5, $t0)], expr#7=[OR($t4, $t6)], proj#0..2=[{exprs}], $condition=[$t7])
        EnumerableAggregate(group=[{0, 1}], volume=[$SUM0($2)])
          EnumerableTableScan(table=[[catalog, MV_TABLE]])
{code}
This plan is correct: now the filter values are correctly tested against the "product_category" ($t0) field of MV_TABLE.

  was:
A combination of the following rules may lead to a corrupted physical plan:
* AbstractMaterializedViewRule.INSTANCE_AGGREGATE
* AggregateFilterTransposeRule.INSTANCE (in allColumnsInAggregate=false mode)

While my attempts to reproduce the issue in tests were not successful, I have a detailed description of the issue with plans examples.

h4. Example.
Materialized View definition SQL (MV_TABLE):
{code}
SELECT "product_category", "income_bracket", COUNT(*) AS "volume"
FROM "DATA_TABLE"
GROUP BY "product_category", "income_bracket"
{code}

Logical plan (produced using RelBuilder):
{code}
LogicalSort(sort0=[$1], dir0=[DESC], fetch=[50])
  LogicalAggregate(group=[{0}], volume=[COUNT()])
    LogicalProject(income_bracket=[$0])
      LogicalFilter(condition=[=($1, 'Accessories')])
        LogicalProject(income_bracket=[$12], product_category=[$16])
          CustomTableScan(table=[[catalog, DATA_TABLE]])
{code}

It is a simple aggregation with a filter "product_category" = 'Accessories'.

Physical plan (corrupted):
{code}
EnumerableLimit(fetch=[50])
  EnumerableSort(sort0=[$1], dir0=[DESC])
    EnumerableAggregate(group=[{0}], volume=[$SUM0($2)])
      EnumerableCalc(expr#0..2=[{inputs}], expr#3=['Accessories'], expr#4=[=($t1, $t3)], proj#0..2=[{exprs}], $condition=[$t4])
        EnumerableAggregate(group=[{0, 1}], volume=[$SUM0($2)])
          EnumerableTableScan(table=[[catalog, MV_TABLE]]) 
{code}
This plan is corrupted because the the EnumerableCalc condition checks if 'Accessories' ($t3) is equal to "income_bracket" ($t1) instead of "product_category"($t0) field of MV_TABLE leading to an empty result set. 
Swapping the order of fields in a materialized view definition (SELECT  "income_bracket", "product_category" ..., so they match the order of fields in DATA_TABLE) masks the the issue producing a correct physical plan.

The issue is not reproducible if using a complex filter condition. For example, here are  correct plans if trying to execute the same query but with a complex filter ("product_category" = 'Accessories' OR "product_category" = 'Audio'):

Logical plan:
{code}
LogicalSort(sort0=[$1], dir0=[DESC], fetch=[50])
  LogicalAggregate(group=[{0}], volume=[COUNT()])
    LogicalProject(income_bracket=[$0])
      LogicalFilter(condition=[OR(=($1, 'Accessories'), =($1, 'Audio'))])
        LogicalProject(income_bracket=[$12], product_category=[$16])
          CustomTableScan(table=[[catalog, DATA_TABLE]])
{code}

Physical plan (correct):
{code}
EnumerableLimit(fetch=[50])
  EnumerableSort(sort0=[$1], dir0=[DESC])
    EnumerableAggregate(group=[{1}], volume=[$SUM0($2)])
      EnumerableCalc(expr#0..2=[{inputs}], expr#3=['Accessories'], expr#4=[=($t3, $t0)], expr#5=['Audio'], expr#6=[=($t5, $t0)], expr#7=[OR($t4, $t6)], proj#0..2=[{exprs}], $condition=[$t7])
        EnumerableAggregate(group=[{0, 1}], volume=[$SUM0($2)])
          EnumerableTableScan(table=[[catalog, MV_TABLE]])
{code}
This plan is correct: now the filter values are correctly tested against the "product_category" ($t0) field of MV_TABLE.


> Materialized view with AggregateFilterTransposeRule and a simple filter may corrupt a physical plan.
> ----------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-3538
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3538
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.21.0
>            Reporter: Anton Haidai
>            Priority: Major
>
> A combination of the following rules may lead to a corrupted physical plan:
> * AbstractMaterializedViewRule.INSTANCE_AGGREGATE
> * AggregateFilterTransposeRule.INSTANCE (in allColumnsInAggregate=false mode)
> While my attempts to reproduce the issue in tests were not successful, I have a detailed description of the issue with plans examples.
> h4. Example.
> Materialized View definition SQL (MV_TABLE):
> {code}
> SELECT "product_category", "income_bracket", COUNT(*) AS "volume"
> FROM "DATA_TABLE"
> GROUP BY "product_category", "income_bracket"
> {code}
> Logical plan (produced using RelBuilder):
> {code}
> LogicalSort(sort0=[$1], dir0=[DESC], fetch=[50])
>   LogicalAggregate(group=[{0}], volume=[COUNT()])
>     LogicalProject(income_bracket=[$0])
>       LogicalFilter(condition=[=($1, 'Accessories')])
>         LogicalProject(income_bracket=[$12], product_category=[$16])
>           CustomTableScan(table=[[catalog, DATA_TABLE]])
> {code}
> It is a simple aggregation with a filter "product_category" = 'Accessories'.
> Physical plan (corrupted):
> {code}
> EnumerableLimit(fetch=[50])
>   EnumerableSort(sort0=[$1], dir0=[DESC])
>     EnumerableAggregate(group=[{0}], volume=[$SUM0($2)])
>       EnumerableCalc(expr#0..2=[{inputs}], expr#3=['Accessories'], expr#4=[=($t1, $t3)], proj#0..2=[{exprs}], $condition=[$t4])
>         EnumerableAggregate(group=[{0, 1}], volume=[$SUM0($2)])
>           EnumerableTableScan(table=[[catalog, MV_TABLE]]) 
> {code}
> This plan is corrupted because the the EnumerableCalc condition checks if 'Accessories' ($t3) is equal to "income_bracket" ($t1) instead of "product_category"($t0) field of MV_TABLE leading to an empty result set. 
> Swapping the order of fields in a materialized view definition (SELECT  "income_bracket", "product_category" ..., so they match the order of fields in DATA_TABLE, see "LogicalProject" indexes) masks the the issue producing a correct physical plan.
> The issue is not reproducible if using a complex filter condition. For example, here are  correct plans if trying to execute the same query but with a complex filter ("product_category" = 'Accessories' OR "product_category" = 'Audio'):
> Logical plan:
> {code}
> LogicalSort(sort0=[$1], dir0=[DESC], fetch=[50])
>   LogicalAggregate(group=[{0}], volume=[COUNT()])
>     LogicalProject(income_bracket=[$0])
>       LogicalFilter(condition=[OR(=($1, 'Accessories'), =($1, 'Audio'))])
>         LogicalProject(income_bracket=[$12], product_category=[$16])
>           CustomTableScan(table=[[catalog, DATA_TABLE]])
> {code}
> Physical plan (correct):
> {code}
> EnumerableLimit(fetch=[50])
>   EnumerableSort(sort0=[$1], dir0=[DESC])
>     EnumerableAggregate(group=[{1}], volume=[$SUM0($2)])
>       EnumerableCalc(expr#0..2=[{inputs}], expr#3=['Accessories'], expr#4=[=($t3, $t0)], expr#5=['Audio'], expr#6=[=($t5, $t0)], expr#7=[OR($t4, $t6)], proj#0..2=[{exprs}], $condition=[$t7])
>         EnumerableAggregate(group=[{0, 1}], volume=[$SUM0($2)])
>           EnumerableTableScan(table=[[catalog, MV_TABLE]])
> {code}
> This plan is correct: now the filter values are correctly tested against the "product_category" ($t0) field of MV_TABLE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)