You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Anton Haidai (Jira)" <ji...@apache.org> on 2019/11/26 20:06:00 UTC
[jira] [Updated] (CALCITE-3538) Materialized view with
AggregateFilterTransposeRule and a simple filter may corrupt a physical
plan.
[ https://issues.apache.org/jira/browse/CALCITE-3538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Anton Haidai updated CALCITE-3538:
----------------------------------
Description:
A combination of the following rules may lead to a corrupted physical plan:
* AbstractMaterializedViewRule.INSTANCE_AGGREGATE
* AggregateFilterTransposeRule.INSTANCE (in allColumnsInAggregate=false mode)
While my attempts to reproduce the issue in tests were not successful, I have a detailed description of the issue with plans examples.
h4. Example.
Materialized View definition SQL (MV_TABLE):
{code}
SELECT "product_category", "income_bracket", COUNT(*) AS "volume"
FROM "DATA_TABLE"
GROUP BY "product_category", "income_bracket"
{code}
Logical plan (produced using RelBuilder):
{code}
LogicalSort(sort0=[$1], dir0=[DESC], fetch=[50])
LogicalAggregate(group=[{0}], volume=[COUNT()])
LogicalProject(income_bracket=[$0])
LogicalFilter(condition=[=($1, 'Accessories')])
LogicalProject(income_bracket=[$12], product_category=[$16])
CustomTableScan(table=[[catalog, DATA_TABLE]])
{code}
It is a simple aggregation with a filter "product_category" = 'Accessories'.
Physical plan (corrupted):
{code}
EnumerableLimit(fetch=[50])
EnumerableSort(sort0=[$1], dir0=[DESC])
EnumerableAggregate(group=[{0}], volume=[$SUM0($2)])
EnumerableCalc(expr#0..2=[{inputs}], expr#3=['Accessories'], expr#4=[=($t1, $t3)], proj#0..2=[{exprs}], $condition=[$t4])
EnumerableAggregate(group=[{0, 1}], volume=[$SUM0($2)])
EnumerableTableScan(table=[[catalog, MV_TABLE]])
{code}
This plan is corrupted because the the EnumerableCalc condition checks if 'Accessories' ($t3) is equal to "income_bracket" ($t1) instead of "product_category"($t0) field of MV_TABLE leading to an empty result set.
Swapping the order of fields in a materialized view definition (SELECT "income_bracket", "product_category" ..., so they match the order of fields in DATA_TABLE, see "LogicalProject" indexes) masks the the issue producing a correct physical plan.
The issue is not reproducible if using a complex filter condition. For example, here are correct plans if trying to execute the same query but with a complex filter ("product_category" = 'Accessories' OR "product_category" = 'Audio'):
Logical plan:
{code}
LogicalSort(sort0=[$1], dir0=[DESC], fetch=[50])
LogicalAggregate(group=[{0}], volume=[COUNT()])
LogicalProject(income_bracket=[$0])
LogicalFilter(condition=[OR(=($1, 'Accessories'), =($1, 'Audio'))])
LogicalProject(income_bracket=[$12], product_category=[$16])
CustomTableScan(table=[[catalog, DATA_TABLE]])
{code}
Physical plan (correct):
{code}
EnumerableLimit(fetch=[50])
EnumerableSort(sort0=[$1], dir0=[DESC])
EnumerableAggregate(group=[{1}], volume=[$SUM0($2)])
EnumerableCalc(expr#0..2=[{inputs}], expr#3=['Accessories'], expr#4=[=($t3, $t0)], expr#5=['Audio'], expr#6=[=($t5, $t0)], expr#7=[OR($t4, $t6)], proj#0..2=[{exprs}], $condition=[$t7])
EnumerableAggregate(group=[{0, 1}], volume=[$SUM0($2)])
EnumerableTableScan(table=[[catalog, MV_TABLE]])
{code}
This plan is correct: now the filter values are correctly tested against the "product_category" ($t0) field of MV_TABLE.
was:
A combination of the following rules may lead to a corrupted physical plan:
* AbstractMaterializedViewRule.INSTANCE_AGGREGATE
* AggregateFilterTransposeRule.INSTANCE (in allColumnsInAggregate=false mode)
While my attempts to reproduce the issue in tests were not successful, I have a detailed description of the issue with plans examples.
h4. Example.
Materialized View definition SQL (MV_TABLE):
{code}
SELECT "product_category", "income_bracket", COUNT(*) AS "volume"
FROM "DATA_TABLE"
GROUP BY "product_category", "income_bracket"
{code}
Logical plan (produced using RelBuilder):
{code}
LogicalSort(sort0=[$1], dir0=[DESC], fetch=[50])
LogicalAggregate(group=[{0}], volume=[COUNT()])
LogicalProject(income_bracket=[$0])
LogicalFilter(condition=[=($1, 'Accessories')])
LogicalProject(income_bracket=[$12], product_category=[$16])
CustomTableScan(table=[[catalog, DATA_TABLE]])
{code}
It is a simple aggregation with a filter "product_category" = 'Accessories'.
Physical plan (corrupted):
{code}
EnumerableLimit(fetch=[50])
EnumerableSort(sort0=[$1], dir0=[DESC])
EnumerableAggregate(group=[{0}], volume=[$SUM0($2)])
EnumerableCalc(expr#0..2=[{inputs}], expr#3=['Accessories'], expr#4=[=($t1, $t3)], proj#0..2=[{exprs}], $condition=[$t4])
EnumerableAggregate(group=[{0, 1}], volume=[$SUM0($2)])
EnumerableTableScan(table=[[catalog, MV_TABLE]])
{code}
This plan is corrupted because the the EnumerableCalc condition checks if 'Accessories' ($t3) is equal to "income_bracket" ($t1) instead of "product_category"($t0) field of MV_TABLE leading to an empty result set.
Swapping the order of fields in a materialized view definition (SELECT "income_bracket", "product_category" ..., so they match the order of fields in DATA_TABLE) masks the the issue producing a correct physical plan.
The issue is not reproducible if using a complex filter condition. For example, here are correct plans if trying to execute the same query but with a complex filter ("product_category" = 'Accessories' OR "product_category" = 'Audio'):
Logical plan:
{code}
LogicalSort(sort0=[$1], dir0=[DESC], fetch=[50])
LogicalAggregate(group=[{0}], volume=[COUNT()])
LogicalProject(income_bracket=[$0])
LogicalFilter(condition=[OR(=($1, 'Accessories'), =($1, 'Audio'))])
LogicalProject(income_bracket=[$12], product_category=[$16])
CustomTableScan(table=[[catalog, DATA_TABLE]])
{code}
Physical plan (correct):
{code}
EnumerableLimit(fetch=[50])
EnumerableSort(sort0=[$1], dir0=[DESC])
EnumerableAggregate(group=[{1}], volume=[$SUM0($2)])
EnumerableCalc(expr#0..2=[{inputs}], expr#3=['Accessories'], expr#4=[=($t3, $t0)], expr#5=['Audio'], expr#6=[=($t5, $t0)], expr#7=[OR($t4, $t6)], proj#0..2=[{exprs}], $condition=[$t7])
EnumerableAggregate(group=[{0, 1}], volume=[$SUM0($2)])
EnumerableTableScan(table=[[catalog, MV_TABLE]])
{code}
This plan is correct: now the filter values are correctly tested against the "product_category" ($t0) field of MV_TABLE.
> Materialized view with AggregateFilterTransposeRule and a simple filter may corrupt a physical plan.
> ----------------------------------------------------------------------------------------------------
>
> Key: CALCITE-3538
> URL: https://issues.apache.org/jira/browse/CALCITE-3538
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.21.0
> Reporter: Anton Haidai
> Priority: Major
>
> A combination of the following rules may lead to a corrupted physical plan:
> * AbstractMaterializedViewRule.INSTANCE_AGGREGATE
> * AggregateFilterTransposeRule.INSTANCE (in allColumnsInAggregate=false mode)
> While my attempts to reproduce the issue in tests were not successful, I have a detailed description of the issue with plans examples.
> h4. Example.
> Materialized View definition SQL (MV_TABLE):
> {code}
> SELECT "product_category", "income_bracket", COUNT(*) AS "volume"
> FROM "DATA_TABLE"
> GROUP BY "product_category", "income_bracket"
> {code}
> Logical plan (produced using RelBuilder):
> {code}
> LogicalSort(sort0=[$1], dir0=[DESC], fetch=[50])
> LogicalAggregate(group=[{0}], volume=[COUNT()])
> LogicalProject(income_bracket=[$0])
> LogicalFilter(condition=[=($1, 'Accessories')])
> LogicalProject(income_bracket=[$12], product_category=[$16])
> CustomTableScan(table=[[catalog, DATA_TABLE]])
> {code}
> It is a simple aggregation with a filter "product_category" = 'Accessories'.
> Physical plan (corrupted):
> {code}
> EnumerableLimit(fetch=[50])
> EnumerableSort(sort0=[$1], dir0=[DESC])
> EnumerableAggregate(group=[{0}], volume=[$SUM0($2)])
> EnumerableCalc(expr#0..2=[{inputs}], expr#3=['Accessories'], expr#4=[=($t1, $t3)], proj#0..2=[{exprs}], $condition=[$t4])
> EnumerableAggregate(group=[{0, 1}], volume=[$SUM0($2)])
> EnumerableTableScan(table=[[catalog, MV_TABLE]])
> {code}
> This plan is corrupted because the the EnumerableCalc condition checks if 'Accessories' ($t3) is equal to "income_bracket" ($t1) instead of "product_category"($t0) field of MV_TABLE leading to an empty result set.
> Swapping the order of fields in a materialized view definition (SELECT "income_bracket", "product_category" ..., so they match the order of fields in DATA_TABLE, see "LogicalProject" indexes) masks the the issue producing a correct physical plan.
> The issue is not reproducible if using a complex filter condition. For example, here are correct plans if trying to execute the same query but with a complex filter ("product_category" = 'Accessories' OR "product_category" = 'Audio'):
> Logical plan:
> {code}
> LogicalSort(sort0=[$1], dir0=[DESC], fetch=[50])
> LogicalAggregate(group=[{0}], volume=[COUNT()])
> LogicalProject(income_bracket=[$0])
> LogicalFilter(condition=[OR(=($1, 'Accessories'), =($1, 'Audio'))])
> LogicalProject(income_bracket=[$12], product_category=[$16])
> CustomTableScan(table=[[catalog, DATA_TABLE]])
> {code}
> Physical plan (correct):
> {code}
> EnumerableLimit(fetch=[50])
> EnumerableSort(sort0=[$1], dir0=[DESC])
> EnumerableAggregate(group=[{1}], volume=[$SUM0($2)])
> EnumerableCalc(expr#0..2=[{inputs}], expr#3=['Accessories'], expr#4=[=($t3, $t0)], expr#5=['Audio'], expr#6=[=($t5, $t0)], expr#7=[OR($t4, $t6)], proj#0..2=[{exprs}], $condition=[$t7])
> EnumerableAggregate(group=[{0, 1}], volume=[$SUM0($2)])
> EnumerableTableScan(table=[[catalog, MV_TABLE]])
> {code}
> This plan is correct: now the filter values are correctly tested against the "product_category" ($t0) field of MV_TABLE.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)