You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/04/28 09:07:00 UTC
[jira] [Work logged] (HIVE-27305) Incremental rebuild of materialized view fails when definition has aggregate on decimal column

     [ https://issues.apache.org/jira/browse/HIVE-27305?focusedWorklogId=859585&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-859585 ]

ASF GitHub Bot logged work on HIVE-27305:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 28/Apr/23 09:06
            Start Date: 28/Apr/23 09:06
    Worklog Time Spent: 10m 
      Work Description: kasakrisz opened a new pull request, #4277:
URL: https://github.com/apache/hive/pull/4277

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/Hive/HowToContribute
     2. Ensure that you have created an issue on the Hive project JIRA: https://issues.apache.org/jira/projects/HIVE/summary
     3. Ensure you have added or run the appropriate tests for your PR: 
     4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]HIVE-XXXXX:  Your PR title ...'.
     5. Be sure to keep the PR description updated to reflect all changes.
     6. Please write your PR title to summarize what this PR proposes.
     7. If possible, provide a concise example to reproduce the issue for a faster review.
   
   -->
   
   ### What changes were proposed in this pull request?
   When transforming the logical plan of a materialized view insert-overwrite rebuild to incremental rebuild wrap the case expression which aggregates the values coming from the actual state of the view and the delta into a cast.
   
   ```
   HiveProject(a=[$5], _o__c1=[CAST(CASE(IS NULL($1), $6, +($6, $1))):DECIMAL(17, 2)], _o__c2=[CASE(IS NULL($2), $7, +($7, $2))], _o__c3=[CAST(/(CAST(CASE(IS NULL($1), $6, +($6, $1))):DECIMAL(17, 2), CASE(IS NULL($2), $7, +($7, $2)))):DECIMAL(11, 6)], _o__c4=[CASE(IS NULL($3), $8, +($8, $3))])
     HiveFilter(condition=[OR(AND($4, OR(AND(IS NULL($3), =($8, 0)), AND(=(+($8, $3), 0), IS NOT NULL($3)))), AND(IS NULL($4), OR(AND(IS NULL($3), >($8, 0)), AND(>(+($8, $3), 0), IS NOT NULL($3)))), AND($4, OR(AND(IS NULL($3), >($8, 0)), AND(>(+($8, $3), 0), IS NOT NULL($3)))))])
       HiveJoin(condition=[IS NOT DISTINCT FROM($0, $5)], joinType=[right], algorithm=[none], cost=[not available])
         HiveProject(a=[$0], _c1=[$1], _c2=[$2], _c4=[$4], $f4=[true])
           HiveTableScan(table=[[default, mat1]], table:alias=[default.mat1])
         HiveProject(a=[$0], $f1=[$1], $f2=[$2], $f3=[$3])
           HiveAggregate(group=[{0}], agg#0=[SUM($1)], agg#1=[SUM($2)], agg#2=[SUM($3)])
             HiveProject(a=[$0], $f3=[CASE(OR($2, $5), *(-1, $1), $1)], $f4=[CASE(OR($2, $5), *(-1, CASE(IS NULL($1), 0, 1)), CASE(IS NULL($1), 0, 1))], $f5=[CASE(OR($2, $5), -1, 1)])
               HiveJoin(condition=[AND(=($0, $4), OR($3, $6))], joinType=[inner], algorithm=[none], cost=[not available])
                 HiveProject(a=[$0], b=[$1], ROW__IS__DELETED=[$6], <=[<(3, $5.writeid)])
                   HiveFilter(condition=[IS NOT NULL($0)])
                     HiveTableScan(table=[[default, t1]], table:alias=[t1])
                 HiveProject(a=[$0], ROW__IS__DELETED=[$5], <=[<(3, $4.writeid)])
                   HiveFilter(condition=[IS NOT NULL($0)])
                     HiveTableScan(table=[[default, t2]], table:alias=[t2])
   ```
   In the top project of the plan when aggregating decimal column an cast id added.
   ```
   CAST(CASE(IS NULL($1), $6, +($6, $1))):DECIMAL(17, 2)
   ```
   
   
   ### Why are the changes needed?
   When and operation like addition has decimal operands the result type may have a different decimal type.
   ```
   Decimal(17,2) + Decimal(17,2) -> Decimal(18,2)
   ```
   However Calcite rules expect the original and transformed row same row schema to be equal.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   ```
   mvn test -Dtest.output.overwrite -Dtest=TestMiniLlapLocalCliDriver -Dqfile=materialized_view_create_rewrite_6.q -pl itests/qtest -Pitests
   ```




Issue Time Tracking
-------------------

            Worklog Id:     (was: 859585)
    Remaining Estimate: 0h
            Time Spent: 10m

> Incremental rebuild of materialized view fails when definition has aggregate on decimal column
> ----------------------------------------------------------------------------------------------
>
>                 Key: HIVE-27305
>                 URL: https://issues.apache.org/jira/browse/HIVE-27305
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO, Materialized views
>            Reporter: Krisztian Kasa
>            Assignee: Krisztian Kasa
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.materializedview.rewriting.sql=false;
> create table t1(a int, b decimal(7,2)) stored as orc TBLPROPERTIES ('transactional'='true');
> insert into t1(a, b) values(1, 1);
> create materialized view mat1 stored as orc TBLPROPERTIES ('transactional'='true') as
> select t1.a, sum(t1.b) from t1
> group by t1.a;
> insert into t1(a,b) values(2, 5);
> explain cbo alter materialized view mat1 rebuild;
> {code}
> {code}
> java.lang.AssertionError: 
> Cannot add expression of different type to set:
> set type is RecordType(INTEGER $f0, DECIMAL(17, 2) $f1) NOT NULL
> expression type is RecordType(INTEGER $f0, DECIMAL(18, 2) $f1) NOT NULL
> set is rel#388:HiveAggregate.HIVE.[].any(input=HepRelVertex#387,group={0},agg#0=sum($1))
> expression is HiveProject($f0=[$3], $f1=[CASE(IS NULL($1), $4, IS NULL($4), $1, +($4, $1))])
>   HiveFilter(condition=[OR($2, IS NULL($2))])
>     HiveJoin(condition=[IS NOT DISTINCT FROM($0, $3)], joinType=[right], algorithm=[none], cost=[not available])
>       HiveProject(a=[$0], _c1=[$1], $f2=[true])
>         HiveTableScan(table=[[default, mat1]], table:alias=[default.mat1])
>       HiveAggregate(group=[{0}], agg#0=[sum($1)])
>         HiveProject($f0=[$0], $f1=[$1])
>           HiveFilter(condition=[<(1, $4.writeid)])
>             HiveTableScan(table=[[default, t1]], table:alias=[t1])
> 	at org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelOptUtil.java:380)
> 	at org.apache.calcite.plan.hep.HepRuleCall.transformTo(HepRuleCall.java:58)
> 	at org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:268)
> 	at org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:283)
> 	at org.apache.hadoop.hive.ql.optimizer.calcite.rules.views.HiveAggregateIncrementalRewritingRuleBase.onMatch(HiveAggregateIncrementalRewritingRuleBase.java:161)
> 	at org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:333)
> 	at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:542)
> 	at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407)
> 	at org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:243)
> 	at org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127)
> 	at org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202)
> 	at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189)
> 	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2468)
> 	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2427)
> 	at org.apache.hadoop.hive.ql.ddl.view.materialized.alter.rebuild.AlterMaterializedViewRebuildAnalyzer$MVRebuildCalcitePlannerAction.applyIncrementalRebuild(AlterMaterializedViewRebuildAnalyzer.java:460)
> 	at org.apache.hadoop.hive.ql.ddl.view.materialized.alter.rebuild.AlterMaterializedViewRebuildAnalyzer$MVRebuildCalcitePlannerAction.applyAggregateInsertIncremental(AlterMaterializedViewRebuildAnalyzer.java:352)
> 	at org.apache.hadoop.hive.ql.ddl.view.materialized.alter.rebuild.AlterMaterializedViewRebuildAnalyzer$MVRebuildCalcitePlannerAction.applyRecordIncrementalRebuildPlan(AlterMaterializedViewRebuildAnalyzer.java:311)
> 	at org.apache.hadoop.hive.ql.ddl.view.materialized.alter.rebuild.AlterMaterializedViewRebuildAnalyzer$MVRebuildCalcitePlannerAction.applyMaterializedViewRewriting(AlterMaterializedViewRebuildAnalyzer.java:278)
> 	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1722)
> 	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1591)
> 	at org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131)
> 	at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914)
> 	at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180)
> 	at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126)
> 	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1343)
> 	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:570)
> 	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12824)
> 	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465)
> 	at org.apache.hadoop.hive.ql.ddl.view.materialized.alter.rebuild.AlterMaterializedViewRebuildAnalyzer.analyzeInternal(AlterMaterializedViewRebuildAnalyzer.java:135)
> 	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:326)
> 	at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180)
> 	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:326)
> 	at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
> 	at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107)
> 	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519)
> 	at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471)
> 	at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436)
> 	at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430)
> 	at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121)
> 	at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:227)
> 	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257)
> 	at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
> 	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
> 	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425)
> 	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356)
> 	at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:733)
> 	at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:703)
> 	at org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:115)
> 	at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
> 	at org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> 	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 	at org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135)
> 	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> 	at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> 	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> 	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> 	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> 	at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> 	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> 	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> 	at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> 	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> 	at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> 	at org.junit.runners.Suite.runChild(Suite.java:128)
> 	at org.junit.runners.Suite.runChild(Suite.java:27)
> 	at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> 	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> 	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> 	at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> 	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> 	at org.apache.hadoop.hive.cli.control.CliAdapter$1$1.evaluate(CliAdapter.java:95)
> 	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> 	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> 	at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> 	at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> 	at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> 	at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> 	at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> 	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:377)
> 	at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:138)
> 	at org.apache.maven.surefire.booter.ForkedBooter.run(ForkedBooter.java:465)
> 	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:451)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)