You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Austin Haas (JIRA)" <ji...@apache.org> on 2018/01/15 17:28:00 UTC
[jira] [Commented] (BEAM-3481) Query with subquery and aggregates cannot be implemented.

    [ https://issues.apache.org/jira/browse/BEAM-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16326427#comment-16326427 ] 

Austin Haas commented on BEAM-3481:
-----------------------------------

CURRENT_TIME may be a red herring. You can put a constant there, too. For example, 500:

 
{noformat}
[main] INFO org.apache.beam.sdk.extensions.sql.impl.planner.BeamQueryPlanner - SQL:
SELECT COUNT(`t1`.`?p`) AS `n`, 500
FROM (SELECT `contains`.`p` AS `?p`, 500
FROM `contains` AS `contains`
GROUP BY `contains`.`p`) AS `t1`
[main] INFO org.apache.beam.sdk.extensions.sql.impl.planner.BeamQueryPlanner - SQLPlan>
LogicalProject(n=[$0], EXPR$1=[500])
  LogicalAggregate(group=[{}], n=[COUNT()])
    LogicalProject(?p=[$0])
      LogicalProject(?p=[$0], EXPR$1=[500])
        LogicalAggregate(group=[{0}])
          LogicalProject(?p=[$0])
            LogicalTableScan(table=[[contains]])
{noformat}
Note how the plan differs from above: there are two LogicalProject expressions between the LogicalAggregate expressions.

 

> Query with subquery and aggregates cannot be implemented.
> ---------------------------------------------------------
>
>                 Key: BEAM-3481
>                 URL: https://issues.apache.org/jira/browse/BEAM-3481
>             Project: Beam
>          Issue Type: Bug
>          Components: dsl-sql
>    Affects Versions: 2.2.0
>            Reporter: Austin Haas
>            Assignee: Xu Mingmin
>            Priority: Major
>
> This query results in the error below:
> {noformat}
> "SELECT (COUNT(`p`))
>  FROM (SELECT `p`
>        FROM `contains`
>        GROUP BY `p`) AS `t1`"{noformat}
> This works correctly:
> {noformat}
> "SELECT (COUNT(`p`))
>  FROM (SELECT `p`, CURRENT_TIME
>        FROM `contains`
>        GROUP BY `p`) AS `t1`"{noformat}
> Error:
>  
> {noformat}
> [nREPL-worker-5] INFO org.apache.beam.sdk.extensions.sql.impl.planner.BeamQueryPlanner - SQL:
> SELECT COUNT(`t1`.`p`)
> FROM (SELECT `contains`.`p`
> FROM `contains` AS `contains`
> GROUP BY `contains`.`p`) AS `t1`
> [nREPL-worker-5] INFO org.apache.beam.sdk.extensions.sql.impl.planner.BeamQueryPlanner - SQLPlan>
> LogicalAggregate(group=[{}], EXPR$0=[COUNT()])
>  LogicalAggregate(group=[{0}])
>  LogicalProject(p=[$0])
>  LogicalTableScan(table=[[contains]])
> CannotPlanException Node [rel#157:Subset#3.BEAM_LOGICAL.[]] could not be implemented; planner state:
> Root: rel#157:Subset#3.BEAM_LOGICAL.[]
> Original rel:
> LogicalAggregate(subset=[rel#157:Subset#3.BEAM_LOGICAL.[]], group=[{}], EXPR$0=[COUNT()]): rowcount = 1.0, cumulative cost = {1.125 rows, 0.0 cpu, 0.0 io}, id = 155
>  LogicalAggregate(subset=[rel#154:Subset#2.NONE.[]], group=[{0}]): rowcount = 10.0, cumulative cost = {10.0 rows, 0.0 cpu, 0.0 io}, id = 153
>  LogicalProject(subset=[rel#152:Subset#1.NONE.[]], p=[$0]): rowcount = 100.0, cumulative cost = {100.0 rows, 100.0 cpu, 0.0 io}, id = 151
>  LogicalTableScan(subset=[rel#150:Subset#0.NONE.[]], table=[[contains]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 146
> Sets:
> Set#0, type: RecordType(VARCHAR p, VARCHAR s, BIGINT c)
>  rel#150:Subset#0.NONE.[], best=null, importance=0.6561
>  rel#146:LogicalTableScan.NONE.[](table=[contains]), rowcount=100.0, cumulative cost={inf}
>  rel#162:Subset#0.BEAM_LOGICAL.[], best=rel#164, importance=0.32805
>  rel#164:BeamIOSourceRel.BEAM_LOGICAL.[](table=[contains]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
> Set#1, type: RecordType(VARCHAR p)
>  rel#152:Subset#1.NONE.[], best=null, importance=0.7290000000000001
>  rel#151:LogicalProject.NONE.[](input=rel#150:Subset#0.NONE.[],p=$0), rowcount=100.0, cumulative cost={inf}
>  rel#159:Subset#1.BEAM_LOGICAL.[], best=rel#163, importance=0.36450000000000005
>  rel#163:BeamProjectRel.BEAM_LOGICAL.[](input=rel#162:Subset#0.BEAM_LOGICAL.[],p=$0), rowcount=100.0, cumulative cost={200.0 rows, 201.0 cpu, 0.0 io}
> Set#2, type: RecordType(VARCHAR p)
>  rel#154:Subset#2.NONE.[], best=null, importance=0.81
>  rel#153:LogicalAggregate.NONE.[](input=rel#152:Subset#1.NONE.[],group={0}), rowcount=10.0, cumulative cost={inf}
>  rel#161:Subset#2.BEAM_LOGICAL.[], best=rel#160, importance=0.405
>  rel#160:BeamAggregationRel.BEAM_LOGICAL.[](group={0},window=org.apache.beam.sdk.transforms.windowing.GlobalWindows,trigger=Repeatedly.forever(AfterWatermark.pastEndOfWindow())), rowcount=10.0, cumulative cost={210.0 rows, 201.0 cpu, 0.0 io}
> Set#3, type: RecordType(BIGINT EXPR$0)
>  rel#156:Subset#3.NONE.[], best=null, importance=0.9
>  rel#155:LogicalAggregate.NONE.[](input=rel#154:Subset#2.NONE.[],group={},EXPR$0=COUNT()), rowcount=1.0, cumulative cost={inf}
>  rel#157:Subset#3.BEAM_LOGICAL.[], best=null,
>  importance=1.0
>  rel#158:AbstractConverter.BEAM_LOGICAL.[](input=rel#156:Subset#3.NONE.[],convention=BEAM_LOGICAL,sort=[]), rowcount=1.0, cumulative cost={inf}
> org.apache.beam.sdks.java.extensions.sql.repackaged.org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit (RelSubset.java:441)
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)