You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/12/21 08:38:21 UTC

[GitHub] [doris] englefly opened a new pull request, #15240: [fix](nereids) binding priority in BindSlotReference::bindSortWithAggregateFunction()

englefly opened a new pull request, #15240:
URL: https://github.com/apache/doris/pull/15240

   # Proposed changes
   we have bug in bind sort with aggregate function.
   
    ```
                   select
                           col1 * -1 as col1    # inner_col1 * -1 as alias_col1
                   from
                           t
                   group by col1
                   order by col1;     # order by order_col1
   ```
   order_col1 could be bound with alias_col1 and inner_col1. But alias_col1 has higher priority.
   without priority, the bound becomes ambiguous.
   
   Issue Number: close #xxx
   
   ## Problem summary
   
   Describe your changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: 
       - [ ] Yes
       - [ ] No
       - [ ] I don't know
   2. Has unit tests been added:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   3. Has document been added or modified:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   4. Does it need to update dependencies:
       - [ ] Yes
       - [ ] No
   5. Are there any changes that cannot be rolled back:
       - [ ] Yes (If Yes, please explain WHY)
       - [ ] No
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morrySnow commented on a diff in pull request #15240: [fix](nereids) binding priority in agg-sort, having, group_by_key

Posted by GitBox <gi...@apache.org>.
morrySnow commented on code in PR #15240:
URL: https://github.com/apache/doris/pull/15240#discussion_r1059777460


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/analysis/CheckAfterRewrite.java:
##########
@@ -45,11 +48,31 @@ public class CheckAfterRewrite extends OneAnalysisRuleFactory {
     public Rule build() {
         return any().then(plan -> {
             checkAllSlotReferenceFromChildren(plan);
+            checkNoUnbounded(plan);

Review Comment:
   we have check bound in org.apache.doris.nereids.rules.analysis.CheckAnalysis#checkBound.
   But it only check unbound slot.
   If u want to check UnboundFunction, update it~



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/analysis/CheckAfterRewrite.java:
##########
@@ -45,11 +48,31 @@ public class CheckAfterRewrite extends OneAnalysisRuleFactory {
     public Rule build() {
         return any().then(plan -> {
             checkAllSlotReferenceFromChildren(plan);
+            checkNoUnbounded(plan);
             checkMetricTypeIsUsedCorrectly(plan);
             return null;
         }).toRule(RuleType.CHECK_ANALYSIS);
     }
 
+    private void checkNoUnbounded(Plan plan) {
+        List<Expression> unboundedExprs = Lists.newArrayList();
+        boolean unbounded = plan.getExpressions().stream().anyMatch(
+                expr -> {
+                    if (expr.getInputSlots().stream().anyMatch(UnboundSlot.class::isInstance)) {
+                        unboundedExprs.add(expr);
+                        return true;
+                    } else if (expr.getInputSlots().stream().anyMatch(UnboundFunction.class::isInstance)) {
+                        unboundedExprs.add(expr);
+                        return true;
+                    }
+                    return false;

Review Comment:
   both `UnboundSlot` and `UnboundFunction` are implement `Unbound`.
   just need to match `Unbound`



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/analysis/BindSlotReference.java:
##########
@@ -194,11 +195,55 @@ public List<Rule> buildRules() {
 
                     // The columns referenced in group by are first obtained from the child's output,
                     // and then from the node's output
+                    Set<String> duplicatedSlotNames = new HashSet<>();
                     Map<String, Expression> childOutputsToExpr = agg.child().getOutput().stream()
-                            .collect(Collectors.toMap(Slot::getName, Slot::toSlot, (oldExpr, newExpr) -> oldExpr));
+                            .collect(Collectors.toMap(Slot::getName, Slot::toSlot,
+                                    (oldExpr, newExpr) -> {
+                                        duplicatedSlotNames.add(((Slot) oldExpr).getName());
+                                        return oldExpr;
+                                }));

Review Comment:
   i don't think the original code is right. and i also don't think remove duplicatedSlotNames output is right.
   u could try
   ```sql
   -- t1: a, b, c
   -- t2: a, b, c 
   
   SELECT SUM(t1.c) FROM t1 JOIN t2 GROUP BY t1.a;
   ```
   
   the map is not necessary.
   the right way is binding twice.
   first, bind with children's output. if expr still has unboundSlot, we bind the **ORIGINAL** unbound expr with aggregate's output. if expr still has unboundSlot, just throw AnalysisException.
   
   But, i know why we cannot do it now. the BIND_SLOT rule and BIND_FUNCTION rule is seperated and both throw unbound exception immediately. So we need to refactor all bind rules **LATER**. **Do not need to fix it in this PR**
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] englefly commented on a diff in pull request #15240: [fix](nereids) binding priority in agg-sort, having, group_by_key

Posted by GitBox <gi...@apache.org>.
englefly commented on code in PR #15240:
URL: https://github.com/apache/doris/pull/15240#discussion_r1060299884


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/analysis/CheckAfterRewrite.java:
##########
@@ -45,11 +48,31 @@ public class CheckAfterRewrite extends OneAnalysisRuleFactory {
     public Rule build() {
         return any().then(plan -> {
             checkAllSlotReferenceFromChildren(plan);
+            checkNoUnbounded(plan);
             checkMetricTypeIsUsedCorrectly(plan);
             return null;
         }).toRule(RuleType.CHECK_ANALYSIS);
     }
 
+    private void checkNoUnbounded(Plan plan) {
+        List<Expression> unboundedExprs = Lists.newArrayList();
+        boolean unbounded = plan.getExpressions().stream().anyMatch(
+                expr -> {
+                    if (expr.getInputSlots().stream().anyMatch(UnboundSlot.class::isInstance)) {
+                        unboundedExprs.add(expr);
+                        return true;
+                    } else if (expr.getInputSlots().stream().anyMatch(UnboundFunction.class::isInstance)) {
+                        unboundedExprs.add(expr);
+                        return true;
+                    }
+                    return false;

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #15240: [fix](nereids) binding priority in BindSlotReference::bindSortWithAggregateFunction()

Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #15240:
URL: https://github.com/apache/doris/pull/15240#issuecomment-1361212241

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 34.6 seconds
    load time: 627 seconds
    storage size: 17123660563 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221221114420_clickbench_pr_66296.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15240: [fix](nereids) binding priority in BindSlotReference::bindSortWithAggregateFunction()

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15240:
URL: https://github.com/apache/doris/pull/15240#issuecomment-1367758277

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morrySnow merged pull request #15240: [fix](nereids) binding priority in agg-sort, having, group_by_key

Posted by GitBox <gi...@apache.org>.
morrySnow merged PR #15240:
URL: https://github.com/apache/doris/pull/15240


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] englefly commented on a diff in pull request #15240: [fix](nereids) binding priority in BindSlotReference::bindSortWithAggregateFunction()

Posted by GitBox <gi...@apache.org>.
englefly commented on code in PR #15240:
URL: https://github.com/apache/doris/pull/15240#discussion_r1059232132


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/analysis/BindSlotReference.java:
##########
@@ -373,13 +374,45 @@ private Plan bindSortWithAggregateFunction(
             LogicalSort<? extends Plan> sort, Aggregate<? extends Plan> aggregate, CascadesContext ctx) {
         // We should deduplicate the slots, otherwise the binding process will fail due to the
         // ambiguous slots exist.
-        Set<Slot> boundSlots = Stream.concat(Stream.of(aggregate), aggregate.children().stream())
-                .flatMap(plan -> plan.getOutput().stream())
-                .collect(Collectors.toSet());
+        //Set<Slot> boundSlots = Stream.concat(Stream.of(aggregate), aggregate.children().stream())
+        //        .flatMap(plan -> plan.getOutput().stream())
+        //        .collect(Collectors.toSet());
+        //List<OrderKey> sortItemList = sort.getOrderKeys()
+        //        .stream()
+        //        .map(orderKey -> {
+        //            Expression item = new SlotBinder(toScope(new ArrayList<>(boundSlots)), sort, ctx)
+        //                    .bind(orderKey.getExpr());
+        //            return new OrderKey(item, orderKey.isAsc(), orderKey.isNullFirst());
+        //        }).collect(Collectors.toList());
+        //return new LogicalSort<>(sortItemList, sort.child());
+        //try bound order key with agg output, if failed, try to bound with output of agg.child
+        //example:
+        //        select
+        //        col1 * -1 as col1    # inner_col1 * -1 as alias_col1
+        //        from
+        //                (
+        //                        select 1 as col1
+        //                        union
+        //                        select -2 as col1
+        //                ) t
+        //        group by col1
+        //        order by col1;     # order by order_col1
+        // bind order_col1 with alias_col1, then, bind it with inner_col1
+        List<Slot> boundSlots = Lists.newArrayList();
+        Set<String> slotNames = new HashSet<>();
+
+        Stream.concat(Stream.of(aggregate), aggregate.children().stream())
+                .flatMap(plan -> plan.getOutput().stream()).forEach(slot -> {
+                    if (!slotNames.contains(slot.getName())) {
+                        slotNames.add(slot.getName());
+                        boundSlots.add(slot);
+                    }
+                });

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] englefly commented on a diff in pull request #15240: [fix](nereids) binding priority in agg-sort, having, group_by_key

Posted by GitBox <gi...@apache.org>.
englefly commented on code in PR #15240:
URL: https://github.com/apache/doris/pull/15240#discussion_r1060337672


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/analysis/CheckAfterRewrite.java:
##########
@@ -45,11 +48,31 @@ public class CheckAfterRewrite extends OneAnalysisRuleFactory {
     public Rule build() {
         return any().then(plan -> {
             checkAllSlotReferenceFromChildren(plan);
+            checkNoUnbounded(plan);

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morrySnow commented on a diff in pull request #15240: [fix](nereids) binding priority in BindSlotReference::bindSortWithAggregateFunction()

Posted by GitBox <gi...@apache.org>.
morrySnow commented on code in PR #15240:
URL: https://github.com/apache/doris/pull/15240#discussion_r1058913444


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/analysis/BindSlotReference.java:
##########
@@ -373,13 +374,45 @@ private Plan bindSortWithAggregateFunction(
             LogicalSort<? extends Plan> sort, Aggregate<? extends Plan> aggregate, CascadesContext ctx) {
         // We should deduplicate the slots, otherwise the binding process will fail due to the
         // ambiguous slots exist.
-        Set<Slot> boundSlots = Stream.concat(Stream.of(aggregate), aggregate.children().stream())
-                .flatMap(plan -> plan.getOutput().stream())
-                .collect(Collectors.toSet());
+        //Set<Slot> boundSlots = Stream.concat(Stream.of(aggregate), aggregate.children().stream())
+        //        .flatMap(plan -> plan.getOutput().stream())
+        //        .collect(Collectors.toSet());
+        //List<OrderKey> sortItemList = sort.getOrderKeys()
+        //        .stream()
+        //        .map(orderKey -> {
+        //            Expression item = new SlotBinder(toScope(new ArrayList<>(boundSlots)), sort, ctx)
+        //                    .bind(orderKey.getExpr());
+        //            return new OrderKey(item, orderKey.isAsc(), orderKey.isNullFirst());
+        //        }).collect(Collectors.toList());
+        //return new LogicalSort<>(sortItemList, sort.child());

Review Comment:
   remove useless comments



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/analysis/BindSlotReference.java:
##########
@@ -373,13 +374,45 @@ private Plan bindSortWithAggregateFunction(
             LogicalSort<? extends Plan> sort, Aggregate<? extends Plan> aggregate, CascadesContext ctx) {
         // We should deduplicate the slots, otherwise the binding process will fail due to the
         // ambiguous slots exist.
-        Set<Slot> boundSlots = Stream.concat(Stream.of(aggregate), aggregate.children().stream())
-                .flatMap(plan -> plan.getOutput().stream())
-                .collect(Collectors.toSet());
+        //Set<Slot> boundSlots = Stream.concat(Stream.of(aggregate), aggregate.children().stream())
+        //        .flatMap(plan -> plan.getOutput().stream())
+        //        .collect(Collectors.toSet());
+        //List<OrderKey> sortItemList = sort.getOrderKeys()
+        //        .stream()
+        //        .map(orderKey -> {
+        //            Expression item = new SlotBinder(toScope(new ArrayList<>(boundSlots)), sort, ctx)
+        //                    .bind(orderKey.getExpr());
+        //            return new OrderKey(item, orderKey.isAsc(), orderKey.isNullFirst());
+        //        }).collect(Collectors.toList());
+        //return new LogicalSort<>(sortItemList, sort.child());
+        //try bound order key with agg output, if failed, try to bound with output of agg.child
+        //example:
+        //        select
+        //        col1 * -1 as col1    # inner_col1 * -1 as alias_col1
+        //        from
+        //                (
+        //                        select 1 as col1
+        //                        union
+        //                        select -2 as col1
+        //                ) t
+        //        group by col1
+        //        order by col1;     # order by order_col1
+        // bind order_col1 with alias_col1, then, bind it with inner_col1
+        List<Slot> boundSlots = Lists.newArrayList();
+        Set<String> slotNames = new HashSet<>();
+
+        Stream.concat(Stream.of(aggregate), aggregate.children().stream())
+                .flatMap(plan -> plan.getOutput().stream()).forEach(slot -> {
+                    if (!slotNames.contains(slot.getName())) {
+                        slotNames.add(slot.getName());
+                        boundSlots.add(slot);
+                    }
+                });

Review Comment:
   we should not rely on the item order in stream api. instead, the better way is bind two round using difference scopes. the first round use aggregate output and the second round use aggregate's child if still has unbound slot in orderkeys



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15240: [fix](nereids) binding priority in BindSlotReference::bindSortWithAggregateFunction()

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15240:
URL: https://github.com/apache/doris/pull/15240#issuecomment-1367758285

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] englefly commented on a diff in pull request #15240: [fix](nereids) binding priority in agg-sort, having, group_by_key

Posted by GitBox <gi...@apache.org>.
englefly commented on code in PR #15240:
URL: https://github.com/apache/doris/pull/15240#discussion_r1059846750


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/analysis/BindSlotReference.java:
##########
@@ -194,11 +195,55 @@ public List<Rule> buildRules() {
 
                     // The columns referenced in group by are first obtained from the child's output,
                     // and then from the node's output
+                    Set<String> duplicatedSlotNames = new HashSet<>();
                     Map<String, Expression> childOutputsToExpr = agg.child().getOutput().stream()
-                            .collect(Collectors.toMap(Slot::getName, Slot::toSlot, (oldExpr, newExpr) -> oldExpr));
+                            .collect(Collectors.toMap(Slot::getName, Slot::toSlot,
+                                    (oldExpr, newExpr) -> {
+                                        duplicatedSlotNames.add(((Slot) oldExpr).getName());
+                                        return oldExpr;
+                                }));

Review Comment:
   groupByKey绑定到output 的alias 有问题,会导致NormalizeAggregate的逻辑出错。
   `SELECT SUM(t1.c) FROM t1 JOIN t2 GROUP BY t1.a;` 这个sql 现在的逻辑执行也没有问题。
   `replacedGroupBy={t1.a}`, 在L265 会绑定到join的output



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morrySnow commented on a diff in pull request #15240: [fix](nereids) binding priority in agg-sort, having, group_by_key

Posted by GitBox <gi...@apache.org>.
morrySnow commented on code in PR #15240:
URL: https://github.com/apache/doris/pull/15240#discussion_r1060066842


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/analysis/BindSlotReference.java:
##########
@@ -194,11 +195,55 @@ public List<Rule> buildRules() {
 
                     // The columns referenced in group by are first obtained from the child's output,
                     // and then from the node's output
+                    Set<String> duplicatedSlotNames = new HashSet<>();
                     Map<String, Expression> childOutputsToExpr = agg.child().getOutput().stream()
-                            .collect(Collectors.toMap(Slot::getName, Slot::toSlot, (oldExpr, newExpr) -> oldExpr));
+                            .collect(Collectors.toMap(Slot::getName, Slot::toSlot,
+                                    (oldExpr, newExpr) -> {
+                                        duplicatedSlotNames.add(((Slot) oldExpr).getName());
+                                        return oldExpr;
+                                }));

Review Comment:
   yes, u r right. it can handle this sql~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15240: [fix](nereids) binding priority in agg-sort, having, group_by_key

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15240:
URL: https://github.com/apache/doris/pull/15240#issuecomment-1369805767

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org