You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/12/29 08:01:27 UTC

[GitHub] [doris] XieJiann opened a new pull request, #15479: fix colocate join

XieJiann opened a new pull request, #15479:
URL: https://github.com/apache/doris/pull/15479

   Signed-off-by: xiejiann <ji...@gmail.com>
   
   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem summary
   
   Describe your changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: 
       - [ ] Yes
       - [ ] No
       - [ ] I don't know
   2. Has unit tests been added:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   3. Has document been added or modified:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   4. Does it need to update dependencies:
       - [ ] Yes
       - [ ] No
   5. Are there any changes that cannot be rolled back:
       - [ ] Yes (If Yes, please explain WHY)
       - [ ] No
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] XieJiann commented on a diff in pull request #15479: [fix](Nereids): generate colocate join when property is different with require property

Posted by GitBox <gi...@apache.org>.
XieJiann commented on code in PR #15479:
URL: https://github.com/apache/doris/pull/15479#discussion_r1059810215


##########
fe/fe-core/src/test/java/org/apache/doris/nereids/rules/mv/SelectMvIndexTest.java:
##########
@@ -760,7 +760,7 @@ public void testBitmapUnionInSubquery() throws Exception {
         createMv(createUserTagMVSql);
         String query = "select user_id from " + USER_TAG_TABLE_NAME + " where user_id in (select user_id from "
                 + USER_TAG_TABLE_NAME + " group by user_id having bitmap_union_count(to_bitmap(tag_id)) >1 ) ;";
-        testMvWithTwoTable(query, "user_tags", "user_tags_mv");
+        testMvWithTwoTable(query, "user_tags_mv", "user_tags");

Review Comment:
   Because we generate broadcast join. The join order in the new plan is changed:
   ```
   ---------------------------------------------------new plan---------------------------------------------------
   PhysicalHashJoin ( type=LEFT_SEMI_JOIN, hashJoinCondition=[(user_id#1 = user_id#5)], otherJoinCondition=[], stats=(rows=1, width=2, penalty=0.0) )
   |--PhysicalProject ( projects=[user_id#1], stats=(rows=1, width=1, penalty=0.0) )
   |  +--PhysicalOlapScan ( qualified=default_cluster:db1.user_tags, output=[time_col#0, user_id#1, user_name#2, tag_id#3], stats=(rows=1, width=1, penalty=0.0) )
   +--PhysicalDistribute ( distributionSpec=DistributionSpecReplicated, stats=(rows=1, width=1, penalty=1.0) )
      +--PhysicalProject ( projects=[user_id#5], stats=(rows=1, width=1, penalty=1.0) )
         +--PhysicalFilter ( predicates=(bitmap_union_count(to_bitmap(cast(tag_id as VARCHAR(*))))#8 > 1), stats=(rows=1, width=1, penalty=1.0) )
            +--PhysicalHashAggregate ( aggPhase=LOCAL, aggMode=INPUT_TO_RESULT, maybeUseStreaming=false, groupByExpr=[user_id#5], outputExpr=[user_id#5, bitmap_union_count(to_bitmap(cast(tag_id as VARCHAR(*)))#9) AS `bitmap_union_count(to_bitmap(cast(tag_id as VARCHAR(*))))`#8], partitionExpr=Optional[[user_id#5]], requireProperties=[DistributionSpecHash ( orderedShuffledColumns=[5], shuffleType=AGGREGATE, tableId=-1, partitionIds=[], equivalenceExprIds=[[5]], exprIdToEquivalenceSet={5=0} ) Order: ([])], stats=(rows=1, width=1, penalty=1.0) )
               +--PhysicalDistribute ( distributionSpec=DistributionSpecHash ( orderedShuffledColumns=[5], shuffleType=ENFORCED, tableId=-1, partitionIds=[], equivalenceExprIds=[[5]], exprIdToEquivalenceSet={5=0} ), stats=(rows=1, width=1, penalty=0.0) )
                  +--PhysicalProject ( projects=[user_id#5, mv_bitmap_union_tag_id#10 AS `to_bitmap(cast(tag_id as VARCHAR(*)))`#9], stats=(rows=1, width=1, penalty=0.0) )
                     +--PhysicalOlapScan ( qualified=default_cluster:db1.user_tags, output=[time_col#4, user_id#5, user_name#6, tag_id#7], stats=(rows=1, width=1, penalty=0.0) )
   
   
   ---------------------------------------------------old plan---------------------------------------------------
   PhysicalHashJoin ( type=RIGHT_SEMI_JOIN, hashJoinCondition=[(user_id#1 = user_id#5)], otherJoinCondition=[], stats=(rows=1, width=2, penalty=0.0) )
   |--PhysicalProject ( projects=[user_id#5], stats=(rows=1, width=1, penalty=1.0) )
   |  +--PhysicalFilter ( predicates=(bitmap_union_count(to_bitmap(cast(tag_id as VARCHAR(*))))#8 > 1), stats=(rows=1, width=1, penalty=1.0) )
   |     +--PhysicalHashAggregate ( aggPhase=LOCAL, aggMode=INPUT_TO_RESULT, maybeUseStreaming=false, groupByExpr=[user_id#5], outputExpr=[user_id#5, bitmap_union_count(to_bitmap(cast(tag_id as VARCHAR(*)))#9) AS `bitmap_union_count(to_bitmap(cast(tag_id as VARCHAR(*))))`#8], partitionExpr=Optional[[user_id#5]], requireProperties=[DistributionSpecHash ( orderedShuffledColumns=[5], shuffleType=AGGREGATE, tableId=-1, partitionIds=[], equivalenceExprIds=[[5]], exprIdToEquivalenceSet={5=0} ) Order: ([])], stats=(rows=1, width=1, penalty=1.0) )
   |        +--PhysicalDistribute ( distributionSpec=DistributionSpecHash ( orderedShuffledColumns=[5], shuffleType=ENFORCED, tableId=-1, partitionIds=[], equivalenceExprIds=[[5]], exprIdToEquivalenceSet={5=0} ), stats=(rows=1, width=1, penalty=0.0) )
   |           +--PhysicalProject ( projects=[user_id#5, mv_bitmap_union_tag_id#10 AS `to_bitmap(cast(tag_id as VARCHAR(*)))`#9], stats=(rows=1, width=1, penalty=0.0) )
   |              +--PhysicalOlapScan ( qualified=default_cluster:db1.user_tags, output=[time_col#4, user_id#5, user_name#6, tag_id#7], stats=(rows=1, width=1, penalty=0.0) )
   +--PhysicalDistribute ( distributionSpec=DistributionSpecHash ( orderedShuffledColumns=[1], shuffleType=ENFORCED, tableId=-1, partitionIds=[], equivalenceExprIds=[[1]], exprIdToEquivalenceSet={1=0} ), stats=(rows=1, width=1, penalty=0.0) )
      +--PhysicalProject ( projects=[user_id#1], stats=(rows=1, width=1, penalty=0.0) )
         +--PhysicalOlapScan ( qualified=default_cluster:db1.user_tags, output=[time_col#0, user_id#1, user_name#2, tag_id#3], stats=(rows=1, width=1, penalty=0.0) )
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morrySnow commented on a diff in pull request #15479: [fix](Nereids): generate colocate join when property is different with require property

Posted by GitBox <gi...@apache.org>.
morrySnow commented on code in PR #15479:
URL: https://github.com/apache/doris/pull/15479#discussion_r1059781227


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/properties/DistributionSpecHash.java:
##########
@@ -174,7 +182,13 @@ public boolean satisfy(DistributionSpec required) {
             return containsSatisfy(requiredHash.getOrderedShuffledColumns());
         }
 
-        if (requiredHash.shuffleType == ShuffleType.NATURAL && this.shuffleType != ShuffleType.NATURAL) {
+        // when this shuffle type is natural, we allow contains satisfied for possible colocate-join
+        if (this.shuffleType == ShuffleType.NATURAL) {

Review Comment:
   how can we handle left output's type is BUCKETED with possible bucket shuffle join?
   If we need equalsSatisfy for BUCKETED type. the left hand may has been add Distribute on it, and has no chance to change to bucket join, am i right?



##########
fe/fe-core/src/test/java/org/apache/doris/nereids/rules/mv/SelectMvIndexTest.java:
##########
@@ -760,7 +760,7 @@ public void testBitmapUnionInSubquery() throws Exception {
         createMv(createUserTagMVSql);
         String query = "select user_id from " + USER_TAG_TABLE_NAME + " where user_id in (select user_id from "
                 + USER_TAG_TABLE_NAME + " group by user_id having bitmap_union_count(to_bitmap(tag_id)) >1 ) ;";
-        testMvWithTwoTable(query, "user_tags", "user_tags_mv");
+        testMvWithTwoTable(query, "user_tags_mv", "user_tags");

Review Comment:
   why swap them?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15479: [enhancement](Nereids) generate colocate join when property is different with require property

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15479:
URL: https://github.com/apache/doris/pull/15479#issuecomment-1371741242

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morrySnow merged pull request #15479: [enhancement](Nereids) generate colocate join when property is different with require property

Posted by GitBox <gi...@apache.org>.
morrySnow merged PR #15479:
URL: https://github.com/apache/doris/pull/15479


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #15479: [fix](Nereids): generate colocate join when property is different with require property

Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #15479:
URL: https://github.com/apache/doris/pull/15479#issuecomment-1367246796

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 34.67 seconds
    load time: 634 seconds
    storage size: 17123673499 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221229111314_clickbench_pr_70987.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morrySnow commented on a diff in pull request #15479: [enhancement](Nereids) generate colocate join when property is different with require property

Posted by GitBox <gi...@apache.org>.
morrySnow commented on code in PR #15479:
URL: https://github.com/apache/doris/pull/15479#discussion_r1061135195


##########
fe/fe-core/src/test/java/org/apache/doris/nereids/sqltest/JoinTest.java:
##########
@@ -33,4 +34,32 @@ void testJoinUsing() {
                         innerLogicalJoin().when(j -> j.getHashJoinConjuncts().size() == 1)
                 );
     }
+
+    @Test
+    void testColocatedJoin() {

Review Comment:
   check with plan pattern, instead of plan tree string



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morrySnow commented on a diff in pull request #15479: [fix](Nereids): generate colocate join when property is different with require property

Posted by GitBox <gi...@apache.org>.
morrySnow commented on code in PR #15479:
URL: https://github.com/apache/doris/pull/15479#discussion_r1060827696


##########
fe/fe-core/src/test/java/org/apache/doris/nereids/rules/mv/SelectMvIndexTest.java:
##########
@@ -760,7 +760,7 @@ public void testBitmapUnionInSubquery() throws Exception {
         createMv(createUserTagMVSql);
         String query = "select user_id from " + USER_TAG_TABLE_NAME + " where user_id in (select user_id from "
                 + USER_TAG_TABLE_NAME + " group by user_id having bitmap_union_count(to_bitmap(tag_id)) >1 ) ;";
-        testMvWithTwoTable(query, "user_tags", "user_tags_mv");
+        testMvWithTwoTable(query, "user_tags_mv", "user_tags");

Review Comment:
   does it stable?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15479: [enhancement](Nereids) generate colocate join when property is different with require property

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15479:
URL: https://github.com/apache/doris/pull/15479#issuecomment-1371741215

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] XieJiann commented on a diff in pull request #15479: [fix](Nereids): generate colocate join when property is different with require property

Posted by GitBox <gi...@apache.org>.
XieJiann commented on code in PR #15479:
URL: https://github.com/apache/doris/pull/15479#discussion_r1059810215


##########
fe/fe-core/src/test/java/org/apache/doris/nereids/rules/mv/SelectMvIndexTest.java:
##########
@@ -760,7 +760,7 @@ public void testBitmapUnionInSubquery() throws Exception {
         createMv(createUserTagMVSql);
         String query = "select user_id from " + USER_TAG_TABLE_NAME + " where user_id in (select user_id from "
                 + USER_TAG_TABLE_NAME + " group by user_id having bitmap_union_count(to_bitmap(tag_id)) >1 ) ;";
-        testMvWithTwoTable(query, "user_tags", "user_tags_mv");
+        testMvWithTwoTable(query, "user_tags_mv", "user_tags");

Review Comment:
   Because we generate bucket join. The join order in the new plan is changed:
   ```
   ---------------------------------------------------new plan---------------------------------------------------
   PhysicalHashJoin ( type=LEFT_SEMI_JOIN, hashJoinCondition=[(user_id#1 = user_id#5)], otherJoinCondition=[], stats=(rows=1, width=2, penalty=0.0) )
   |--PhysicalProject ( projects=[user_id#1], stats=(rows=1, width=1, penalty=0.0) )
   |  +--PhysicalOlapScan ( qualified=default_cluster:db1.user_tags, output=[time_col#0, user_id#1, user_name#2, tag_id#3], stats=(rows=1, width=1, penalty=0.0) )
   +--PhysicalDistribute ( distributionSpec=DistributionSpecReplicated, stats=(rows=1, width=1, penalty=1.0) )
      +--PhysicalProject ( projects=[user_id#5], stats=(rows=1, width=1, penalty=1.0) )
         +--PhysicalFilter ( predicates=(bitmap_union_count(to_bitmap(cast(tag_id as VARCHAR(*))))#8 > 1), stats=(rows=1, width=1, penalty=1.0) )
            +--PhysicalHashAggregate ( aggPhase=LOCAL, aggMode=INPUT_TO_RESULT, maybeUseStreaming=false, groupByExpr=[user_id#5], outputExpr=[user_id#5, bitmap_union_count(to_bitmap(cast(tag_id as VARCHAR(*)))#9) AS `bitmap_union_count(to_bitmap(cast(tag_id as VARCHAR(*))))`#8], partitionExpr=Optional[[user_id#5]], requireProperties=[DistributionSpecHash ( orderedShuffledColumns=[5], shuffleType=AGGREGATE, tableId=-1, partitionIds=[], equivalenceExprIds=[[5]], exprIdToEquivalenceSet={5=0} ) Order: ([])], stats=(rows=1, width=1, penalty=1.0) )
               +--PhysicalDistribute ( distributionSpec=DistributionSpecHash ( orderedShuffledColumns=[5], shuffleType=ENFORCED, tableId=-1, partitionIds=[], equivalenceExprIds=[[5]], exprIdToEquivalenceSet={5=0} ), stats=(rows=1, width=1, penalty=0.0) )
                  +--PhysicalProject ( projects=[user_id#5, mv_bitmap_union_tag_id#10 AS `to_bitmap(cast(tag_id as VARCHAR(*)))`#9], stats=(rows=1, width=1, penalty=0.0) )
                     +--PhysicalOlapScan ( qualified=default_cluster:db1.user_tags, output=[time_col#4, user_id#5, user_name#6, tag_id#7], stats=(rows=1, width=1, penalty=0.0) )
   
   
   ---------------------------------------------------old plan---------------------------------------------------
   PhysicalHashJoin ( type=RIGHT_SEMI_JOIN, hashJoinCondition=[(user_id#1 = user_id#5)], otherJoinCondition=[], stats=(rows=1, width=2, penalty=0.0) )
   |--PhysicalProject ( projects=[user_id#5], stats=(rows=1, width=1, penalty=1.0) )
   |  +--PhysicalFilter ( predicates=(bitmap_union_count(to_bitmap(cast(tag_id as VARCHAR(*))))#8 > 1), stats=(rows=1, width=1, penalty=1.0) )
   |     +--PhysicalHashAggregate ( aggPhase=LOCAL, aggMode=INPUT_TO_RESULT, maybeUseStreaming=false, groupByExpr=[user_id#5], outputExpr=[user_id#5, bitmap_union_count(to_bitmap(cast(tag_id as VARCHAR(*)))#9) AS `bitmap_union_count(to_bitmap(cast(tag_id as VARCHAR(*))))`#8], partitionExpr=Optional[[user_id#5]], requireProperties=[DistributionSpecHash ( orderedShuffledColumns=[5], shuffleType=AGGREGATE, tableId=-1, partitionIds=[], equivalenceExprIds=[[5]], exprIdToEquivalenceSet={5=0} ) Order: ([])], stats=(rows=1, width=1, penalty=1.0) )
   |        +--PhysicalDistribute ( distributionSpec=DistributionSpecHash ( orderedShuffledColumns=[5], shuffleType=ENFORCED, tableId=-1, partitionIds=[], equivalenceExprIds=[[5]], exprIdToEquivalenceSet={5=0} ), stats=(rows=1, width=1, penalty=0.0) )
   |           +--PhysicalProject ( projects=[user_id#5, mv_bitmap_union_tag_id#10 AS `to_bitmap(cast(tag_id as VARCHAR(*)))`#9], stats=(rows=1, width=1, penalty=0.0) )
   |              +--PhysicalOlapScan ( qualified=default_cluster:db1.user_tags, output=[time_col#4, user_id#5, user_name#6, tag_id#7], stats=(rows=1, width=1, penalty=0.0) )
   +--PhysicalDistribute ( distributionSpec=DistributionSpecHash ( orderedShuffledColumns=[1], shuffleType=ENFORCED, tableId=-1, partitionIds=[], equivalenceExprIds=[[1]], exprIdToEquivalenceSet={1=0} ), stats=(rows=1, width=1, penalty=0.0) )
      +--PhysicalProject ( projects=[user_id#1], stats=(rows=1, width=1, penalty=0.0) )
         +--PhysicalOlapScan ( qualified=default_cluster:db1.user_tags, output=[time_col#0, user_id#1, user_name#2, tag_id#3], stats=(rows=1, width=1, penalty=0.0) )
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org