You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by "morrySnow (via GitHub)" <gi...@apache.org> on 2023/04/12 09:40:02 UTC

[GitHub] [doris] morrySnow opened a new pull request, #18596: [enhancement](Nereids) optimize bloom filter size reducing strategy

morrySnow opened a new pull request, #18596:
URL: https://github.com/apache/doris/pull/18596

   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem summary
   
   Describe your changes.
   
   ## Checklist(Required)
   
   * [ ] Does it affect the original behavior
   * [ ] Has unit tests been added
   * [ ] Has document been added or modified
   * [ ] Does it need to update dependencies
   * [ ] Is this PR support rollback (If NO, please explain WHY)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morrySnow commented on a diff in pull request #18596: [enhancement](Nereids) optimize bloom filter size reducing strategy

Posted by "morrySnow (via GitHub)" <gi...@apache.org>.
morrySnow commented on code in PR #18596:
URL: https://github.com/apache/doris/pull/18596#discussion_r1166496454


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/StatsMathUtil.java:
##########
@@ -59,4 +64,21 @@ public static double divide(double a, double b) {
         return a / nonZeroDivisor(b);
     }
 
+    /**
+     * compute the multi columns unite ndv
+     */
+    public static double multiNdv(List<Double> ndvs) {
+        if (CollectionUtils.isEmpty(ndvs)) {
+            return -1;

Review Comment:
   i don't think 0 is a good idea, if we return 0 when we have no column stats, then the bloom filter size will significantly smaller than the actual demand
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #18596: [enhancement](Nereids) optimize bloom filter size reducing strategy

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #18596:
URL: https://github.com/apache/doris/pull/18596#issuecomment-1507976106

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morrySnow commented on a diff in pull request #18596: [enhancement](Nereids) optimize bloom filter size reducing strategy

Posted by "morrySnow (via GitHub)" <gi...@apache.org>.
morrySnow commented on code in PR #18596:
URL: https://github.com/apache/doris/pull/18596#discussion_r1166499454


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/processor/post/RuntimeFilterGenerator.java:
##########
@@ -111,8 +114,21 @@ public PhysicalPlan visitPhysicalHashJoin(PhysicalHashJoin<? extends Plan, ? ext
                         continue;
                     }
                     Slot olapScanSlot = aliasTransferMap.get(unwrappedSlot).second;
+                    long buildSideNdv = -1L;
+                    AbstractPlan right = (AbstractPlan) join.right();
+                    if (right.getStats() != null) {
+                        List<Double> ndvs = join.getHashJoinConjuncts().stream()
+                                .map(Expression::getInputSlots)
+                                .flatMap(Set::stream)
+                                .filter(s -> right.getOutputExprIdSet().contains(s.getExprId()))
+                                .map(s -> right.getStats().columnStatistics().get(s))
+                                .filter(Objects::nonNull)
+                                .map(cs -> cs.ndv)
+                                .collect(Collectors.toList());
+                        buildSideNdv = (long) StatsMathUtil.multiNdv(ndvs);

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Gabriel39 merged pull request #18596: [enhancement](Nereids) optimize bloom filter size computing strategy

Posted by "Gabriel39 (via GitHub)" <gi...@apache.org>.
Gabriel39 merged PR #18596:
URL: https://github.com/apache/doris/pull/18596


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morrySnow commented on a diff in pull request #18596: [enhancement](Nereids) optimize bloom filter size reducing strategy

Posted by "morrySnow (via GitHub)" <gi...@apache.org>.
morrySnow commented on code in PR #18596:
URL: https://github.com/apache/doris/pull/18596#discussion_r1166500043


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/StatsMathUtil.java:
##########
@@ -59,4 +64,21 @@ public static double divide(double a, double b) {
         return a / nonZeroDivisor(b);
     }
 
+    /**
+     * compute the multi columns unite ndv
+     */
+    public static double multiNdv(List<Double> ndvs) {

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] englefly commented on a diff in pull request #18596: [enhancement](Nereids) optimize bloom filter size reducing strategy

Posted by "englefly (via GitHub)" <gi...@apache.org>.
englefly commented on code in PR #18596:
URL: https://github.com/apache/doris/pull/18596#discussion_r1166320661


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/processor/post/RuntimeFilterGenerator.java:
##########
@@ -111,8 +114,21 @@ public PhysicalPlan visitPhysicalHashJoin(PhysicalHashJoin<? extends Plan, ? ext
                         continue;
                     }
                     Slot olapScanSlot = aliasTransferMap.get(unwrappedSlot).second;
+                    long buildSideNdv = -1L;
+                    AbstractPlan right = (AbstractPlan) join.right();
+                    if (right.getStats() != null) {
+                        List<Double> ndvs = join.getHashJoinConjuncts().stream()
+                                .map(Expression::getInputSlots)
+                                .flatMap(Set::stream)
+                                .filter(s -> right.getOutputExprIdSet().contains(s.getExprId()))
+                                .map(s -> right.getStats().columnStatistics().get(s))
+                                .filter(Objects::nonNull)
+                                .map(cs -> cs.ndv)
+                                .collect(Collectors.toList());
+                        buildSideNdv = (long) StatsMathUtil.multiNdv(ndvs);

Review Comment:
   buildSideNdv should be less than build side row count.
   buildSideNdv = Math.max(buildSideNdv, rowCount)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #18596: [enhancement](Nereids) optimize bloom filter size reducing strategy

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #18596:
URL: https://github.com/apache/doris/pull/18596#issuecomment-1507976140

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morrySnow commented on pull request #18596: [enhancement](Nereids) optimize bloom filter size reducing strategy

Posted by "morrySnow (via GitHub)" <gi...@apache.org>.
morrySnow commented on PR #18596:
URL: https://github.com/apache/doris/pull/18596#issuecomment-1505206097

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #18596: [enhancement](Nereids) optimize bloom filter size reducing strategy

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #18596:
URL: https://github.com/apache/doris/pull/18596#issuecomment-1507976133

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morrySnow commented on pull request #18596: [enhancement](Nereids) optimize bloom filter size computing strategy

Posted by "morrySnow (via GitHub)" <gi...@apache.org>.
morrySnow commented on PR #18596:
URL: https://github.com/apache/doris/pull/18596#issuecomment-1508495203

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] englefly commented on a diff in pull request #18596: [enhancement](Nereids) optimize bloom filter size reducing strategy

Posted by "englefly (via GitHub)" <gi...@apache.org>.
englefly commented on code in PR #18596:
URL: https://github.com/apache/doris/pull/18596#discussion_r1166324139


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/StatsMathUtil.java:
##########
@@ -59,4 +64,21 @@ public static double divide(double a, double b) {
         return a / nonZeroDivisor(b);
     }
 
+    /**
+     * compute the multi columns unite ndv
+     */
+    public static double multiNdv(List<Double> ndvs) {
+        if (CollectionUtils.isEmpty(ndvs)) {
+            return -1;

Review Comment:
   if ndvs.isEmpty(), return 0 is resonable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] englefly commented on a diff in pull request #18596: [enhancement](Nereids) optimize bloom filter size reducing strategy

Posted by "englefly (via GitHub)" <gi...@apache.org>.
englefly commented on code in PR #18596:
URL: https://github.com/apache/doris/pull/18596#discussion_r1166322983


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/StatsMathUtil.java:
##########
@@ -59,4 +64,21 @@ public static double divide(double a, double b) {
         return a / nonZeroDivisor(b);
     }
 
+    /**
+     * compute the multi columns unite ndv
+     */
+    public static double multiNdv(List<Double> ndvs) {

Review Comment:
   how about rename to accumulativeNdv or jointNdv ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #18596: [enhancement](Nereids) optimize bloom filter size reducing strategy

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #18596:
URL: https://github.com/apache/doris/pull/18596#issuecomment-1507976107

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org