You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by "englefly (via GitHub)" <gi...@apache.org> on 2023/06/09 08:24:24 UTC

[GitHub] [doris] englefly opened a new pull request, #20642: Distr bytes

englefly opened a new pull request, #20642:
URL: https://github.com/apache/doris/pull/20642

   ## Proposed changes
   
   Issue Number: close #xxx
   
   <!--Describe your changes.-->
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] englefly commented on a diff in pull request #20642: [tpcds](nereids) estimate distribution cost by byte size instead of row count

Posted by "englefly (via GitHub)" <gi...@apache.org>.
englefly commented on code in PR #20642:
URL: https://github.com/apache/doris/pull/20642#discussion_r1226270669


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/cost/CostModelV1.java:
##########
@@ -161,27 +161,27 @@ public Cost visitPhysicalPartitionTopN(PhysicalPartitionTopN<? extends Plan> par
     @Override
     public Cost visitPhysicalDistribute(
             PhysicalDistribute<? extends Plan> distribute, PlanContext context) {
+        int kBytes = 1024;
         Statistics childStatistics = context.getChildStatistics(0);
         DistributionSpec spec = distribute.getDistributionSpec();
+        int beNumber = ConnectContext.get().getEnv().getClusterInfo().getBackendsNumber(true);
+        beNumber = Math.max(1, beNumber);
+        double dataSize = childStatistics.computeSize() / kBytes; // in K bytes
         // shuffle
         if (spec instanceof DistributionSpecHash) {
             return CostV1.of(
                     0,
                     0,
-                    childStatistics.getRowCount());

Review Comment:
   ok, let's do it in next pr



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] englefly merged pull request #20642: [tpcds](nereids) estimate distribution cost by byte size instead of row count

Posted by "englefly (via GitHub)" <gi...@apache.org>.
englefly merged PR #20642:
URL: https://github.com/apache/doris/pull/20642


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #20642: [tpcds](nereids) estimate distribution cost by byte size instead of row count

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #20642:
URL: https://github.com/apache/doris/pull/20642#issuecomment-1586494040

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #20642: [tpcds](nereids) estimate distribution cost by byte size instead of row count

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #20642:
URL: https://github.com/apache/doris/pull/20642#issuecomment-1586494067

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] englefly commented on pull request #20642: Distr bytes

Posted by "englefly (via GitHub)" <gi...@apache.org>.
englefly commented on PR #20642:
URL: https://github.com/apache/doris/pull/20642#issuecomment-1585390417

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xzj7019 commented on a diff in pull request #20642: [tpcds](nereids) estimate distribution cost by byte size instead of row count

Posted by "xzj7019 (via GitHub)" <gi...@apache.org>.
xzj7019 commented on code in PR #20642:
URL: https://github.com/apache/doris/pull/20642#discussion_r1226268711


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/cost/CostModelV1.java:
##########
@@ -191,7 +191,7 @@ public Cost visitPhysicalDistribute(
             return CostV1.of(
                     0,
                     0,
-                    childStatistics.getRowCount() * Math.pow(beNumber, 0.5));

Review Comment:
   In cost model v1, we use row count as the unified metrics to measure the cost. So here we would better to also use row-count based cost system, if we need to distinguish the distribute cost for different cases, we can use a factor parameter using dataSize as the input with a minimal value 1.



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/cost/CostModelV1.java:
##########
@@ -161,27 +161,27 @@ public Cost visitPhysicalPartitionTopN(PhysicalPartitionTopN<? extends Plan> par
     @Override
     public Cost visitPhysicalDistribute(
             PhysicalDistribute<? extends Plan> distribute, PlanContext context) {
+        int kBytes = 1024;
         Statistics childStatistics = context.getChildStatistics(0);
         DistributionSpec spec = distribute.getDistributionSpec();
+        int beNumber = ConnectContext.get().getEnv().getClusterInfo().getBackendsNumber(true);
+        beNumber = Math.max(1, beNumber);
+        double dataSize = childStatistics.computeSize() / kBytes; // in K bytes
         // shuffle
         if (spec instanceof DistributionSpecHash) {
             return CostV1.of(
                     0,
                     0,
-                    childStatistics.getRowCount());

Review Comment:
   In cost model v1, we use row count as the unified metrics to measure the cost. So here we would better to also use row-count based cost system, if we need to distinguish the distribute cost for different cases, we can use a factor parameter using dataSize as the input with a minimal value 1. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] englefly commented on pull request #20642: [tpcds](nereids) estimate distribution cost by byte size instead of row count

Posted by "englefly (via GitHub)" <gi...@apache.org>.
englefly commented on PR #20642:
URL: https://github.com/apache/doris/pull/20642#issuecomment-1585445794

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] englefly commented on pull request #20642: [tpcds](nereids) estimate distribution cost by byte size instead of row count

Posted by "englefly (via GitHub)" <gi...@apache.org>.
englefly commented on PR #20642:
URL: https://github.com/apache/doris/pull/20642#issuecomment-1585558221

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org