You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/27 14:55:19 UTC

[GitHub] [arrow-datafusion] AssHero opened a new pull request, #4394: improve hashjoin execution metrics

AssHero opened a new pull request, #4394:
URL: https://github.com/apache/arrow-datafusion/pull/4394

   # Which issue does this PR close?
   Closes #4009 
   
   # Rationale for this change
   improve hashjoin execution metrics, include hashmap build time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] mingmwang commented on pull request #4394: improve hashjoin execution metrics

Posted by GitBox <gi...@apache.org>.
mingmwang commented on PR #4394:
URL: https://github.com/apache/arrow-datafusion/pull/4394#issuecomment-1330031929

   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb merged pull request #4394: improve hashjoin execution metrics

Posted by GitBox <gi...@apache.org>.
alamb merged PR #4394:
URL: https://github.com/apache/arrow-datafusion/pull/4394


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4394: improve hashjoin execution metrics

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #4394:
URL: https://github.com/apache/arrow-datafusion/pull/4394#discussion_r1033989776


##########
datafusion/core/src/physical_plan/joins/hash_join.rs:
##########
@@ -1551,11 +1557,13 @@ impl HashJoinStream {
                             | JoinType::RightAnti => {}
                         }
                     }
-                    Some(result.map(|x| x.0))
+                    let final_result = Some(result.map(|x| x.0));
+                    timer.done();

Review Comment:
   I think drop will handle this for you (so the explicit `done()` is not needed)



##########
datafusion/core/src/physical_plan/joins/hash_join.rs:
##########
@@ -1487,10 +1492,12 @@ impl HashJoinStream {
         &mut self,
         cx: &mut std::task::Context<'_>,
     ) -> Poll<Option<ArrowResult<RecordBatch>>> {
+        let build_timer = self.join_metrics.build_time.timer();

Review Comment:
   This will be timing the overall clock time (not the cpu time) of the build. As long as that is what you are trying to time 👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] mingmwang commented on a diff in pull request #4394: improve hashjoin execution metrics

Posted by GitBox <gi...@apache.org>.
mingmwang commented on code in PR #4394:
URL: https://github.com/apache/arrow-datafusion/pull/4394#discussion_r1033389506


##########
datafusion/core/src/physical_plan/joins/hash_join.rs:
##########
@@ -376,7 +381,8 @@ impl ExecutionPlan for HashJoinExec {
     ) -> Result<SendableRecordBatchStream> {
         let on_left = self.on.iter().map(|on| on.0.clone()).collect::<Vec<_>>();
         let on_right = self.on.iter().map(|on| on.1.clone()).collect::<Vec<_>>();
-
+        let hashjoin_metrics = HashJoinMetrics::new(partition, &self.metrics);
+        let timer = hashjoin_metrics.build_time.timer();
         let left_fut = match self.mode {

Review Comment:
   Is it correct to calculate the build time here? The `left_fut` returned here is a `Future`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] AssHero commented on a diff in pull request #4394: improve hashjoin execution metrics

Posted by GitBox <gi...@apache.org>.
AssHero commented on code in PR #4394:
URL: https://github.com/apache/arrow-datafusion/pull/4394#discussion_r1033497522


##########
datafusion/core/src/physical_plan/joins/hash_join.rs:
##########
@@ -376,7 +381,8 @@ impl ExecutionPlan for HashJoinExec {
     ) -> Result<SendableRecordBatchStream> {
         let on_left = self.on.iter().map(|on| on.0.clone()).collect::<Vec<_>>();
         let on_right = self.on.iter().map(|on| on.1.clone()).collect::<Vec<_>>();
-
+        let hashjoin_metrics = HashJoinMetrics::new(partition, &self.metrics);
+        let timer = hashjoin_metrics.build_time.timer();
         let left_fut = match self.mode {

Review Comment:
   > Is it correct to calculate the build time here? The `left_fut` returned here is a `Future`.
   
   Thanks!I'll check this code later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ursabot commented on pull request #4394: improve hashjoin execution metrics

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #4394:
URL: https://github.com/apache/arrow-datafusion/pull/4394#issuecomment-1331012335

   Benchmark runs are scheduled for baseline = fa4bea871086db70a8d19820a2f266de826836e1 and contender = 66c95e70ae2ff9f3f89b91898ede875d316e731f. 66c95e70ae2ff9f3f89b91898ede875d316e731f is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/7298ca8b049542f5b78d4b293cdc350f...5a0a5d5a89494f499a89e73a3c071acf/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/d4815997bf3542af8785fe0e57335647...df1640af4ca444dba8472af1c6438999/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/f49c2361daed4f918f99f2061b20e298...618d9e51446d4e8d8bdccefbb61946f4/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/bfdd8e16a101423b8632c90b9cdce3af...2d1b843fd3484afa874073def8583e92/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Dandandan commented on pull request #4394: improve hashjoin execution metrics

Posted by GitBox <gi...@apache.org>.
Dandandan commented on PR #4394:
URL: https://github.com/apache/arrow-datafusion/pull/4394#issuecomment-1331027053

   Thanks @AssHero !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org