You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/22 04:55:27 UTC

[GitHub] [arrow] michalursa opened a new pull request, #13686: ARROW-15938: [C++][Compute] Fixing HashJoinBasicImpl in case of zero batches on build side

michalursa opened a new pull request, #13686:
URL: https://github.com/apache/arrow/pull/13686

   Hash join implementation using HashJoinBasicImpl class was missing initialization in case of no batches one the build side.
   Initialization of a few data structures, mainly two RowEncoder instances for holding key and payload columns for rows on build side, was missing inside BuildHashTable_exec_task, the method responsible for transforming accumulated batches on build side of the hash join into a hash table. 
   
   The initialization of RowEncoder inserts a single special row containing null values for all columns. This special row is accessed when outputting probe side rows with no matches in case of left outer and full outer join (these joins are supposed in that case to output nulls in place of all fields that would come from build side).
   
   Interestingly, the initialization was present in a similar case when batches were present on build side but all of them included zero rows. I modified the code to use the same code path for both these logically equivalent cases: a) zero build side batches and b) non-zero batches but with zero rows each.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on a diff in pull request #13686: ARROW-15938: [C++][Compute] Fixing HashJoinBasicImpl in case of zero batches on build side

Posted by GitBox <gi...@apache.org>.
westonpace commented on code in PR #13686:
URL: https://github.com/apache/arrow/pull/13686#discussion_r929426092


##########
cpp/src/arrow/compute/exec/hash_join_node_test.cc:
##########
@@ -1298,6 +1298,15 @@ void TestHashJoinDictionaryHelper(
     }
   }
 
+  // Instead of sending 2 batches of size 0 we should not send any batches
+  // at all to more accurately simulate real world use cases
+  if (l_length == 0) {
+    l_batches.batches.resize(0);
+  }
+  if (r_length == 0) {
+    r_batches.batches.resize(0);
+  }
+

Review Comment:
   Since we now handle both cases (no batches and only empty batches) can we test both scenarios?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #13686: ARROW-15938: [C++][Compute] Fixing HashJoinBasicImpl in case of zero batches on build side

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13686:
URL: https://github.com/apache/arrow/pull/13686#issuecomment-1192181494

   https://issues.apache.org/jira/browse/ARROW-15938


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kszucs commented on pull request #13686: ARROW-15938: [C++][Compute] Fixing HashJoinBasicImpl in case of zero batches on build side

Posted by GitBox <gi...@apache.org>.
kszucs commented on PR #13686:
URL: https://github.com/apache/arrow/pull/13686#issuecomment-1196965049

   Tests do pass on https://github.com/apache/arrow/pull/13725's fork, so I'm going to merge it. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kszucs merged pull request #13686: ARROW-15938: [C++][Compute] Fixing HashJoinBasicImpl in case of zero batches on build side

Posted by GitBox <gi...@apache.org>.
kszucs merged PR #13686:
URL: https://github.com/apache/arrow/pull/13686


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #13686: ARROW-15938: [C++][Compute] Fixing HashJoinBasicImpl in case of zero batches on build side

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #13686:
URL: https://github.com/apache/arrow/pull/13686#issuecomment-1197507972

   Benchmark runs are scheduled for baseline = 545b4313d6db2dfcc4ea0aa4ac23785d64450e1d and contender = 2ace2cdf06b7a82e1c2024bef89a7d70ec4031ce. 2ace2cdf06b7a82e1c2024bef89a7d70ec4031ce is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:1.79% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/a75c77ea8e474504b7cfe6cbafacc4de...e8c945c7cd0b4b918808a26a7056a573/)
   [Finished :arrow_down:0.38% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/5f429c5c1f5c426b825237d97f2614b5...f5353bd53f304211bb3d14fa788c9209/)
   [Finished :arrow_down:0.54% :arrow_up:0.27%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/dda8b8ba11ac4c02a1f75d9997f07612...3d0fc87b08eb436e9e8f57d59f5d217f/)
   [Finished :arrow_down:0.57% :arrow_up:0.04%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/ee2fe44af3bc4affbc1e415225362ce4...f77dd4ea7a8f4ffa817fc0b1cf54c240/)
   Buildkite builds:
   [Failed] [`2ace2cdf` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/1208)
   [Finished] [`2ace2cdf` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/1220)
   [Finished] [`2ace2cdf` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/1202)
   [Finished] [`2ace2cdf` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/1222)
   [Failed] [`545b4313` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/1207)
   [Finished] [`545b4313` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/1219)
   [Finished] [`545b4313` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/1201)
   [Finished] [`545b4313` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/1221)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kszucs commented on pull request #13686: ARROW-15938: [C++][Compute] Fixing HashJoinBasicImpl in case of zero batches on build side

Posted by GitBox <gi...@apache.org>.
kszucs commented on PR #13686:
URL: https://github.com/apache/arrow/pull/13686#issuecomment-1196652760

   @michalursa @westonpace what's the status of it? Can we include it in the release?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on pull request #13686: ARROW-15938: [C++][Compute] Fixing HashJoinBasicImpl in case of zero batches on build side

Posted by GitBox <gi...@apache.org>.
westonpace commented on PR #13686:
URL: https://github.com/apache/arrow/pull/13686#issuecomment-1196855486

   Sorry, I've been out these past few days (sick).  I had assumed anything non-blocking would just be pushed to 10.0.0 automatically.  However, this PR is good to go.  I don't seem to have permission to push to this branch so I addressed the feedback in a follow-up here: https://github.com/apache/arrow/pull/13725
   
   I think we can proceed with merging this assuming the CI is green for https://github.com/apache/arrow/pull/13725


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #13686: ARROW-15938: [C++][Compute] Fixing HashJoinBasicImpl in case of zero batches on build side

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #13686:
URL: https://github.com/apache/arrow/pull/13686#issuecomment-1197508089

   ['Python', 'R'] benchmarks have high level of regressions.
   [test-mac-arm](https://conbench.ursa.dev/compare/runs/5f429c5c1f5c426b825237d97f2614b5...f5353bd53f304211bb3d14fa788c9209/)
   [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/dda8b8ba11ac4c02a1f75d9997f07612...3d0fc87b08eb436e9e8f57d59f5d217f/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #13686: ARROW-15938: [C++][Compute] Fixing HashJoinBasicImpl in case of zero batches on build side

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13686:
URL: https://github.com/apache/arrow/pull/13686#issuecomment-1192181508

   :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org