You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "micah-white (via GitHub)" <gi...@apache.org> on 2023/05/09 22:33:45 UTC

[GitHub] [arrow] micah-white commented on a diff in pull request #35522: GH-35521: [C++] Hash null bitmap only if null count is 0

micah-white commented on code in PR #35522:
URL: https://github.com/apache/arrow/pull/35522#discussion_r1189188194


##########
cpp/src/arrow/scalar.cc:
##########
@@ -153,9 +153,10 @@ struct ScalarHashImpl {
 
   Status ArrayHash(const ArrayData& a) {
     RETURN_NOT_OK(StdHash(a.length) & StdHash(a.GetNullCount()));
-    if (a.buffers[0] != nullptr) {
+    if (a.GetNullCount() != 0) {

Review Comment:
   IIUC, the null count should get cached from calling `GetNullCount()` above, so this line does not have performance degradations.



##########
cpp/src/arrow/scalar_test.cc:
##########
@@ -1106,6 +1106,25 @@ class TestListScalar : public ::testing::Test {
     ASSERT_RAISES(Invalid, scalar.ValidateFull());
   }
 
+  void TestHashing() {
+    ScalarType empty_bitmap_scalar(ArrayFromJSON(int16(), "[1, 2, 3]"));
+    ASSERT_OK(empty_bitmap_scalar.ValidateFull());
+    ASSERT_TRUE(empty_bitmap_scalar.value->data()->buffers[0] == nullptr);
+    
+    auto data = empty_bitmap_scalar.value->data()->buffers[1];
+    std::vector<uint8_t> bitmap_data = {0,0,0};
+    auto null_bitmap = std::make_shared<Buffer>(bitmap_data.data(), 3);
+
+    std::shared_ptr<Int16Array> arr(new Int16Array(3, data, null_bitmap, 0));
+    ASSERT_TRUE(arr->null_count() == 0);
+    // this line fails - I don't know how to create an array with a null bitmap
+    // that is all 0s.
+    ASSERT_TRUE(arr->data()->buffers[0] != nullptr);
+    ScalarType set_bitmap_scalar(arr);

Review Comment:
   As I said in the description, I don't know how to make an array with a 0s null bitmap. I ran into this bug when using the `filter()` function on a `LargeListArray` of strings. The scalars in the array would have the null bitmap set even if there were no nulls, changing the hash values. I assume using that exact method is too circumspect to be used here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org