You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "jorisvandenbossche (via GitHub)" <gi...@apache.org> on 2023/05/10 11:41:49 UTC

[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #35522: GH-35521: [C++] Hash null bitmap only if null count is 0

jorisvandenbossche commented on code in PR #35522:
URL: https://github.com/apache/arrow/pull/35522#discussion_r1189435573


##########
cpp/src/arrow/scalar.cc:
##########
@@ -153,9 +153,10 @@ struct ScalarHashImpl {
 
   Status ArrayHash(const ArrayData& a) {
     RETURN_NOT_OK(StdHash(a.length) & StdHash(a.GetNullCount()));
-    if (a.buffers[0] != nullptr) {
+    if (a.GetNullCount() != 0) {
       // We can't visit values without unboxing the whole array, so only hash
-      // the null bitmap for now.
+      // the null bitmap for now. Only hash the null bitmap if the null count
+      // is 0 to ensure hash consistency.

Review Comment:
   ```suggestion
         // is not 0 to ensure hash consistency.
   ```



##########
cpp/src/arrow/scalar_test.cc:
##########
@@ -1106,6 +1106,25 @@ class TestListScalar : public ::testing::Test {
     ASSERT_RAISES(Invalid, scalar.ValidateFull());
   }
 
+  void TestHashing() {
+    ScalarType empty_bitmap_scalar(ArrayFromJSON(int16(), "[1, 2, 3]"));
+    ASSERT_OK(empty_bitmap_scalar.ValidateFull());
+    ASSERT_TRUE(empty_bitmap_scalar.value->data()->buffers[0] == nullptr);
+    
+    auto data = empty_bitmap_scalar.value->data()->buffers[1];
+    std::vector<uint8_t> bitmap_data = {0,0,0};
+    auto null_bitmap = std::make_shared<Buffer>(bitmap_data.data(), 3);
+
+    std::shared_ptr<Int16Array> arr(new Int16Array(3, data, null_bitmap, 0));
+    ASSERT_TRUE(arr->null_count() == 0);
+    // this line fails - I don't know how to create an array with a null bitmap
+    // that is all 0s.
+    ASSERT_TRUE(arr->data()->buffers[0] != nullptr);
+    ScalarType set_bitmap_scalar(arr);

Review Comment:
   Yeah, it seems we explicitly don't keep the bitmap around if there are no nulls in the Array construction:
   
   https://github.com/apache/arrow/blob/ec29c6ffc3cb1af4db4903d9877b2f0b548a3ad9/cpp/src/arrow/array/data.cc#L53-L55
   
   One way around this would be to create a ListArray, and then get one element of it as a scalar (like my code snippet in python)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org