You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "felipecrv (via GitHub)" <gi...@apache.org> on 2023/09/09 16:37:55 UTC

[GitHub] [arrow] felipecrv opened a new pull request, #37642: GH-34315: [C++] Correct is_null kernel for Union and RunEndEncoded logical nulls

felipecrv opened a new pull request, #37642:
URL: https://github.com/apache/arrow/pull/37642

   ### Rationale for this change
   
   Currently the is_null kernel always returns all-False for union and run-end-encoded types, since those don't have a top-level validity buffer. Update the kernel to take the logical nulls into account.
   
   
   ### Are these changes tested?
   
   Yes, both in Python and C++
   
   ### Are there any user-facing changes?
   
   Yes, this changes (corrects) the behaviour of the is_null kernel.
   
   Closes #35036


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] bkietz commented on a diff in pull request #37642: GH-34315: [C++] Correct is_null kernel for Union and RunEndEncoded logical nulls

Posted by "bkietz (via GitHub)" <gi...@apache.org>.
bkietz commented on code in PR #37642:
URL: https://github.com/apache/arrow/pull/37642#discussion_r1321613572


##########
cpp/src/arrow/compute/kernels/scalar_validity.cc:
##########
@@ -82,6 +85,72 @@ static void SetNanBits(const ArraySpan& arr, uint8_t* out_bitmap, int64_t out_of
   }
 }
 
+static void SetSparseUnionLogicalNullBits(const ArraySpan& span, uint8_t* out_bitmap,

Review Comment:
   Would it be worth reusing the `choose` kernel for this case?



##########
cpp/src/arrow/compute/kernels/scalar_validity.cc:
##########
@@ -82,6 +85,72 @@ static void SetNanBits(const ArraySpan& arr, uint8_t* out_bitmap, int64_t out_of
   }
 }
 
+static void SetSparseUnionLogicalNullBits(const ArraySpan& span, uint8_t* out_bitmap,
+                                          int64_t out_offset) {
+  const auto* sparse_union_type =
+      ::arrow::internal::checked_cast<const SparseUnionType*>(span.type);
+  DCHECK_LE(span.child_data.size(), 128);
+
+  const int8_t* types = span.GetValues<int8_t>(1);  // NOLINT
+  for (int64_t i = 0; i < span.length; i++) {
+    const int8_t child_id = sparse_union_type->child_ids()[types[i]];
+    if (span.child_data[child_id].IsNull(i + span.offset)) {
+      bit_util::SetBit(out_bitmap, i + out_offset);
+    }
+  }
+}
+
+static void SetDenseUnionLogicalNullBits(const ArraySpan& span, uint8_t* out_bitmap,
+                                         int64_t out_offset) {
+  const auto* dense_union_type =
+      ::arrow::internal::checked_cast<const DenseUnionType*>(span.type);
+  DCHECK_LE(span.child_data.size(), 128);

Review Comment:
   Someday if we have more dense union utilities the same trick as for sparse unions could be used here



##########
cpp/src/arrow/compute/kernels/scalar_validity.cc:
##########
@@ -82,6 +85,72 @@ static void SetNanBits(const ArraySpan& arr, uint8_t* out_bitmap, int64_t out_of
   }
 }
 
+static void SetSparseUnionLogicalNullBits(const ArraySpan& span, uint8_t* out_bitmap,
+                                          int64_t out_offset) {
+  const auto* sparse_union_type =
+      ::arrow::internal::checked_cast<const SparseUnionType*>(span.type);
+  DCHECK_LE(span.child_data.size(), 128);
+
+  const int8_t* types = span.GetValues<int8_t>(1);  // NOLINT
+  for (int64_t i = 0; i < span.length; i++) {
+    const int8_t child_id = sparse_union_type->child_ids()[types[i]];
+    if (span.child_data[child_id].IsNull(i + span.offset)) {
+      bit_util::SetBit(out_bitmap, i + out_offset);
+    }
+  }
+}
+
+static void SetDenseUnionLogicalNullBits(const ArraySpan& span, uint8_t* out_bitmap,
+                                         int64_t out_offset) {
+  const auto* dense_union_type =
+      ::arrow::internal::checked_cast<const DenseUnionType*>(span.type);
+  DCHECK_LE(span.child_data.size(), 128);
+
+  const int8_t* types = span.GetValues<int8_t>(1);      // NOLINT
+  const int32_t* offsets = span.GetValues<int32_t>(2);  // NOLINT
+  for (int64_t i = 0; i < span.length; i++) {
+    const int8_t child_id = dense_union_type->child_ids()[types[i]];
+    const int32_t offset = offsets[i];
+    if (span.child_data[child_id].IsNull(offset)) {
+      bit_util::SetBit(out_bitmap, i + out_offset);
+    }
+  }
+}
+
+template <typename RunEndCType>
+void SetREELogicalNullBits(const ArraySpan& span, uint8_t* out_bitmap,

Review Comment:
   Here it seems we could trivially create an `ree($index_type, bool)`, then return that directly or decode it using existing utilities



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org