You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "felipecrv (via GitHub)" <gi...@apache.org> on 2023/04/18 13:44:48 UTC

[GitHub] [arrow] felipecrv opened a new issue, #35207: [C++] Don't assume `!MayHaveNulls()` implies no logical nulls for all types

felipecrv opened a new issue, #35207:
URL: https://github.com/apache/arrow/issues/35207

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   The assumption that an array without a validity bitmap has no logical nulls breaks for some types that are not `NA` `RUN_END_ENCODED`, `DENSE_UNION`, `SPARSE_UNION`.
   
   *We need to review usages of `MayHaveNulls` case by case and use more focused functions -- `HasValidityBitmap` and/or `MayHaveLogicalNulls`.*
   
   The docstring in `ArrayData::MayHaveLogicalNulls` describes the situation:
   
   ```cpp
       /// \brief Return true if the validity bitmap may have 0's in it, or if the
       /// child arrays (in the case of types without a validity bitmap) may have
       /// nulls
       ///
       /// This is not a drop-in replacement for MayHaveNulls, as historically
       /// MayHaveNulls() has been used to check for the presence of a validity
       /// bitmap that needs to be checked.
       ///
       /// Code that previously used MayHaveNulls() and then dealt with the validity
       /// bitmap directly can be fixed to handle all types correctly without
       /// performance degradation when handling most types by adopting
       /// HasValidityBitmap and MayHaveLogicalNulls.
       ///
       /// Before:
       ///
       ///     uint8_t* validity = array.MayHaveNulls() ? array.buffers[0].data : NULLPTR;
       ///     for (int64_t i = 0; i < array.length; ++i) {
       ///       if (validity && !bit_util::GetBit(validity, i)) {
       ///         continue;  // skip a NULL
       ///       }
       ///       ...
       ///     }
       ///
       /// After:
       ///
       ///     bool all_valid = !array.MayHaveLogicalNulls();
       ///     uint8_t* validity = array.HasValidityBitmap() ? array.buffers[0].data : NULLPTR;
       ///     for (int64_t i = 0; i < array.length; ++i) {
       ///       bool is_valid = all_valid ||
       ///                       (validity && bit_util::GetBit(validity, i)) ||
       ///                       array.IsValid(i);
       ///       if (!is_valid) {
       ///         continue;  // skip a NULL
       ///       }
       ///       ...
       ///     }
       bool MayHaveLogicalNulls() const {
         if (buffers[0] != NULLPTR) {
           return null_count.load() != 0;
         }
         const auto t = type->id();
         if (t == Type::SPARSE_UNION || t == Type::DENSE_UNION) {
           return internal::UnionMayHaveLogicalNulls(*this);
         }
         if (t == Type::RUN_END_ENCODED) {
           return internal::RunEndEncodedMayHaveLogicalNulls(*this);
         }
         return null_count.load() != 0;
       }
       ```
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org