You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Wes McKinney (Jira)" <ji...@apache.org> on 2019/08/26 15:28:00 UTC

[jira] [Created] (ARROW-6359) [C++] Raw data equality in arrays vs. semantic value equality

Wes McKinney created ARROW-6359:
-----------------------------------

             Summary: [C++] Raw data equality in arrays vs. semantic value equality
                 Key: ARROW-6359
                 URL: https://issues.apache.org/jira/browse/ARROW-6359
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Wes McKinney


I have observed a conflict in requirements / expectations in our {{Equals}} functions. The initial implementations of these functions would compare the raw bytes found in non-null data slots, in addition to the validity bitmaps in each array, and their respective children, taken slice offsets and so forth into account. 

Recently we have been adding type-specific value comparison semantics to these comparisons, notably propagating {{NaN != NaN}}. This has led to such issues as ARROW-6043. 

Rather than creating "one true way" to compare array contents, I would suggest introducing functions that perform slightly different comparisons:

* Raw data comparison, skipping masked null values
* Raw data comparison, comparing all buffer contents (up to the semantic "extent" of an array -- so ignoring the contents of padding, or excess buffer contents when dealing with slices)

thoughts? 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)