You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Benjamin Kietzman (Jira)" <ji...@apache.org> on 2019/08/26 16:09:00 UTC

[jira] [Commented] (ARROW-6359) [C++] Raw data equality in arrays vs. semantic value equality

    [ https://issues.apache.org/jira/browse/ARROW-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16915914#comment-16915914 ] 

Benjamin Kietzman commented on ARROW-6359:
------------------------------------------

Comparing values under an unset bit in the null bitmask seems like an antipattern. Specifically that sounds like that will lead to an expanding set of APIs to provide guarantees about what will be in a null slot.

Is there a use case for this other than {{NaN == NaN}} vs {{NaN != NaN}}?

> [C++] Raw data equality in arrays vs. semantic value equality
> -------------------------------------------------------------
>
>                 Key: ARROW-6359
>                 URL: https://issues.apache.org/jira/browse/ARROW-6359
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Wes McKinney
>            Priority: Major
>
> I have observed a conflict in requirements / expectations in our {{Equals}} functions. The initial implementations of these functions would compare the raw bytes found in non-null data slots, in addition to the validity bitmaps in each array, and their respective children, taken slice offsets and so forth into account. 
> Recently we have been adding type-specific value comparison semantics to these comparisons, notably propagating {{NaN != NaN}}. This has led to such issues as ARROW-6043. 
> Rather than creating "one true way" to compare array contents, I would suggest introducing functions that perform slightly different comparisons:
> * Raw data comparison, skipping masked null values
> * Raw data comparison, comparing all buffer contents (up to the semantic "extent" of an array -- so ignoring the contents of padding, or excess buffer contents when dealing with slices)
> thoughts? 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)