You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Ben Kietzman (Jira)" <ji...@apache.org> on 2021/07/26 14:45:00 UTC

[jira] [Commented] (ARROW-11762) [C++][Dataset] Refactor Partitioning to explicitly treat null and absent fields identically

    [ https://issues.apache.org/jira/browse/ARROW-11762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387396#comment-17387396 ] 

Ben Kietzman commented on ARROW-11762:
--------------------------------------

You're correct for those filter expressions, but I was referring to the guarantees produced by partitions. Specifically, currently it's legal for a HivePartitioning to parse either of {{/a=0/}} or {{/a=0/b=__HIVE_DEFAULT_PARTITION__/}} as {{a == 0}} or as {{a == 0 and is_null(b)}}. The former guarantee doesn't include explicit information about field {{b}}, which we currently consider to be equivalent to specifying that it's null. This is not optimal; we'd prefer to be specific

> [C++][Dataset] Refactor Partitioning to explicitly treat null and absent fields identically
> -------------------------------------------------------------------------------------------
>
>                 Key: ARROW-11762
>                 URL: https://issues.apache.org/jira/browse/ARROW-11762
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>    Affects Versions: 3.0.0
>            Reporter: Ben Kietzman
>            Assignee: Weston Pace
>            Priority: Major
>             Fix For: 6.0.0
>
>
> ARROW-10438 adds support for partition expressions with explicit absence of a partition key by including an {{is_null(field_ref("absent key field name"))}} in the conjunction. Whenever possible, this should be preferred to an equivalent conjunction which simply omits an equality expression for the missing field.
> Additionally since an absent partition key and a null partition key is semantically equivalent to a  null valued partition key, we should ensure there is no difference in behavior. Currently, {{equal(field_ref("a"), literal(0))}} and {{and_(equal(field_ref("a"), literal(0)), is_null("b"))}} are formatted differently 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)