You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2021/02/02 01:22:00 UTC
[jira] [Commented] (ARROW-10438) [C++][Dataset]
Partitioning::Format on nulls
[ https://issues.apache.org/jira/browse/ARROW-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276774#comment-17276774 ]
Weston Pace commented on ARROW-10438:
-------------------------------------
[~jorisvandenbossche] , I spoke with [~bkietz] a bit on this and, at the risk of putting words in his mouth, he also agreed with `I am not sure we should exactly follow the (potentially non-ideal) behaviour of Hive, here.`
Ben's assumption was that we would just omit the directory on null and, if `\_HIVE\_DEFAULT\_PARTITION\_` is present then just read in that string and allow the user to convert it to null at a later projection stage if that is what they desire.
That does make inference a little difficult in this case (right now HivePartitioning will attempt to infer int32 if possible).
It also puts the responsibility back on the user if they want to create a dataset compatible with other Hive tools.
We agreed it would be good to revisit the topic with you and see if you had any strong opinions.
> [C++][Dataset] Partitioning::Format on nulls
> --------------------------------------------
>
> Key: ARROW-10438
> URL: https://issues.apache.org/jira/browse/ARROW-10438
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Affects Versions: 2.0.0
> Reporter: Ben Kietzman
> Assignee: Weston Pace
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0
>
> Time Spent: 1h 40m
> Remaining Estimate: 0h
>
> Writing a dataset with null partition keys is currently untested. Ensure the behavior is documented and correct
--
This message was sent by Atlassian Jira
(v8.3.4#803005)