You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2022/01/21 18:41:00 UTC

[jira] [Created] (ARROW-15407) [Python] Change the default write partitioning flavor to hive

Weston Pace created ARROW-15407:
-----------------------------------

             Summary: [Python] Change the default write partitioning flavor to hive
                 Key: ARROW-15407
                 URL: https://issues.apache.org/jira/browse/ARROW-15407
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
            Reporter: Weston Pace


Hive partitioning round trips smoothly as it doesn't require the reader to specify the column names on read like they have to do with directory partitioning.  We already default to hive in some places (e.g. parquet.write_to_dataset) but we do not do so on dataset.write_dataset.

To alleviate backwards compatibility issues Joris suggested a deprecation cycle.

First stage:

  * If a partitioning is specified and it is not a list of columns then do nothing.
  * If a partitioning is specified and it is a list of columns but the user has explicitly set partitioning_flavor then do nothing.
  * If a partitioning is specified and it is a list of columns and the user has not explicitly set partitioning_flavor then default to directory and emit a warning:

"The default partitioning_flavor will be changing from 'directory' to 'hive' in future releases.  To silence this warning please explicitly set a the partitioning_flavor"

Second stage:
  * If a partitioning is specified and it is not a list of columns then do nothing. (same as before)
  * If a partitioning is specified and it is a list of columns but the user has explicitly set partitioning_flavor then do nothing. (same as before)
  * If a partitioning is specified and it is a list of columns and the user has not explicitly set partitioning_flavor then default to hive.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)