You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2021/09/28 12:58:00 UTC

[jira] [Updated] (ARROW-14149) [C++][R] Support a "modified" hive style directory naming scheme

     [ https://issues.apache.org/jira/browse/ARROW-14149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Neal Richardson updated ARROW-14149:
------------------------------------
    Component/s: C++

> [C++][R] Support a "modified" hive style directory naming scheme
> ----------------------------------------------------------------
>
>                 Key: ARROW-14149
>                 URL: https://issues.apache.org/jira/browse/ARROW-14149
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Ryan Hafen
>            Priority: Minor
>
> I am working on a project where I need to create and analyze parquet files using Apache Arrow but the environment I'm working with does not allow "=" in file paths, which the hive naming convention forces, e.g. "year=2007". While I can specify the partitioning to not use the hive contention, I then lose the variable names. This is problematic when I'm sharing the datasets with others because they will have to specify the partitioning variables when opening the dataset but they don't know what the partitioning variables are.
>  
> Would it be possible to allow a modified hive-style directory naming convention that still preserves the variable name in the directory name? For example, allowing a delimiter other than "="?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)