You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2022/01/13 22:37:00 UTC

[jira] [Resolved] (ARROW-10485) [R] Accept partitioning in open_dataset when file paths are hive-style

     [ https://issues.apache.org/jira/browse/ARROW-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Neal Richardson resolved ARROW-10485.
-------------------------------------
    Resolution: Fixed

Issue resolved by pull request 12133
[https://github.com/apache/arrow/pull/12133]

> [R] Accept partitioning in open_dataset when file paths are hive-style
> ----------------------------------------------------------------------
>
>                 Key: ARROW-10485
>                 URL: https://issues.apache.org/jira/browse/ARROW-10485
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>    Affects Versions: 2.0.0
>         Environment: MacOS Catalina 10.15.7 (19H2), R 4.01, arrow R package v2.0.0
>            Reporter: John Sheffield
>            Assignee: Neal Richardson
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 7.0.0
>
>          Time Spent: 3h
>  Remaining Estimate: 0h
>
> When writing a dataset with hive_style = TRUE, now the default, that dataset has to be opened without an explicit definition of the partitions to work as expected. Even if the correct partition is specified, any query to the dataset on the partition field returns 0 rows.
>  
> From my eyes as a user, I'd want this to error out specifically (not just warn), probably when first calling open_dataset().
> {code:r}
> data("mtcars")
> arrow::write_dataset(
>     dataset = mtcars, path = "mtcarstest", partitioning = "cyl",
>     format = "parquet", hive_style = TRUE)
> mtc1 <- arrow::open_dataset("mtcarstest", partitioning = "cyl")
> mtc2 <- arrow::open_dataset("mtcarstest")
> mtc1 %>%
>      dplyr::filter(cyl == 4) %>%
>      collect()
> mtc2 %>%
>      dplyr::filter(cyl == 4) %>%
>      collect()
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)