You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@arrow.apache.org by "Lorenzo Gaborini (Jira)" <ji...@apache.org> on 2022/06/02 07:48:00 UTC

[jira] [Created] (ARROW-16720) [R] Cannot read datasets partitioned by columns starting with dots

Lorenzo Gaborini created ARROW-16720:
----------------------------------------

             Summary: [R] Cannot read datasets partitioned by columns starting with dots
                 Key: ARROW-16720
                 URL: https://issues.apache.org/jira/browse/ARROW-16720
             Project: Apache Arrow
          Issue Type: Bug
          Components: R
    Affects Versions: 8.0.0
            Reporter: Lorenzo Gaborini


Reprex:
{code:r}
library(dplyr)
library(arrow)

packageVersion("arrow")
#> [1] '8.0.0'

path_arrow_tmp <- tempfile()

mtcars %>% 
   dplyr::group_by(cyl) %>% 
   arrow::write_dataset(
      path = path_arrow_tmp
   )

base::list.files(path_arrow_tmp, recursive = TRUE, all.files = TRUE)
#> [1] "cyl=4/part-0.parquet" "cyl=6/part-0.parquet" "cyl=8/part-0.parquet"

mtcars_load <- path_arrow_tmp %>% 
   arrow::open_dataset() %>% 
   dplyr::collect()

setequal(mtcars$mpg, mtcars_load$mpg)
#> [1] TRUE
setequal(mtcars$wt, mtcars_load$wt)
#> [1] TRUE
setequal(mtcars$cyl, mtcars_load$cyl)
#> [1] TRUE

# Change grouping

path_arrow_tmp_grp <- tempfile()

mtcars %>% 
   dplyr::mutate(
      .cyl = cyl
   ) %>% 
   dplyr::group_by(.cyl) %>% 
   arrow::write_dataset(
      path = path_arrow_tmp_grp
   )

# the files are there
base::list.files(path_arrow_tmp_grp, recursive = TRUE, all.files = TRUE)
#> [1] ".cyl=4/part-0.parquet" ".cyl=6/part-0.parquet" ".cyl=8/part-0.parquet"

# 0 files detected
path_arrow_tmp_grp %>% 
   arrow::open_dataset()
#> FileSystemDataset with 0 Parquet files

# Specify partitioning manually

path_arrow_tmp_grp %>% 
   arrow::open_dataset(
      partitioning = ".cyl",
      hive_style = TRUE
   )
#> FileSystemDataset with 0 Parquet files
#> .cyl: int32
{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)