You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Lorenzo Gaborini (Jira)" <ji...@apache.org> on 2022/06/02 07:48:00 UTC
[jira] [Created] (ARROW-16720) [R] Cannot read datasets partitioned by columns starting with dots
Lorenzo Gaborini created ARROW-16720:
----------------------------------------
Summary: [R] Cannot read datasets partitioned by columns starting with dots
Key: ARROW-16720
URL: https://issues.apache.org/jira/browse/ARROW-16720
Project: Apache Arrow
Issue Type: Bug
Components: R
Affects Versions: 8.0.0
Reporter: Lorenzo Gaborini
Reprex:
{code:r}
library(dplyr)
library(arrow)
packageVersion("arrow")
#> [1] '8.0.0'
path_arrow_tmp <- tempfile()
mtcars %>%
dplyr::group_by(cyl) %>%
arrow::write_dataset(
path = path_arrow_tmp
)
base::list.files(path_arrow_tmp, recursive = TRUE, all.files = TRUE)
#> [1] "cyl=4/part-0.parquet" "cyl=6/part-0.parquet" "cyl=8/part-0.parquet"
mtcars_load <- path_arrow_tmp %>%
arrow::open_dataset() %>%
dplyr::collect()
setequal(mtcars$mpg, mtcars_load$mpg)
#> [1] TRUE
setequal(mtcars$wt, mtcars_load$wt)
#> [1] TRUE
setequal(mtcars$cyl, mtcars_load$cyl)
#> [1] TRUE
# Change grouping
path_arrow_tmp_grp <- tempfile()
mtcars %>%
dplyr::mutate(
.cyl = cyl
) %>%
dplyr::group_by(.cyl) %>%
arrow::write_dataset(
path = path_arrow_tmp_grp
)
# the files are there
base::list.files(path_arrow_tmp_grp, recursive = TRUE, all.files = TRUE)
#> [1] ".cyl=4/part-0.parquet" ".cyl=6/part-0.parquet" ".cyl=8/part-0.parquet"
# 0 files detected
path_arrow_tmp_grp %>%
arrow::open_dataset()
#> FileSystemDataset with 0 Parquet files
# Specify partitioning manually
path_arrow_tmp_grp %>%
arrow::open_dataset(
partitioning = ".cyl",
hive_style = TRUE
)
#> FileSystemDataset with 0 Parquet files
#> .cyl: int32
{code}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)