You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "Mauricio 'Pachá' Vargas Sepúlveda (Jira)" <ji...@apache.org> on 2021/04/09 15:43:00 UTC

[jira] [Created] (ARROW-12315) add max_partitions argument to write_dataset()

Mauricio 'Pachá' Vargas Sepúlveda created ARROW-12315:
---------------------------------------------------------

             Summary: add max_partitions argument to write_dataset()
                 Key: ARROW-12315
                 URL: https://issues.apache.org/jira/browse/ARROW-12315
             Project: Apache Arrow
          Issue Type: Bug
          Components: R
    Affects Versions: 3.0.0
            Reporter: Mauricio 'Pachá' Vargas Sepúlveda
             Fix For: 4.0.0


the Python docs show that we can pass, say, 1025 partitions
https://arrow.apache.org/docs/_modules/pyarrow/dataset.html

but in R this argument doesn't exist, it would be good to add this for arrow v4.0.0

this is useful, for example, with intl trade datasets:
```
# d = UN COMTRADE - World's bilateral flows 2019
# 13,050,535 x 22 data.frame
d %>%
          group_by(Year, `Reporter ISO`, `Partner ISO`) %>%
          write_dataset("parquet", hive_style = F)

Error: Invalid: Fragment would be written into 12808 partitions. This exceeds the maximum of 1024
```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)