You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Jonathan Keane (Jira)" <ji...@apache.org> on 2021/11/10 14:02:00 UTC
[jira] [Resolved] (ARROW-12315) [R] add max_partitions argument to
write_dataset()
[ https://issues.apache.org/jira/browse/ARROW-12315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Keane resolved ARROW-12315.
------------------------------------
Resolution: Fixed
Issue resolved by pull request 9972
[https://github.com/apache/arrow/pull/9972]
> [R] add max_partitions argument to write_dataset()
> --------------------------------------------------
>
> Key: ARROW-12315
> URL: https://issues.apache.org/jira/browse/ARROW-12315
> Project: Apache Arrow
> Issue Type: New Feature
> Components: R
> Affects Versions: 3.0.0
> Reporter: Mauricio 'PachĂĄ' Vargas SepĂșlveda
> Priority: Minor
> Labels: pull-request-available
> Fix For: 7.0.0
>
> Time Spent: 2.5h
> Remaining Estimate: 0h
>
> the Python docs show that we can pass, say, 1025 partitions
> https://arrow.apache.org/docs/_modules/pyarrow/dataset.html
> but in R this argument doesn't exist, it would be good to add this for arrow v4.0.0
> this is useful, for example, with intl trade datasets:
> {code:java}
> # d = UN COMTRADE - World's bilateral flows 2019
> # 13,050,535 x 22 data.frame
> d %>%
> group_by(Year, `Reporter ISO`, `Partner ISO`) %>%
> write_dataset("parquet", hive_style = F)
> Error: Invalid: Fragment would be written into 12808 partitions. This exceeds the maximum of 1024
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)