You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "Mauricio 'Pachá' Vargas Sepúlveda (Jira)" <ji...@apache.org> on 2021/05/21 23:48:00 UTC

[jira] [Commented] (ARROW-12315) [R] add max_partitions argument to write_dataset()

    [ https://issues.apache.org/jira/browse/ARROW-12315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17349564#comment-17349564 ] 

Mauricio 'Pachá' Vargas Sepúlveda commented on ARROW-12315:
-----------------------------------------------------------

related to ARROW-12373, the PR for this ticket adds a verification so that instead of converting values of -n, ..., -3, -2, -1 max partitions to 18,446,744,073,709,551,613, it returns an error message about feasibility.

> [R] add max_partitions argument to write_dataset()
> --------------------------------------------------
>
>                 Key: ARROW-12315
>                 URL: https://issues.apache.org/jira/browse/ARROW-12315
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: R
>    Affects Versions: 3.0.0
>            Reporter: Mauricio 'Pachá' Vargas Sepúlveda
>            Assignee: Mauricio 'Pachá' Vargas Sepúlveda
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 5.0.0
>
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> the Python docs show that we can pass, say, 1025 partitions
> https://arrow.apache.org/docs/_modules/pyarrow/dataset.html
> but in R this argument doesn't exist, it would be good to add this for arrow v4.0.0
> this is useful, for example, with intl trade datasets:
> {code:java}
> # d = UN COMTRADE - World's bilateral flows 2019
> # 13,050,535 x 22 data.frame
> d %>%
>           group_by(Year, `Reporter ISO`, `Partner ISO`) %>%
>           write_dataset("parquet", hive_style = F)
> Error: Invalid: Fragment would be written into 12808 partitions. This exceeds the maximum of 1024
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)