You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "Zsolt Kegyes-Brassai (Jira)" <ji...@apache.org> on 2021/03/20 19:20:00 UTC

[jira] [Created] (ARROW-12036) [R] dataset by a single parquet file

Zsolt Kegyes-Brassai created ARROW-12036:
--------------------------------------------

             Summary: [R] dataset by a single parquet file
                 Key: ARROW-12036
                 URL: https://issues.apache.org/jira/browse/ARROW-12036
             Project: Apache Arrow
          Issue Type: Wish
            Reporter: Zsolt Kegyes-Brassai


I like using the {{dplyr}} in conjunction with [datasets|https://arrow.apache.org/docs/r/articles/dataset.html], it results in a clean code.

There are times, when I would like to use the same workflow just for a single (larger) parquet file and in most of those cases it doesn’t make sense to create a separate folder for just one file. 

(the {{read_parquet()}} provides options only for selecting the columns, no filtering and grouping)

Is it possible/does it make sense to extend the {{open_dataset()}} with an option to specify just a single file?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)