You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Zsolt Kegyes-Brassai (Jira)" <ji...@apache.org> on 2021/03/20 19:20:00 UTC
[jira] [Created] (ARROW-12036) [R] dataset by a single parquet file
Zsolt Kegyes-Brassai created ARROW-12036:
--------------------------------------------
Summary: [R] dataset by a single parquet file
Key: ARROW-12036
URL: https://issues.apache.org/jira/browse/ARROW-12036
Project: Apache Arrow
Issue Type: Wish
Reporter: Zsolt Kegyes-Brassai
I like using the {{dplyr}} in conjunction with [datasets|https://arrow.apache.org/docs/r/articles/dataset.html], it results in a clean code.
There are times, when I would like to use the same workflow just for a single (larger) parquet file and in most of those cases it doesn’t make sense to create a separate folder for just one file.
(the {{read_parquet()}} provides options only for selecting the columns, no filtering and grouping)
Is it possible/does it make sense to extend the {{open_dataset()}} with an option to specify just a single file?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)