You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Martin du Toit (Jira)" <ji...@apache.org> on 2022/03/07 12:00:00 UTC
[jira] [Created] (ARROW-15856) [R] S3FileSystem - open_dataset
Martin du Toit created ARROW-15856:
--------------------------------------
Summary: [R] S3FileSystem - open_dataset
Key: ARROW-15856
URL: https://issues.apache.org/jira/browse/ARROW-15856
Project: Apache Arrow
Issue Type: New Feature
Components: R
Affects Versions: 7.0.0
Reporter: Martin du Toit
Hi
I can successfully create a S3FileSystem that connects via minio.
I can create a SubTreeFileSystem: s3://investmentaccountingdata/rawdata/transactions/transactions-xxx/v1.1/
I can list the files in the SubTreeFileSystem, and I can open a dataset on from the list of files
{code:java}
// code placeholder
list_files <- sfs$ls(recursive=TRUE)
ds <- arrow::open_dataset(sources = list_files, schema = schema_file, format = csv_format, filesystem = sfs)
{code}
This all works fine, if I provide the list of files, but I want to specify a path higher up to be able to include the sub folders as partitions. The code I use works perfectly if I run it on a local disk.
How can I do open_dataset, and give a folder as source?
--
This message was sent by Atlassian Jira
(v8.20.1#820001)