You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Martin du Toit (Jira)" <ji...@apache.org> on 2022/03/07 12:00:00 UTC

[jira] [Created] (ARROW-15856) [R] S3FileSystem - open_dataset

Martin du Toit created ARROW-15856:
--------------------------------------

             Summary: [R] S3FileSystem - open_dataset
                 Key: ARROW-15856
                 URL: https://issues.apache.org/jira/browse/ARROW-15856
             Project: Apache Arrow
          Issue Type: New Feature
          Components: R
    Affects Versions: 7.0.0
            Reporter: Martin du Toit


Hi

 I can successfully create a S3FileSystem that connects via minio. 

I can create a SubTreeFileSystem: s3://investmentaccountingdata/rawdata/transactions/transactions-xxx/v1.1/

I can list the files in the SubTreeFileSystem, and I can open a dataset on from the list of files
{code:java}
// code placeholder
list_files <- sfs$ls(recursive=TRUE)
ds <- arrow::open_dataset(sources = list_files, schema = schema_file, format = csv_format, filesystem = sfs)

{code}
This all works fine, if I provide the list of files, but I want to specify a path higher up to be able to include the sub folders as partitions. The code I use works perfectly if I run it on a local disk.

How can I do open_dataset, and give a folder as source?

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)