You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Nic Crane (Jira)" <ji...@apache.org> on 2021/05/14 14:55:00 UTC
[jira] [Comment Edited] (ARROW-12792) [R] DatasetFactory could
sniff file formats
[ https://issues.apache.org/jira/browse/ARROW-12792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17344638#comment-17344638 ]
Nic Crane edited comment on ARROW-12792 at 5/14/21, 2:54 PM:
-------------------------------------------------------------
Ah, didn't know it could be created with a reference to a directory.
I suppose an alternative might be that DataSetFactor$create doesn't have a default value for format and it should instead be specified in the code? Sounds like it'd also help solve ARROW-12791 though I guess that expediancy isn't necessarily the reason to implement something.
was (Author: thisisnic):
Ah, didn't know it could be created with a reference to a directory.
I suppose an alternative might be that DataSetFactor$create doesn't have a default value for format and it should instead be specified in the code?
> [R] DatasetFactory could sniff file formats
> -------------------------------------------
>
> Key: ARROW-12792
> URL: https://issues.apache.org/jira/browse/ARROW-12792
> Project: Apache Arrow
> Issue Type: New Feature
> Components: R
> Reporter: Nic Crane
> Assignee: Nic Crane
> Priority: Minor
>
> I was running the following code:
> {code:java}
> tf <- tempfile()
> dir.create(tf)
> on.exit(unlink(tf))
> write_csv_arrow(mtcars[1:5,], file.path(tf, "file1.csv"))
> write_csv_arrow(mtcars[6:11,], file.path(tf, "file2.csv"))
> # ds <- open_dataset(c(file.path(tf, "file1.csv"), file.path(tf, "file2.csv")))
> ds <- open_dataset(c(file.path(tf, "file1.csv"), file.path(tf, "file2.csv")),
> schema = Table$create(mtcars)$schema
> )
> {code}
> But when I print the ds object, it reports that the files are Parquet files not CSVs
> {code:java}
> > ds
> FileSystemDataset with 2 Parquet files
> mpg: double
> cyl: double
> disp: double
> hp: double
> drat: double
> wt: double
> qsec: double
> vs: double
> am: double
> gear: double
> carb: double{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)