You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2021/06/04 20:12:00 UTC
[jira] [Assigned] (ARROW-12791) [R] Better error handling for
DatasetFactory$Finish() when no format specified
[ https://issues.apache.org/jira/browse/ARROW-12791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Neal Richardson reassigned ARROW-12791:
---------------------------------------
Assignee: Nic Crane
> [R] Better error handling for DatasetFactory$Finish() when no format specified
> ------------------------------------------------------------------------------
>
> Key: ARROW-12791
> URL: https://issues.apache.org/jira/browse/ARROW-12791
> Project: Apache Arrow
> Issue Type: Improvement
> Components: R
> Reporter: Nic Crane
> Assignee: Nic Crane
> Priority: Major
> Labels: pull-request-available
> Time Spent: 4h 50m
> Remaining Estimate: 0h
>
> When I call the following code:
>
> {code:java}
> tf <- tempfile()
> dir.create(tf)
> on.exit(unlink(tf))
> write_csv_arrow(mtcars[1:5,], file.path(tf, "file1.csv"))
> write_csv_arrow(mtcars[6:11,], file.path(tf, "file2.csv"))
> ds <- open_dataset(c(file.path(tf, "file1.csv"), file.path(tf, "file2.csv")))
> {code}
> I get the following error:
> {code:java}
> Error: IOError: Could not open parquet input source '/tmp/RtmpSug6P8/file714931976ac54/file1.csv': Invalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.
> {code}
> However, in the documentation for open_dataset(), there is nothing saying that the input source cannot be a CSV or must be a Parquet file.
> I think this is due to calling DataSetFactory$Finish() when schema is NULL and input files have no inherent schema (i.e. are CSVs).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)