You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Nic Crane (Jira)" <ji...@apache.org> on 2021/05/14 13:47:00 UTC

[jira] [Updated] (ARROW-12792) [R] FileSystemDataset incorrectly reports CSVs as parquet files

     [ https://issues.apache.org/jira/browse/ARROW-12792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nic Crane updated ARROW-12792:
------------------------------
    Description: 
I was running the following code:
{code:java}
tf <- tempfile()
dir.create(tf)
on.exit(unlink(tf))
write_csv_arrow(mtcars[1:5,], file.path(tf, "file1.csv"))
write_csv_arrow(mtcars[6:11,], file.path(tf, "file2.csv"))
# ds <- open_dataset(c(file.path(tf, "file1.csv"), file.path(tf, "file2.csv")))
ds <- open_dataset(c(file.path(tf, "file1.csv"), file.path(tf, "file2.csv")), 
                   schema = Table$create(mtcars)$schema
                   )
{code}
But when I print the ds object, it reports that the files are Parquet files not CSVs
{code:java}
> ds
 FileSystemDataset with 2 Parquet files
 mpg: double
 cyl: double
 disp: double
 hp: double
 drat: double
 wt: double
 qsec: double
 vs: double
 am: double
 gear: double
 carb: double{code}

  was:
I was running the following code:
{code:java}
tf <- tempfile()
dir.create(tf)
on.exit(unlink(tf))
write_csv_arrow(mtcars[1:5,], file.path(tf, "file1.csv"))
write_csv_arrow(mtcars[6:11,], file.path(tf, "file2.csv"))
# ds <- open_dataset(c(file.path(tf, "file1.csv"), file.path(tf, "file2.csv")))
ds <- open_dataset(c(file.path(tf, "file1.csv"), file.path(tf, "file2.csv")), 
                   schema = Table$create(mtcars)$schema
                   )
{code}
But when I print the ds object, it reports that the files are Parquet files not CSVs

> ds
FileSystemDataset with 2 Parquet files
mpg: double
cyl: double
disp: double
hp: double
drat: double
wt: double
qsec: double
vs: double
am: double
gear: double
carb: double


> [R] FileSystemDataset incorrectly reports CSVs as parquet files
> ---------------------------------------------------------------
>
>                 Key: ARROW-12792
>                 URL: https://issues.apache.org/jira/browse/ARROW-12792
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>            Reporter: Nic Crane
>            Assignee: Nic Crane
>            Priority: Major
>
> I was running the following code:
> {code:java}
> tf <- tempfile()
> dir.create(tf)
> on.exit(unlink(tf))
> write_csv_arrow(mtcars[1:5,], file.path(tf, "file1.csv"))
> write_csv_arrow(mtcars[6:11,], file.path(tf, "file2.csv"))
> # ds <- open_dataset(c(file.path(tf, "file1.csv"), file.path(tf, "file2.csv")))
> ds <- open_dataset(c(file.path(tf, "file1.csv"), file.path(tf, "file2.csv")), 
>                    schema = Table$create(mtcars)$schema
>                    )
> {code}
> But when I print the ds object, it reports that the files are Parquet files not CSVs
> {code:java}
> > ds
>  FileSystemDataset with 2 Parquet files
>  mpg: double
>  cyl: double
>  disp: double
>  hp: double
>  drat: double
>  wt: double
>  qsec: double
>  vs: double
>  am: double
>  gear: double
>  carb: double{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)