You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@arrow.apache.org by "Ryan Patrick Kyle (Jira)" <ji...@apache.org> on 2019/10/08 17:39:00 UTC

[jira] [Created] (ARROW-6819) arrow::read_parquet ignores as_data_frame when sparklyr package is attached

Ryan Patrick Kyle created ARROW-6819:
----------------------------------------

             Summary: arrow::read_parquet ignores as_data_frame when sparklyr package is attached
                 Key: ARROW-6819
                 URL: https://issues.apache.org/jira/browse/ARROW-6819
             Project: Apache Arrow
          Issue Type: Bug
          Components: R
    Affects Versions: 0.15.0
         Environment: R version 3.6.1 (2019-07-05) on x86_64, darwin15.6.0 (Mac OS 10.13.4)
            Reporter: Ryan Patrick Kyle


I am currently using v0.15.0 of the arrow package, installed from source using CRAN. I also have v1.0.4 of the sparklyr package installed. While attempting to read in Parquet data with both packages attached, the read_parquet function appears to ignore the as_data_frame argument (which defaults to TRUE).

[https://github.com/apache/arrow/blob/3d55122c56a508894823a1b79bca71f519fdd52f/r/R/parquet.R#L35-L47]

I am not certain, but I suspect the issue may be in the way Table__to_dataframe coerces Arrow Table objects into tibbles, since this statement appears also to produce a tibble (I expected a data.frame to be returned):

arrow:::Table__to_dataframe(tab, use_threads=FALSE)

A reproducible example follows.

 

#This does work as expected

library(arrow)

temp <- tempfile()
download.file("https://github.com/Teradata/kylo/blob/master/samples/sample-data/parquet/userdata1.parquet?raw=true", temp)

read_parquet(temp, as_data_frame=TRUE)

#This does not work as expected

library(sparklyr)

library(arrow)

read_parquet(temp, as_data_frame=TRUE)

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)