You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Keith Hughitt (Jira)" <ji...@apache.org> on 2020/02/10 17:28:00 UTC

[jira] [Created] (ARROW-7825) Have arrow::read_parquet respect options(stringsAsFactors = FALSE)

Keith Hughitt created ARROW-7825:
------------------------------------

             Summary: Have arrow::read_parquet respect options(stringsAsFactors = FALSE)
                 Key: ARROW-7825
                 URL: https://issues.apache.org/jira/browse/ARROW-7825
             Project: Apache Arrow
          Issue Type: Improvement
          Components: R
    Affects Versions: 0.16.0
         Environment: Linux 64-bit 5.4.15
            Reporter: Keith Hughitt


Same issue as reported for feather::read_feather (https://issues.apache.org/jira/browse/ARROW-7823);

 

For the R arrow package, the "read_parquet()" function currently does not respect "options(stringsAsFactors = FALSE)", leading to unexpected/inconsistent behavior.

 

*Example:*

 

 
{code:java}
library(arrow)
library(readr)
options(stringsAsFactors = FALSE)
write_tsv(head(iris), 'test.tsv')
write_parquet(head(iris), 'test.parquet')
head(read.delim('test.tsv', sep='\t')$Species)
# [1] "setosa" "setosa" "setosa" "setosa" "setosa" "setosa"
head(read_tsv('test.tsv', col_types = cols())$Species)
# [1] "setosa" "setosa" "setosa" "setosa" "setosa" "setosa"
head(read_parquet('test.parquet')$Species)
# [1] setosa setosa setosa setosa setosa setosa
# Levels: setosa versicolor virginica
{code}
 

 

*Versions:*

- R 3.6.2

- arrow_0.15.1.9000



--
This message was sent by Atlassian Jira
(v8.3.4#803005)