You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Keith Hughitt (Jira)" <ji...@apache.org> on 2020/02/10 17:28:00 UTC
[jira] [Created] (ARROW-7825) Have arrow::read_parquet respect
options(stringsAsFactors = FALSE)
Keith Hughitt created ARROW-7825:
------------------------------------
Summary: Have arrow::read_parquet respect options(stringsAsFactors = FALSE)
Key: ARROW-7825
URL: https://issues.apache.org/jira/browse/ARROW-7825
Project: Apache Arrow
Issue Type: Improvement
Components: R
Affects Versions: 0.16.0
Environment: Linux 64-bit 5.4.15
Reporter: Keith Hughitt
Same issue as reported for feather::read_feather (https://issues.apache.org/jira/browse/ARROW-7823);
For the R arrow package, the "read_parquet()" function currently does not respect "options(stringsAsFactors = FALSE)", leading to unexpected/inconsistent behavior.
*Example:*
{code:java}
library(arrow)
library(readr)
options(stringsAsFactors = FALSE)
write_tsv(head(iris), 'test.tsv')
write_parquet(head(iris), 'test.parquet')
head(read.delim('test.tsv', sep='\t')$Species)
# [1] "setosa" "setosa" "setosa" "setosa" "setosa" "setosa"
head(read_tsv('test.tsv', col_types = cols())$Species)
# [1] "setosa" "setosa" "setosa" "setosa" "setosa" "setosa"
head(read_parquet('test.parquet')$Species)
# [1] setosa setosa setosa setosa setosa setosa
# Levels: setosa versicolor virginica
{code}
*Versions:*
- R 3.6.2
- arrow_0.15.1.9000
--
This message was sent by Atlassian Jira
(v8.3.4#803005)