You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Eu Jing Chua (Jira)" <ji...@apache.org> on 2021/04/29 16:43:00 UTC
[jira] [Created] (ARROW-12603) open_dataset ignoring provided
schema when using select
Eu Jing Chua created ARROW-12603:
------------------------------------
Summary: open_dataset ignoring provided schema when using select
Key: ARROW-12603
URL: https://issues.apache.org/jira/browse/ARROW-12603
Project: Apache Arrow
Issue Type: Bug
Components: R
Affects Versions: 4.0.0
Environment: R version 4.0.5 (2021-03-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Reporter: Eu Jing Chua
While the following snippet works with arrow 3.0.0, it fails after updating to arrow 4.0.0.
An example CSV that can be used to replicate this can be found [here|https://raw.githubusercontent.com/reichlab/covid19-forecast-hub/master/data-processed/Karlen-pypm/2021-04-25-Karlen-pypm.csv]
{code:bash}
.
├── data
│ └── 2021-04-25-Karlen-pypm.csv
└── test.R
{code}
{code:r}
library(arrow)
library(tidyverse)
sch <- schema(forecast_date=string(),
target=string(),
target_end_date=string(),
location=string(),
type=string(),
quantile=string(),
value=string())
ds = open_dataset("data", format = "csv", schema = sch)
ds %>% select(target) %>% collect()
{code}
The error is:
{{Error: Invalid: In CSV column #3: CSV conversion error to int64: invalid value 'US'}}
However, it should be noted that these all run well and return a data frame with the right schema.
{code:r}
ds %>% collect()
ds %>% select(target, location) %>% collect()
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)