You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "N D (Jira)" <ji...@apache.org> on 2021/12/15 23:00:00 UTC
[jira] [Created] (ARROW-15123) [R] Schema order not respected and file header ignored
N D created ARROW-15123:
---------------------------
Summary: [R] Schema order not respected and file header ignored
Key: ARROW-15123
URL: https://issues.apache.org/jira/browse/ARROW-15123
Project: Apache Arrow
Issue Type: Bug
Components: R
Affects Versions: 6.0.1, 6.0.0
Reporter: N D
Attachments: reprex-arrow-6-read.tar.gz
In `arrow` 6.0.0+ for R, when I read in a CSV file using a schema where the order of the columns in the schema doesn't match the order of columns in the CSV, the data is read in incorrectly.
The header is included as an observation in the read-in dataset. The columns are renamed *but not reordered* to match the schema. So I end up with the "quantile" column called "location", etc, as below.
{code:java}
[1] "last few obs in sorted order with arrow"
# A tibble: 6 × 7
forecast_date target target_end_date location type quantile value
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2021-12-12 9 day ahead… 2021-12-21 0.99 946.43313… 06 quant…
2 2021-12-12 9 day ahead… 2021-12-21 0.99 956.43294… 39 quant…
3 2021-12-12 9 day ahead… 2021-12-21 0.99 97.948144… 41 quant…
4 2021-12-12 9 day ahead… 2021-12-21 0.99 98.573545… 49 quant…
5 2021-12-12 9 day ahead… 2021-12-21 0.99 98.978636… 33 quant…
6 forecast_date target target_end_date quantile value location type
[1] "dimensions with arrow"
[1] 45361 7 {code}
The [file in question|[https://raw.githubusercontent.com/reichlab/covid19-forecast-hub/master/data-processed/JHUAPL-Gecko/2021-12-12-JHUAPL-Gecko.csv|https://raw.githubusercontent.com/reichlab/covid19-forecast-hub/master/data-processed/JHUAPL-Gecko/2021-12-12-JHUAPL-Gecko.csv)]] has 45360 observations + 1 line for the header.
Reprex attached with working (`packageVersion("arrow") == 4.0.1`; 5.0.0 also works) and non-working (`packageVersion("arrow") == 6.0.1`) examples. Run examples using `make run-broken` and `make run-works`.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)