You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Todd Farmer (Jira)" <ji...@apache.org> on 2022/07/08 21:42:00 UTC
[jira] [Updated] (ARROW-16863) [R] open_dataset() silently drops the missing values from a csv file
[ https://issues.apache.org/jira/browse/ARROW-16863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Todd Farmer updated ARROW-16863:
--------------------------------
Component/s: R
> [R] open_dataset() silently drops the missing values from a csv file
> --------------------------------------------------------------------
>
> Key: ARROW-16863
> URL: https://issues.apache.org/jira/browse/ARROW-16863
> Project: Apache Arrow
> Issue Type: Bug
> Components: R
> Reporter: Zsolt Kegyes-Brassai
> Priority: Major
>
> The {{open_dataset()}} +silently+ drops the empty/missing values from a csv file. This empty string was generated when writing a dataframe containing a NA value using the {{{}write_csv_arrow(){}}}.
>
> {code:java}
> df_numbers <- tibble::tibble(number = c(1, 2, "error", 4, 5, NA, 7, 8))
> arrow::write_csv_arrow(df_numbers, "numbers.csv")
> readLines("numbers.csv")
> #> [1] "\"number\"" "\"1\"" "\"2\"" "\"error\"" "\"4\""
> #> [6] "\"5\"" "" "\"7\"" "\"8\""
> arrow::open_dataset("numbers.csv", format = "csv") |> dplyr::collect()
> #> # A tibble: 7 x 1
> #> number
> #> <chr>
> #> 1 1
> #> 2 2
> #> 3 error
> #> 4 4
> #> 5 5
> #> 6 7
> #> 7 8
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)