You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Zsolt Kegyes-Brassai (Jira)" <ji...@apache.org> on 2022/06/20 09:36:00 UTC
[jira] [Created] (ARROW-16863) [R] open_dataset() silently drops the missing values from a csv file
Zsolt Kegyes-Brassai created ARROW-16863:
--------------------------------------------
Summary: [R] open_dataset() silently drops the missing values from a csv file
Key: ARROW-16863
URL: https://issues.apache.org/jira/browse/ARROW-16863
Project: Apache Arrow
Issue Type: Bug
Reporter: Zsolt Kegyes-Brassai
The {{open_dataset()}} +silently+ drops the empty/missing values from a csv file. This empty string was generated when writing a dataframe containing a NA value using the {{{}write_csv_arrow(){}}}.
{code:java}
df_numbers <- tibble::tibble(number = c(1, 2, "error", 4, 5, NA, 7, 8))
arrow::write_csv_arrow(df_numbers, "numbers.csv")
readLines("numbers.csv")
#> [1] "\"number\"" "\"1\"" "\"2\"" "\"error\"" "\"4\""
#> [6] "\"5\"" "" "\"7\"" "\"8\""
arrow::open_dataset("numbers.csv", format = "csv") |> dplyr::collect()
#> # A tibble: 7 x 1
#> number
#> <chr>
#> 1 1
#> 2 2
#> 3 error
#> 4 4
#> 5 5
#> 6 7
#> 7 8
{code}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)