You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Zsolt Kegyes-Brassai (Jira)" <ji...@apache.org> on 2022/06/20 09:36:00 UTC

[jira] [Created] (ARROW-16863) [R] open_dataset() silently drops the missing values from a csv file

Zsolt Kegyes-Brassai created ARROW-16863:
--------------------------------------------

             Summary: [R] open_dataset() silently drops the missing values from a csv file
                 Key: ARROW-16863
                 URL: https://issues.apache.org/jira/browse/ARROW-16863
             Project: Apache Arrow
          Issue Type: Bug
            Reporter: Zsolt Kegyes-Brassai


The {{open_dataset()}} +silently+ drops the empty/missing values from a csv file. This empty string was generated when writing a dataframe containing a NA value using the {{{}write_csv_arrow(){}}}.

 
{code:java}
df_numbers <- tibble::tibble(number = c(1, 2, "error", 4, 5, NA, 7, 8))
arrow::write_csv_arrow(df_numbers, "numbers.csv")
readLines("numbers.csv")
#> [1] "\"number\"" "\"1\""      "\"2\""      "\"error\""  "\"4\""     
#> [6] "\"5\""      ""           "\"7\""      "\"8\""
arrow::open_dataset("numbers.csv", format = "csv") |> dplyr::collect()
#> # A tibble: 7 x 1
#>   number
#>   <chr> 
#> 1 1     
#> 2 2     
#> 3 error 
#> 4 4     
#> 5 5     
#> 6 7     
#> 7 8
{code}
 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)