You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Todd Farmer (Jira)" <ji...@apache.org> on 2022/07/08 21:42:00 UTC

[jira] [Updated] (ARROW-16863) [R] open_dataset() silently drops the missing values from a csv file

     [ https://issues.apache.org/jira/browse/ARROW-16863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Farmer updated ARROW-16863:
--------------------------------
    Component/s: R

> [R] open_dataset() silently drops the missing values from a csv file
> --------------------------------------------------------------------
>
>                 Key: ARROW-16863
>                 URL: https://issues.apache.org/jira/browse/ARROW-16863
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>            Reporter: Zsolt Kegyes-Brassai
>            Priority: Major
>
> The {{open_dataset()}} +silently+ drops the empty/missing values from a csv file. This empty string was generated when writing a dataframe containing a NA value using the {{{}write_csv_arrow(){}}}.
>  
> {code:java}
> df_numbers <- tibble::tibble(number = c(1, 2, "error", 4, 5, NA, 7, 8))
> arrow::write_csv_arrow(df_numbers, "numbers.csv")
> readLines("numbers.csv")
> #> [1] "\"number\"" "\"1\""      "\"2\""      "\"error\""  "\"4\""     
> #> [6] "\"5\""      ""           "\"7\""      "\"8\""
> arrow::open_dataset("numbers.csv", format = "csv") |> dplyr::collect()
> #> # A tibble: 7 x 1
> #>   number
> #>   <chr> 
> #> 1 1     
> #> 2 2     
> #> 3 error 
> #> 4 4     
> #> 5 5     
> #> 6 7     
> #> 7 8
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)