You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Jacob Wujciak-Jens (Jira)" <ji...@apache.org> on 2022/07/28 15:11:00 UTC

[jira] [Updated] (ARROW-17241) Support scientific notation for integers in csv reader

     [ https://issues.apache.org/jira/browse/ARROW-17241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jacob Wujciak-Jens updated ARROW-17241:
---------------------------------------
    Component/s: R

> Support scientific notation for integers in csv reader
> ------------------------------------------------------
>
>                 Key: ARROW-17241
>                 URL: https://issues.apache.org/jira/browse/ARROW-17241
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: R
>         Environment: arrow R package 8.0.0
>            Reporter: Hugo Gruson
>            Priority: Minor
>
> It looks like the csv reader doesn't support scientific notation for integers, as shown in the following reprex. However, it works fine for floats/doubles.
> Could support for scientific notation for integers be added please?
>  
> {noformat}
> testcsv <- tempfile(fileext = ".csv")
> c(1, 2, 1e6) |> 
>   as.data.frame() |> 
>   setNames("int") |> 
>   write.csv(testcsv, row.names = FALSE)
> arrow::read_csv_arrow(testcsv, col_types = "i", col_names = "int", skip = 1)
> #> Error:
> #> ! Invalid: In CSV column #0: CSV conversion error to int32: invalid value '1e+06'
> #> Backtrace:
> #>     ▆
> #>  1. └─arrow (local) `<fn>`(...)
> #>  2.   └─base::tryCatch(...)
> #>  3.     └─base (local) tryCatchList(expr, classes, parentenv, handlers)
> #>  4.       └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
> #>  5.         └─value[[3L]](cond)
> #>  6.           └─arrow:::handle_csv_read_error(e, schema, call)
> #>  7.             └─rlang::abort(msg, call = call)
> arrow::read_csv_arrow(testcsv, col_types = "d", col_names = "int", skip = 1)
> #> # A tibble: 3 × 1
> #>       int
> #>     <dbl>
> #> 1       1
> #> 2       2
> #> 3 1000000
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)