You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Hugo Gruson (Jira)" <ji...@apache.org> on 2022/07/28 14:58:00 UTC

[jira] [Created] (ARROW-17241) Support scientific notation for integers in csv reader

Hugo Gruson created ARROW-17241:
-----------------------------------

             Summary: Support scientific notation for integers in csv reader
                 Key: ARROW-17241
                 URL: https://issues.apache.org/jira/browse/ARROW-17241
             Project: Apache Arrow
          Issue Type: New Feature
         Environment: arrow R package 8.0.0
            Reporter: Hugo Gruson


It looks like the csv reader doesn't support scientific notation for integers, as shown in the following reprex. However, it works fine for floats/doubles.

Could support for scientific notation for integers be added please?

 
{noformat}
testcsv <- tempfile(fileext = ".csv")

c(1, 2, 1e6) |> 
  as.data.frame() |> 
  setNames("int") |> 
  write.csv(testcsv, row.names = FALSE)

arrow::read_csv_arrow(testcsv, col_types = "i", col_names = "int", skip = 1)
#> Error:
#> ! Invalid: In CSV column #0: CSV conversion error to int32: invalid value '1e+06'

#> Backtrace:
#>     ▆
#>  1. └─arrow (local) `<fn>`(...)
#>  2.   └─base::tryCatch(...)
#>  3.     └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#>  4.       └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>  5.         └─value[[3L]](cond)
#>  6.           └─arrow:::handle_csv_read_error(e, schema, call)
#>  7.             └─rlang::abort(msg, call = call)

arrow::read_csv_arrow(testcsv, col_types = "d", col_names = "int", skip = 1)
#> # A tibble: 3 × 1
#>       int
#>     <dbl>
#> 1       1
#> 2       2
#> 3 1000000
{noformat}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)