You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Andrew Lamb (Jira)" <ji...@apache.org> on 2021/04/26 12:37:04 UTC

[jira] [Closed] (ARROW-10132) [Rust] Considers scientific notation when inferring schema from csv

     [ https://issues.apache.org/jira/browse/ARROW-10132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Lamb closed ARROW-10132.
-------------------------------
    Resolution: Invalid

> [Rust] Considers scientific notation when inferring schema from csv
> -------------------------------------------------------------------
>
>                 Key: ARROW-10132
>                 URL: https://issues.apache.org/jira/browse/ARROW-10132
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Rust
>    Affects Versions: 1.0.1
>         Environment: Ubuntu
>            Reporter: Ziru Niu
>            Priority: Minor
>              Labels: easyfix
>
>  
> ||col||
> |1.2|
> |1.3e-2|
> |1.4|
> Currently this column would be inferred as Utf8 type, since csv::reader::DECIMAL_RE is defined as r"^-?(\d+\.\d+)$". Maybe we could change this to r"^-?(\d+\.\d+)(e-?(\d+))?$" or similar stuff to allow scientific notation of real number inferred as float?
>  
> (The RE I currently proposed doesn't handle "5e-4" correctly though)
>  
> And I would wish we could infer "3." or ".3" as float too. I will come up with an exact RE when I get time.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)