You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Andrew Lamb (Jira)" <ji...@apache.org> on 2021/04/26 12:37:04 UTC
[jira] [Commented] (ARROW-10132) [Rust] Considers scientific
notation when inferring schema from csv
[ https://issues.apache.org/jira/browse/ARROW-10132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332133#comment-17332133 ]
Andrew Lamb commented on ARROW-10132:
-------------------------------------
Migrated to github: https://github.com/apache/arrow-rs/issues/108
> [Rust] Considers scientific notation when inferring schema from csv
> -------------------------------------------------------------------
>
> Key: ARROW-10132
> URL: https://issues.apache.org/jira/browse/ARROW-10132
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Rust
> Affects Versions: 1.0.1
> Environment: Ubuntu
> Reporter: Ziru Niu
> Priority: Minor
> Labels: easyfix
>
>
> ||col||
> |1.2|
> |1.3e-2|
> |1.4|
> Currently this column would be inferred as Utf8 type, since csv::reader::DECIMAL_RE is defined as r"^-?(\d+\.\d+)$". Maybe we could change this to r"^-?(\d+\.\d+)(e-?(\d+))?$" or similar stuff to allow scientific notation of real number inferred as float?
>
> (The RE I currently proposed doesn't handle "5e-4" correctly though)
>
> And I would wish we could infer "3." or ".3" as float too. I will come up with an exact RE when I get time.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)