You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Tim Armstrong (Jira)" <ji...@apache.org> on 2020/11/23 19:50:00 UTC

[jira] [Commented] (IMPALA-10350) Impala loses double precision because of DECIMAL->DOUBLE cast

    [ https://issues.apache.org/jira/browse/IMPALA-10350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237668#comment-17237668 ] 

Tim Armstrong commented on IMPALA-10350:
----------------------------------------

You need a specialized algorithm (like used in something like strtof) to do this accurately. A dumb but correct solution would be to convert the decimal to a string, then parse the string. I found a couple of high performance examples of the core algorithm with Apache licenses.

https://github.com/lemire/fast_double_parser/blob/master/include/fast_double_parser.h#L884
https://github.com/google/wuffs/blob/7ea23d56e3fe9e4adff95ebf11bc18b9cb06e0d5/internal/cgen/base/floatconv-submodule-code.c#L1262

Note that both of these implement float parsing, but do it by first converting to a representation similar to Impala's DecimalValue - i.e. an integer plus a scale, then converting to double.

> Impala loses double precision because of DECIMAL->DOUBLE cast
> -------------------------------------------------------------
>
>                 Key: IMPALA-10350
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10350
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Zoltán Borók-Nagy
>            Priority: Major
>              Labels: correctness, ramp-up
>         Attachments: test.c
>
>
> Impala might loses presision of double values. Reproduction: 
> {noformat}
> create table double_tbl (d double) stored as textfile;
> insert into double_tbl values (-0.43149576573887316);
> {noformat}
>  Then inspect the data file:
> {noformat}
> $ hdfs dfs -cat /test-warehouse/double_tbl/424097c644088674-c55b910100000000_175064830_data.0.txt
>  -0.4314957657388731{noformat}
> The same happens if we store our data in Parquet.
> Hive writes don't lose precision. If the data was written by Hive then Impala can read the values correctly:
> {noformat}
> $ bin/run-jdbc-client.sh -t NOSASL -q "select * from double_tbl;"
> Using JDBC Driver Name: org.apache.hive.jdbc.HiveDriver
> Connecting to: jdbc:hive2://localhost:21050/;auth=noSasl
> Executing: select * from double_tbl
> ----[START]----
> -0.43149576573887316
> ----[END]----{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org