You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@impala.apache.org by Aleksei Maželis <ol...@gmail.com> on 2017/09/11 07:37:57 UTC

Querying column of double data type from (Kudu-based) table: artifact digits

Hi,

While querying, via Impala, a double-type data column from Kudu table, I
have noticed that the original value is sometimes miss-represented with an
artifact digit. Say, whereas "68.01" is stored in a column, querying it
returns "68.01000000000001". What would be the proper way of handling (or
going around) that?

This can be reproduced e.g. as:

> create TABLE test
(
  col1 BIGINT PRIMARY KEY,
  col2 STRING,
  col3 double
) PARTITION BY HASH(col1) PARTITIONS 2
STORED AS KUDU TBLPROPERTIES ('kudu.master_addresses'='kudu-docker:7051',
'kudu.num_tablet_replicas' = '1', 'kudu.table_name'='test')

> describe test;
Query: describe test
+------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
| name | type   | comment | primary_key | nullable | default_value |
encoding      | compression         | block_size |
+------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
| col1 | bigint |         | true        | false    |               |
AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
| col2 | string |         | false       | true     |               |
AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
| col3 | double |         | false       | true     |               |
AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+

> insert into test values (1, "test", 68.01);

> select * from test;
Query: select * from test
+------+------+-------------------+
| col1 | col2 | col3              |
+------+------+-------------------+
| 1    | test | 68.01000000000001 |
+------+------+-------------------+

Finally, checking that the digit at the end is an artifact:

> select (col3-68.01)*10e20 from test;
Query: select (col3-68.01)*10e20 from test
+--------------------------+
| (col3 - 68.01) * 1.0e+21 |
+--------------------------+
| 0                        |
+--------------------------+

Regards,
Aleksei Maželis

Re: Querying column of double data type from (Kudu-based) table: artifact digits

Posted by Aleksei Maželis <ol...@gmail.com>.
Ok, in the absence of Decimal support in Kudu ATM, I guess casting is the
only way. Thanks for elaborating!

Regards,
Aleksei

On Mon, Sep 11, 2017 at 7:21 PM, Alexander Behm <al...@cloudera.com>
wrote:

> Like Jim said, this is purely a "display" issue. If you want a precise
> representation you can cast to a decimal:
>
> select cast(col as decimal(4, 2)) from test
>
> On Mon, Sep 11, 2017 at 8:53 AM, Jim Apple <jb...@cloudera.com> wrote:
>
>> 68.01 is not exactly representable in IEEE double-precision floating
>> point. In Python:
>>
>> >>> 68.01 * 100 - 6801
>> 9.094947017729282e-13
>>
>> One workaround is to use integers and store 6801 instead of 68.01. In
>> Impala, in non-Kudu tables, you can use DECIMAL.
>>
>>
>> On Mon, Sep 11, 2017 at 12:37 AM, Aleksei Maželis <ol...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> While querying, via Impala, a double-type data column from Kudu table, I
>>> have noticed that the original value is sometimes miss-represented with an
>>> artifact digit. Say, whereas "68.01" is stored in a column, querying it
>>> returns "68.01000000000001". What would be the proper way of handling (or
>>> going around) that?
>>>
>>> This can be reproduced e.g. as:
>>>
>>> > create TABLE test
>>> (
>>>   col1 BIGINT PRIMARY KEY,
>>>   col2 STRING,
>>>   col3 double
>>> ) PARTITION BY HASH(col1) PARTITIONS 2
>>> STORED AS KUDU TBLPROPERTIES ('kudu.master_addresses'='kudu-docker:7051',
>>> 'kudu.num_tablet_replicas' = '1', 'kudu.table_name'='test')
>>>
>>> > describe test;
>>> Query: describe test
>>> +------+--------+---------+-------------+----------+--------
>>> -------+---------------+---------------------+------------+
>>> | name | type   | comment | primary_key | nullable | default_value |
>>> encoding      | compression         | block_size |
>>> +------+--------+---------+-------------+----------+--------
>>> -------+---------------+---------------------+------------+
>>> | col1 | bigint |         | true        | false    |               |
>>> AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
>>> | col2 | string |         | false       | true     |               |
>>> AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
>>> | col3 | double |         | false       | true     |               |
>>> AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
>>> +------+--------+---------+-------------+----------+--------
>>> -------+---------------+---------------------+------------+
>>>
>>> > insert into test values (1, "test", 68.01);
>>>
>>> > select * from test;
>>> Query: select * from test
>>> +------+------+-------------------+
>>> | col1 | col2 | col3              |
>>> +------+------+-------------------+
>>> | 1    | test | 68.01000000000001 |
>>> +------+------+-------------------+
>>>
>>> Finally, checking that the digit at the end is an artifact:
>>>
>>> > select (col3-68.01)*10e20 from test;
>>> Query: select (col3-68.01)*10e20 from test
>>> +--------------------------+
>>> | (col3 - 68.01) * 1.0e+21 |
>>> +--------------------------+
>>> | 0                        |
>>> +--------------------------+
>>>
>>> Regards,
>>> Aleksei Maželis
>>>
>>
>>
>

Re: Querying column of double data type from (Kudu-based) table: artifact digits

Posted by Alexander Behm <al...@cloudera.com>.
Like Jim said, this is purely a "display" issue. If you want a precise
representation you can cast to a decimal:

select cast(col as decimal(4, 2)) from test

On Mon, Sep 11, 2017 at 8:53 AM, Jim Apple <jb...@cloudera.com> wrote:

> 68.01 is not exactly representable in IEEE double-precision floating
> point. In Python:
>
> >>> 68.01 * 100 - 6801
> 9.094947017729282e-13
>
> One workaround is to use integers and store 6801 instead of 68.01. In
> Impala, in non-Kudu tables, you can use DECIMAL.
>
>
> On Mon, Sep 11, 2017 at 12:37 AM, Aleksei Maželis <ol...@gmail.com>
> wrote:
>
>> Hi,
>>
>> While querying, via Impala, a double-type data column from Kudu table, I
>> have noticed that the original value is sometimes miss-represented with an
>> artifact digit. Say, whereas "68.01" is stored in a column, querying it
>> returns "68.01000000000001". What would be the proper way of handling (or
>> going around) that?
>>
>> This can be reproduced e.g. as:
>>
>> > create TABLE test
>> (
>>   col1 BIGINT PRIMARY KEY,
>>   col2 STRING,
>>   col3 double
>> ) PARTITION BY HASH(col1) PARTITIONS 2
>> STORED AS KUDU TBLPROPERTIES ('kudu.master_addresses'='kudu-docker:7051',
>> 'kudu.num_tablet_replicas' = '1', 'kudu.table_name'='test')
>>
>> > describe test;
>> Query: describe test
>> +------+--------+---------+-------------+----------+--------
>> -------+---------------+---------------------+------------+
>> | name | type   | comment | primary_key | nullable | default_value |
>> encoding      | compression         | block_size |
>> +------+--------+---------+-------------+----------+--------
>> -------+---------------+---------------------+------------+
>> | col1 | bigint |         | true        | false    |               |
>> AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
>> | col2 | string |         | false       | true     |               |
>> AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
>> | col3 | double |         | false       | true     |               |
>> AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
>> +------+--------+---------+-------------+----------+--------
>> -------+---------------+---------------------+------------+
>>
>> > insert into test values (1, "test", 68.01);
>>
>> > select * from test;
>> Query: select * from test
>> +------+------+-------------------+
>> | col1 | col2 | col3              |
>> +------+------+-------------------+
>> | 1    | test | 68.01000000000001 |
>> +------+------+-------------------+
>>
>> Finally, checking that the digit at the end is an artifact:
>>
>> > select (col3-68.01)*10e20 from test;
>> Query: select (col3-68.01)*10e20 from test
>> +--------------------------+
>> | (col3 - 68.01) * 1.0e+21 |
>> +--------------------------+
>> | 0                        |
>> +--------------------------+
>>
>> Regards,
>> Aleksei Maželis
>>
>
>

Re: Querying column of double data type from (Kudu-based) table: artifact digits

Posted by Jim Apple <jb...@cloudera.com>.
68.01 is not exactly representable in IEEE double-precision floating point.
In Python:

>>> 68.01 * 100 - 6801
9.094947017729282e-13

One workaround is to use integers and store 6801 instead of 68.01. In
Impala, in non-Kudu tables, you can use DECIMAL.


On Mon, Sep 11, 2017 at 12:37 AM, Aleksei Maželis <ol...@gmail.com> wrote:

> Hi,
>
> While querying, via Impala, a double-type data column from Kudu table, I
> have noticed that the original value is sometimes miss-represented with an
> artifact digit. Say, whereas "68.01" is stored in a column, querying it
> returns "68.01000000000001". What would be the proper way of handling (or
> going around) that?
>
> This can be reproduced e.g. as:
>
> > create TABLE test
> (
>   col1 BIGINT PRIMARY KEY,
>   col2 STRING,
>   col3 double
> ) PARTITION BY HASH(col1) PARTITIONS 2
> STORED AS KUDU TBLPROPERTIES ('kudu.master_addresses'='kudu-docker:7051',
> 'kudu.num_tablet_replicas' = '1', 'kudu.table_name'='test')
>
> > describe test;
> Query: describe test
> +------+--------+---------+-------------+----------+--------
> -------+---------------+---------------------+------------+
> | name | type   | comment | primary_key | nullable | default_value |
> encoding      | compression         | block_size |
> +------+--------+---------+-------------+----------+--------
> -------+---------------+---------------------+------------+
> | col1 | bigint |         | true        | false    |               |
> AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
> | col2 | string |         | false       | true     |               |
> AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
> | col3 | double |         | false       | true     |               |
> AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
> +------+--------+---------+-------------+----------+--------
> -------+---------------+---------------------+------------+
>
> > insert into test values (1, "test", 68.01);
>
> > select * from test;
> Query: select * from test
> +------+------+-------------------+
> | col1 | col2 | col3              |
> +------+------+-------------------+
> | 1    | test | 68.01000000000001 |
> +------+------+-------------------+
>
> Finally, checking that the digit at the end is an artifact:
>
> > select (col3-68.01)*10e20 from test;
> Query: select (col3-68.01)*10e20 from test
> +--------------------------+
> | (col3 - 68.01) * 1.0e+21 |
> +--------------------------+
> | 0                        |
> +--------------------------+
>
> Regards,
> Aleksei Maželis
>