You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@impala.apache.org by Aleksei Maželis <ol...@gmail.com> on 2017/09/11 07:37:57 UTC
Querying column of double data type from (Kudu-based) table: artifact digits
Hi,
While querying, via Impala, a double-type data column from Kudu table, I
have noticed that the original value is sometimes miss-represented with an
artifact digit. Say, whereas "68.01" is stored in a column, querying it
returns "68.01000000000001". What would be the proper way of handling (or
going around) that?
This can be reproduced e.g. as:
> create TABLE test
(
col1 BIGINT PRIMARY KEY,
col2 STRING,
col3 double
) PARTITION BY HASH(col1) PARTITIONS 2
STORED AS KUDU TBLPROPERTIES ('kudu.master_addresses'='kudu-docker:7051',
'kudu.num_tablet_replicas' = '1', 'kudu.table_name'='test')
> describe test;
Query: describe test
+------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
| name | type | comment | primary_key | nullable | default_value |
encoding | compression | block_size |
+------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
| col1 | bigint | | true | false | |
AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
| col2 | string | | false | true | |
AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
| col3 | double | | false | true | |
AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
+------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
> insert into test values (1, "test", 68.01);
> select * from test;
Query: select * from test
+------+------+-------------------+
| col1 | col2 | col3 |
+------+------+-------------------+
| 1 | test | 68.01000000000001 |
+------+------+-------------------+
Finally, checking that the digit at the end is an artifact:
> select (col3-68.01)*10e20 from test;
Query: select (col3-68.01)*10e20 from test
+--------------------------+
| (col3 - 68.01) * 1.0e+21 |
+--------------------------+
| 0 |
+--------------------------+
Regards,
Aleksei Maželis
Re: Querying column of double data type from (Kudu-based) table:
artifact digits
Posted by Aleksei Maželis <ol...@gmail.com>.
Ok, in the absence of Decimal support in Kudu ATM, I guess casting is the
only way. Thanks for elaborating!
Regards,
Aleksei
On Mon, Sep 11, 2017 at 7:21 PM, Alexander Behm <al...@cloudera.com>
wrote:
> Like Jim said, this is purely a "display" issue. If you want a precise
> representation you can cast to a decimal:
>
> select cast(col as decimal(4, 2)) from test
>
> On Mon, Sep 11, 2017 at 8:53 AM, Jim Apple <jb...@cloudera.com> wrote:
>
>> 68.01 is not exactly representable in IEEE double-precision floating
>> point. In Python:
>>
>> >>> 68.01 * 100 - 6801
>> 9.094947017729282e-13
>>
>> One workaround is to use integers and store 6801 instead of 68.01. In
>> Impala, in non-Kudu tables, you can use DECIMAL.
>>
>>
>> On Mon, Sep 11, 2017 at 12:37 AM, Aleksei Maželis <ol...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> While querying, via Impala, a double-type data column from Kudu table, I
>>> have noticed that the original value is sometimes miss-represented with an
>>> artifact digit. Say, whereas "68.01" is stored in a column, querying it
>>> returns "68.01000000000001". What would be the proper way of handling (or
>>> going around) that?
>>>
>>> This can be reproduced e.g. as:
>>>
>>> > create TABLE test
>>> (
>>> col1 BIGINT PRIMARY KEY,
>>> col2 STRING,
>>> col3 double
>>> ) PARTITION BY HASH(col1) PARTITIONS 2
>>> STORED AS KUDU TBLPROPERTIES ('kudu.master_addresses'='kudu-docker:7051',
>>> 'kudu.num_tablet_replicas' = '1', 'kudu.table_name'='test')
>>>
>>> > describe test;
>>> Query: describe test
>>> +------+--------+---------+-------------+----------+--------
>>> -------+---------------+---------------------+------------+
>>> | name | type | comment | primary_key | nullable | default_value |
>>> encoding | compression | block_size |
>>> +------+--------+---------+-------------+----------+--------
>>> -------+---------------+---------------------+------------+
>>> | col1 | bigint | | true | false | |
>>> AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
>>> | col2 | string | | false | true | |
>>> AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
>>> | col3 | double | | false | true | |
>>> AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
>>> +------+--------+---------+-------------+----------+--------
>>> -------+---------------+---------------------+------------+
>>>
>>> > insert into test values (1, "test", 68.01);
>>>
>>> > select * from test;
>>> Query: select * from test
>>> +------+------+-------------------+
>>> | col1 | col2 | col3 |
>>> +------+------+-------------------+
>>> | 1 | test | 68.01000000000001 |
>>> +------+------+-------------------+
>>>
>>> Finally, checking that the digit at the end is an artifact:
>>>
>>> > select (col3-68.01)*10e20 from test;
>>> Query: select (col3-68.01)*10e20 from test
>>> +--------------------------+
>>> | (col3 - 68.01) * 1.0e+21 |
>>> +--------------------------+
>>> | 0 |
>>> +--------------------------+
>>>
>>> Regards,
>>> Aleksei Maželis
>>>
>>
>>
>
Re: Querying column of double data type from (Kudu-based) table:
artifact digits
Posted by Alexander Behm <al...@cloudera.com>.
Like Jim said, this is purely a "display" issue. If you want a precise
representation you can cast to a decimal:
select cast(col as decimal(4, 2)) from test
On Mon, Sep 11, 2017 at 8:53 AM, Jim Apple <jb...@cloudera.com> wrote:
> 68.01 is not exactly representable in IEEE double-precision floating
> point. In Python:
>
> >>> 68.01 * 100 - 6801
> 9.094947017729282e-13
>
> One workaround is to use integers and store 6801 instead of 68.01. In
> Impala, in non-Kudu tables, you can use DECIMAL.
>
>
> On Mon, Sep 11, 2017 at 12:37 AM, Aleksei Maželis <ol...@gmail.com>
> wrote:
>
>> Hi,
>>
>> While querying, via Impala, a double-type data column from Kudu table, I
>> have noticed that the original value is sometimes miss-represented with an
>> artifact digit. Say, whereas "68.01" is stored in a column, querying it
>> returns "68.01000000000001". What would be the proper way of handling (or
>> going around) that?
>>
>> This can be reproduced e.g. as:
>>
>> > create TABLE test
>> (
>> col1 BIGINT PRIMARY KEY,
>> col2 STRING,
>> col3 double
>> ) PARTITION BY HASH(col1) PARTITIONS 2
>> STORED AS KUDU TBLPROPERTIES ('kudu.master_addresses'='kudu-docker:7051',
>> 'kudu.num_tablet_replicas' = '1', 'kudu.table_name'='test')
>>
>> > describe test;
>> Query: describe test
>> +------+--------+---------+-------------+----------+--------
>> -------+---------------+---------------------+------------+
>> | name | type | comment | primary_key | nullable | default_value |
>> encoding | compression | block_size |
>> +------+--------+---------+-------------+----------+--------
>> -------+---------------+---------------------+------------+
>> | col1 | bigint | | true | false | |
>> AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
>> | col2 | string | | false | true | |
>> AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
>> | col3 | double | | false | true | |
>> AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
>> +------+--------+---------+-------------+----------+--------
>> -------+---------------+---------------------+------------+
>>
>> > insert into test values (1, "test", 68.01);
>>
>> > select * from test;
>> Query: select * from test
>> +------+------+-------------------+
>> | col1 | col2 | col3 |
>> +------+------+-------------------+
>> | 1 | test | 68.01000000000001 |
>> +------+------+-------------------+
>>
>> Finally, checking that the digit at the end is an artifact:
>>
>> > select (col3-68.01)*10e20 from test;
>> Query: select (col3-68.01)*10e20 from test
>> +--------------------------+
>> | (col3 - 68.01) * 1.0e+21 |
>> +--------------------------+
>> | 0 |
>> +--------------------------+
>>
>> Regards,
>> Aleksei Maželis
>>
>
>
Re: Querying column of double data type from (Kudu-based) table:
artifact digits
Posted by Jim Apple <jb...@cloudera.com>.
68.01 is not exactly representable in IEEE double-precision floating point.
In Python:
>>> 68.01 * 100 - 6801
9.094947017729282e-13
One workaround is to use integers and store 6801 instead of 68.01. In
Impala, in non-Kudu tables, you can use DECIMAL.
On Mon, Sep 11, 2017 at 12:37 AM, Aleksei Maželis <ol...@gmail.com> wrote:
> Hi,
>
> While querying, via Impala, a double-type data column from Kudu table, I
> have noticed that the original value is sometimes miss-represented with an
> artifact digit. Say, whereas "68.01" is stored in a column, querying it
> returns "68.01000000000001". What would be the proper way of handling (or
> going around) that?
>
> This can be reproduced e.g. as:
>
> > create TABLE test
> (
> col1 BIGINT PRIMARY KEY,
> col2 STRING,
> col3 double
> ) PARTITION BY HASH(col1) PARTITIONS 2
> STORED AS KUDU TBLPROPERTIES ('kudu.master_addresses'='kudu-docker:7051',
> 'kudu.num_tablet_replicas' = '1', 'kudu.table_name'='test')
>
> > describe test;
> Query: describe test
> +------+--------+---------+-------------+----------+--------
> -------+---------------+---------------------+------------+
> | name | type | comment | primary_key | nullable | default_value |
> encoding | compression | block_size |
> +------+--------+---------+-------------+----------+--------
> -------+---------------+---------------------+------------+
> | col1 | bigint | | true | false | |
> AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
> | col2 | string | | false | true | |
> AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
> | col3 | double | | false | true | |
> AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
> +------+--------+---------+-------------+----------+--------
> -------+---------------+---------------------+------------+
>
> > insert into test values (1, "test", 68.01);
>
> > select * from test;
> Query: select * from test
> +------+------+-------------------+
> | col1 | col2 | col3 |
> +------+------+-------------------+
> | 1 | test | 68.01000000000001 |
> +------+------+-------------------+
>
> Finally, checking that the digit at the end is an artifact:
>
> > select (col3-68.01)*10e20 from test;
> Query: select (col3-68.01)*10e20 from test
> +--------------------------+
> | (col3 - 68.01) * 1.0e+21 |
> +--------------------------+
> | 0 |
> +--------------------------+
>
> Regards,
> Aleksei Maželis
>