You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2022/03/04 09:41:00 UTC

[jira] [Resolved] (IMPALA-11137) ORC and Avro testdata on date_tbl are unusable

     [ https://issues.apache.org/jira/browse/IMPALA-11137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Quanlong Huang resolved IMPALA-11137.
-------------------------------------
    Fix Version/s: Impala 4.1.0
       Resolution: Fixed

> ORC and Avro testdata on date_tbl are unusable
> ----------------------------------------------
>
>                 Key: IMPALA-11137
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11137
>             Project: IMPALA
>          Issue Type: Test
>          Components: Infrastructure
>    Affects Versions: Impala 4.0.0, Impala 3.4.0
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Major
>             Fix For: Impala 4.1.0
>
>
> Currently, ORC and Avro testdata on the data_tbl table is inconsistent with other formats (text, parquet, kudu).
> {code:sql}
> [localhost:21050] functional_orc_def> select * from date_tbl order by id_col;
> Query: select * from date_tbl order by id_col
> Query submitted at: 2022-02-20 11:19:36 (Coordinator: http://quanlong-OptiPlex-BJ:25000)
> Query progress can be monitored at: http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=a14cc5049351c48a:703197c000000000
> +--------+------------+------------+
> | id_col | date_col   | date_part  |
> +--------+------------+------------+
> | 0      | NULL       | 0001-01-01 |
> | 1      | 0001-12-29 | 0001-01-01 |
> | 2      | 0001-12-30 | 0001-01-01 |
> | 3      | 1400-01-08 | 0001-01-01 |
> | 4      | 2017-11-28 | 0001-01-01 |
> | 5      | 9999-12-31 | 0001-01-01 |
> | 6      | NULL       | 0001-01-01 |
> | 10     | 2017-11-28 | 1399-06-27 |
> | 11     | NULL       | 1399-06-27 |
> | 12     | 2018-12-31 | 1399-06-27 |
> | 20     | 0001-06-19 | 2017-11-27 |
> | 21     | 0001-06-20 | 2017-11-27 |
> | 22     | 0001-06-21 | 2017-11-27 |
> | 23     | 0001-06-22 | 2017-11-27 |
> | 24     | 0001-06-23 | 2017-11-27 |
> | 25     | 0001-06-24 | 2017-11-27 |
> | 26     | 0001-06-25 | 2017-11-27 |
> | 27     | 0001-06-26 | 2017-11-27 |
> | 28     | 0001-06-27 | 2017-11-27 |
> | 29     | 2017-11-28 | 2017-11-27 |
> | 30     | 9999-12-01 | 9999-12-31 |
> | 31     | 9999-12-31 | 9999-12-31 |
> +--------+------------+------------+
> WARNINGS: ORC file 'hdfs://localhost:20500/test-warehouse/managed/date_tbl_orc_def/date_part=0001-01-01/base_0000005/bucket_00000_0' column '8' contains an out of range date. The valid date range is 0001-01-01..9999-12-31. 
> [localhost:21050] default> select * from functional_avro_snap.date_tbl order by id_col;
> Query: select * from functional_avro_snap.date_tbl order by id_col
> Query submitted at: 2022-02-20 15:38:04 (Coordinator: http://quanlong-OptiPlex-BJ:25000)
> Query progress can be monitored at: http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=2e4f2a5a7375ff57:083fe1a300000000
> +--------+------------+------------+
> | id_col | date_col   | date_part  |
> +--------+------------+------------+
> | 10     | 2017-11-28 | 1399-06-27 |
> | 11     | NULL       | 1399-06-27 |
> | 12     | 2018-12-31 | 1399-06-27 |
> | 20     | 0001-06-19 | 2017-11-27 |
> | 21     | 0001-06-20 | 2017-11-27 |
> | 22     | 0001-06-21 | 2017-11-27 |
> | 23     | 0001-06-22 | 2017-11-27 |
> | 24     | 0001-06-23 | 2017-11-27 |
> | 25     | 0001-06-24 | 2017-11-27 |
> | 26     | 0001-06-25 | 2017-11-27 |
> | 27     | 0001-06-26 | 2017-11-27 |
> | 28     | 0001-06-27 | 2017-11-27 |
> | 29     | 2017-11-28 | 2017-11-27 |
> | 30     | 9999-12-01 | 9999-12-31 |
> | 31     | 9999-12-31 | 9999-12-31 |
> +--------+------------+------------+
> WARNINGS: Problem parsing file hdfs://localhost:20500/test-warehouse/date_tbl_avro_snap/date_part=0001-01-01/000000_0 at 307
> Avro file 'hdfs://localhost:20500/test-warehouse/date_tbl_avro_snap/date_part=0001-01-01/000000_0' is corrupt: out of range date value -719164 at offset 307. The valid date range is -719162..2932896 (0001-01-01..9999-12-31).
> Fetched 15 row(s) in 0.44s
> {code}
> They should be consistent with other formats, e.g. parquet:
> {code:sql}
> [localhost:21050] default> select * from functional_parquet.date_tbl order by id_col;
> Query: select * from functional_parquet.date_tbl order by id_col
> Query submitted at: 2022-02-20 15:39:06 (Coordinator: http://quanlong-OptiPlex-BJ:25000)
> Query progress can be monitored at: http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=e84991542712d687:fbc35db000000000
> +--------+------------+------------+
> | id_col | date_col   | date_part  |
> +--------+------------+------------+
> | 0      | 0001-01-01 | 0001-01-01 |
> | 1      | 0001-12-31 | 0001-01-01 |
> | 2      | 0002-01-01 | 0001-01-01 |
> | 3      | 1399-12-31 | 0001-01-01 |
> | 4      | 2017-11-28 | 0001-01-01 |
> | 5      | 9999-12-31 | 0001-01-01 |
> | 6      | NULL       | 0001-01-01 |
> | 10     | 2017-11-28 | 1399-06-27 |
> | 11     | NULL       | 1399-06-27 |
> | 12     | 2018-12-31 | 1399-06-27 |
> | 20     | 0001-06-21 | 2017-11-27 |
> | 21     | 0001-06-22 | 2017-11-27 |
> | 22     | 0001-06-23 | 2017-11-27 |
> | 23     | 0001-06-24 | 2017-11-27 |
> | 24     | 0001-06-25 | 2017-11-27 |
> | 25     | 0001-06-26 | 2017-11-27 |
> | 26     | 0001-06-27 | 2017-11-27 |
> | 27     | 0001-06-28 | 2017-11-27 |
> | 28     | 0001-06-29 | 2017-11-27 |
> | 29     | 2017-11-28 | 2017-11-27 |
> | 30     | 9999-12-01 | 9999-12-31 |
> | 31     | 9999-12-31 | 9999-12-31 |
> +--------+------------+------------+
> Fetched 22 row(s) in 0.22s
> {code}
> These two tables are generated by Hive. The difference makes these two tables unusable in tests. As mentioned in IMPALA-9555, Hive still uses the legacy DATE format (Julian calendar) by default, whereas Impala uses proleptic Gregorian Calendar. We don't have any legacy testdata in our minicluster. I think we can change our Hive configs to use proleptic Gregorian Calendar by default. CC [~attilaj] 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org