You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2022/06/09 01:57:00 UTC

[jira] [Updated] (IMPALA-11344) Selecting only the missing fields of ORC files should return NULLs

     [ https://issues.apache.org/jira/browse/IMPALA-11344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Quanlong Huang updated IMPALA-11344:
------------------------------------
    Description: 
While looking into the bug of IMPALA-11296, I found a bug on the same scenario (scanning only the missing columns of ORC files) in current master branch.

Creating an ORC table with missing fields in the underlying files:
{code:sql}
hive> create external table missing_field_orc (f0 int) stored as orc;
hive> insert into table missing_field_orc select 1;
hive> alter table missing_field_orc add columns (f1 int);
hive> select f1 from missing_field_orc;
+-------+
|  f1   |
+-------+
| NULL  |
+-------+
hive> select f0, f1 from missing_field_orc;
+-----+-------+
| f0  |  f1   |
+-----+-------+
| 1   | NULL  |
+-----+-------+
{code}
Run the same queries in Impala:
{code:sql}
impala> VERSION;
Shell version: impala shell build version not available
Server version: impalad version 4.2.0-SNAPSHOT DEBUG (build 7273cfdfb901b9ef564c2737cf00c7a8abb57f07)

impala> invalidate metadata missing_field_orc;
impala> select f1 from missing_field_orc;
ERROR: Parse error in possibly corrupt ORC file: 'hdfs://localhost:20500/test-warehouse/missing_field_orc/000000_0'. No columns found for this scan.

impala> select f0, f1 from missing_field_orc;
+----+------+
| f0 | f1   |
+----+------+
| 1  | NULL |
+----+------+
{code}
While selecting only the column 'f1', the query failed by an error. It should return NULL.

  was:
While looking into the bug of IMPALA-11296, I found a bug on the same scenario (scanning only the missing columns of ORC files) in current master branch.

Creating an ORC table with missing fields in the underlying files:
{code:sql}
hive> create external table missing_field_orc (f0 int) stored as orc;
hive> insert into table missing_field_orc select 1;
hive> alter table missing_field_orc add columns (f1 int);
hive> select f1 from missing_field_orc;
+-------+
|  f1   |
+-------+
| NULL  |
+-------+
hive> select f0, f1 from missing_field_orc;
+-----+-------+
| f0  |  f1   |
+-----+-------+
| 1   | NULL  |
+-----+-------+
{code}
Run the same queries in Impala:
{code:sql}
impala> invalidate metadata missing_field_orc;
impala> select f1 from missing_field_orc;
ERROR: Parse error in possibly corrupt ORC file: 'hdfs://localhost:20500/test-warehouse/missing_field_orc/000000_0'. No columns found for this scan.

impala> select f0, f1 from missing_field_orc;
+----+------+
| f0 | f1   |
+----+------+
| 1  | NULL |
+----+------+
{code}
While selecting only the column 'f1', the query failed by an error. It should return NULL.


> Selecting only the missing fields of ORC files should return NULLs
> ------------------------------------------------------------------
>
>                 Key: IMPALA-11344
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11344
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Quanlong Huang
>            Priority: Critical
>
> While looking into the bug of IMPALA-11296, I found a bug on the same scenario (scanning only the missing columns of ORC files) in current master branch.
> Creating an ORC table with missing fields in the underlying files:
> {code:sql}
> hive> create external table missing_field_orc (f0 int) stored as orc;
> hive> insert into table missing_field_orc select 1;
> hive> alter table missing_field_orc add columns (f1 int);
> hive> select f1 from missing_field_orc;
> +-------+
> |  f1   |
> +-------+
> | NULL  |
> +-------+
> hive> select f0, f1 from missing_field_orc;
> +-----+-------+
> | f0  |  f1   |
> +-----+-------+
> | 1   | NULL  |
> +-----+-------+
> {code}
> Run the same queries in Impala:
> {code:sql}
> impala> VERSION;
> Shell version: impala shell build version not available
> Server version: impalad version 4.2.0-SNAPSHOT DEBUG (build 7273cfdfb901b9ef564c2737cf00c7a8abb57f07)
> impala> invalidate metadata missing_field_orc;
> impala> select f1 from missing_field_orc;
> ERROR: Parse error in possibly corrupt ORC file: 'hdfs://localhost:20500/test-warehouse/missing_field_orc/000000_0'. No columns found for this scan.
> impala> select f0, f1 from missing_field_orc;
> +----+------+
> | f0 | f1   |
> +----+------+
> | 1  | NULL |
> +----+------+
> {code}
> While selecting only the column 'f1', the query failed by an error. It should return NULL.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org