You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2022/07/07 07:25:00 UTC

[jira] [Commented] (IMPALA-11344) Selecting only the missing fields of ORC files should return NULLs

    [ https://issues.apache.org/jira/browse/IMPALA-11344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563605#comment-17563605 ] 

ASF subversion and git services commented on IMPALA-11344:
----------------------------------------------------------

Commit 5f2e8ddd9d162023665bfd7ff1429bd91dfcbd50 in impala's branch refs/heads/master from ttttttz
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=5f2e8ddd9 ]

IMPALA-11344: Missing slots in all cases should be allowed to be read

When selecting only the missing fields of ORC files and the missing fields
contain non-partition fields, the query will fail due to `Parse error in
possibly corrupt ORC file: '$filename'. No columns found for this scan`.
We should allow read missing slots in all cases.

Testing:
- Added a test to test_scanners.py that ensures the query can be
  executed successfully when selecting only the missing fields of
  ORC files.
Change-Id: I15dca47ba5f7a93bfd5fcba3cab4ac6d64459023
Reviewed-on: http://gerrit.cloudera.org:8080/18652
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Selecting only the missing fields of ORC files should return NULLs
> ------------------------------------------------------------------
>
>                 Key: IMPALA-11344
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11344
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Quanlong Huang
>            Assignee: zhi tang
>            Priority: Critical
>              Labels: newbie, ramp-up
>
> While looking into the bug of IMPALA-11296, I found a bug on the same scenario (scanning only the missing columns of ORC files) in current master branch.
> Creating an ORC table with missing fields in the underlying files:
> {code:sql}
> hive> create external table missing_field_orc (f0 int) stored as orc;
> hive> insert into table missing_field_orc select 1;
> hive> alter table missing_field_orc add columns (f1 int);
> hive> select f1 from missing_field_orc;
> +-------+
> |  f1   |
> +-------+
> | NULL  |
> +-------+
> hive> select f0, f1 from missing_field_orc;
> +-----+-------+
> | f0  |  f1   |
> +-----+-------+
> | 1   | NULL  |
> +-----+-------+
> {code}
> Run the same queries in Impala:
> {code:sql}
> impala> VERSION;
> Shell version: impala shell build version not available
> Server version: impalad version 4.2.0-SNAPSHOT DEBUG (build 7273cfdfb901b9ef564c2737cf00c7a8abb57f07)
> impala> invalidate metadata missing_field_orc;
> impala> select f1 from missing_field_orc;
> ERROR: Parse error in possibly corrupt ORC file: 'hdfs://localhost:20500/test-warehouse/missing_field_orc/000000_0'. No columns found for this scan.
> impala> select f0, f1 from missing_field_orc;
> +----+------+
> | f0 | f1   |
> +----+------+
> | 1  | NULL |
> +----+------+
> {code}
> While selecting only the column 'f1', the query failed by an error. It should return NULL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org