You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2021/09/27 01:21:00 UTC

[jira] [Commented] (IMPALA-10894) Pushing down predicates in reading "original files" of ACID tables

    [ https://issues.apache.org/jira/browse/IMPALA-10894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420430#comment-17420430 ] 

Quanlong Huang commented on IMPALA-10894:
-----------------------------------------

Note that {{test_full_acid_original_files}} in tests/query_test/test_acid.py is a test guarding this. If we remove this if-branch in HdfsOrcScanner::PrepareSearchArguments(): [https://github.com/apache/impala/blob/20e07f8645b402f08af91e6b078d1fdd2a0d06f6/be/src/exec/hdfs-orc-scanner.cc#L1056-L1063,] we'll get test failure as
{code:java}
select row__id.*, id from alltypes_promoted_nopart
where id < 10;

-- 2021-09-27 09:15:39,556 INFO     MainThread: Started query 334ebd3b54dd92af:eafa89fd00000000
-- 2021-09-27 09:15:39,704 ERROR    MainThread: Comparing QueryTestResults (expected vs actual):
0,0,536870912,4030,0,0 != 0,0,536870912,0,0,0
0,0,536870912,4031,0,1 != 0,0,536870912,1,0,1
0,0,536870912,4032,0,2 != 0,0,536870912,2,0,2
0,0,536870912,4033,0,3 != 0,0,536870912,3,0,3
0,0,536870912,4034,0,4 != 0,0,536870912,4,0,4
0,0,536870912,4035,0,5 != 0,0,536870912,5,0,5
0,0,536870912,4036,0,6 != 0,0,536870912,6,0,6
0,0,536870912,4037,0,7 != 0,0,536870912,7,0,7
0,0,536870912,4038,0,8 != 0,0,536870912,8,0,8
0,0,536870912,4039,0,9 != 0,0,536870912,9,0,9{code}
The row-id column is wrong.

> Pushing down predicates in reading "original files" of ACID tables
> ------------------------------------------------------------------
>
>                 Key: IMPALA-10894
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10894
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>
> “Original files” don't store special ACID columns. We generate the row id by using the row index of the file. The orc reader doesn't provide interfaces for retrieving the row index of a row in the file. When predicates are pushed down into the orc reader, the returned batch will skip some rows. So we can't calculate the actual row index in file using its index in the batch.
> Currently we skip pushing down predicates in reading such files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org