You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2022/05/19 01:15:00 UTC

[jira] [Commented] (IMPALA-11208) CollectionItemsRead profile counter might be wrong in ORC scanner

    [ https://issues.apache.org/jira/browse/IMPALA-11208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539152#comment-17539152 ] 

ASF subversion and git services commented on IMPALA-11208:
----------------------------------------------------------

Commit 6ea15409b879a1286e72848defdda8d5d8568c19 in impala's branch refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=6ea15409b ]

IMPALA-11208: Fix uninitialized counter of CollectionItemsRead in orc-scanner

CollectionItemsRead in the runtime profile counts the total number of
nested collection items read by the scan node. Only created for scans
that support nested types, e.g. Parquet or ORC.

Each scanner thread maintains its local counter and merges it into
HdfsScanNode counter for each row batch. However, the local counter in
orc-scanner is uninitialized, leading to weird values. This patch simply
initializes it to 0 and adds test coverage.

Tests:
Add profile verification for this counter on some existing query tests.
Note that there are some implementation difference between Parquet and
ORC scanners (e.g. in predicate pushdown). So we will see different
counter results in some query. I just pick some queries that have
consistent counters.

Change-Id: Id7783d1460ac9b98e94d3a31028b43f5a9884f99
Reviewed-on: http://gerrit.cloudera.org:8080/18528
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> CollectionItemsRead profile counter might be wrong in ORC scanner
> -----------------------------------------------------------------
>
>                 Key: IMPALA-11208
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11208
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Major
>
> I ran some TPCH(30) queries locally using orc/snap/block format. The profile counter of CollectionItemsRead seems weird for me:
> {code:java}
> - CollectionItemsRead: -1679471728382351781 (-1679471728382351781) {code}
> It could also be super large positive values, e.g.
> {code:java}
> - CollectionItemsRead: 1296851974.72B (1296851974721369461) {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org