You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2023/03/27 23:02:00 UTC
[jira] [Commented] (IMPALA-8205) Illegal statistics for numFalse and numTrue
[ https://issues.apache.org/jira/browse/IMPALA-8205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17705722#comment-17705722 ]
ASF subversion and git services commented on IMPALA-8205:
---------------------------------------------------------
Commit c6223b2aeb8ae23a094551aa2abc8fab75e13165 in impala's branch refs/heads/branch-4.1.2 from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c6223b2ae ]
IMPALA-11953: Declare num_trues and num_falses in TIntermediateColumnStats as optional
TIntermediateColumnStats is the representation of incremental stats
which are stored in HMS partition properties using keys like
"impala_intermediate_stats_chunk0", "impala_intermediate_stats_chunk1",
"impala_intermediate_stats_chunk2", etc.
Fields in TIntermediateColumnStats should be optional to ensure
backward compatibility. IMPALA-8205 adds two required fields, num_trues
and num_falses, in TIntermediateColumnStats. This breaks the incremental
stats loading in higher versions of Impala if the stats are generated by
older Impala versions (< 4.0). This patch changes the fields to be
optional.
Tests:
- Verified the incremental stats generated by CDH Impala cluster can be
loaded by CDP Impala cluster with this fix.
Change-Id: I4f74d5d0676e7ce9eb4ea8061a15610846db3ca5
Reviewed-on: http://gerrit.cloudera.org:8080/19555
Reviewed-by: Riza Suminto <ri...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
> Illegal statistics for numFalse and numTrue
> -------------------------------------------
>
> Key: IMPALA-8205
> URL: https://issues.apache.org/jira/browse/IMPALA-8205
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Reporter: wuchang
> Assignee: wuchang
> Priority: Major
> Labels: impala, numFalse, numTrue, statistics
> Fix For: Impala 4.0.0
>
>
> When impala compute statistics, it set *numFalse = -1* and *numTrue = 1* when the statistic is missing;
> *-1* for *numFalse* will corrupt some query engine like Presto and there already exists some PR report and hotfix it : [presto-11859|https://github.com/prestodb/presto/pull/11859]
> *1* for *numTrue* is also unreasonable because we are not sure whether it indicates the real numTrue statistics or a missing statistics;
> Also, previously , the *nullCount* also use -1 to indicate its absence which also caused problem for Presto. Presto has to add a hotfix for it([presto-11549|https://github.com/prestodb/presto/pull/11549]) . But it is a fortunate that impala has fixed this bug;
> It is necessary to set to null when these statistics are absent instead of -1 and 1.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org