You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2023/03/27 23:02:00 UTC
[jira] [Commented] (IMPALA-11953) num_trues and num_falses in TIntermediateColumnStats should be optional
[ https://issues.apache.org/jira/browse/IMPALA-11953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17705721#comment-17705721 ]
ASF subversion and git services commented on IMPALA-11953:
----------------------------------------------------------
Commit c6223b2aeb8ae23a094551aa2abc8fab75e13165 in impala's branch refs/heads/branch-4.1.2 from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c6223b2ae ]
IMPALA-11953: Declare num_trues and num_falses in TIntermediateColumnStats as optional
TIntermediateColumnStats is the representation of incremental stats
which are stored in HMS partition properties using keys like
"impala_intermediate_stats_chunk0", "impala_intermediate_stats_chunk1",
"impala_intermediate_stats_chunk2", etc.
Fields in TIntermediateColumnStats should be optional to ensure
backward compatibility. IMPALA-8205 adds two required fields, num_trues
and num_falses, in TIntermediateColumnStats. This breaks the incremental
stats loading in higher versions of Impala if the stats are generated by
older Impala versions (< 4.0). This patch changes the fields to be
optional.
Tests:
- Verified the incremental stats generated by CDH Impala cluster can be
loaded by CDP Impala cluster with this fix.
Change-Id: I4f74d5d0676e7ce9eb4ea8061a15610846db3ca5
Reviewed-on: http://gerrit.cloudera.org:8080/19555
Reviewed-by: Riza Suminto <ri...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
> num_trues and num_falses in TIntermediateColumnStats should be optional
> -----------------------------------------------------------------------
>
> Key: IMPALA-11953
> URL: https://issues.apache.org/jira/browse/IMPALA-11953
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 4.0.0, Impala 4.1.0, Impala 4.2.0, Impala 4.1.1
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Blocker
> Fix For: Impala 4.3.0
>
>
> IMPALA-8205 adds two required fields for TIntermediateColumnStats:
> {code:java}
> struct TIntermediateColumnStats {
> // One byte for each bucket of the NDV HLL computation
> 1: optional binary intermediate_ndv
> // If true, intermediate_ndv is RLE-compressed
> 2: optional bool is_ndv_encoded
> // Number of nulls seen so far (or -1 if nulls are not counted)
> 3: optional i64 num_nulls
> // The maximum width, in bytes, of the column
> 4: optional i32 max_width
> // The average width (in bytes) of the column
> 5: optional double avg_width
> // The number of rows counted, needed to compute NDVs from intermediate_ndv
> 6: optional i64 num_rows
> +
> + // The number of true and false value, of the column
> + 7: required i64 num_trues
> + 8: required i64 num_falses
> }{code}
> TIntermediateColumnStats is the representation of incremental stats which are stored in HMS partition properties using keys like "impala_intermediate_stats_num_chunks" and "impala_intermediate_stats_chunk0", "impala_intermediate_stats_chunk1", "impala_intermediate_stats_chunk2", etc.
> While upgrading Impala to 4.0, incremental stats can't be parsed due to missing these fields.
> {noformat}
> W0227 09:06:49.057451 31105 HdfsPartition.java:1337] Failed to set partition stats for table reptest.test partition loaddate=2022
> Java exception follows:
> org.apache.impala.common.InternalException: Required field 'num_trues' was not found in serialized data! Struct: org.apache.impala.thrift.TIntermediateColumnStats$TIntermediateColumnStatsStandardScheme@377da96a
> at org.apache.impala.common.JniUtil.deserializeThrift(JniUtil.java:138)
> at org.apache.impala.catalog.PartitionStatsUtil.partStatsBytesFromParameters(PartitionStatsUtil.java:114)
> at org.apache.impala.catalog.HdfsPartition$Builder.extractAndCompressPartStats(HdfsPartition.java:1334)
> at org.apache.impala.catalog.HdfsPartition$Builder.setMsPartition(HdfsPartition.java:1310)
> at org.apache.impala.catalog.HdfsTable.createOrUpdatePartitionBuilder(HdfsTable.java:906)
> at org.apache.impala.catalog.HdfsTable.createPartitionBuilder(HdfsTable.java:895)
> at org.apache.impala.catalog.HdfsTable.loadAllPartitions(HdfsTable.java:698)
> at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1244)
> at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1138)
> at org.apache.impala.catalog.TableLoader.load(TableLoader.java:114)
> at org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:245)
> at org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:242)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748){noformat}
> numTrues and numFalses are not used in planning. We'd better change them to optional to unblock the migration.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org