You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2023/03/28 22:00:00 UTC

[jira] [Updated] (IMPALA-11953) num_trues and num_falses in TIntermediateColumnStats should be optional

     [ https://issues.apache.org/jira/browse/IMPALA-11953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Quanlong Huang updated IMPALA-11953:
------------------------------------
    Fix Version/s: Impala 4.1.2

> num_trues and num_falses in TIntermediateColumnStats should be optional
> -----------------------------------------------------------------------
>
>                 Key: IMPALA-11953
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11953
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 4.0.0, Impala 4.1.0, Impala 4.2.0, Impala 4.1.1
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Blocker
>             Fix For: Impala 4.1.2, Impala 4.3.0
>
>
> IMPALA-8205 adds two required fields for TIntermediateColumnStats:
> {code:java}
> struct TIntermediateColumnStats {
>    // One byte for each bucket of the NDV HLL computation
>   1: optional binary intermediate_ndv
>   // If true, intermediate_ndv is RLE-compressed
>   2: optional bool is_ndv_encoded
>   // Number of nulls seen so far (or -1 if nulls are not counted)
>   3: optional i64 num_nulls
>   // The maximum width, in bytes, of the column
>   4: optional i32 max_width
>   // The average width (in bytes) of the column
>   5: optional double avg_width
>   // The number of rows counted, needed to compute NDVs from intermediate_ndv
>   6: optional i64 num_rows
> +
> +  // The number of true and false value, of the column
> +  7: required i64 num_trues
> +  8: required i64 num_falses
>  }{code}
> TIntermediateColumnStats is the representation of incremental stats which are stored in HMS partition properties using keys like "impala_intermediate_stats_num_chunks" and "impala_intermediate_stats_chunk0", "impala_intermediate_stats_chunk1", "impala_intermediate_stats_chunk2", etc.
> While upgrading Impala to 4.0, incremental stats can't be parsed due to missing these fields.
> {noformat}
> W0227 09:06:49.057451 31105 HdfsPartition.java:1337] Failed to set partition stats for table reptest.test partition loaddate=2022
> Java exception follows:
> org.apache.impala.common.InternalException: Required field 'num_trues' was not found in serialized data! Struct: org.apache.impala.thrift.TIntermediateColumnStats$TIntermediateColumnStatsStandardScheme@377da96a
> 	at org.apache.impala.common.JniUtil.deserializeThrift(JniUtil.java:138)
> 	at org.apache.impala.catalog.PartitionStatsUtil.partStatsBytesFromParameters(PartitionStatsUtil.java:114)
> 	at org.apache.impala.catalog.HdfsPartition$Builder.extractAndCompressPartStats(HdfsPartition.java:1334)
> 	at org.apache.impala.catalog.HdfsPartition$Builder.setMsPartition(HdfsPartition.java:1310)
> 	at org.apache.impala.catalog.HdfsTable.createOrUpdatePartitionBuilder(HdfsTable.java:906)
> 	at org.apache.impala.catalog.HdfsTable.createPartitionBuilder(HdfsTable.java:895)
> 	at org.apache.impala.catalog.HdfsTable.loadAllPartitions(HdfsTable.java:698)
> 	at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1244)
> 	at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1138)
> 	at org.apache.impala.catalog.TableLoader.load(TableLoader.java:114)
> 	at org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:245)
> 	at org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:242)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 	at java.lang.Thread.run(Thread.java:748){noformat}
> numTrues and numFalses are not used in planning. We'd better change them to optional to unblock the migration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org