You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "logan.zheng (Jira)" <ji...@apache.org> on 2020/10/13 13:29:00 UTC
[jira] [Issue Comment Deleted] (IMPALA-10230) column stats
num_nulls less than -1
[ https://issues.apache.org/jira/browse/IMPALA-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
logan.zheng updated IMPALA-10230:
---------------------------------
Comment: was deleted
(was: ## reproduce this issue
impala 3.3+
### 1 create table
create table test_column_stats(str1 string,str2 string,int1 int) PARTITIONED BY (ds int) STORED AS PARQUET;
### 2 create data
insert overwrite table test_column_stats partition(ds=20200101)
select 'tt' str1 ,'20200101' as str2 ,1 as int1;
insert overwrite table test_column_stats partition(ds=20200103)
select 'tt2' str1 ,'20200103' as str2 ,1 as int1;
insert overwrite table test_column_stats partition(ds=20200104)
select 'tt2' str1 ,'20200104' as str2 ,1 as int1;
insert overwrite table test_column_stats partition(ds=20200105)
select 'tt2' str1 ,'20200104' as str2 ,1 as int1;
### 3 compute increment stats
compute incremental stats test_column_stats partition(ds=20200101);
compute incremental stats test_column_stats partition(ds=20200103);
compute incremental stats test_column_stats partition(ds=20200104);
### 4 update metastore
```
SELECT d.`NAME`,t.`TBL_NAME`,p.*,pp.*
FROM `PARTITIONS` p,`TBLS` t,`DBS` d,partition_params pp
WHERE d.`NAME`='default' AND t.`TBL_NAME`='test_column_stats'
and p.PART_ID=pp.PART_ID
and p.TBL_ID=92746
```
```
update PARTITION_PARAMS
set PARAM_VALUE='HBYCABsDjARpbnQxGAz/AK4AAAH/AP8ATwARFgAVABcAAAAAAAAQQBYCAARzdHIxGAj/AP8A/wD/ABEWARUAFwAAAAAAAAAAFgAABHN0cjIYDP8A/wD/AAAAAAH9ABEWABUQFwAAAAAAACBAFgIAAA=='
where PARAM_KEY='impala_intermediate_stats_chunk0'
```
##### PARAM_VALUE中序列化了TPartitionStats对象 关键点num_nulls=-1
```
// Intermediate state for the computation of per-column stats. Impala can aggregate these
// structures together to produce final stats for a column.
struct TIntermediateColumnStats {
// One byte for each bucket of the NDV HLL computation
1: optional binary intermediate_ndv
// If true, intermediate_ndv is RLE-compressed
2: optional bool is_ndv_encoded
// Number of nulls seen so far (or -1 if nulls are not counted)
3: optional i64 num_nulls
// The maximum width, in bytes, of the column
4: optional i32 max_width
// The average width (in bytes) of the column
5: optional double avg_width
// The number of rows counted, needed to compute NDVs from intermediate_ndv
6: optional i64 num_rows
}
// Per-partition statistics
struct TPartitionStats {
// Number of rows gathered per-partition by non-incremental stats.
// TODO: This can probably be removed in favour of the intermediate_col_stats, but doing
// so would interfere with the non-incremental stats path
1: required TTableStats stats
// Intermediate state for incremental statistics, one entry per column name.
2: optional map<string, TIntermediateColumnStats> intermediate_col_stats
}
```
### 5. restart catalog and coordinator
clear then table partition cache
### 6. execute compute incremental stats
compute incremental stats test_column_stats partition(ds=20200105);
then will see exception
```
[localhost:21000] default> compute incremental stats test_column_stats partition(ds=20200105);
Query: compute incremental stats test_column_stats partition(ds=20200107)
ERROR: TableLoadingException: Failed to load metadata for table: default.test_column_stats
CAUSED BY: IllegalStateException: ColumnStats\{avgSize_=3.0, avgSerializedSize_=15.0, maxSize_=3, numDistinct_=1, numNulls_=-5}
```)
> column stats num_nulls less than -1
> -----------------------------------
>
> Key: IMPALA-10230
> URL: https://issues.apache.org/jira/browse/IMPALA-10230
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Affects Versions: Impala 3.4.0
> Reporter: logan zheng
> Priority: Critical
> Original Estimate: 96h
> Remaining Estimate: 96h
>
> when update impala 3.2.0(CDH6.3.2 ) to asf3.4.0 ,after when "increment stats default.test partition(xx=yyyy)":
> {noformat}
> ERROR: TableLoadingException: Failed to load metadata for table: default.test
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=13.0, avgSerializedSize_=25.0, maxSize_=19, numDistinct_=12, numNulls_=-2}{noformat}
> The table default.test already exists in impala 3.2.0, and has been running for a long time, and has also been added stats.
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org