You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "logan.zheng (Jira)" <ji...@apache.org> on 2020/10/13 13:29:00 UTC

[jira] [Issue Comment Deleted] (IMPALA-10230) column stats num_nulls less than -1

     [ https://issues.apache.org/jira/browse/IMPALA-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

logan.zheng updated IMPALA-10230:
---------------------------------
    Comment: was deleted

(was: ## reproduce this issue
impala 3.3+
### 1 create table 
create table test_column_stats(str1 string,str2 string,int1 int) PARTITIONED BY (ds int) STORED AS PARQUET;
### 2 create data 
insert overwrite table test_column_stats partition(ds=20200101)
select 'tt' str1 ,'20200101' as str2 ,1 as int1;


insert overwrite table test_column_stats partition(ds=20200103)
select 'tt2' str1 ,'20200103' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200104)
select 'tt2' str1 ,'20200104' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200105)
select 'tt2' str1 ,'20200104' as str2 ,1 as int1;

### 3 compute increment stats
compute incremental stats test_column_stats partition(ds=20200101);
compute incremental stats test_column_stats partition(ds=20200103);
compute incremental stats test_column_stats partition(ds=20200104);


### 4 update metastore 
```
SELECT d.`NAME`,t.`TBL_NAME`,p.*,pp.*
FROM `PARTITIONS` p,`TBLS` t,`DBS` d,partition_params pp 
 WHERE d.`NAME`='default' AND t.`TBL_NAME`='test_column_stats' 
 and p.PART_ID=pp.PART_ID 
 and p.TBL_ID=92746
```
```
update PARTITION_PARAMS
set PARAM_VALUE='HBYCABsDjARpbnQxGAz/AK4AAAH/AP8ATwARFgAVABcAAAAAAAAQQBYCAARzdHIxGAj/AP8A/wD/ABEWARUAFwAAAAAAAAAAFgAABHN0cjIYDP8A/wD/AAAAAAH9ABEWABUQFwAAAAAAACBAFgIAAA=='
where PARAM_KEY='impala_intermediate_stats_chunk0'
```
##### PARAM_VALUE中序列化了TPartitionStats对象 关键点num_nulls=-1

```
// Intermediate state for the computation of per-column stats. Impala can aggregate these
// structures together to produce final stats for a column.
struct TIntermediateColumnStats {
 // One byte for each bucket of the NDV HLL computation
 1: optional binary intermediate_ndv

// If true, intermediate_ndv is RLE-compressed
 2: optional bool is_ndv_encoded

// Number of nulls seen so far (or -1 if nulls are not counted)
 3: optional i64 num_nulls

// The maximum width, in bytes, of the column
 4: optional i32 max_width

// The average width (in bytes) of the column
 5: optional double avg_width

// The number of rows counted, needed to compute NDVs from intermediate_ndv
 6: optional i64 num_rows
}

// Per-partition statistics
struct TPartitionStats {
 // Number of rows gathered per-partition by non-incremental stats.
 // TODO: This can probably be removed in favour of the intermediate_col_stats, but doing
 // so would interfere with the non-incremental stats path
 1: required TTableStats stats

// Intermediate state for incremental statistics, one entry per column name.
 2: optional map<string, TIntermediateColumnStats> intermediate_col_stats
}

```

### 5. restart catalog and coordinator
clear then table partition cache

### 6. execute compute incremental stats
compute incremental stats test_column_stats partition(ds=20200105);
then will see exception
```
[localhost:21000] default> compute incremental stats test_column_stats partition(ds=20200105);
Query: compute incremental stats test_column_stats partition(ds=20200107)
ERROR: TableLoadingException: Failed to load metadata for table: default.test_column_stats
CAUSED BY: IllegalStateException: ColumnStats\{avgSize_=3.0, avgSerializedSize_=15.0, maxSize_=3, numDistinct_=1, numNulls_=-5}
```)

> column stats num_nulls less than -1
> -----------------------------------
>
>                 Key: IMPALA-10230
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10230
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>    Affects Versions: Impala 3.4.0
>            Reporter: logan zheng
>            Priority: Critical
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> when update impala 3.2.0(CDH6.3.2 ) to asf3.4.0 ,after when "increment stats default.test partition(xx=yyyy)":
> {noformat}
> ERROR: TableLoadingException: Failed to load metadata for table: default.test
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=13.0, avgSerializedSize_=25.0, maxSize_=19, numDistinct_=12, numNulls_=-2}{noformat}
> The table default.test already exists in impala 3.2.0, and has been running for a long time, and has also been added stats. 
>  
>  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org