You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Csaba Ringhofer (Jira)" <ji...@apache.org> on 2020/06/03 09:52:00 UTC
[jira] [Resolved] (IMPALA-8409) STRINGs without stats have too low
row-size in explain plan
[ https://issues.apache.org/jira/browse/IMPALA-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Csaba Ringhofer resolved IMPALA-8409.
-------------------------------------
Fix Version/s: Impala 4.0
Resolution: Fixed
> STRINGs without stats have too low row-size in explain plan
> -----------------------------------------------------------
>
> Key: IMPALA-8409
> URL: https://issues.apache.org/jira/browse/IMPALA-8409
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 3.2.0
> Reporter: Csaba Ringhofer
> Assignee: Csaba Ringhofer
> Priority: Minor
> Labels: explain, statistics
> Fix For: Impala 4.0
>
>
> STRING columns without avg_size statistic are calculated into the row-size as 11 bytes, while they take 12 bytes in the tuple (+ more somewhere in the memory if they are not empty). The issue is caused by adding -1 (meaning unknown) to the 12 byte slot size.
> I think that this doesn't cause problems, as the estimation is probably way off without statistics anyway, but row-size >= tuple size seems like a meaningful invariant that we shouldn't break.
> Reproduce:
> {code}
> create table test_row_size (s string);
> explain select * from test_row_size;
> Result:
> ...
> WARNING: The following tables are missing relevant table and/or column statistics.
> default.test_row_size
> ...
> 00:SCAN HDFS [default.test_row_size]
> partitions=1/1 files=0 size=0B
> row-size=11B cardinality=0
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)