You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Vincent Tran (Jira)" <ji...@apache.org> on 2020/08/25 19:30:00 UTC

[jira] [Commented] (IMPALA-7876) COMPUTE STATS TABLESAMPLE is not updating number of estimated rows

    [ https://issues.apache.org/jira/browse/IMPALA-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17184697#comment-17184697 ] 

Vincent Tran commented on IMPALA-7876:
--------------------------------------

I can reproduce this on ~ 3.2.0. I think this may be unrelated to the width of the table.

The specs for my table is below:

 
{noformat}
default> show create table one_gram_p;
Query: show create table one_gram_p
CREATE TABLE default.one_gram_p ( 
 ngram STRING, 
 match_count INT, 
 volume_count INT 
 ) 
 PARTITIONED BY ( 
 year STRING 
 ) 
 STORED AS TEXTFILE 
 LOCATION 'hdfs:////user/hive/warehouse/one_gram_p' 
 TBLPROPERTIES ('DO_NOT_UPDATE_STATS'='true', 'STATS_GENERATED'='TASK', 'impala.enable.stats.extrapolation'='true', 'impala.lastComputeStatsTime'='1598383227', 'numRows'='1430731493', 'totalSize'='22081529047')
{noformat}
 

I need to check against the master branch next.

> COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
> ------------------------------------------------------------------
>
>                 Key: IMPALA-7876
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7876
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.0
>            Reporter: Andre Araujo
>            Priority: Critical
>
> Running the command below seems to have no impact on the #rows stats.
> {code}
> [host:21000] default> COMPUTE STATS wide TABLESAMPLE SYSTEM(5);
> Query: COMPUTE STATS wide TABLESAMPLE SYSTEM(100)
> +-------------------------------------------+
> | summary                                   |
> +-------------------------------------------+
> | Updated 1 partition(s) and 103 column(s). |
> +-------------------------------------------+
> WARNINGS: Ignoring TABLESAMPLE because the effective sampling rate is 100%.
> The minimum sample size is COMPUTE_STATS_MIN_SAMPLE_SIZE=1.00GB and the table size 20.35GB
> Fetched 1 row(s) in 43.67s
> [host:21000] default> show table stats wide;
> Query: show table stats wide
> +-------+--------------+--------+---------+--------------+-------------------+---------+-------------------+-------------------------------------+
> | #Rows | Extrap #Rows | #Files | Size    | Bytes Cached | Cache Replication | Format  | Incremental stats | Location                            |
> +-------+--------------+--------+---------+--------------+-------------------+---------+-------------------+-------------------------------------+
> | 0     | -1           | 84     | 20.35GB | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://ns1/user/hive/warehouse/wide |
> +-------+--------------+--------+---------+--------------+-------------------+---------+-------------------+-------------------------------------+
> Fetched 1 row(s) in 0.01s
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org