You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Vincent Tran (Jira)" <ji...@apache.org> on 2020/08/25 19:30:00 UTC
[jira] [Commented] (IMPALA-7876) COMPUTE STATS TABLESAMPLE is not
updating number of estimated rows
[ https://issues.apache.org/jira/browse/IMPALA-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17184697#comment-17184697 ]
Vincent Tran commented on IMPALA-7876:
--------------------------------------
I can reproduce this on ~ 3.2.0. I think this may be unrelated to the width of the table.
The specs for my table is below:
{noformat}
default> show create table one_gram_p;
Query: show create table one_gram_p
CREATE TABLE default.one_gram_p (
ngram STRING,
match_count INT,
volume_count INT
)
PARTITIONED BY (
year STRING
)
STORED AS TEXTFILE
LOCATION 'hdfs:////user/hive/warehouse/one_gram_p'
TBLPROPERTIES ('DO_NOT_UPDATE_STATS'='true', 'STATS_GENERATED'='TASK', 'impala.enable.stats.extrapolation'='true', 'impala.lastComputeStatsTime'='1598383227', 'numRows'='1430731493', 'totalSize'='22081529047')
{noformat}
I need to check against the master branch next.
> COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
> ------------------------------------------------------------------
>
> Key: IMPALA-7876
> URL: https://issues.apache.org/jira/browse/IMPALA-7876
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 3.0
> Reporter: Andre Araujo
> Priority: Critical
>
> Running the command below seems to have no impact on the #rows stats.
> {code}
> [host:21000] default> COMPUTE STATS wide TABLESAMPLE SYSTEM(5);
> Query: COMPUTE STATS wide TABLESAMPLE SYSTEM(100)
> +-------------------------------------------+
> | summary |
> +-------------------------------------------+
> | Updated 1 partition(s) and 103 column(s). |
> +-------------------------------------------+
> WARNINGS: Ignoring TABLESAMPLE because the effective sampling rate is 100%.
> The minimum sample size is COMPUTE_STATS_MIN_SAMPLE_SIZE=1.00GB and the table size 20.35GB
> Fetched 1 row(s) in 43.67s
> [host:21000] default> show table stats wide;
> Query: show table stats wide
> +-------+--------------+--------+---------+--------------+-------------------+---------+-------------------+-------------------------------------+
> | #Rows | Extrap #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location |
> +-------+--------------+--------+---------+--------------+-------------------+---------+-------------------+-------------------------------------+
> | 0 | -1 | 84 | 20.35GB | NOT CACHED | NOT CACHED | PARQUET | false | hdfs://ns1/user/hive/warehouse/wide |
> +-------+--------------+--------+---------+--------------+-------------------+---------+-------------------+-------------------------------------+
> Fetched 1 row(s) in 0.01s
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org