You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Zoltan Haindrich (JIRA)" <ji...@apache.org> on 2018/04/03 10:51:00 UTC
[jira] [Commented] (HIVE-19095) Improve analyze statement execution
time for partitioned tables
[ https://issues.apache.org/jira/browse/HIVE-19095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16423837#comment-16423837 ]
Zoltan Haindrich commented on HIVE-19095:
-----------------------------------------
I think stat task execution is even worse than this 2 minutes...because metastore updates the stat data 1 by 1 - altought I'm not sure if that could be improved, but it definetly worth a look...
> Improve analyze statement execution time for partitioned tables
> ---------------------------------------------------------------
>
> Key: HIVE-19095
> URL: https://issues.apache.org/jira/browse/HIVE-19095
> Project: Hive
> Issue Type: Improvement
> Reporter: Zoltan Haindrich
> Priority: Major
>
> tpcds@1TB:
> {{analyze web_returns compile statistics for columns}}
> both compile and stat task execution is slow;
> there were ~2000 calls to get_partitions_ps_with_auth which took 2minutes
> stattask seems to be slow because it seems like the metastore updates the stats 1 by 1 for each partition
> {flushCache=1, optimizer=565, open_txns=8, TezCompiler=5248, get_table_req=69, get_partitions_ps_with_auth=130333}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)