You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Zoltan Haindrich (JIRA)" <ji...@apache.org> on 2018/04/03 10:51:00 UTC

[jira] [Commented] (HIVE-19095) Improve analyze statement execution time for partitioned tables

    [ https://issues.apache.org/jira/browse/HIVE-19095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16423837#comment-16423837 ] 

Zoltan Haindrich commented on HIVE-19095:
-----------------------------------------

I think stat task execution is even worse than this 2 minutes...because metastore updates the stat data 1 by 1  - altought I'm not sure if that could be improved, but it definetly worth a look...

> Improve analyze statement execution time for partitioned tables
> ---------------------------------------------------------------
>
>                 Key: HIVE-19095
>                 URL: https://issues.apache.org/jira/browse/HIVE-19095
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Zoltan Haindrich
>            Priority: Major
>
> tpcds@1TB:
> {{analyze web_returns compile statistics for columns}} 
> both compile and stat task execution is slow; 
> there were ~2000 calls to get_partitions_ps_with_auth which took 2minutes
> stattask seems to be slow because it seems like the metastore updates the stats 1 by 1 for each partition
> {flushCache=1, optimizer=565, open_txns=8, TezCompiler=5248, get_table_req=69, get_partitions_ps_with_auth=130333}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)