You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/10/11 11:24:00 UTC

[jira] [Work logged] (HIVE-25580) Increase the performance of getTableColumnStatistics and getPartitionColumnStatistics

     [ https://issues.apache.org/jira/browse/HIVE-25580?focusedWorklogId=663450&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-663450 ]

ASF GitHub Bot logged work on HIVE-25580:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 11/Oct/21 11:23
            Start Date: 11/Oct/21 11:23
    Worklog Time Spent: 10m 
      Work Description: pvary merged pull request #2692:
URL: https://github.com/apache/hive/pull/2692


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 663450)
    Time Spent: 0.5h  (was: 20m)

> Increase the performance of getTableColumnStatistics and getPartitionColumnStatistics
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-25580
>                 URL: https://issues.apache.org/jira/browse/HIVE-25580
>             Project: Hive
>          Issue Type: Improvement
>          Components: Standalone Metastore
>            Reporter: Peter Vary
>            Assignee: Peter Vary
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When the PART_COL_STATS table contains high number of rows the getTableColumnStatistics and getPartitionColumnStatistics response time increases.
> The root cause is the full table scan for the jdbc query below:
> {code:java}
> 2021-09-27 13:22:44,218 DEBUG DataNucleus.Datastore.Native: [pool-6-thread-199]: SELECT DISTINCT "A0"."ENGINE" FROM "PART_COL_STATS" "A0"
> 2021-09-27 13:22:50,569 DEBUG DataNucleus.Datastore.Retrieve: [pool-6-thread-199]: Execution Time = 6351 ms {code}
> The time spent in [here|https://github.com/apache/hive/blob/ed1882ef569f8d00317597c269cfae35ace5a5fa/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L9965]:
> {code:java}
>       query = pm.newQuery(MPartitionColumnStatistics.class);
>       query.setResult("DISTINCT engine");
>       Collection names = (Collection) query.execute();
> {code}
> We might get a better performance if we limit the query range based on the cat/db/table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)