You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/01/02 00:21:00 UTC

[jira] [Work logged] (HIVE-26649) Hive metabase performance issues due to slow queries

     [ https://issues.apache.org/jira/browse/HIVE-26649?focusedWorklogId=836359&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-836359 ]

ASF GitHub Bot logged work on HIVE-26649:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 02/Jan/23 00:20
            Start Date: 02/Jan/23 00:20
    Worklog Time Spent: 10m 
      Work Description: github-actions[bot] commented on PR #3714:
URL: https://github.com/apache/hive/pull/3714#issuecomment-1368574883

   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the dev@hive.apache.org list if the patch is in need of reviews.




Issue Time Tracking
-------------------

            Worklog Id:     (was: 836359)
    Remaining Estimate: 95h 20m  (was: 95.5h)
            Time Spent: 40m  (was: 0.5h)

> Hive metabase performance issues due to slow queries
> ----------------------------------------------------
>
>                 Key: HIVE-26649
>                 URL: https://issues.apache.org/jira/browse/HIVE-26649
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 2.3.3
>         Environment: metastore db :mysql 5.X
> hive:2.3.3
>            Reporter: yihangqiao
>            Assignee: yihangqiao
>            Priority: Major
>              Labels: metastore, patch, performance, pull-request-available
>         Attachments: image-2022-10-19-14-42-33-073.png
>
>   Original Estimate: 96h
>          Time Spent: 40m
>  Remaining Estimate: 95h 20m
>
> When the Hive metabase uses Mysql, during the peak period of Hive statement query, the metastore initiates a large amount of DirectSQL, which will cause performance problems in the metabase. The fundamental reason is that some DirectSQL performance problems cause a large number of slow queries at the DB level.
> For example for the following Hive query:
> {code:java}
> select * from imd_fcac_safe.fcac_dw_loan_details where ds='2021-10-10' and sysid='MCFCM' {code}
> where ds and sysid are the primary and secondary partitions of the imd_fcac_safe.fcac_dw_loan_details table, respectively
>  
> The Hive statement will generate the DirectSQL query as follows:
> {code:java}
> explain select PARTITIONS.PART_ID from PARTITIONS  inner join TBLS on PARTITIONS.TBL_ID = TBLS.TBL_ID     and TBLS.TBL_NAME = 'fcac_dw_loan_details'   inner join DBS on TBLS.DB_ID = DBS.DB_ID      and DBS.NAME = 'imd_fcac_safe' inner join PARTITION_KEY_VALS FILTER0 on FILTER0.PART_ID = PARTITIONS.PART_ID and FILTER0.INTEGER_IDX = 0 inner join PARTITION_KEY_VALS FILTER1 on FILTER1.PART_ID = PARTITIONS.PART_ID and FILTER1.INTEGER_IDX = 1 where ( ((FILTER0.PART_KEY_VAL = '2021-10-10') and (FILTER1.PART_KEY_VAL = 'MCFCM')) ) {code}
> !image-2022-10-19-14-42-33-073.png!
>  
> Problems with this statement
> There is no TBL_ID field in the PARTITION_KEY_VALS table, which will cause the partition of the same name of the unrelated table to be described when performing an associated query; there is no index column in the PARTITION_KEY_VAL table, so it cannot be accelerated by the index.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)