You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/06/02 14:11:00 UTC
[jira] [Work logged] (HIVE-25189) Cache the validWriteIdList in query cache before fetching tables from HMS

     [ https://issues.apache.org/jira/browse/HIVE-25189?focusedWorklogId=605225&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-605225 ]

ASF GitHub Bot logged work on HIVE-25189:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 02/Jun/21 14:10
            Start Date: 02/Jun/21 14:10
    Worklog Time Spent: 10m 
      Work Description: scarlin-cloudera opened a new pull request #2342:
URL: https://github.com/apache/hive/pull/2342


   Added code to ensure the validWriteIdList for all tables are fetched
   before any table request is done.
   
   The tables are now gathered at parsing time within FromClauseParser.g.
   One of the initial steps while parsing is to make an HMS call to fetch
   the validWriteIdList for all parsed tables. This is done within a newly
   added class called CacheTableHelper. The only purpose of this class is
   to populate the query HMS cache for future calls to HMS. The methods to
   this class do not contain any return values.
   
   One complication to sending the request up front is when the query contains
   views. The views will have underlying physical tables that need fetching but
   we won't know these tables until after doing an HMS call. This is handled
   by using the underlying tables of the most recent time a query used this view.
   The idea here is that views don't change very often, so we will most likely
   be able to cache these tables by guessing that these tables are still in the
   view. On the chance that the view did change, we may be doing a fetch of
   validWriteIdLists on tables no longer used or we may miss tables added on
   the new definition of the view. If that happens, the only downside is an
   extra HMS call as we get the updated information of the view (which cannot
   be optimized). The view will then be updated so that the next query will
   be optimized.
   
   Because this view information is cached at a server level, a cap of 10000
   tables to track was used. Also, on the low chance there is a problem with
   memory consumption on this feature, it can be turned off via the
   HIVE_OPTIMIZE_VIEW_CACHE_ENABLED conf variable.
   
   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/Hive/HowToContribute
     2. Ensure that you have created an issue on the Hive project JIRA: https://issues.apache.org/jira/projects/HIVE/summary
     3. Ensure you have added or run the appropriate tests for your PR: 
     4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]HIVE-XXXXX:  Your PR title ...'.
     5. Be sure to keep the PR description updated to reflect all changes.
     6. Please write your PR title to summarize what this PR proposes.
     7. If possible, provide a concise example to reproduce the issue for a faster review.
   
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   The fetch of validWriteIdList from HMS will be done at the beginning of the query with all parsed tables.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   Slight performance boost. The get_table_req will only have to be done once with the validWriteIdList.  Also, all validWriteIdLists will be batched together in one HMS call.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description, screenshot and/or a reproducable example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Hive versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   No
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   Run through unit tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 605225)
    Remaining Estimate: 0h
            Time Spent: 10m

> Cache the validWriteIdList in query cache before fetching tables from HMS
> -------------------------------------------------------------------------
>
>                 Key: HIVE-25189
>                 URL: https://issues.apache.org/jira/browse/HIVE-25189
>             Project: Hive
>          Issue Type: Improvement
>          Components: HiveServer2
>            Reporter: Steve Carlin
>            Assignee: Steve Carlin
>            Priority: Minor
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> For a small performance boost at compile time, we should fetch the validWriteIdList before fetching the tables.  HMS allows these to be batched together in one call.  This will avoid the getTable API from being called twice, because the first time we call it, we pass in a null for validWriteIdList.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)