You are viewing a plain text version of this content. The canonical link for it is here.

Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/06/02 14:10:03 UTC

[GitHub] [hive] scarlin-cloudera opened a new pull request #2342: HIVE-25189: Fetch validWriteIdList for tables before table request.

scarlin-cloudera opened a new pull request #2342:
URL: https://github.com/apache/hive/pull/2342

Added code to ensure the validWriteIdList for all tables are fetched
before any table request is done.

The tables are now gathered at parsing time within FromClauseParser.g.
One of the initial steps while parsing is to make an HMS call to fetch
the validWriteIdList for all parsed tables. This is done within a newly
added class called CacheTableHelper. The only purpose of this class is
to populate the query HMS cache for future calls to HMS. The methods to
this class do not contain any return values.

One complication to sending the request up front is when the query contains
views. The views will have underlying physical tables that need fetching but
we won't know these tables until after doing an HMS call. This is handled
by using the underlying tables of the most recent time a query used this view.
The idea here is that views don't change very often, so we will most likely
be able to cache these tables by guessing that these tables are still in the
view. On the chance that the view did change, we may be doing a fetch of
validWriteIdLists on tables no longer used or we may miss tables added on
the new definition of the view. If that happens, the only downside is an
extra HMS call as we get the updated information of the view (which cannot
be optimized). The view will then be updated so that the next query will
be optimized.

Because this view information is cached at a server level, a cap of 10000
tables to track was used. Also, on the low chance there is a problem with
memory consumption on this feature, it can be turned off via the
HIVE_OPTIMIZE_VIEW_CACHE_ENABLED conf variable.

<!--
Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/Hive/HowToContribute
2. Ensure that you have created an issue on the Hive project JIRA: https://issues.apache.org/jira/projects/HIVE/summary
3. Ensure you have added or run the appropriate tests for your PR:
4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]HIVE-XXXXX: Your PR title ...'.
5. Be sure to keep the PR description updated to reflect all changes.
6. Please write your PR title to summarize what this PR proposes.
7. If possible, provide a concise example to reproduce the issue for a faster review.

-->

### What changes were proposed in this pull request?

The fetch of validWriteIdList from HMS will be done at the beginning of the query with all parsed tables.

### Why are the changes needed?

Slight performance boost. The get_table_req will only have to be done once with the validWriteIdList. Also, all validWriteIdLists will be batched together in one HMS call.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Run through unit tests.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org