You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Prasanth Jayachandran (JIRA)" <ji...@apache.org> on 2017/09/27 23:26:00 UTC
[jira] [Created] (HIVE-17626) Query reoptimization using cached
runtime statistics
Prasanth Jayachandran created HIVE-17626:
--------------------------------------------
Summary: Query reoptimization using cached runtime statistics
Key: HIVE-17626
URL: https://issues.apache.org/jira/browse/HIVE-17626
Project: Hive
Issue Type: New Feature
Components: Logical Optimizer
Affects Versions: 3.0.0
Reporter: Prasanth Jayachandran
Something similar to "EXPLAIN ANALYZE" where we annotate explain plan with actual and estimated statistics. The runtime stats can be cached at query level and subsequent execution of the same query can make use of the cached statistics from the previous run for better optimization.
Some use cases,
1) re-planning join query (mapjoin failures can be converted to shuffle joins)
2) better statistics for table scan operator if dynamic partition pruning is involved
3) Better estimates for bloom filter initialization (setting expected entries during merge)
This can extended to support wider queries by caching fragments of operator plans scanning same table(s) or matching some operator sequences.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)