You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org> on 2011/08/01 18:44:20 UTC
[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes

    [ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13073604#comment-13073604 ] 

jiraposter@reviews.apache.org commented on HIVE-1694:
-----------------------------------------------------



bq.  On 2011-07-28 21:40:30, John Sichi wrote:
bq.  > ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java, line 61
bq.  > <https://reviews.apache.org/r/1194/diff/1/?file=27052#file27052line61>
bq.  >
bq.  >     Please run ant checkstyle and fix all the formatting discrepancies it reports for your new files.
bq.  >

Done! The code is still having checkstyle formatting errors only for places where we have used LinkedHashMap, HashMap and ArrayList. The error states "Declaring variables, return values or parameters of type 'HashMap' is not allowed".


bq.  On 2011-07-28 21:40:30, John Sichi wrote:
bq.  > ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java, line 184
bq.  > <https://reviews.apache.org/r/1194/diff/1/?file=27052#file27052line184>
bq.  >
bq.  >     Don't you need to reuse the compact implementation here so that the index can be used for WHERE (not just GROUP BY)?
bq.  >

The AggregateIndexHandler now extends from CompactIndexHandler instead of TableBasedIndexHandler. We override only analyzeIndexDefinition(...) and getIndexBuilderMapRedTask(...) methods from CompactIndexHandler.


bq.  On 2011-07-28 21:40:30, John Sichi wrote:
bq.  > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 700
bq.  > <https://reviews.apache.org/r/1194/diff/1/?file=27054#file27054line700>
bq.  >
bq.  >     This method is redundant now.

Removed. Sorry to have missed that.


bq.  On 2011-07-28 21:40:30, John Sichi wrote:
bq.  > ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java, line 252
bq.  > <https://reviews.apache.org/r/1194/diff/1/?file=27056#file27056line252>
bq.  >
bq.  >     I can't think of a case where it would be worse.

Ok.


bq.  On 2011-07-28 21:40:30, John Sichi wrote:
bq.  > ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java, line 164
bq.  > <https://reviews.apache.org/r/1194/diff/1/?file=27057#file27057line164>
bq.  >
bq.  >     Actually group-by is now preserved in all cases.

Forgot to change a few comments after fixing the bug. Done!


bq.  On 2011-07-28 21:40:30, John Sichi wrote:
bq.  > ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java, line 66
bq.  > <https://reviews.apache.org/r/1194/diff/1/?file=27058#file27058line66>
bq.  >
bq.  >     Please use HTML bullet syntax for Javadoc (otherwise it all gets run together into one line when rendered).
bq.  >

Done!


bq.  On 2011-07-28 21:40:30, John Sichi wrote:
bq.  > ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java, line 93
bq.  > <https://reviews.apache.org/r/1194/diff/1/?file=27060#file27060line93>
bq.  >
bq.  >     Shouldn't this be BIGINT?
bq.  >     
bq.  >     Also, I think you're supposed to use a TypeInfoFactory for this purpose.

Yes. Changed it to bigint. Also changed it in AggregateIndexHandler where I had declared the type to be "int".


bq.  On 2011-07-28 21:40:30, John Sichi wrote:
bq.  > ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java, line 603
bq.  > <https://reviews.apache.org/r/1194/diff/1/?file=27062#file27062line603>
bq.  >
bq.  >     Not sure why this new constructor is needed...after using it, all you do is get the table out of it.

The only other constructor option for tableSpec needs the ASTNode as one of its parameters. Since we need to construct a new tableSpec using only the index table name, and we do not have a ASTNode for this, I need this constructor. If you have any other way in mind, please let me know. That would be helpful.


bq.  On 2011-07-28 21:40:30, John Sichi wrote:
bq.  > ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q, line 27
bq.  > <https://reviews.apache.org/r/1194/diff/1/?file=27063#file27063line27>
bq.  >
bq.  >     This should *not* be using the index, since the index is built on count(l_shipdate), and l_shipdate may contain nulls, whereas the query is referencing count(1), which is insensitive to nulls.

Yes true. :(
I have now changed the count(1) queries with count(l_shipdate) in ql_rewrite_gbtoidx.q file. Also, verified that the count(1) queries are not using the index.


bq.  On 2011-07-28 21:40:30, John Sichi wrote:
bq.  > ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q, line 61
bq.  > <https://reviews.apache.org/r/1194/diff/1/?file=27063#file27063line61>
bq.  >
bq.  >     Need additional tests to verify all the cases where the optimization should *not* be used:
bq.  >     
bq.  >     * when configuration disables it
bq.  >     * when index partitions do not cover table partitions (I still don't see the code for this case)
bq.  >     * ... all the other conditions checked for in the code ...
bq.  >

Added new queries to verify that optimization is not used in case of:
* when configuration disables it
* ... all the other conditions checked for in the code ...

About "* when index partitions do not cover table partitions ", still pending (also the code for it). I will upload the new patch once this is done.


- Prajakta


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1194/#review1212
-----------------------------------------------------------


On 2011-07-26 14:44:01, Prajakta Kalmegh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1194/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-07-26 14:44:01)
bq.  
bq.  
bq.  Review request for hive and John Sichi.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This patch has defined a new AggregateIndexHandler which is used to optimize the query plan for groupby queries. 
bq.  
bq.  
bq.  This addresses bug HIVE-1694.
bq.      https://issues.apache.org/jira/browse/HIVE-1694
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b46976f 
bq.    ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java PRE-CREATION 
bq.    ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff 
bq.    ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 2ca63b3 
bq.    ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a 
bq.    ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java PRE-CREATION 
bq.    ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java PRE-CREATION 
bq.    ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java PRE-CREATION 
bq.    ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java PRE-CREATION 
bq.    ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java PRE-CREATION 
bq.    ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java PRE-CREATION 
bq.    ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 77a6dc6 
bq.    ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION 
bq.    ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/1194/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Prajakta
bq.  
bq.



> Accelerate GROUP BY execution using indexes
> -------------------------------------------
>
>                 Key: HIVE-1694
>                 URL: https://issues.apache.org/jira/browse/HIVE-1694
>             Project: Hive
>          Issue Type: New Feature
>          Components: Indexing, Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Nikhil Deshpande
>            Assignee: Prajakta Kalmegh
>         Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql
>
>
> The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler & execution engine for SELECT queries.
> This is in ref. to John's comment at
> https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
> on creating separate JIRA issue for tracking index usage in optimizer & query execution.
> The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g.
> - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?)
> - Joins (index based joins)
> - Group By, Order By and other misc cases
> The proposal is multi-step:
> 1. Building index based operators, compiler and execution engine changes
> 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.)
> This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans & operator implementations for above mentioned cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira