You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Prajakta Kalmegh <pk...@gmail.com> on 2011/08/01 18:42:36 UTC

Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes


> On 2011-07-28 21:40:30, John Sichi wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java, line 61
> > <https://reviews.apache.org/r/1194/diff/1/?file=27052#file27052line61>
> >
> >     Please run ant checkstyle and fix all the formatting discrepancies it reports for your new files.
> >

Done! The code is still having checkstyle formatting errors only for places where we have used LinkedHashMap, HashMap and ArrayList. The error states "Declaring variables, return values or parameters of type 'HashMap' is not allowed".


> On 2011-07-28 21:40:30, John Sichi wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java, line 184
> > <https://reviews.apache.org/r/1194/diff/1/?file=27052#file27052line184>
> >
> >     Don't you need to reuse the compact implementation here so that the index can be used for WHERE (not just GROUP BY)?
> >

The AggregateIndexHandler now extends from CompactIndexHandler instead of TableBasedIndexHandler. We override only analyzeIndexDefinition(...) and getIndexBuilderMapRedTask(...) methods from CompactIndexHandler.


> On 2011-07-28 21:40:30, John Sichi wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 700
> > <https://reviews.apache.org/r/1194/diff/1/?file=27054#file27054line700>
> >
> >     This method is redundant now.

Removed. Sorry to have missed that.


> On 2011-07-28 21:40:30, John Sichi wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java, line 252
> > <https://reviews.apache.org/r/1194/diff/1/?file=27056#file27056line252>
> >
> >     I can't think of a case where it would be worse.

Ok.


> On 2011-07-28 21:40:30, John Sichi wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java, line 164
> > <https://reviews.apache.org/r/1194/diff/1/?file=27057#file27057line164>
> >
> >     Actually group-by is now preserved in all cases.

Forgot to change a few comments after fixing the bug. Done!


> On 2011-07-28 21:40:30, John Sichi wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java, line 66
> > <https://reviews.apache.org/r/1194/diff/1/?file=27058#file27058line66>
> >
> >     Please use HTML bullet syntax for Javadoc (otherwise it all gets run together into one line when rendered).
> >

Done!


> On 2011-07-28 21:40:30, John Sichi wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java, line 93
> > <https://reviews.apache.org/r/1194/diff/1/?file=27060#file27060line93>
> >
> >     Shouldn't this be BIGINT?
> >     
> >     Also, I think you're supposed to use a TypeInfoFactory for this purpose.

Yes. Changed it to bigint. Also changed it in AggregateIndexHandler where I had declared the type to be "int".


> On 2011-07-28 21:40:30, John Sichi wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java, line 603
> > <https://reviews.apache.org/r/1194/diff/1/?file=27062#file27062line603>
> >
> >     Not sure why this new constructor is needed...after using it, all you do is get the table out of it.

The only other constructor option for tableSpec needs the ASTNode as one of its parameters. Since we need to construct a new tableSpec using only the index table name, and we do not have a ASTNode for this, I need this constructor. If you have any other way in mind, please let me know. That would be helpful.


> On 2011-07-28 21:40:30, John Sichi wrote:
> > ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q, line 27
> > <https://reviews.apache.org/r/1194/diff/1/?file=27063#file27063line27>
> >
> >     This should *not* be using the index, since the index is built on count(l_shipdate), and l_shipdate may contain nulls, whereas the query is referencing count(1), which is insensitive to nulls.

Yes true. :(
I have now changed the count(1) queries with count(l_shipdate) in ql_rewrite_gbtoidx.q file. Also, verified that the count(1) queries are not using the index.


> On 2011-07-28 21:40:30, John Sichi wrote:
> > ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q, line 61
> > <https://reviews.apache.org/r/1194/diff/1/?file=27063#file27063line61>
> >
> >     Need additional tests to verify all the cases where the optimization should *not* be used:
> >     
> >     * when configuration disables it
> >     * when index partitions do not cover table partitions (I still don't see the code for this case)
> >     * ... all the other conditions checked for in the code ...
> >

Added new queries to verify that optimization is not used in case of:
* when configuration disables it
* ... all the other conditions checked for in the code ...

About "* when index partitions do not cover table partitions ", still pending (also the code for it). I will upload the new patch once this is done.


- Prajakta


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1194/#review1212
-----------------------------------------------------------


On 2011-07-26 14:44:01, Prajakta Kalmegh wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/1194/
> -----------------------------------------------------------
> 
> (Updated 2011-07-26 14:44:01)
> 
> 
> Review request for hive and John Sichi.
> 
> 
> Summary
> -------
> 
> This patch has defined a new AggregateIndexHandler which is used to optimize the query plan for groupby queries. 
> 
> 
> This addresses bug HIVE-1694.
>     https://issues.apache.org/jira/browse/HIVE-1694
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b46976f 
>   ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 2ca63b3 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 77a6dc6 
>   ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION 
>   ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/1194/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Prajakta
> 
>