You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Amareshwari Sriramadasu <am...@apache.org> on 2011/05/09 15:36:29 UTC

Review Request: HIVE-2056

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/700/
-----------------------------------------------------------

Review request for hive.


Summary
-------

Attached patch generates a single M/R job for multi group by query with non-null common group by key set. Added configuration hive.multigroupby.singlemr to turn on and off the optimization. 


This addresses bug HIVE-2056.
    https://issues.apache.org/jira/browse/HIVE-2056


Diffs
-----

  trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1100910 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1100910 
  trunk/ql/src/test/queries/clientpositive/groupby10.q 1100910 
  trunk/ql/src/test/queries/clientpositive/groupby8.q 1100910 
  trunk/ql/src/test/queries/clientpositive/groupby8_noskew.q 1100910 
  trunk/ql/src/test/queries/clientpositive/groupby9.q 1100910 
  trunk/ql/src/test/results/clientpositive/groupby10.q.out 1100910 
  trunk/ql/src/test/results/clientpositive/groupby8.q.out 1100910 
  trunk/ql/src/test/results/clientpositive/groupby9.q.out 1100910 

Diff: https://reviews.apache.org/r/700/diff


Testing
-------

Updated jira with performance tests.


Thanks,

Amareshwari


Re: Review Request: HIVE-2056

Posted by Amareshwari Sriramadasu <am...@apache.org>.

> On 2011-05-09 17:07:16, namit jain wrote:
> > Change hive-default.xml with the new parameter.
> > Add the new parameter in the name of the jira.

Done


> On 2011-05-09 17:07:16, namit jain wrote:
> > trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java, line 5518
> > <https://reviews.apache.org/r/700/diff/1/?file=18439#file18439line5518>
> >
> >     Add a comment - this optimization is not enabled
> >     if one of the sub-queries does not involve a 
> >     aggregation

Done


> On 2011-05-09 17:07:16, namit jain wrote:
> > trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java, line 5524
> > <https://reviews.apache.org/r/700/diff/1/?file=18439#file18439line5524>
> >
> >     The code is not preforming a prefix match.
> >     I mean,
> >     
> >     if the query is:
> >     
> >     from T
> >     insert overwrite T1 select ... group by c1
> >     insert overwrite T1 select ... group by c2, c1
> >     
> >     
> >     c1 will still be returned.
> >     
> >     Is that desirable ?
> >     
> >     I dont think this will work - can you add a testcase
> >     for this - I mean, with a explain which shows that
> >     the parameter does not make a difference
> >

Agreed. I missed this.
Updated the patch with prefix matching


- Amareshwari


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/700/#review651
-----------------------------------------------------------


On 2011-05-11 13:14:36, Amareshwari Sriramadasu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/700/
> -----------------------------------------------------------
> 
> (Updated 2011-05-11 13:14:36)
> 
> 
> Review request for hive.
> 
> 
> Summary
> -------
> 
> Attached patch generates a single M/R job for multi group by query with non-null common group by key set. Added configuration hive.multigroupby.singlemr to turn on and off the optimization. 
> 
> 
> This addresses bug HIVE-2056.
>     https://issues.apache.org/jira/browse/HIVE-2056
> 
> 
> Diffs
> -----
> 
>   trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1100910 
>   trunk/conf/hive-default.xml 1100910 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1100910 
>   trunk/ql/src/test/queries/clientpositive/groupby10.q 1100910 
>   trunk/ql/src/test/queries/clientpositive/groupby8.q 1100910 
>   trunk/ql/src/test/queries/clientpositive/groupby8_noskew.q 1100910 
>   trunk/ql/src/test/queries/clientpositive/groupby9.q 1100910 
>   trunk/ql/src/test/queries/clientpositive/multigroupby_singlemr.q PRE-CREATION 
>   trunk/ql/src/test/results/clientpositive/groupby10.q.out 1100910 
>   trunk/ql/src/test/results/clientpositive/groupby8.q.out 1100910 
>   trunk/ql/src/test/results/clientpositive/groupby9.q.out 1100910 
>   trunk/ql/src/test/results/clientpositive/multigroupby_singlemr.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/700/diff
> 
> 
> Testing
> -------
> 
> Updated jira with performance tests.
> 
> All unit tests passed with the patch
> 
> 
> Thanks,
> 
> Amareshwari
> 
>


Re: Review Request: HIVE-2056

Posted by namit jain <nj...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/700/#review651
-----------------------------------------------------------


Change hive-default.xml with the new parameter.
Add the new parameter in the name of the jira.


trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
<https://reviews.apache.org/r/700/#comment1306>

    Add a comment - this optimization is not enabled
    if one of the sub-queries does not involve a 
    aggregation



trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
<https://reviews.apache.org/r/700/#comment1307>

    The code is not preforming a prefix match.
    I mean,
    
    if the query is:
    
    from T
    insert overwrite T1 select ... group by c1
    insert overwrite T1 select ... group by c2, c1
    
    
    c1 will still be returned.
    
    Is that desirable ?
    
    I dont think this will work - can you add a testcase
    for this - I mean, with a explain which shows that
    the parameter does not make a difference
    


- namit


On 2011-05-09 13:36:28, Amareshwari Sriramadasu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/700/
> -----------------------------------------------------------
> 
> (Updated 2011-05-09 13:36:28)
> 
> 
> Review request for hive.
> 
> 
> Summary
> -------
> 
> Attached patch generates a single M/R job for multi group by query with non-null common group by key set. Added configuration hive.multigroupby.singlemr to turn on and off the optimization. 
> 
> 
> This addresses bug HIVE-2056.
>     https://issues.apache.org/jira/browse/HIVE-2056
> 
> 
> Diffs
> -----
> 
>   trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1100910 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1100910 
>   trunk/ql/src/test/queries/clientpositive/groupby10.q 1100910 
>   trunk/ql/src/test/queries/clientpositive/groupby8.q 1100910 
>   trunk/ql/src/test/queries/clientpositive/groupby8_noskew.q 1100910 
>   trunk/ql/src/test/queries/clientpositive/groupby9.q 1100910 
>   trunk/ql/src/test/results/clientpositive/groupby10.q.out 1100910 
>   trunk/ql/src/test/results/clientpositive/groupby8.q.out 1100910 
>   trunk/ql/src/test/results/clientpositive/groupby9.q.out 1100910 
> 
> Diff: https://reviews.apache.org/r/700/diff
> 
> 
> Testing
> -------
> 
> Updated jira with performance tests.
> 
> 
> Thanks,
> 
> Amareshwari
> 
>


Re: Review Request: HIVE-2056

Posted by Amareshwari Sriramadasu <am...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/700/
-----------------------------------------------------------

(Updated 2011-05-11 13:14:36.754720)


Review request for hive.


Changes
-------

Updated the patch to do prefix matching and added a testcase.


Summary
-------

Attached patch generates a single M/R job for multi group by query with non-null common group by key set. Added configuration hive.multigroupby.singlemr to turn on and off the optimization. 


This addresses bug HIVE-2056.
    https://issues.apache.org/jira/browse/HIVE-2056


Diffs (updated)
-----

  trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1100910 
  trunk/conf/hive-default.xml 1100910 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1100910 
  trunk/ql/src/test/queries/clientpositive/groupby10.q 1100910 
  trunk/ql/src/test/queries/clientpositive/groupby8.q 1100910 
  trunk/ql/src/test/queries/clientpositive/groupby8_noskew.q 1100910 
  trunk/ql/src/test/queries/clientpositive/groupby9.q 1100910 
  trunk/ql/src/test/queries/clientpositive/multigroupby_singlemr.q PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby10.q.out 1100910 
  trunk/ql/src/test/results/clientpositive/groupby8.q.out 1100910 
  trunk/ql/src/test/results/clientpositive/groupby9.q.out 1100910 
  trunk/ql/src/test/results/clientpositive/multigroupby_singlemr.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/700/diff


Testing (updated)
-------

Updated jira with performance tests.

All unit tests passed with the patch


Thanks,

Amareshwari