You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org> on 2011/03/15 17:27:29 UTC

[jira] Created: (HIVE-2056) Generate single MR job for multi groupby query.

Generate single MR job for multi groupby query.
-----------------------------------------------

                 Key: HIVE-2056
                 URL: https://issues.apache.org/jira/browse/HIVE-2056
             Project: Hive
          Issue Type: Improvement
            Reporter: Amareshwari Sriramadasu
            Assignee: Amareshwari Sriramadasu




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2056) Generate single MR job for multi groupby query.

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013938#comment-13013938 ] 

Amareshwari Sriramadasu commented on HIVE-2056:
-----------------------------------------------

For a query of the form,
"From table T
 insert overwrite table test1 select col1, count(distinct colx) group by col1
 insert overwrite table test2 select col2, count(distinct colx) group by col2;" 
it is not possible to generate a single M/R job, because partitioning the input row by both col1 and col2 in a single stage does not work here. 
If the groupby keys are such that one keyset is a subset of the other, i.e. of the following form: 
"From table T 
insert overwrite table test1 select col1, count(distinct colx) group by col1 
insert overwrite table test2 select col1, col2, count(distinct colx) group by col1, col2;", 
we can run it in a single MR job by spraying over common groupby keyset( i.e. col1). Will implement this and see if it reduces query execution time.

Thoughts? 



> Generate single MR job for multi groupby query.
> -----------------------------------------------
>
>                 Key: HIVE-2056
>                 URL: https://issues.apache.org/jira/browse/HIVE-2056
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2056) Generate single MR job for multi groupby query.

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030729#comment-13030729 ] 

jiraposter@reviews.apache.org commented on HIVE-2056:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/700/
-----------------------------------------------------------

Review request for hive.


Summary
-------

Attached patch generates a single M/R job for multi group by query with non-null common group by key set. Added configuration hive.multigroupby.singlemr to turn on and off the optimization. 


This addresses bug HIVE-2056.
    https://issues.apache.org/jira/browse/HIVE-2056


Diffs
-----

  trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1100910 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1100910 
  trunk/ql/src/test/queries/clientpositive/groupby10.q 1100910 
  trunk/ql/src/test/queries/clientpositive/groupby8.q 1100910 
  trunk/ql/src/test/queries/clientpositive/groupby8_noskew.q 1100910 
  trunk/ql/src/test/queries/clientpositive/groupby9.q 1100910 
  trunk/ql/src/test/results/clientpositive/groupby10.q.out 1100910 
  trunk/ql/src/test/results/clientpositive/groupby8.q.out 1100910 
  trunk/ql/src/test/results/clientpositive/groupby9.q.out 1100910 

Diff: https://reviews.apache.org/r/700/diff


Testing
-------

Updated jira with performance tests.


Thanks,

Amareshwari



> Generate single MR job for multi groupby query.
> -----------------------------------------------
>
>                 Key: HIVE-2056
>                 URL: https://issues.apache.org/jira/browse/HIVE-2056
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: patch-2056.txt
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2056) Generate single MR job for multi groupby query.

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-2056:
------------------------------------------

    Attachment: patch-2056-1.txt

Thanks Namit for the review.

Updated the patch to do prefix matching and added a testcase.

> Generate single MR job for multi groupby query.
> -----------------------------------------------
>
>                 Key: HIVE-2056
>                 URL: https://issues.apache.org/jira/browse/HIVE-2056
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: patch-2056-1.txt, patch-2056.txt
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2056) Generate single MR job for multi groupby query if hive.multigroupby.singlemr is enabled.

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-2056:
------------------------------------------

    Summary: Generate single MR job for multi groupby query if hive.multigroupby.singlemr is enabled.  (was: Generate single MR job for multi groupby query.)

> Generate single MR job for multi groupby query if hive.multigroupby.singlemr is enabled.
> ----------------------------------------------------------------------------------------
>
>                 Key: HIVE-2056
>                 URL: https://issues.apache.org/jira/browse/HIVE-2056
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: patch-2056-1.txt, patch-2056-2.txt, patch-2056.txt
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HIVE-2056) Generate single MR job for multi groupby query.

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006995#comment-13006995 ] 

Amareshwari Sriramadasu commented on HIVE-2056:
-----------------------------------------------

Here is a request from one of our customers:

here is a real example of need to have multi group by with 1 M/R. If
you look at the query below, we have two aggregates being generated out of single fact table. The 1st aggregate
generates unique count by date and the 2nd one generates unique count by date and gender. We have lot of
these aggregates to be built. We would like this to be done in 1 M/R job as against three below. Is it possible to do
this in Hive?

// created two intermediate tables

hive> create table test_1 (dt string, bc_cnt bigint);

OK

Time taken: 9.004 seconds

hive> create table test_2 (dt string, gender string, bc_cnt bigint);

OK



// multi group by in insert statement



hive> from fact_table f

    > insert overwrite table test_1 select dt, count(distinct id) group by dt

    > insert overwrite table test_2 select dt,gender,count(distinct id) group by dt,gender;

Total MapReduce jobs = 3

Launching Job 1 out of 3

Number of reduce tasks not specified. Estimated from input data size: 999

In order to change the average load for a reducer (in bytes):

  set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

  set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

  set mapred.reduce.tasks=<number>



Thanks

Sudhish



> Generate single MR job for multi groupby query.
> -----------------------------------------------
>
>                 Key: HIVE-2056
>                 URL: https://issues.apache.org/jira/browse/HIVE-2056
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2056) Generate single MR job for multi groupby query if hive.multigroupby.singlemr is enabled.

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-2056:
-----------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

Committed. Thanks Amareshwari

> Generate single MR job for multi groupby query if hive.multigroupby.singlemr is enabled.
> ----------------------------------------------------------------------------------------
>
>                 Key: HIVE-2056
>                 URL: https://issues.apache.org/jira/browse/HIVE-2056
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: patch-2056-1.txt, patch-2056-2.txt, patch-2056.txt
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2056) Generate single MR job for multi groupby query.

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-2056:
------------------------------------------

    Attachment: patch-2056-2.txt

Earlier patch missed a comment. Updated this diff on review board.

> Generate single MR job for multi groupby query.
> -----------------------------------------------
>
>                 Key: HIVE-2056
>                 URL: https://issues.apache.org/jira/browse/HIVE-2056
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: patch-2056-1.txt, patch-2056-2.txt, patch-2056.txt
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2056) Generate single MR job for multi groupby query.

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-2056:
------------------------------------------

    Fix Version/s: 0.8.0
           Status: Patch Available  (was: Open)

> Generate single MR job for multi groupby query.
> -----------------------------------------------
>
>                 Key: HIVE-2056
>                 URL: https://issues.apache.org/jira/browse/HIVE-2056
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: patch-2056.txt
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2056) Generate single MR job for multi groupby query.

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030816#comment-13030816 ] 

Namit Jain commented on HIVE-2056:
----------------------------------

Changed to 'Cancel Patch' for the comments above

> Generate single MR job for multi groupby query.
> -----------------------------------------------
>
>                 Key: HIVE-2056
>                 URL: https://issues.apache.org/jira/browse/HIVE-2056
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: patch-2056.txt
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2056) Generate single MR job for multi groupby query.

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-2056:
-----------------------------

    Status: Open  (was: Patch Available)

> Generate single MR job for multi groupby query.
> -----------------------------------------------
>
>                 Key: HIVE-2056
>                 URL: https://issues.apache.org/jira/browse/HIVE-2056
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: patch-2056.txt
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2056) Generate single MR job for multi groupby query.

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030814#comment-13030814 ] 

jiraposter@reviews.apache.org commented on HIVE-2056:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/700/#review651
-----------------------------------------------------------


Change hive-default.xml with the new parameter.
Add the new parameter in the name of the jira.


trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
<https://reviews.apache.org/r/700/#comment1306>

    Add a comment - this optimization is not enabled
    if one of the sub-queries does not involve a 
    aggregation



trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
<https://reviews.apache.org/r/700/#comment1307>

    The code is not preforming a prefix match.
    I mean,
    
    if the query is:
    
    from T
    insert overwrite T1 select ... group by c1
    insert overwrite T1 select ... group by c2, c1
    
    
    c1 will still be returned.
    
    Is that desirable ?
    
    I dont think this will work - can you add a testcase
    for this - I mean, with a explain which shows that
    the parameter does not make a difference
    


- namit


On 2011-05-09 13:36:28, Amareshwari Sriramadasu wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/700/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-05-09 13:36:28)
bq.  
bq.  
bq.  Review request for hive.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Attached patch generates a single M/R job for multi group by query with non-null common group by key set. Added configuration hive.multigroupby.singlemr to turn on and off the optimization. 
bq.  
bq.  
bq.  This addresses bug HIVE-2056.
bq.      https://issues.apache.org/jira/browse/HIVE-2056
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1100910 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1100910 
bq.    trunk/ql/src/test/queries/clientpositive/groupby10.q 1100910 
bq.    trunk/ql/src/test/queries/clientpositive/groupby8.q 1100910 
bq.    trunk/ql/src/test/queries/clientpositive/groupby8_noskew.q 1100910 
bq.    trunk/ql/src/test/queries/clientpositive/groupby9.q 1100910 
bq.    trunk/ql/src/test/results/clientpositive/groupby10.q.out 1100910 
bq.    trunk/ql/src/test/results/clientpositive/groupby8.q.out 1100910 
bq.    trunk/ql/src/test/results/clientpositive/groupby9.q.out 1100910 
bq.  
bq.  Diff: https://reviews.apache.org/r/700/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Updated jira with performance tests.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Amareshwari
bq.  
bq.



> Generate single MR job for multi groupby query.
> -----------------------------------------------
>
>                 Key: HIVE-2056
>                 URL: https://issues.apache.org/jira/browse/HIVE-2056
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: patch-2056.txt
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2056) Generate single MR job for multi groupby query.

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-2056:
------------------------------------------

    Attachment: patch-2056.txt

Attached patch generates a single M/R job for multi group by query with non-null common group by key set. Added configuration hive.multigroupby.singlemr to turn on and off the optimization. 
It considers no-distinct or single common distinct expression; did not multi distinct expressions yet. Will do in a follow up if required.

Performance numbers:
||Number of rows in table|| Query || Time taken by 3 M/R jobs plan || Time taken by Single M/R job plan||
|100 | query1| 58.416 seconds |22.099 seconds| 
|33682 million | query1 | Did not succeed | 11434.308 seconds|
|33682 million | query2 | 2hrs, 48mins, 15sec |16mins, 3sec.|

Query1 did not succeed with 33682 million row table with existing plan. Reducers failed with OOM after 12 hours. I tried many combinations of number of reducers and Xmx values, but in vain.
Verified the correctness for 100 row table row by row; and number of rows in the result for 33682 million rows table. 


> Generate single MR job for multi groupby query.
> -----------------------------------------------
>
>                 Key: HIVE-2056
>                 URL: https://issues.apache.org/jira/browse/HIVE-2056
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-2056.txt
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2056) Generate single MR job for multi groupby query.

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031685#comment-13031685 ] 

jiraposter@reviews.apache.org commented on HIVE-2056:
-----------------------------------------------------



bq.  On 2011-05-09 17:07:16, namit jain wrote:
bq.  > Change hive-default.xml with the new parameter.
bq.  > Add the new parameter in the name of the jira.

Done


bq.  On 2011-05-09 17:07:16, namit jain wrote:
bq.  > trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java, line 5518
bq.  > <https://reviews.apache.org/r/700/diff/1/?file=18439#file18439line5518>
bq.  >
bq.  >     Add a comment - this optimization is not enabled
bq.  >     if one of the sub-queries does not involve a 
bq.  >     aggregation

Done


bq.  On 2011-05-09 17:07:16, namit jain wrote:
bq.  > trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java, line 5524
bq.  > <https://reviews.apache.org/r/700/diff/1/?file=18439#file18439line5524>
bq.  >
bq.  >     The code is not preforming a prefix match.
bq.  >     I mean,
bq.  >     
bq.  >     if the query is:
bq.  >     
bq.  >     from T
bq.  >     insert overwrite T1 select ... group by c1
bq.  >     insert overwrite T1 select ... group by c2, c1
bq.  >     
bq.  >     
bq.  >     c1 will still be returned.
bq.  >     
bq.  >     Is that desirable ?
bq.  >     
bq.  >     I dont think this will work - can you add a testcase
bq.  >     for this - I mean, with a explain which shows that
bq.  >     the parameter does not make a difference
bq.  >

Agreed. I missed this.
Updated the patch with prefix matching


- Amareshwari


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/700/#review651
-----------------------------------------------------------


On 2011-05-11 13:14:36, Amareshwari Sriramadasu wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/700/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-05-11 13:14:36)
bq.  
bq.  
bq.  Review request for hive.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Attached patch generates a single M/R job for multi group by query with non-null common group by key set. Added configuration hive.multigroupby.singlemr to turn on and off the optimization. 
bq.  
bq.  
bq.  This addresses bug HIVE-2056.
bq.      https://issues.apache.org/jira/browse/HIVE-2056
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1100910 
bq.    trunk/conf/hive-default.xml 1100910 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1100910 
bq.    trunk/ql/src/test/queries/clientpositive/groupby10.q 1100910 
bq.    trunk/ql/src/test/queries/clientpositive/groupby8.q 1100910 
bq.    trunk/ql/src/test/queries/clientpositive/groupby8_noskew.q 1100910 
bq.    trunk/ql/src/test/queries/clientpositive/groupby9.q 1100910 
bq.    trunk/ql/src/test/queries/clientpositive/multigroupby_singlemr.q PRE-CREATION 
bq.    trunk/ql/src/test/results/clientpositive/groupby10.q.out 1100910 
bq.    trunk/ql/src/test/results/clientpositive/groupby8.q.out 1100910 
bq.    trunk/ql/src/test/results/clientpositive/groupby9.q.out 1100910 
bq.    trunk/ql/src/test/results/clientpositive/multigroupby_singlemr.q.out PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/700/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Updated jira with performance tests.
bq.  
bq.  All unit tests passed with the patch
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Amareshwari
bq.  
bq.



> Generate single MR job for multi groupby query.
> -----------------------------------------------
>
>                 Key: HIVE-2056
>                 URL: https://issues.apache.org/jira/browse/HIVE-2056
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: patch-2056-1.txt, patch-2056-2.txt, patch-2056.txt
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2056) Generate single MR job for multi groupby query.

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030802#comment-13030802 ] 

Namit Jain commented on HIVE-2056:
----------------------------------

Great work, Amareshwari - I will take a look.

This should be very useful.

> Generate single MR job for multi groupby query.
> -----------------------------------------------
>
>                 Key: HIVE-2056
>                 URL: https://issues.apache.org/jira/browse/HIVE-2056
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: patch-2056.txt
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2056) Generate single MR job for multi groupby query if hive.multigroupby.singlemr is enabled.

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031779#comment-13031779 ] 

Namit Jain commented on HIVE-2056:
----------------------------------

+1

> Generate single MR job for multi groupby query if hive.multigroupby.singlemr is enabled.
> ----------------------------------------------------------------------------------------
>
>                 Key: HIVE-2056
>                 URL: https://issues.apache.org/jira/browse/HIVE-2056
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: patch-2056-1.txt, patch-2056-2.txt, patch-2056.txt
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2056) Generate single MR job for multi groupby query if hive.multigroupby.singlemr is enabled.

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-2056:
------------------------------------------

    Status: Patch Available  (was: Open)

> Generate single MR job for multi groupby query if hive.multigroupby.singlemr is enabled.
> ----------------------------------------------------------------------------------------
>
>                 Key: HIVE-2056
>                 URL: https://issues.apache.org/jira/browse/HIVE-2056
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: patch-2056-1.txt, patch-2056-2.txt, patch-2056.txt
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2056) Generate single MR job for multi groupby query.

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031684#comment-13031684 ] 

jiraposter@reviews.apache.org commented on HIVE-2056:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/700/
-----------------------------------------------------------

(Updated 2011-05-11 13:14:36.754720)


Review request for hive.


Changes
-------

Updated the patch to do prefix matching and added a testcase.


Summary
-------

Attached patch generates a single M/R job for multi group by query with non-null common group by key set. Added configuration hive.multigroupby.singlemr to turn on and off the optimization. 


This addresses bug HIVE-2056.
    https://issues.apache.org/jira/browse/HIVE-2056


Diffs (updated)
-----

  trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1100910 
  trunk/conf/hive-default.xml 1100910 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1100910 
  trunk/ql/src/test/queries/clientpositive/groupby10.q 1100910 
  trunk/ql/src/test/queries/clientpositive/groupby8.q 1100910 
  trunk/ql/src/test/queries/clientpositive/groupby8_noskew.q 1100910 
  trunk/ql/src/test/queries/clientpositive/groupby9.q 1100910 
  trunk/ql/src/test/queries/clientpositive/multigroupby_singlemr.q PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby10.q.out 1100910 
  trunk/ql/src/test/results/clientpositive/groupby8.q.out 1100910 
  trunk/ql/src/test/results/clientpositive/groupby9.q.out 1100910 
  trunk/ql/src/test/results/clientpositive/multigroupby_singlemr.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/700/diff


Testing (updated)
-------

Updated jira with performance tests.

All unit tests passed with the patch


Thanks,

Amareshwari



> Generate single MR job for multi groupby query.
> -----------------------------------------------
>
>                 Key: HIVE-2056
>                 URL: https://issues.apache.org/jira/browse/HIVE-2056
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: patch-2056-1.txt, patch-2056-2.txt, patch-2056.txt
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira