You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org> on 2011/05/09 15:33:03 UTC
[jira] [Updated] (HIVE-2056) Generate single MR job for multi
groupby query.
[ https://issues.apache.org/jira/browse/HIVE-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amareshwari Sriramadasu updated HIVE-2056:
------------------------------------------
Attachment: patch-2056.txt
Attached patch generates a single M/R job for multi group by query with non-null common group by key set. Added configuration hive.multigroupby.singlemr to turn on and off the optimization.
It considers no-distinct or single common distinct expression; did not multi distinct expressions yet. Will do in a follow up if required.
Performance numbers:
||Number of rows in table|| Query || Time taken by 3 M/R jobs plan || Time taken by Single M/R job plan||
|100 | query1| 58.416 seconds |22.099 seconds|
|33682 million | query1 | Did not succeed | 11434.308 seconds|
|33682 million | query2 | 2hrs, 48mins, 15sec |16mins, 3sec.|
Query1 did not succeed with 33682 million row table with existing plan. Reducers failed with OOM after 12 hours. I tried many combinations of number of reducers and Xmx values, but in vain.
Verified the correctness for 100 row table row by row; and number of rows in the result for 33682 million rows table.
> Generate single MR job for multi groupby query.
> -----------------------------------------------
>
> Key: HIVE-2056
> URL: https://issues.apache.org/jira/browse/HIVE-2056
> Project: Hive
> Issue Type: Improvement
> Reporter: Amareshwari Sriramadasu
> Assignee: Amareshwari Sriramadasu
> Attachments: patch-2056.txt
>
>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira