You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "David Phillips (JIRA)" <ji...@apache.org> on 2009/01/09 17:02:01 UTC
[jira] Created: (HIVE-219) Map-side aggregates output one row per
reducer when not grouping
Map-side aggregates output one row per reducer when not grouping
----------------------------------------------------------------
Key: HIVE-219
URL: https://issues.apache.org/jira/browse/HIVE-219
Project: Hadoop Hive
Issue Type: Bug
Components: Query Processor
Reporter: David Phillips
Priority: Critical
Example: SELECT count(1) FROM table;
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-219) Map-side aggregates output one row per
reducer when not grouping
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Namit Jain resolved HIVE-219.
-----------------------------
Resolution: Duplicate
Fix Version/s: 0.2.0
Assignee: Namit Jain
marking duplicate of 256
> Map-side aggregates output one row per reducer when not grouping
> ----------------------------------------------------------------
>
> Key: HIVE-219
> URL: https://issues.apache.org/jira/browse/HIVE-219
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: David Phillips
> Assignee: Namit Jain
> Priority: Blocker
> Fix For: 0.2.0
>
>
> Example: SELECT count(1) FROM table;
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-219) Map-side aggregates output one row per
reducer when not grouping
Posted by "David Phillips (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662410#action_12662410 ]
David Phillips commented on HIVE-219:
-------------------------------------
This issue cannot be caught by the current unit tests as they use the local runner. Should we have a test driver like TestCliDriver that uses MiniMRCluster?
> Map-side aggregates output one row per reducer when not grouping
> ----------------------------------------------------------------
>
> Key: HIVE-219
> URL: https://issues.apache.org/jira/browse/HIVE-219
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: David Phillips
> Priority: Critical
>
> Example: SELECT count(1) FROM table;
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-219) Map-side aggregates output one row per
reducer when not grouping
Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Carl Steinbach updated HIVE-219:
--------------------------------
Fix Version/s: (was: 0.6.0)
0.3.0
> Map-side aggregates output one row per reducer when not grouping
> ----------------------------------------------------------------
>
> Key: HIVE-219
> URL: https://issues.apache.org/jira/browse/HIVE-219
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: David Phillips
> Assignee: Namit Jain
> Priority: Blocker
> Fix For: 0.3.0
>
>
> Example: SELECT count(1) FROM table;
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-219) Map-side aggregates output one row per
reducer when not grouping
Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662456#action_12662456 ]
Ashish Thusoo commented on HIVE-219:
------------------------------------
yes we should move to miniMR. The nfs related problem also seem to be related to what we do in MapRedTask in the local mode.
> Map-side aggregates output one row per reducer when not grouping
> ----------------------------------------------------------------
>
> Key: HIVE-219
> URL: https://issues.apache.org/jira/browse/HIVE-219
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: David Phillips
> Priority: Critical
>
> Example: SELECT count(1) FROM table;
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-219) Map-side aggregates output one row per
reducer when not grouping
Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joydeep Sen Sarma updated HIVE-219:
-----------------------------------
Priority: Blocker (was: Critical)
this is absolutely broken.
i am trying count(1) with hive.map.aggr = true - and there is no map side aggregation happening (even though the explain has a map-side group by operator):
Alias -> Map Operator Tree:
mm_users_goodip_count
Select Operator
Group By Operator
aggregations:
expr: count(1)
mode: hash
Reduce Output Operator
sort order:
Map-reduce partition columns:
expr: rand()
type: double
tag: -1
value expressions:
expr: 0
type: bigint
it seems that the groupbyDesc doe not have a 'keys' field specified (in other map side aggregates - i can see the keys specified).
At any rate - the mapper emits one output row for each input row in this case. This is completely broken ..
> Map-side aggregates output one row per reducer when not grouping
> ----------------------------------------------------------------
>
> Key: HIVE-219
> URL: https://issues.apache.org/jira/browse/HIVE-219
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: David Phillips
> Priority: Blocker
>
> Example: SELECT count(1) FROM table;
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-219) Map-side aggregates output one row per
reducer when not grouping
Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662453#action_12662453 ]
Joydeep Sen Sarma commented on HIVE-219:
----------------------------------------
yeah - definitely need to use miniMR - there's one (or more) Jiras on this already ..
> Map-side aggregates output one row per reducer when not grouping
> ----------------------------------------------------------------
>
> Key: HIVE-219
> URL: https://issues.apache.org/jira/browse/HIVE-219
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: David Phillips
> Priority: Critical
>
> Example: SELECT count(1) FROM table;
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-219) Map-side aggregates output one row per
reducer when not grouping
Posted by "David Phillips (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662408#action_12662408 ]
David Phillips commented on HIVE-219:
-------------------------------------
With map-side:
{noformat}
hive> set hive.map.aggr=true;
hive> select count(1) from search_result;
Total MapReduce jobs = 1
Number of reducers = 7
...
OK
4012275
4011646
4011059
4008870
4011555
4014719
4013710
Time taken: 169.683 seconds
{noformat}
Without:
{noformat}
hive> set hive.map.aggr=false;
hive> select count(1) from search_result;
Total MapReduce jobs = 2
Number of reducers = 7
...
OK
28083834
Time taken: 204.804 seconds
{noformat}
> Map-side aggregates output one row per reducer when not grouping
> ----------------------------------------------------------------
>
> Key: HIVE-219
> URL: https://issues.apache.org/jira/browse/HIVE-219
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: David Phillips
> Priority: Critical
>
> Example: SELECT count(1) FROM table;
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.