You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "David Phillips (JIRA)" <ji...@apache.org> on 2009/01/09 17:02:01 UTC

[jira] Created: (HIVE-219) Map-side aggregates output one row per reducer when not grouping

Map-side aggregates output one row per reducer when not grouping
----------------------------------------------------------------

                 Key: HIVE-219
                 URL: https://issues.apache.org/jira/browse/HIVE-219
             Project: Hadoop Hive
          Issue Type: Bug
          Components: Query Processor
            Reporter: David Phillips
            Priority: Critical


Example: SELECT count(1) FROM table;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HIVE-219) Map-side aggregates output one row per reducer when not grouping

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain resolved HIVE-219.
-----------------------------

       Resolution: Duplicate
    Fix Version/s: 0.2.0
         Assignee: Namit Jain

marking duplicate of 256

> Map-side aggregates output one row per reducer when not grouping
> ----------------------------------------------------------------
>
>                 Key: HIVE-219
>                 URL: https://issues.apache.org/jira/browse/HIVE-219
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: David Phillips
>            Assignee: Namit Jain
>            Priority: Blocker
>             Fix For: 0.2.0
>
>
> Example: SELECT count(1) FROM table;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-219) Map-side aggregates output one row per reducer when not grouping

Posted by "David Phillips (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662410#action_12662410 ] 

David Phillips commented on HIVE-219:
-------------------------------------

This issue cannot be caught by the current unit tests as they use the local runner.  Should we have a test driver like TestCliDriver that uses MiniMRCluster?

> Map-side aggregates output one row per reducer when not grouping
> ----------------------------------------------------------------
>
>                 Key: HIVE-219
>                 URL: https://issues.apache.org/jira/browse/HIVE-219
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: David Phillips
>            Priority: Critical
>
> Example: SELECT count(1) FROM table;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-219) Map-side aggregates output one row per reducer when not grouping

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-219:
--------------------------------

    Fix Version/s:     (was: 0.6.0)
                   0.3.0

> Map-side aggregates output one row per reducer when not grouping
> ----------------------------------------------------------------
>
>                 Key: HIVE-219
>                 URL: https://issues.apache.org/jira/browse/HIVE-219
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: David Phillips
>            Assignee: Namit Jain
>            Priority: Blocker
>             Fix For: 0.3.0
>
>
> Example: SELECT count(1) FROM table;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-219) Map-side aggregates output one row per reducer when not grouping

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662456#action_12662456 ] 

Ashish Thusoo commented on HIVE-219:
------------------------------------

yes we should move to miniMR. The nfs related problem also seem to be related to what we do in MapRedTask in the local mode.

> Map-side aggregates output one row per reducer when not grouping
> ----------------------------------------------------------------
>
>                 Key: HIVE-219
>                 URL: https://issues.apache.org/jira/browse/HIVE-219
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: David Phillips
>            Priority: Critical
>
> Example: SELECT count(1) FROM table;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-219) Map-side aggregates output one row per reducer when not grouping

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma updated HIVE-219:
-----------------------------------

    Priority: Blocker  (was: Critical)

this is absolutely broken.

i am trying count(1) with hive.map.aggr = true - and there is no map side aggregation happening (even though the explain has a map-side group by operator):

      Alias -> Map Operator Tree:
        mm_users_goodip_count
            Select Operator
              Group By Operator
                aggregations:
                      expr: count(1)
                mode: hash
                Reduce Output Operator
                  sort order:
                  Map-reduce partition columns:
                        expr: rand()
                        type: double
                  tag: -1
                  value expressions:
                        expr: 0
                        type: bigint

it seems that the groupbyDesc doe not have a 'keys' field specified (in other map side aggregates - i can see the keys specified). 

At any rate - the mapper emits one output row for each input row in this case. This is completely broken ..

> Map-side aggregates output one row per reducer when not grouping
> ----------------------------------------------------------------
>
>                 Key: HIVE-219
>                 URL: https://issues.apache.org/jira/browse/HIVE-219
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: David Phillips
>            Priority: Blocker
>
> Example: SELECT count(1) FROM table;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-219) Map-side aggregates output one row per reducer when not grouping

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662453#action_12662453 ] 

Joydeep Sen Sarma commented on HIVE-219:
----------------------------------------

yeah - definitely need to use miniMR - there's one (or more) Jiras on this already .. 

> Map-side aggregates output one row per reducer when not grouping
> ----------------------------------------------------------------
>
>                 Key: HIVE-219
>                 URL: https://issues.apache.org/jira/browse/HIVE-219
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: David Phillips
>            Priority: Critical
>
> Example: SELECT count(1) FROM table;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-219) Map-side aggregates output one row per reducer when not grouping

Posted by "David Phillips (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662408#action_12662408 ] 

David Phillips commented on HIVE-219:
-------------------------------------

With map-side:

{noformat}
hive> set hive.map.aggr=true;
hive> select count(1) from search_result;
Total MapReduce jobs = 1
Number of reducers = 7
...
OK
4012275
4011646
4011059
4008870
4011555
4014719
4013710
Time taken: 169.683 seconds
{noformat}

Without:

{noformat}
hive> set hive.map.aggr=false;
hive> select count(1) from search_result;
Total MapReduce jobs = 2
Number of reducers = 7
...
OK
28083834
Time taken: 204.804 seconds
{noformat}

> Map-side aggregates output one row per reducer when not grouping
> ----------------------------------------------------------------
>
>                 Key: HIVE-219
>                 URL: https://issues.apache.org/jira/browse/HIVE-219
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: David Phillips
>            Priority: Critical
>
> Example: SELECT count(1) FROM table;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.