You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "David Bowen (JIRA)" <ji...@apache.org> on 2007/03/22 19:25:32 UTC

[jira] Created: (HADOOP-1146) "Reduce input records" counter name is misleading

"Reduce input records" counter name is misleading
-------------------------------------------------

                 Key: HADOOP-1146
                 URL: https://issues.apache.org/jira/browse/HADOOP-1146
             Project: Hadoop
          Issue Type: Bug
            Reporter: David Bowen
         Assigned To: David Bowen


It has been pointed out that the counter name "reduce input records" is misleading; this number should be called "reduce input keys" or "reduce input groups".  It could also be useful to have the actual number of reduce input records, which should be the same as the number of map output records.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1146) "Reduce input records" counter name is misleading

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12484220 ] 

Hadoop QA commented on HADOOP-1146:
-----------------------------------

+1, because http://issues.apache.org/jira/secure/attachment/12354133/1146.patch applied and successfully tested against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/522597. Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch

> "Reduce input records" counter name is misleading
> -------------------------------------------------
>
>                 Key: HADOOP-1146
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1146
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: David Bowen
>         Assigned To: David Bowen
>         Attachments: 1146.patch
>
>
> It has been pointed out that the counter name "reduce input records" is misleading; this number should be called "reduce input keys" or "reduce input groups".  It could also be useful to have the actual number of reduce input records, which should be the same as the number of map output records.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1146) "Reduce input records" counter name is misleading

Posted by "David Bowen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Bowen updated HADOOP-1146:
--------------------------------

    Attachment: 1146.patch


This patch:

   1. Renames the counter Reduce Input Records to Reduce Input Groups since that what it counts.

   2. Adds a new counter called Reduce Input Records that does count the records.

   3. Then when testing on Wordcount, I noticed that Map Output Records and Reduce Input Records were not the same because of the use of a Combiner.  So I added two new counters to show this: Combine Input Records and Combine Output Records.

I'm not sure if we really need these Combine Input/Output record counters.  At the end of the job, they should be the same as Map Output Records and Reduce Input Records respectively, but they are possibly interesting to watch as the job proceeds.

Comments welcome.


> "Reduce input records" counter name is misleading
> -------------------------------------------------
>
>                 Key: HADOOP-1146
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1146
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: David Bowen
>         Assigned To: David Bowen
>         Attachments: 1146.patch
>
>
> It has been pointed out that the counter name "reduce input records" is misleading; this number should be called "reduce input keys" or "reduce input groups".  It could also be useful to have the actual number of reduce input records, which should be the same as the number of map output records.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1146) "Reduce input records" counter name is misleading

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12485592 ] 

Hadoop QA commented on HADOOP-1146:
-----------------------------------

Integrated in Hadoop-Nightly #42 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/42/)

> "Reduce input records" counter name is misleading
> -------------------------------------------------
>
>                 Key: HADOOP-1146
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1146
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: David Bowen
>         Assigned To: David Bowen
>             Fix For: 0.13.0
>
>         Attachments: 1146.patch
>
>
> It has been pointed out that the counter name "reduce input records" is misleading; this number should be called "reduce input keys" or "reduce input groups".  It could also be useful to have the actual number of reduce input records, which should be the same as the number of map output records.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1146) "Reduce input records" counter name is misleading

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-1146:
---------------------------------

       Resolution: Fixed
    Fix Version/s: 0.13.0
           Status: Resolved  (was: Patch Available)

I just committed  this.  Thanks, David.

> "Reduce input records" counter name is misleading
> -------------------------------------------------
>
>                 Key: HADOOP-1146
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1146
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: David Bowen
>         Assigned To: David Bowen
>             Fix For: 0.13.0
>
>         Attachments: 1146.patch
>
>
> It has been pointed out that the counter name "reduce input records" is misleading; this number should be called "reduce input keys" or "reduce input groups".  It could also be useful to have the actual number of reduce input records, which should be the same as the number of map output records.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1146) "Reduce input records" counter name is misleading

Posted by "David Bowen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Bowen updated HADOOP-1146:
--------------------------------

    Status: Patch Available  (was: Open)

> "Reduce input records" counter name is misleading
> -------------------------------------------------
>
>                 Key: HADOOP-1146
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1146
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: David Bowen
>         Assigned To: David Bowen
>         Attachments: 1146.patch
>
>
> It has been pointed out that the counter name "reduce input records" is misleading; this number should be called "reduce input keys" or "reduce input groups".  It could also be useful to have the actual number of reduce input records, which should be the same as the number of map output records.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.