You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Ari Rabkin (JIRA)" <ji...@apache.org> on 2008/05/23 22:13:55 UTC

[jira] Created: (HADOOP-3441) Pass the size of the MapReduce input to JobInProgress

Pass the size of the MapReduce input to JobInProgress
-----------------------------------------------------

                 Key: HADOOP-3441
                 URL: https://issues.apache.org/jira/browse/HADOOP-3441
             Project: Hadoop Core
          Issue Type: Improvement
          Components: mapred
    Affects Versions: 0.17.0
         Environment: all
            Reporter: Ari Rabkin
            Assignee: Ari Rabkin
            Priority: Minor
             Fix For: 0.18.0
         Attachments: addDataSize.patch

Currently, there's no easy way for the JobInProgress to know how large the job's input data is.

This patch corrects the problem, by storing the size of the input split's data through the RawSplit.  The sizes of each split are then totaled up and made available via JobInProgress.getInputSize().  

This is needed, among other reasons, so that the JobInProgress knows how much data it's being run on, which will help build smarter schedulers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3441) Pass the size of the MapReduce input to JobInProgress

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12599748#action_12599748 ] 

Amar Kamat commented on HADOOP-3441:
------------------------------------

Shouldn't _input-size_ be part of job conf?

> Pass the size of the MapReduce input to JobInProgress
> -----------------------------------------------------
>
>                 Key: HADOOP-3441
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3441
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.17.0
>         Environment: all
>            Reporter: Ari Rabkin
>            Assignee: Ari Rabkin
>            Priority: Minor
>             Fix For: 0.18.0
>
>         Attachments: addDataSize.patch
>
>
> Currently, there's no easy way for the JobInProgress to know how large the job's input data is.
> This patch corrects the problem, by storing the size of the input split's data through the RawSplit.  The sizes of each split are then totaled up and made available via JobInProgress.getInputSize().  
> This is needed, among other reasons, so that the JobInProgress knows how much data it's being run on, which will help build smarter schedulers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3441) Pass the size of the MapReduce input to JobInProgress

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ari Rabkin updated HADOOP-3441:
-------------------------------

    Comment: was deleted

> Pass the size of the MapReduce input to JobInProgress
> -----------------------------------------------------
>
>                 Key: HADOOP-3441
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3441
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.17.0
>         Environment: all
>            Reporter: Ari Rabkin
>            Assignee: Ari Rabkin
>            Priority: Minor
>             Fix For: 0.18.0
>
>         Attachments: addDataSize.patch
>
>
> Currently, there's no easy way for the JobInProgress to know how large the job's input data is.
> This patch corrects the problem, by storing the size of the input split's data through the RawSplit.  The sizes of each split are then totaled up and made available via JobInProgress.getInputSize().  
> This is needed, among other reasons, so that the JobInProgress knows how much data it's being run on, which will help build smarter schedulers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3441) Pass the size of the MapReduce input to JobInProgress

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ari Rabkin updated HADOOP-3441:
-------------------------------

    Status: Open  (was: Patch Available)

no need for patch

> Pass the size of the MapReduce input to JobInProgress
> -----------------------------------------------------
>
>                 Key: HADOOP-3441
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3441
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.17.0
>         Environment: all
>            Reporter: Ari Rabkin
>            Assignee: Ari Rabkin
>            Priority: Minor
>             Fix For: 0.18.0
>
>         Attachments: addDataSize.patch
>
>
> Currently, there's no easy way for the JobInProgress to know how large the job's input data is.
> This patch corrects the problem, by storing the size of the input split's data through the RawSplit.  The sizes of each split are then totaled up and made available via JobInProgress.getInputSize().  
> This is needed, among other reasons, so that the JobInProgress knows how much data it's being run on, which will help build smarter schedulers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3441) Pass the size of the MapReduce input to JobInProgress

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ari Rabkin updated HADOOP-3441:
-------------------------------

    Comment: was deleted

> Pass the size of the MapReduce input to JobInProgress
> -----------------------------------------------------
>
>                 Key: HADOOP-3441
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3441
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.17.0
>         Environment: all
>            Reporter: Ari Rabkin
>            Assignee: Ari Rabkin
>            Priority: Minor
>             Fix For: 0.18.0
>
>         Attachments: addDataSize.patch
>
>
> Currently, there's no easy way for the JobInProgress to know how large the job's input data is.
> This patch corrects the problem, by storing the size of the input split's data through the RawSplit.  The sizes of each split are then totaled up and made available via JobInProgress.getInputSize().  
> This is needed, among other reasons, so that the JobInProgress knows how much data it's being run on, which will help build smarter schedulers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3441) Pass the size of the MapReduce input to JobInProgress

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ari Rabkin updated HADOOP-3441:
-------------------------------

    Status: Patch Available  (was: Open)

> Pass the size of the MapReduce input to JobInProgress
> -----------------------------------------------------
>
>                 Key: HADOOP-3441
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3441
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.17.0
>         Environment: all
>            Reporter: Ari Rabkin
>            Assignee: Ari Rabkin
>            Priority: Minor
>             Fix For: 0.18.0
>
>         Attachments: addDataSize.patch
>
>
> Currently, there's no easy way for the JobInProgress to know how large the job's input data is.
> This patch corrects the problem, by storing the size of the input split's data through the RawSplit.  The sizes of each split are then totaled up and made available via JobInProgress.getInputSize().  
> This is needed, among other reasons, so that the JobInProgress knows how much data it's being run on, which will help build smarter schedulers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HADOOP-3441) Pass the size of the MapReduce input to JobInProgress

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ari Rabkin resolved HADOOP-3441.
--------------------------------

    Resolution: Won't Fix

> Pass the size of the MapReduce input to JobInProgress
> -----------------------------------------------------
>
>                 Key: HADOOP-3441
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3441
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.17.0
>         Environment: all
>            Reporter: Ari Rabkin
>            Assignee: Ari Rabkin
>            Priority: Minor
>             Fix For: 0.18.0
>
>         Attachments: addDataSize.patch
>
>
> Currently, there's no easy way for the JobInProgress to know how large the job's input data is.
> This patch corrects the problem, by storing the size of the input split's data through the RawSplit.  The sizes of each split are then totaled up and made available via JobInProgress.getInputSize().  
> This is needed, among other reasons, so that the JobInProgress knows how much data it's being run on, which will help build smarter schedulers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3441) Pass the size of the MapReduce input to JobInProgress

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12599499#action_12599499 ] 

Doug Cutting commented on HADOOP-3441:
--------------------------------------

 - The field & methods should call it 'length', not 'size' to be consistent with the InputSplit API, which is the source of the data.
 - I'd prefer to see this change included in a patch that makes good use of the data.  Adding features that have possible uses leads to bloat.  So perhaps this should patch be instead bundled with a scheduler implementation that needs this information?

> Pass the size of the MapReduce input to JobInProgress
> -----------------------------------------------------
>
>                 Key: HADOOP-3441
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3441
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.17.0
>         Environment: all
>            Reporter: Ari Rabkin
>            Assignee: Ari Rabkin
>            Priority: Minor
>             Fix For: 0.18.0
>
>         Attachments: addDataSize.patch
>
>
> Currently, there's no easy way for the JobInProgress to know how large the job's input data is.
> This patch corrects the problem, by storing the size of the input split's data through the RawSplit.  The sizes of each split are then totaled up and made available via JobInProgress.getInputSize().  
> This is needed, among other reasons, so that the JobInProgress knows how much data it's being run on, which will help build smarter schedulers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3441) Pass the size of the MapReduce input to JobInProgress

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ari Rabkin updated HADOOP-3441:
-------------------------------

    Attachment: addDataSize.patch

Patch fixing the problem.

> Pass the size of the MapReduce input to JobInProgress
> -----------------------------------------------------
>
>                 Key: HADOOP-3441
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3441
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.17.0
>         Environment: all
>            Reporter: Ari Rabkin
>            Assignee: Ari Rabkin
>            Priority: Minor
>             Fix For: 0.18.0
>
>         Attachments: addDataSize.patch
>
>
> Currently, there's no easy way for the JobInProgress to know how large the job's input data is.
> This patch corrects the problem, by storing the size of the input split's data through the RawSplit.  The sizes of each split are then totaled up and made available via JobInProgress.getInputSize().  
> This is needed, among other reasons, so that the JobInProgress knows how much data it's being run on, which will help build smarter schedulers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.