You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Runping Qi (JIRA)" <ji...@apache.org> on 2008/03/31 23:22:25 UTC

[jira] Created: (HADOOP-3140) JobTracker should not try to promote a (map) task if it dis not write to DFS at all

JobTracker should not try to promote a (map) task if it dis not write to DFS at all
-----------------------------------------------------------------------------------

                 Key: HADOOP-3140
                 URL: https://issues.apache.org/jira/browse/HADOOP-3140
             Project: Hadoop Core
          Issue Type: Bug
          Components: mapred
            Reporter: Runping Qi



In most cases, map tasks do not write to dfs.
Thus, when they complete, they should not be put into commit_pending queue at all.
This will improve the task promotion significantly.

 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3140) JobTracker should not try to promote a (map) task if it does not write to DFS at all

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12585273#action_12585273 ] 

Devaraj Das commented on HADOOP-3140:
-------------------------------------

Dhruba, is that a documented exception. I didn't see it in the FileSystem.getContentSummary API doc. So if it is not documented is it advisable to bank client code on the exception? For e.g., what if getContentSummary, later on, returns null for non existent paths? So, unless FileSystem provides a guarantee that an exception will be thrown for non-existent paths, i'd like to go in the lines of what Amar mentioned in the code snippet. Thoughts?

> JobTracker should not try to promote a (map) task if it does not write to DFS at all
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3140
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>             Fix For: 0.17.0
>
>         Attachments: HADOOP-3140-v1.patch, HADOOP-3140-v2.patch
>
>
> In most cases, map tasks do not write to dfs.
> Thus, when they complete, they should not be put into commit_pending queue at all.
> This will improve the task promotion significantly.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3140) JobTracker should not try to promote a (map) task if it does not write to DFS at all

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated HADOOP-3140:
-------------------------------

    Attachment: HADOOP-3140-v2.patch

> JobTracker should not try to promote a (map) task if it does not write to DFS at all
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3140
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>             Fix For: 0.17.0
>
>         Attachments: HADOOP-3140-v1.patch, HADOOP-3140-v2.patch
>
>
> In most cases, map tasks do not write to dfs.
> Thus, when they complete, they should not be put into commit_pending queue at all.
> This will improve the task promotion significantly.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3140) JobTracker should not try to promote a (map) task if it does not write to DFS at all

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12585363#action_12585363 ] 

Hadoop QA commented on HADOOP-3140:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12379270/HADOOP-3140-v2.patch
against trunk revision 643282.

    @author +1.  The patch does not contain any @author tags.

    tests included -1.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new javac compiler warnings.

    release audit +1.  The applied patch does not generate any new release audit warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2148/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2148/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2148/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2148/console

This message is automatically generated.

> JobTracker should not try to promote a (map) task if it does not write to DFS at all
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3140
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>             Fix For: 0.17.0
>
>         Attachments: HADOOP-3140-v1.patch, HADOOP-3140-v2.patch
>
>
> In most cases, map tasks do not write to dfs.
> Thus, when they complete, they should not be put into commit_pending queue at all.
> This will improve the task promotion significantly.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3140) JobTracker should not try to promote a (map) task if it does not write to DFS at all

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584603#action_12584603 ] 

Arun C Murthy commented on HADOOP-3140:
---------------------------------------

Looks good, couple of comments:

1. I'm a little bothered by 
{noformat}
+    // If the TIP is already completed and the task reports as SUCCEEDED then 
+    // mark the task as KILLED.
+    // In case of task with no promotion the task tracker will mark the task 
+    // as SUCCEEDED.
+    if (wasComplete && (status.getRunState() == TaskStatus.State.SUCCEEDED)) {
+      status.setRunState(TaskStatus.State.KILLED);
+    }
     boolean change = tip.updateStatus(status);
     if (change) {
       TaskStatus.State state = status.getRunState();
{noformat}
Normally I'd expect the first check inside the 'if (change)' to make sure the same status isn't being processed twice, and wrongly manipulates the state of the TIP - I'm happy if you can confirm that this works... just being careful.

2. Please bump up TaskUmbilicalProtocol's version number.

> JobTracker should not try to promote a (map) task if it does not write to DFS at all
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3140
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>             Fix For: 0.17.0
>
>         Attachments: HADOOP-3140-v1.patch
>
>
> In most cases, map tasks do not write to dfs.
> Thus, when they complete, they should not be put into commit_pending queue at all.
> This will improve the task promotion significantly.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3140) JobTracker should not try to promote a (map) task if it does not write to DFS at all

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated HADOOP-3140:
-------------------------------

    Status: Open  (was: Patch Available)

> JobTracker should not try to promote a (map) task if it does not write to DFS at all
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3140
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>             Fix For: 0.17.0
>
>         Attachments: HADOOP-3140-v1.patch, HADOOP-3140-v2.patch, HADOOP-3140-v3.patch
>
>
> In most cases, map tasks do not write to dfs.
> Thus, when they complete, they should not be put into commit_pending queue at all.
> This will improve the task promotion significantly.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Work started: (HADOOP-3140) JobTracker should not try to promote a (map) task if it does not write to DFS at all

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on HADOOP-3140 started by Amar Kamat.

> JobTracker should not try to promote a (map) task if it does not write to DFS at all
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3140
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>
> In most cases, map tasks do not write to dfs.
> Thus, when they complete, they should not be put into commit_pending queue at all.
> This will improve the task promotion significantly.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3140) JobTracker should not try to promote a (map) task if it does not write to DFS at all

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated HADOOP-3140:
-------------------------------

    Fix Version/s: 0.17.0

> JobTracker should not try to promote a (map) task if it does not write to DFS at all
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3140
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>             Fix For: 0.17.0
>
>         Attachments: HADOOP-3140-v1.patch
>
>
> In most cases, map tasks do not write to dfs.
> Thus, when they complete, they should not be put into commit_pending queue at all.
> This will improve the task promotion significantly.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3140) JobTracker should not try to promote a (map) task if it does not write to DFS at all

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated HADOOP-3140:
--------------------------------

      Resolution: Fixed
    Release Note: Tasks that don't generate any output are not inserted in the commit queue of the JobTracker. They are marked as SUCCESSFUL by the TaskTracker and the JobTracker updates their state short-circuiting the commit queue.
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

I just committed this. Thanks, Amar!

> JobTracker should not try to promote a (map) task if it does not write to DFS at all
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3140
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>             Fix For: 0.17.0
>
>         Attachments: HADOOP-3140-v1.patch, HADOOP-3140-v2.patch, HADOOP-3140-v3.patch, HADOOP-3140-v3.patch
>
>
> In most cases, map tasks do not write to dfs.
> Thus, when they complete, they should not be put into commit_pending queue at all.
> This will improve the task promotion significantly.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3140) JobTracker should not try to promote a (map) task if it does not write to DFS at all

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated HADOOP-3140:
-------------------------------

    Attachment: HADOOP-3140-v1.patch

> JobTracker should not try to promote a (map) task if it does not write to DFS at all
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3140
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>         Attachments: HADOOP-3140-v1.patch
>
>
> In most cases, map tasks do not write to dfs.
> Thus, when they complete, they should not be put into commit_pending queue at all.
> This will improve the task promotion significantly.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3140) JobTracker should not try to promote a (map) task if it does not write to DFS at all

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584430#action_12584430 ] 

Amar Kamat commented on HADOOP-3140:
------------------------------------

bq. In addition, we should discard outputs of failed tasks in TaskTracker.Child.main
Reiterating #4 from my earlier comment. Here we might ignore the failed/killed tasks and never call discard. It will be taken care once the job completes. This is a simple approach. Another approach is to have a scavenger thread that will periodically do this cleanup business *offline*.

> JobTracker should not try to promote a (map) task if it does not write to DFS at all
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3140
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>
> In most cases, map tasks do not write to dfs.
> Thus, when they complete, they should not be put into commit_pending queue at all.
> This will improve the task promotion significantly.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3140) JobTracker should not try to promote a (map) task if it does not write to DFS at all

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584427#action_12584427 ] 

Devaraj Das commented on HADOOP-3140:
-------------------------------------

We actually don't need to discard output (at the cost of creating some temp garbage on the dfs). The jobtracker deletes the temp dir for the job at the end of the job (HADOOP-2391). That way we will save a bunch of namenode RPCs.

> JobTracker should not try to promote a (map) task if it does not write to DFS at all
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3140
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>
> In most cases, map tasks do not write to dfs.
> Thus, when they complete, they should not be put into commit_pending queue at all.
> This will improve the task promotion significantly.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3140) JobTracker should not try to promote a (map) task if it does not write to DFS at all

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12585506#action_12585506 ] 

Devaraj Das commented on HADOOP-3140:
-------------------------------------

+1

> JobTracker should not try to promote a (map) task if it does not write to DFS at all
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3140
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>             Fix For: 0.17.0
>
>         Attachments: HADOOP-3140-v1.patch, HADOOP-3140-v2.patch, HADOOP-3140-v3.patch, HADOOP-3140-v3.patch
>
>
> In most cases, map tasks do not write to dfs.
> Thus, when they complete, they should not be put into commit_pending queue at all.
> This will improve the task promotion significantly.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3140) JobTracker should not try to promote a (map) task if it does not write to DFS at all

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12585275#action_12585275 ] 

Raghu Angadi commented on HADOOP-3140:
--------------------------------------

> So, unless FileSystem provides a guarantee that an exception will be thrown for non-existent paths, i'd like to go in the lines of what Amar mentioned in the code snippet. Thoughts?
Then, should the code handle summary being null? (exists() is previous line does not mean it exists during next line).


> JobTracker should not try to promote a (map) task if it does not write to DFS at all
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3140
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>             Fix For: 0.17.0
>
>         Attachments: HADOOP-3140-v1.patch, HADOOP-3140-v2.patch
>
>
> In most cases, map tasks do not write to dfs.
> Thus, when they complete, they should not be put into commit_pending queue at all.
> This will improve the task promotion significantly.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-3140) JobTracker should not try to promote a (map) task if it does not write to DFS at all

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat reassigned HADOOP-3140:
----------------------------------

    Assignee: Amar Kamat

> JobTracker should not try to promote a (map) task if it does not write to DFS at all
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3140
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>
> In most cases, map tasks do not write to dfs.
> Thus, when they complete, they should not be put into commit_pending queue at all.
> This will improve the task promotion significantly.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3140) JobTracker should not try to promote a (map) task if it does not write to DFS at all

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-3140:
----------------------------------

    Description: 
In most cases, map tasks do not write to dfs.
Thus, when they complete, they should not be put into commit_pending queue at all.
This will improve the task promotion significantly.

 

  was:

In most cases, map tasks do not write to dfs.
Thus, when they complete, they should not be put into commit_pending queue at all.
This will improve the task promotion significantly.

 

        Summary: JobTracker should not try to promote a (map) task if it does not write to DFS at all  (was: JobTracker should not try to promote a (map) task if it dis not write to DFS at all)

I think that the tasks should include a boolean in the done message to the task tracker that says if they have output to promote. (And it should delete everything in the case of failure, locally.) This is just an optimization. The framework (TaskTracker.Child.main) would look in the work output directory and set true if there is anything to promote. The TT would then set the state to commit-pending or success according to the flag value.

> JobTracker should not try to promote a (map) task if it does not write to DFS at all
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3140
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>
> In most cases, map tasks do not write to dfs.
> Thus, when they complete, they should not be put into commit_pending queue at all.
> This will improve the task promotion significantly.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3140) JobTracker should not try to promote a (map) task if it dis not write to DFS at all

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583888#action_12583888 ] 

Arun C Murthy commented on HADOOP-3140:
---------------------------------------

I agree, in principle.

However, there is currently no way to check if the maps wrote side-files to HDFS, in which case we either need a new api for tasks (or jobs) to tell whether they are writing side-files and hence they need promotion or worse, we need to look into the _${taskid} directories and try and guess. Both seem unsatisfactory ...

> JobTracker should not try to promote a (map) task if it dis not write to DFS at all
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-3140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3140
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>
> In most cases, map tasks do not write to dfs.
> Thus, when they complete, they should not be put into commit_pending queue at all.
> This will improve the task promotion significantly.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3140) JobTracker should not try to promote a (map) task if it does not write to DFS at all

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12585398#action_12585398 ] 

Owen O'Malley commented on HADOOP-3140:
---------------------------------------

I'm very strongly against using exceptions as part of the nominal flow of the program.

I much prefer the exists check.

> JobTracker should not try to promote a (map) task if it does not write to DFS at all
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3140
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>             Fix For: 0.17.0
>
>         Attachments: HADOOP-3140-v1.patch, HADOOP-3140-v2.patch
>
>
> In most cases, map tasks do not write to dfs.
> Thus, when they complete, they should not be put into commit_pending queue at all.
> This will improve the task promotion significantly.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3140) JobTracker should not try to promote a (map) task if it does not write to DFS at all

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12585199#action_12585199 ] 

dhruba borthakur commented on HADOOP-3140:
------------------------------------------

Like Amar mentioned, it would be nice if we can eliminate the call to fs.exists() in the previous code snippet, especially if this code snippet is executed frequently. fs.getContentSummary() probably throws an exception if the file does not exists.

> JobTracker should not try to promote a (map) task if it does not write to DFS at all
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3140
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>             Fix For: 0.17.0
>
>         Attachments: HADOOP-3140-v1.patch, HADOOP-3140-v2.patch
>
>
> In most cases, map tasks do not write to dfs.
> Thus, when they complete, they should not be put into commit_pending queue at all.
> This will improve the task promotion significantly.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3140) JobTracker should not try to promote a (map) task if it does not write to DFS at all

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584433#action_12584433 ] 

Amar Kamat commented on HADOOP-3140:
------------------------------------

But for now leaving the garbage as it is and reclaiming it once the job finishes seems to be a simple/better solution.

> JobTracker should not try to promote a (map) task if it does not write to DFS at all
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3140
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>
> In most cases, map tasks do not write to dfs.
> Thus, when they complete, they should not be put into commit_pending queue at all.
> This will improve the task promotion significantly.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3140) JobTracker should not try to promote a (map) task if it does not write to DFS at all

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12585060#action_12585060 ] 

Amar Kamat commented on HADOOP-3140:
------------------------------------

Looks like we can optimize it further. For checking whether the task output dir is empty or not we can do the following
{code}
if (taskOutputPath != null) {
            // Get the file-system for the task output directory
            FileSystem fs = taskOutputPath.getFileSystem(conf);
            // Check if it exists
            if (fs.exists(taskOutputPath)) {
              // Get the summary for the folder
              ContentSummary summary = fs.getContentSummary(taskOutputPath);
              // Check if the directory contains some data
              // i.e total-files + total-folders - 1(itself)
              if ((summary.getFileCount() + summary.getDirectoryCount() - 1)  >  0) {
                shouldBePromoted = true;
              }
            }
          }
{code}
I have tested {{fs.getContentSummary()}} via the DFSClient and it works as expected. Comments?

> JobTracker should not try to promote a (map) task if it does not write to DFS at all
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3140
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>             Fix For: 0.17.0
>
>         Attachments: HADOOP-3140-v1.patch
>
>
> In most cases, map tasks do not write to dfs.
> Thus, when they complete, they should not be put into commit_pending queue at all.
> This will improve the task promotion significantly.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3140) JobTracker should not try to promote a (map) task if it does not write to DFS at all

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated HADOOP-3140:
-------------------------------

    Attachment: HADOOP-3140-v3.patch

One unnecessary import statement slipped in. This patch just removes that.

> JobTracker should not try to promote a (map) task if it does not write to DFS at all
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3140
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>             Fix For: 0.17.0
>
>         Attachments: HADOOP-3140-v1.patch, HADOOP-3140-v2.patch, HADOOP-3140-v3.patch, HADOOP-3140-v3.patch
>
>
> In most cases, map tasks do not write to dfs.
> Thus, when they complete, they should not be put into commit_pending queue at all.
> This will improve the task promotion significantly.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3140) JobTracker should not try to promote a (map) task if it does not write to DFS at all

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584711#action_12584711 ] 

Amar Kamat commented on HADOOP-3140:
------------------------------------

Arun, Two things
1) If the status is replayed by the TaskTracker, the JobTracker will take care of that. The {{JobTracker.heartbeat()}} will simply discard it there and then.
2) If at all the status gets replayed (in {{JobInProgress.updateTaskStatus()}}) it will be taken care as follows
a) task t comes in as {{SUCCEEDED}} for a tip that is already completed.
b) It will be marked (locally) as {{KILLED}} and the tasks status will be updated in the JT.
c) If at all the status is resent, it will be marked locally as {{KILLED}}. Now the *change* in the status will result in as _false_ and nothing will happen.
The reason for marking the task as {{KILLED}} (locally) is to make sure that the semantics of the trunk is retained. If  the state is updated first and later marked as {{KILLED}} then the task status will be temporarily marked as {{SUCCEEDED}}. 

> JobTracker should not try to promote a (map) task if it does not write to DFS at all
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3140
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>             Fix For: 0.17.0
>
>         Attachments: HADOOP-3140-v1.patch
>
>
> In most cases, map tasks do not write to dfs.
> Thus, when they complete, they should not be put into commit_pending queue at all.
> This will improve the task promotion significantly.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3140) JobTracker should not try to promote a (map) task if it does not write to DFS at all

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated HADOOP-3140:
-------------------------------

    Status: Patch Available  (was: Open)

> JobTracker should not try to promote a (map) task if it does not write to DFS at all
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3140
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>             Fix For: 0.17.0
>
>         Attachments: HADOOP-3140-v1.patch, HADOOP-3140-v2.patch, HADOOP-3140-v3.patch
>
>
> In most cases, map tasks do not write to dfs.
> Thus, when they complete, they should not be put into commit_pending queue at all.
> This will improve the task promotion significantly.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3140) JobTracker should not try to promote a (map) task if it does not write to DFS at all

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584436#action_12584436 ] 

Arun C Murthy commented on HADOOP-3140:
---------------------------------------

Right, I missed that...

> JobTracker should not try to promote a (map) task if it does not write to DFS at all
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3140
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>
> In most cases, map tasks do not write to dfs.
> Thus, when they complete, they should not be put into commit_pending queue at all.
> This will improve the task promotion significantly.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3140) JobTracker should not try to promote a (map) task if it does not write to DFS at all

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584419#action_12584419 ] 

Arun C Murthy commented on HADOOP-3140:
---------------------------------------

{quote}
1) Task.done() method checks if the task has data to be promoted and passes this info to the TaskTracker via the TaskTracker.done() api.
2) If there is no data to promote, the TaskTracker sets the task status as SUCCEEDED or FAILED depending on whether the task succeeds or fails.
{quote}

+1

In addition, we should discard outputs of failed tasks in TaskTracker.Child.main if feasible in the 'finally' clause in TaskTracker.Child.main. Then we could just set the status to 'FAILED/KILLED' and relieve of the need to discard outputs in a lot of cases. We could go further and do the same in the TT too to ensure that the JT only needs to promote outputs of successful tasks... clearly it needs some careful thought.



> JobTracker should not try to promote a (map) task if it does not write to DFS at all
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3140
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>
> In most cases, map tasks do not write to dfs.
> Thus, when they complete, they should not be put into commit_pending queue at all.
> This will improve the task promotion significantly.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3140) JobTracker should not try to promote a (map) task if it does not write to DFS at all

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584183#action_12584183 ] 

Amar Kamat commented on HADOOP-3140:
------------------------------------

How about this
1) {{Task.done()}} method checks if the task has data to be promoted and passes this info to the TaskTracker via the {{TaskTracker.done()}} api.
2) If there is no data to promote, the TaskTracker sets the task status as {{SUCCEEDED}} or {{FAILED}} depending on whether the task succeeds or fails.
3) JobInProgress adds only {{COMMIT_PENDING}} tasks to the commit-pending queue. The commit-pending queue deals with {{KILLED/FAILED}} tasks only if the commit-pending thread fails to save the task output or if the TaskTracker is lost.
4) Temporary data from {{FAILED/KILLED}} tasks will be deleted once the job completes (see HADOOP-2391).
5) {{JobInProgress.updateTaskStatus()}} can now be called with {{SUCCEEDED}} state from TaskTracker (via heartbeat) or from the commit-pending queue.
5) If a JobInProgress.updateTaskStatus() is called with {{SUCCEEDED}} state for a completed TIP it will be marked as {{KILLED}}.


> JobTracker should not try to promote a (map) task if it does not write to DFS at all
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3140
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>
> In most cases, map tasks do not write to dfs.
> Thus, when they complete, they should not be put into commit_pending queue at all.
> This will improve the task promotion significantly.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3140) JobTracker should not try to promote a (map) task if it does not write to DFS at all

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12585607#action_12585607 ] 

Hadoop QA commented on HADOOP-3140:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12379382/HADOOP-3140-v3.patch
against trunk revision 643282.

    @author +1.  The patch does not contain any @author tags.

    tests included -1.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new javac compiler warnings.

    release audit +1.  The applied patch does not generate any new release audit warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2162/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2162/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2162/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2162/console

This message is automatically generated.

> JobTracker should not try to promote a (map) task if it does not write to DFS at all
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3140
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>             Fix For: 0.17.0
>
>         Attachments: HADOOP-3140-v1.patch, HADOOP-3140-v2.patch, HADOOP-3140-v3.patch, HADOOP-3140-v3.patch
>
>
> In most cases, map tasks do not write to dfs.
> Thus, when they complete, they should not be put into commit_pending queue at all.
> This will improve the task promotion significantly.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3140) JobTracker should not try to promote a (map) task if it does not write to DFS at all

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated HADOOP-3140:
-------------------------------

    Attachment: HADOOP-3140-v3.patch

Attaching a patch with following changes
1) _Not null_ check for summary
2) In case of exception making the promotion necessary.


> JobTracker should not try to promote a (map) task if it does not write to DFS at all
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3140
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>             Fix For: 0.17.0
>
>         Attachments: HADOOP-3140-v1.patch, HADOOP-3140-v2.patch, HADOOP-3140-v3.patch
>
>
> In most cases, map tasks do not write to dfs.
> Thus, when they complete, they should not be put into commit_pending queue at all.
> This will improve the task promotion significantly.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3140) JobTracker should not try to promote a (map) task if it does not write to DFS at all

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated HADOOP-3140:
-------------------------------

    Status: Patch Available  (was: In Progress)

> JobTracker should not try to promote a (map) task if it does not write to DFS at all
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3140
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>             Fix For: 0.17.0
>
>         Attachments: HADOOP-3140-v1.patch, HADOOP-3140-v2.patch
>
>
> In most cases, map tasks do not write to dfs.
> Thus, when they complete, they should not be put into commit_pending queue at all.
> This will improve the task promotion significantly.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3140) JobTracker should not try to promote a (map) task if it does not write to DFS at all

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12585966#action_12585966 ] 

Hudson commented on HADOOP-3140:
--------------------------------

Integrated in Hadoop-trunk #451 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/451/])

> JobTracker should not try to promote a (map) task if it does not write to DFS at all
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3140
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>             Fix For: 0.17.0
>
>         Attachments: HADOOP-3140-v1.patch, HADOOP-3140-v2.patch, HADOOP-3140-v3.patch, HADOOP-3140-v3.patch
>
>
> In most cases, map tasks do not write to dfs.
> Thus, when they complete, they should not be put into commit_pending queue at all.
> This will improve the task promotion significantly.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.