You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2006/04/21 18:05:05 UTC

[jira] Created: (HADOOP-157) job fails because pendingCreates is not cleaned up after a task fails

job fails because pendingCreates is not cleaned up after a task fails
---------------------------------------------------------------------

         Key: HADOOP-157
         URL: http://issues.apache.org/jira/browse/HADOOP-157
     Project: Hadoop
        Type: Bug

  Components: dfs  
    Versions: 0.1.1    
    Reporter: Owen O'Malley
 Assigned to: Owen O'Malley 
     Fix For: 0.2


When a task fails under map/reduce, if the client doesn't abandon the files in progress (usually because it was killed), the lease on the name node lasts 1 minute. During that minute, I see 3 backup copies of the task fail because pendingCreates is non-null.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Resolved: (HADOOP-157) job fails because pendingCreates is not cleaned up after a task fails

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-157?page=all ]
     
Doug Cutting resolved HADOOP-157:
---------------------------------

    Resolution: Fixed

I just committed this.  Thanks, Owen!

> job fails because pendingCreates is not cleaned up after a task fails
> ---------------------------------------------------------------------
>
>          Key: HADOOP-157
>          URL: http://issues.apache.org/jira/browse/HADOOP-157
>      Project: Hadoop
>         Type: Bug

>   Components: dfs
>     Versions: 0.1.1
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.2
>  Attachments: pending-creates-wait.patch
>
> When a task fails under map/reduce, if the client doesn't abandon the files in progress (usually because it was killed), the lease on the name node lasts 1 minute. During that minute, I see 3 backup copies of the task fail because pendingCreates is non-null.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-157) job fails because pendingCreates is not cleaned up after a task fails

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-157?page=all ]

Owen O'Malley updated HADOOP-157:
---------------------------------

    Attachment: pending-creates-wait.patch

This patch improves the failures reporting.

1. I created org.apache.hadoop.ipc.RemoteException class that includes the class name of the exception that was the cause.
2. The ipc client throws this RemoteException rather than java.rmi.RemoteException.
3. The DFSClient.create waits and retries if the file is already being created.
4. Killed tasks do not complain when they have non-zero exit codes from their process.
5. Improved the error message when tasks are killed for not updating their progress.
6. Dfs' ClientProtocol.addBlock now takes the client name rather than the client machine.
7. Problems renewing dfs leases are now logged.
8. More details in the exception messages when DfsClient.create fails.
9. addBlock now checks to make sure it is the same client that owns the lease who is adding to the file.
10. FileUnderConstruction now records who is creating the file.
11. Some new exception classes defined for problems that DFSClient wants to catch

> job fails because pendingCreates is not cleaned up after a task fails
> ---------------------------------------------------------------------
>
>          Key: HADOOP-157
>          URL: http://issues.apache.org/jira/browse/HADOOP-157
>      Project: Hadoop
>         Type: Bug

>   Components: dfs
>     Versions: 0.1.1
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.2
>  Attachments: pending-creates-wait.patch
>
> When a task fails under map/reduce, if the client doesn't abandon the files in progress (usually because it was killed), the lease on the name node lasts 1 minute. During that minute, I see 3 backup copies of the task fail because pendingCreates is non-null.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira