You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2006/05/01 20:42:46 UTC

[jira] Created: (HADOOP-180) task tracker times out cleaning big job

task tracker times out cleaning big job
---------------------------------------

         Key: HADOOP-180
         URL: http://issues.apache.org/jira/browse/HADOOP-180
     Project: Hadoop
        Type: Bug

  Components: mapred  
    Versions: 0.1.1    
    Reporter: Owen O'Malley
 Assigned to: Owen O'Malley 


After completing a big job (63,920 maps, 1880 reduces, 188 nodes), lots of the TaskTrackers timed out because the task cleanup is handled by the same thread as the heartbeats.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Closed: (HADOOP-180) task tracker times out cleaning big job

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

     [ http://issues.apache.org/jira/browse/HADOOP-180?page=all ]
     
Doug Cutting closed HADOOP-180:
-------------------------------


> task tracker times out cleaning big job
> ---------------------------------------
>
>          Key: HADOOP-180
>          URL: http://issues.apache.org/jira/browse/HADOOP-180
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Versions: 0.1.1
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.3.0
>  Attachments: task-cleanup-thread.patch
>
> After completing a big job (63,920 maps, 1880 reduces, 188 nodes), lots of the TaskTrackers timed out because the task cleanup is handled by the same thread as the heartbeats.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Updated: (HADOOP-180) task tracker times out cleaning big job

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.

     [ http://issues.apache.org/jira/browse/HADOOP-180?page=all ]

Owen O'Malley updated HADOOP-180:
---------------------------------

    Attachment: task-cleanup-thread.patch

This patch fixes the timeouts by creating a synchronized queue (we really should go to java 1.5 soon) of tasks that need to be cleaned up and a daemon thread that does it in the background.

It also fixes some race conditions in the TaskTracker on the tasks variable. (Some references where locking the TaskTracker and others were locking the TaskTracker.TaskInProgress.)

I also changed the rpc logging a little to include both client and server time measurements.

> task tracker times out cleaning big job
> ---------------------------------------
>
>          Key: HADOOP-180
>          URL: http://issues.apache.org/jira/browse/HADOOP-180
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Versions: 0.1.1
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.3
>  Attachments: task-cleanup-thread.patch
>
> After completing a big job (63,920 maps, 1880 reduces, 188 nodes), lots of the TaskTrackers timed out because the task cleanup is handled by the same thread as the heartbeats.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Updated: (HADOOP-180) task tracker times out cleaning big job

Posted by "Sameer Paranjpye (JIRA)" <ji...@apache.org>.

     [ http://issues.apache.org/jira/browse/HADOOP-180?page=all ]

Sameer Paranjpye updated HADOOP-180:
------------------------------------

    Fix Version: 0.3

> task tracker times out cleaning big job
> ---------------------------------------
>
>          Key: HADOOP-180
>          URL: http://issues.apache.org/jira/browse/HADOOP-180
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Versions: 0.1.1
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.3

>
> After completing a big job (63,920 maps, 1880 reduces, 188 nodes), lots of the TaskTrackers timed out because the task cleanup is handled by the same thread as the heartbeats.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Resolved: (HADOOP-180) task tracker times out cleaning big job

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

     [ http://issues.apache.org/jira/browse/HADOOP-180?page=all ]
     
Doug Cutting resolved HADOOP-180:
---------------------------------

    Resolution: Fixed

I just committed this.  Thanks, Owen!

> task tracker times out cleaning big job
> ---------------------------------------
>
>          Key: HADOOP-180
>          URL: http://issues.apache.org/jira/browse/HADOOP-180
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Versions: 0.1.1
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.3
>  Attachments: task-cleanup-thread.patch
>
> After completing a big job (63,920 maps, 1880 reduces, 188 nodes), lots of the TaskTrackers timed out because the task cleanup is handled by the same thread as the heartbeats.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira