You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Qinghe Jin (JIRA)" <ji...@apache.org> on 2012/10/24 05:06:14 UTC

[jira] [Commented] (MESOS-290) Jobtracker can't get TaskTrackerInfo when the JobTracker log file is deleted

    [ https://issues.apache.org/jira/browse/MESOS-290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482923#comment-13482923 ] 

Qinghe Jin commented on MESOS-290:
----------------------------------

This is caused by mesos-slave startup failure. When assigning tasks, the jobtracker need TaskTrackerInfo, but if the mesos-slave on the "TaskTrack node" fails to startup, the jobtracker will report exceptions time by time which at the same time write jobtracker logfiles. This can be avoided by checking the mesos-slave's health before assign tasks to slave node.
                
> Jobtracker can't get TaskTrackerInfo when the JobTracker log file is deleted
> ----------------------------------------------------------------------------
>
>                 Key: MESOS-290
>                 URL: https://issues.apache.org/jira/browse/MESOS-290
>             Project: Mesos
>          Issue Type: Bug
>          Components: framework, java-api
>    Affects Versions: 0.9.0
>         Environment: SUSE Linux Enterprise Server 11
>            Reporter: Qinghe Jin
>            Priority: Blocker
>
> For some reason, the JobTracker log file is expanding over 20G and running out of my disk partion. I delete the jobtracker log file in logs/ and restart the hadoop system, then can't get my mapreduce work. The JobTracker is suffering from IOExceptions, the stack looks like:
> 2012-10-10 09:19:31,838 INFO org.apache.hadoop.mapred.JobTracker: Adding tracker tracker_blade17:localhost.localdomain/127.0.0.1:44216 to host blade17 
> 2012-10-10 09:19:31,839 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker 'tracker_blade19:localhost.localdomain/127.0.0.1:40465'
> 2012-10-10 09:19:31,839 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 9001, call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@7be536d6, true, true, true, -1) from 10.10.129.17:57073: error: java.io.IOException: java.lang.RuntimeException: Expecting TaskTrackerInfo for host blade17
> java.io.IOException: java.lang.RuntimeException: Expecting TaskTrackerInfo for host blade17   at org.apache.hadoop.mapred.FrameworkScheduler.assignTasks(FrameworkScheduler.java:518)
>   at org.apache.hadoop.mapred.MesosScheduler.assignTasks(MesosScheduler.java:76)
>   at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:3398)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
> 2012-10-10 09:19:31,839 INFO org.apache.hadoop.mapred.JobTracker: Adding tracker tracker_blade19:localhost.localdomain/127.0.0.1:40465 to host blade19
> 2012-10-10 09:19:31,839 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 9001, call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@58651e95, true, true, true, -1) from 10.10.129.19:46705: error: java.io.IOException: java.lang.RuntimeException: Expecting TaskTrackerInfo for host blade19
> On the tasktracker side, it sends status to the jobtracker, but with responseid -1,just like below
> 2012-10-10 09:31:24,463 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'blade20' with reponseId '-1
> 2012-10-10 09:31:24,466 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'blade20' with reponseId '-1
> 2012-10-10 09:31:24,468 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'blade20' with reponseId '-1
> 2012-10-10 09:31:24,471 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'blade20' with reponseId '-1
> 2012-10-10 09:31:24,473 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'blade20' with reponseId '-1
> 2012-10-10 09:31:24,476 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'blade20' with reponseId '-1
> Is there any quick answer for this situation?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira