You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Paul Mackles <pa...@loopr.com> on 2013/10/03 19:30:18 UTC

Re: hadoop task-trackers sticking around

Hi I went back and started from scratch with mesos-0.14.0-rc4 and
mesos-hadoop  from the trunk of  https://github.com/mesos/hadoop. While the
whole setup was definitely a lot smoother, the tasktrackers are still
sticking arounf. I traced through the code and its definitely due to this
check in JobInProgressListener.jobUpdated() from MesosScheduler.java:

if (mesosTracker.jobs.isEmpty() && mesosTracker.active)

Specifically, the tracker processes never seem to enter the "active" state.
If I remove the check for the active flag, the TaskTrackers shut down as
expected when the job completes.

How does the TaskTracker get activated?

Thanks,
Paul


On Fri, Sep 27, 2013 at 6:58 AM, Paul Mackles <pa...@loopr.com> wrote:

> I see the following messages in the job-tracker logs which probably
> explain why the task-trackers are sticking around:
>
> 2013-09-27 03:39:58,808 WARN org.apache.hadoop.mapred.MesosScheduler:
> Ignoring TaskTracker: http://vm282.dev.xxx:31001 because it might not
> have sent a hearbeat
> 2013-09-27 03:39:58,808 WARN org.apache.hadoop.mapred.MesosScheduler:
> Ignoring TaskTracker: http://vm282.dev.xxx:31000 because it might not
> have sent a hearbeat
> 2013-09-27 03:39:58,809 WARN org.apache.hadoop.mapred.MesosScheduler:
> Ignoring TaskTracker: http://vm282.dev.xxx:31001 because it might not
> have sent a hearbeat
> 2013-09-27 03:39:58,809 WARN org.apache.hadoop.mapred.MesosScheduler:
> Ignoring TaskTracker: http://vm282.dev.xxx:31000 because it might not
> have sent a hearbeat
>
> The source for MesosScheduler.java that is bundled with 0.13 looks quite a
> bit different than the version that is currently on git.
>
>
>
> On Thu, Sep 26, 2013 at 11:08 PM, Paul Mackles <pa...@loopr.com> wrote:
>
>> I will dig a little further as the behavior is inconsistent. On
>> subsequent attempts I have seen the task-trackers go away with the job.
>> They always go away when I shutdown the corresponding job-tracker.
>>
>> The hadoop code I am using was included in the 0.13 tarball that I
>> downloaded from here:
>>
>> http://mirror.nexcess.net/apache/mesos/0.13.0/
>>
>> I built the jar by running hadoop/TUTORIAL.sh. I wound up integrating
>> with hadoop manually since the tutorial script didn't work correctly for
>> me. I mostly followed the instructions here:
>> https://github.com/mesos/hadoop
>>
>> At one point I tried building it from https://github.com/mesos/hadoop but
>> I had trouble getting it to build with 0.13.
>>
>> Should I be working off of a different version?
>>
>> Thanks,
>> Paul
>>
>>
>>
>> On Thu, Sep 26, 2013 at 10:34 PM, Dan Colish <dc...@urbanairship.com>wrote:
>>
>>>
>>>
>>>
>>> On Thu, Sep 26, 2013 at 6:27 PM, Paul Mackles <pa...@loopr.com> wrote:
>>>
>>>> Hi - I am using mesos 0.13 with cdh4.2.0 in pseudo-distributed mode.
>>>> While I am able to launch and run hadoop jobs through mesos successfully, I
>>>> noticed in the Mesos UI (and through 'ps') that the task-trackers launched
>>>> by mesos are sticking around long after my job is complete. Is that
>>>> expected behavior? I am thinking the answer is no since they are tying up
>>>> resources that could be used by other frameworks. On the other hand, mesos
>>>> seems to know enough to reuse them when running subsequent hadoop jobs.
>>>> Maybe there are using reservations or something by default?
>>>>
>>>>
>>> Are you using the mesos-hadoop project found here,
>>> https://github.com/mesos/hadoop? If so, you are correct that idle
>>> tasktrackers should be torn down at the end. I wonder what the cluster
>>> state when the JobInProgressListener is called with upon your job's
>>> completion. Specifically, I would look into tracing this section [1] of code
>>> * *where the task trackers job queue is checked for emptiness the
>>> tracker is checked for being active. If the tracker was never activated I
>>> think it would also be running but not killed.
>>>
>>>
>>> [1]
>>> https://github.com/mesos/hadoop/blob/master/src/main/java/org/apache/hadoop/mapred/MesosScheduler.java#L105
>>>
>>>
>>
>>
>> --
>> Thanks,
>> Paul
>>
>
>
>
> --
> Thanks,
> Paul
>



-- 
Thanks,
Paul