You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Brenden Matthews <br...@diddyinc.com> on 2013/07/25 01:00:19 UTC

Re: Review Request 11124: Kill tasks that never properly launch.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11124/
-----------------------------------------------------------

(Updated July 24, 2013, 11 p.m.)


Review request for mesos.


Changes
-------

Rebasing on master, updating as per Ben's comments.


Repository: mesos


Description
-------

Kill tasks that never properly launch.

After trying to launch a task tracker, we'll wait up to 5 minutes before
giving up and killing the task.

Review: https://reviews.apache.org/r/11124


Diffs (updated)
-----

  hadoop/mesos/src/java/org/apache/hadoop/mapred/MesosScheduler.java 279f84e0f0c43ad3cfd9e4442010e706ee3565d9 

Diff: https://reviews.apache.org/r/11124/diff/


Testing
-------

Used in production at airbnb.

make -j10 check && cd hadoop && make hadoop-2.0.0-mr1-cdh4.2.1 && make hadoop-0.20.205.0 && make hadoop-0.20.2-cdh3u3


Thanks,

Brenden Matthews


Re: Review Request 11124: Kill tasks that never properly launch.

Posted by Brenden Matthews <br...@diddyinc.com>.

> On July 25, 2013, 1:53 a.m., Vinod Kone wrote:
> > hadoop/mesos/src/java/org/apache/hadoop/mapred/MesosScheduler.java, lines 76-79
> > <https://reviews.apache.org/r/11124/diff/8/?file=327717#file327717line76>
> >
> >     Just curious, why the task trackers would never launch? The mesos slave kills the executor if it doesn't register within a timeout? Would that not be sufficient here?
> >     Did you see hung task trackers?

It seems to happen occassionaly on slaves that are busy.  Sometimes it will get 'stuck' somewhere between grabbing the executor from HDFS, extracting it, and then launching the TaskTracker.


- Brenden


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11124/#review23810
-----------------------------------------------------------


On July 25, 2013, 1:28 a.m., Brenden Matthews wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11124/
> -----------------------------------------------------------
> 
> (Updated July 25, 2013, 1:28 a.m.)
> 
> 
> Review request for mesos.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Kill tasks that never properly launch.
> 
> After trying to launch a task tracker, we'll wait up to 5 minutes before
> giving up and killing the task.
> 
> Review: https://reviews.apache.org/r/11124
> 
> 
> Diffs
> -----
> 
>   hadoop/mesos/src/java/org/apache/hadoop/mapred/MesosScheduler.java 279f84e0f0c43ad3cfd9e4442010e706ee3565d9 
> 
> Diff: https://reviews.apache.org/r/11124/diff/
> 
> 
> Testing
> -------
> 
> Used in production at airbnb.
> 
> make -j10 check && cd hadoop && make hadoop-2.0.0-mr1-cdh4.2.1 && make hadoop-0.20.205.0 && make hadoop-0.20.2-cdh3u3
> 
> 
> Thanks,
> 
> Brenden Matthews
> 
>


Re: Review Request 11124: Kill tasks that never properly launch.

Posted by Vinod Kone <vi...@gmail.com>.

> On July 25, 2013, 1:53 a.m., Vinod Kone wrote:
> > hadoop/mesos/src/java/org/apache/hadoop/mapred/MesosScheduler.java, lines 76-79
> > <https://reviews.apache.org/r/11124/diff/8/?file=327717#file327717line76>
> >
> >     Just curious, why the task trackers would never launch? The mesos slave kills the executor if it doesn't register within a timeout? Would that not be sufficient here?
> >     Did you see hung task trackers?
> 
> Brenden Matthews wrote:
>     It seems to happen occassionaly on slaves that are busy.  Sometimes it will get 'stuck' somewhere between grabbing the executor from HDFS, extracting it, and then launching the TaskTracker.

the first 2 cases should be handled by the slave, the latter probably not.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11124/#review23810
-----------------------------------------------------------


On July 25, 2013, 1:28 a.m., Brenden Matthews wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11124/
> -----------------------------------------------------------
> 
> (Updated July 25, 2013, 1:28 a.m.)
> 
> 
> Review request for mesos.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Kill tasks that never properly launch.
> 
> After trying to launch a task tracker, we'll wait up to 5 minutes before
> giving up and killing the task.
> 
> Review: https://reviews.apache.org/r/11124
> 
> 
> Diffs
> -----
> 
>   hadoop/mesos/src/java/org/apache/hadoop/mapred/MesosScheduler.java 279f84e0f0c43ad3cfd9e4442010e706ee3565d9 
> 
> Diff: https://reviews.apache.org/r/11124/diff/
> 
> 
> Testing
> -------
> 
> Used in production at airbnb.
> 
> make -j10 check && cd hadoop && make hadoop-2.0.0-mr1-cdh4.2.1 && make hadoop-0.20.205.0 && make hadoop-0.20.2-cdh3u3
> 
> 
> Thanks,
> 
> Brenden Matthews
> 
>


Re: Review Request 11124: Kill tasks that never properly launch.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11124/#review23810
-----------------------------------------------------------



hadoop/mesos/src/java/org/apache/hadoop/mapred/MesosScheduler.java
<https://reviews.apache.org/r/11124/#comment47650>

    Just curious, why the task trackers would never launch? The mesos slave kills the executor if it doesn't register within a timeout? Would that not be sufficient here?
    Did you see hung task trackers?


- Vinod Kone


On July 25, 2013, 1:28 a.m., Brenden Matthews wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11124/
> -----------------------------------------------------------
> 
> (Updated July 25, 2013, 1:28 a.m.)
> 
> 
> Review request for mesos.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Kill tasks that never properly launch.
> 
> After trying to launch a task tracker, we'll wait up to 5 minutes before
> giving up and killing the task.
> 
> Review: https://reviews.apache.org/r/11124
> 
> 
> Diffs
> -----
> 
>   hadoop/mesos/src/java/org/apache/hadoop/mapred/MesosScheduler.java 279f84e0f0c43ad3cfd9e4442010e706ee3565d9 
> 
> Diff: https://reviews.apache.org/r/11124/diff/
> 
> 
> Testing
> -------
> 
> Used in production at airbnb.
> 
> make -j10 check && cd hadoop && make hadoop-2.0.0-mr1-cdh4.2.1 && make hadoop-0.20.205.0 && make hadoop-0.20.2-cdh3u3
> 
> 
> Thanks,
> 
> Brenden Matthews
> 
>


Re: Review Request 11124: Kill tasks that never properly launch.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11124/#review23819
-----------------------------------------------------------

Ship it!


Ship It!

- Vinod Kone


On July 25, 2013, 1:28 a.m., Brenden Matthews wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11124/
> -----------------------------------------------------------
> 
> (Updated July 25, 2013, 1:28 a.m.)
> 
> 
> Review request for mesos.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Kill tasks that never properly launch.
> 
> After trying to launch a task tracker, we'll wait up to 5 minutes before
> giving up and killing the task.
> 
> Review: https://reviews.apache.org/r/11124
> 
> 
> Diffs
> -----
> 
>   hadoop/mesos/src/java/org/apache/hadoop/mapred/MesosScheduler.java 279f84e0f0c43ad3cfd9e4442010e706ee3565d9 
> 
> Diff: https://reviews.apache.org/r/11124/diff/
> 
> 
> Testing
> -------
> 
> Used in production at airbnb.
> 
> make -j10 check && cd hadoop && make hadoop-2.0.0-mr1-cdh4.2.1 && make hadoop-0.20.205.0 && make hadoop-0.20.2-cdh3u3
> 
> 
> Thanks,
> 
> Brenden Matthews
> 
>


Re: Review Request 11124: Kill tasks that never properly launch.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11124/#review23997
-----------------------------------------------------------

Ship it!


Ship It!

- Ben Mahler


On July 25, 2013, 1:28 a.m., Brenden Matthews wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11124/
> -----------------------------------------------------------
> 
> (Updated July 25, 2013, 1:28 a.m.)
> 
> 
> Review request for mesos.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Kill tasks that never properly launch.
> 
> After trying to launch a task tracker, we'll wait up to 5 minutes before
> giving up and killing the task.
> 
> Review: https://reviews.apache.org/r/11124
> 
> 
> Diffs
> -----
> 
>   hadoop/mesos/src/java/org/apache/hadoop/mapred/MesosScheduler.java 279f84e0f0c43ad3cfd9e4442010e706ee3565d9 
> 
> Diff: https://reviews.apache.org/r/11124/diff/
> 
> 
> Testing
> -------
> 
> Used in production at airbnb.
> 
> make -j10 check && cd hadoop && make hadoop-2.0.0-mr1-cdh4.2.1 && make hadoop-0.20.205.0 && make hadoop-0.20.2-cdh3u3
> 
> 
> Thanks,
> 
> Brenden Matthews
> 
>


Re: Review Request 11124: Kill tasks that never properly launch.

Posted by Brenden Matthews <br...@diddyinc.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11124/
-----------------------------------------------------------

(Updated July 25, 2013, 1:28 a.m.)


Review request for mesos.


Changes
-------

Rebase on master.


Repository: mesos


Description
-------

Kill tasks that never properly launch.

After trying to launch a task tracker, we'll wait up to 5 minutes before
giving up and killing the task.

Review: https://reviews.apache.org/r/11124


Diffs (updated)
-----

  hadoop/mesos/src/java/org/apache/hadoop/mapred/MesosScheduler.java 279f84e0f0c43ad3cfd9e4442010e706ee3565d9 

Diff: https://reviews.apache.org/r/11124/diff/


Testing
-------

Used in production at airbnb.

make -j10 check && cd hadoop && make hadoop-2.0.0-mr1-cdh4.2.1 && make hadoop-0.20.205.0 && make hadoop-0.20.2-cdh3u3


Thanks,

Brenden Matthews