You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Vinod Kone <vi...@gmail.com> on 2012/04/03 01:49:03 UTC

Review Request: Fix for properly sending TASK LOST status updates

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4619/
-----------------------------------------------------------

Review request for mesos, Benjamin Hindman and John Sirois.


Summary
-------

This is a short-term fix, because this fix will be subsumed when with we patch with the slave restart code.


Diffs
-----

  src/master/master.cpp 4dc9ee0 
  src/slave/slave.hpp 279bc7b 
  src/slave/slave.cpp 3358ec4 

Diff: https://reviews.apache.org/r/4619/diff


Testing
-------

make check

(All tests except external python test succeed)

/src/examples/python/test-framework local
./src/examples/python/test-framework: line 32: test: /Users/vinod/workspace/apache/mesos/build/src/python/dist/mesos-0.0.1-py2.6-macosx-10.6-universal.egg: binary operator expected
Traceback (most recent call last):
  File "/Users/vinod/workspace/apache/mesos/build/../src/examples/python/test_framework.py", line 23, in <module>
    import mesos
ImportError: No module named mesos


Thanks,

Vinod


Re: Review Request: Fix for properly sending TASK LOST status updates

Posted by Vinod Kone <vi...@gmail.com>.

> On 2012-04-06 21:36:13, Benjamin Hindman wrote:
> > src/slave/slave.hpp, line 101
> > <https://reviews.apache.org/r/4619/diff/3/?file=100676#file100676line101>
> >
> >     While some of this patch might be getting thrown away, reusing something like this would be swell. Maybe sticking it in a "utils" like namespace for dealing with protobuf objects (i.e., mesos::protos::create<StatusUpdate>(...)).

talked offline. will do this in the slave restart branch.


> On 2012-04-06 21:36:13, Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 75
> > <https://reviews.apache.org/r/4619/diff/3/?file=100677#file100677line75>
> >
> >     Could also be in a generic "protos" namespace.

see above.


> On 2012-04-06 21:36:13, Benjamin Hindman wrote:
> > src/slave/slave.cpp, lines 693-699
> > <https://reviews.apache.org/r/4619/diff/3/?file=100677#file100677line693>
> >
> >     You should kill this and put executor->removeTask back in 'statusUpdate'. Since transitionLiveTask calls 'statusUpdate' the task will get removed. We only need the framework->updates.empty() check below to make sure we send all the updates.

fixed


> On 2012-04-06 21:36:13, Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 1002
> > <https://reviews.apache.org/r/4619/diff/3/?file=100677#file100677line1002>
> >
> >     Put this back (see comment above).

fixed


> On 2012-04-06 21:36:13, Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 1536
> > <https://reviews.apache.org/r/4619/diff/3/?file=100677#file100677line1536>
> >
> >     Just throw a comment here saying something along the lines of why calling 'executorExited' is the right thing to do (since before now 'executorExited' was only called from the isolation module when an executor actually exited).

done


> On 2012-04-06 21:36:13, Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 1447
> > <https://reviews.apache.org/r/4619/diff/3/?file=100677#file100677line1447>
> >
> >     This would be *even* more clear if we did the logic of transitionLiveTask right here (I think you could do it without another if statement using ternary if), but since this is just a band-aide I'm not too picky.

cool :)


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4619/#review6752
-----------------------------------------------------------


On 2012-04-06 19:02:24, Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4619/
> -----------------------------------------------------------
> 
> (Updated 2012-04-06 19:02:24)
> 
> 
> Review request for mesos, Benjamin Hindman and John Sirois.
> 
> 
> Summary
> -------
> 
> This is a short-term fix, because this fix will be subsumed when with we patch with the slave restart code.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp 4dc9ee0 
>   src/slave/slave.hpp 279bc7b 
>   src/slave/slave.cpp 3358ec4 
> 
> Diff: https://reviews.apache.org/r/4619/diff
> 
> 
> Testing
> -------
> 
> make check
> 
> (All tests except external python test succeed)
> 
> /src/examples/python/test-framework local
> ./src/examples/python/test-framework: line 32: test: /Users/vinod/workspace/apache/mesos/build/src/python/dist/mesos-0.0.1-py2.6-macosx-10.6-universal.egg: binary operator expected
> Traceback (most recent call last):
>   File "/Users/vinod/workspace/apache/mesos/build/../src/examples/python/test_framework.py", line 23, in <module>
>     import mesos
> ImportError: No module named mesos
> 
> 
> Thanks,
> 
> Vinod
> 
>


Re: Review Request: Fix for properly sending TASK LOST status updates

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4619/#review6752
-----------------------------------------------------------

Ship it!



src/slave/slave.hpp
<https://reviews.apache.org/r/4619/#comment14814>

    While some of this patch might be getting thrown away, reusing something like this would be swell. Maybe sticking it in a "utils" like namespace for dealing with protobuf objects (i.e., mesos::protos::create<StatusUpdate>(...)).



src/slave/slave.cpp
<https://reviews.apache.org/r/4619/#comment14815>

    Could also be in a generic "protos" namespace.



src/slave/slave.cpp
<https://reviews.apache.org/r/4619/#comment14816>

    You should kill this and put executor->removeTask back in 'statusUpdate'. Since transitionLiveTask calls 'statusUpdate' the task will get removed. We only need the framework->updates.empty() check below to make sure we send all the updates.



src/slave/slave.cpp
<https://reviews.apache.org/r/4619/#comment14817>

    Put this back (see comment above).



src/slave/slave.cpp
<https://reviews.apache.org/r/4619/#comment14819>

    This would be *even* more clear if we did the logic of transitionLiveTask right here (I think you could do it without another if statement using ternary if), but since this is just a band-aide I'm not too picky.



src/slave/slave.cpp
<https://reviews.apache.org/r/4619/#comment14820>

    Just throw a comment here saying something along the lines of why calling 'executorExited' is the right thing to do (since before now 'executorExited' was only called from the isolation module when an executor actually exited).


- Benjamin


On 2012-04-06 19:02:24, Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4619/
> -----------------------------------------------------------
> 
> (Updated 2012-04-06 19:02:24)
> 
> 
> Review request for mesos, Benjamin Hindman and John Sirois.
> 
> 
> Summary
> -------
> 
> This is a short-term fix, because this fix will be subsumed when with we patch with the slave restart code.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp 4dc9ee0 
>   src/slave/slave.hpp 279bc7b 
>   src/slave/slave.cpp 3358ec4 
> 
> Diff: https://reviews.apache.org/r/4619/diff
> 
> 
> Testing
> -------
> 
> make check
> 
> (All tests except external python test succeed)
> 
> /src/examples/python/test-framework local
> ./src/examples/python/test-framework: line 32: test: /Users/vinod/workspace/apache/mesos/build/src/python/dist/mesos-0.0.1-py2.6-macosx-10.6-universal.egg: binary operator expected
> Traceback (most recent call last):
>   File "/Users/vinod/workspace/apache/mesos/build/../src/examples/python/test_framework.py", line 23, in <module>
>     import mesos
> ImportError: No module named mesos
> 
> 
> Thanks,
> 
> Vinod
> 
>


Re: Review Request: Fix for properly sending TASK LOST status updates

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4619/
-----------------------------------------------------------

(Updated 2012-04-09 16:44:50.819389)


Review request for mesos, Benjamin Hindman and John Sirois.


Changes
-------

ben's comments


Summary
-------

This is a short-term fix, because this fix will be subsumed when with we patch with the slave restart code.


Diffs (updated)
-----

  src/master/master.cpp 4dc9ee0 
  src/slave/slave.hpp 279bc7b 
  src/slave/slave.cpp 3358ec4 

Diff: https://reviews.apache.org/r/4619/diff


Testing
-------

make check

(All tests except external python test succeed)

/src/examples/python/test-framework local
./src/examples/python/test-framework: line 32: test: /Users/vinod/workspace/apache/mesos/build/src/python/dist/mesos-0.0.1-py2.6-macosx-10.6-universal.egg: binary operator expected
Traceback (most recent call last):
  File "/Users/vinod/workspace/apache/mesos/build/../src/examples/python/test_framework.py", line 23, in <module>
    import mesos
ImportError: No module named mesos


Thanks,

Vinod


Re: Review Request: Fix for properly sending TASK LOST status updates

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4619/
-----------------------------------------------------------

(Updated 2012-04-06 19:02:24.780345)


Review request for mesos, Benjamin Hindman and John Sirois.


Changes
-------

ben's comments


Summary
-------

This is a short-term fix, because this fix will be subsumed when with we patch with the slave restart code.


Diffs (updated)
-----

  src/master/master.cpp 4dc9ee0 
  src/slave/slave.hpp 279bc7b 
  src/slave/slave.cpp 3358ec4 

Diff: https://reviews.apache.org/r/4619/diff


Testing
-------

make check

(All tests except external python test succeed)

/src/examples/python/test-framework local
./src/examples/python/test-framework: line 32: test: /Users/vinod/workspace/apache/mesos/build/src/python/dist/mesos-0.0.1-py2.6-macosx-10.6-universal.egg: binary operator expected
Traceback (most recent call last):
  File "/Users/vinod/workspace/apache/mesos/build/../src/examples/python/test_framework.py", line 23, in <module>
    import mesos
ImportError: No module named mesos


Thanks,

Vinod


Re: Review Request: Fix for properly sending TASK LOST status updates

Posted by Vinod Kone <vi...@gmail.com>.

> On 2012-04-06 17:25:43, Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 1327
> > <https://reviews.apache.org/r/4619/diff/2/?file=98738#file98738line1327>
> >
> >     Rather than introduce a new function 'sendStatusUpdate' it seems like we should really just leverage the existing 'statusUpdate' function ... there is no reason why you can't call that places you'd otherwise be calling 'sendStatusUpdate', and then the logic for dealing with status updates is all in one place.

agreed. i wanted to pull the creation of status updates into a helper. this helper turned out to do more than creating status updates. fixed.


> On 2012-04-06 17:25:43, Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 1344
> > <https://reviews.apache.org/r/4619/diff/2/?file=98738#file98738line1344>
> >
> >     For example, this code is confusing here. It seems like the invariant when calling this function is that 'framework' and 'executor' exist. And yet, we still look them up, and never end up using 'executor'.

fixed


> On 2012-04-06 17:25:43, Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 1432
> > <https://reviews.apache.org/r/4619/diff/2/?file=98738#file98738line1432>
> >
> >     In general putting something like this in this "helper" function gives it a nasty side-effect, and in particular, this means that if an executor was running 3 tasks we'll send ExitedExecutorMessage to the master 3 times. Even though this patch is just a band-aid *and* the master should be able to handle those "extra" messages, this needs to be fixed.

fixed


> On 2012-04-06 17:25:43, Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 1477
> > <https://reviews.apache.org/r/4619/diff/2/?file=98738#file98738line1477>
> >
> >     I don't think 'transitionLiveTasks' actually transitions the state of the task because 'sendStatusUpdate' doesn't. But 'statusUpdate' does! Again, it seems much cleaner just to create StatusUpdate objects here and simply call 'statusUpdate' as though it had come from the executor.

fixed


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4619/#review6725
-----------------------------------------------------------


On 2012-04-05 18:19:14, Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4619/
> -----------------------------------------------------------
> 
> (Updated 2012-04-05 18:19:14)
> 
> 
> Review request for mesos, Benjamin Hindman and John Sirois.
> 
> 
> Summary
> -------
> 
> This is a short-term fix, because this fix will be subsumed when with we patch with the slave restart code.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp 4dc9ee0 
>   src/slave/slave.hpp 279bc7b 
>   src/slave/slave.cpp 3358ec4 
> 
> Diff: https://reviews.apache.org/r/4619/diff
> 
> 
> Testing
> -------
> 
> make check
> 
> (All tests except external python test succeed)
> 
> /src/examples/python/test-framework local
> ./src/examples/python/test-framework: line 32: test: /Users/vinod/workspace/apache/mesos/build/src/python/dist/mesos-0.0.1-py2.6-macosx-10.6-universal.egg: binary operator expected
> Traceback (most recent call last):
>   File "/Users/vinod/workspace/apache/mesos/build/../src/examples/python/test_framework.py", line 23, in <module>
>     import mesos
> ImportError: No module named mesos
> 
> 
> Thanks,
> 
> Vinod
> 
>


Re: Review Request: Fix for properly sending TASK LOST status updates

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4619/#review6725
-----------------------------------------------------------



src/slave/slave.cpp
<https://reviews.apache.org/r/4619/#comment14756>

    Rather than introduce a new function 'sendStatusUpdate' it seems like we should really just leverage the existing 'statusUpdate' function ... there is no reason why you can't call that places you'd otherwise be calling 'sendStatusUpdate', and then the logic for dealing with status updates is all in one place.



src/slave/slave.cpp
<https://reviews.apache.org/r/4619/#comment14644>

    For example, this code is confusing here. It seems like the invariant when calling this function is that 'framework' and 'executor' exist. And yet, we still look them up, and never end up using 'executor'.



src/slave/slave.cpp
<https://reviews.apache.org/r/4619/#comment14590>

    In general putting something like this in this "helper" function gives it a nasty side-effect, and in particular, this means that if an executor was running 3 tasks we'll send ExitedExecutorMessage to the master 3 times. Even though this patch is just a band-aid *and* the master should be able to handle those "extra" messages, this needs to be fixed.



src/slave/slave.cpp
<https://reviews.apache.org/r/4619/#comment14757>

    I don't think 'transitionLiveTasks' actually transitions the state of the task because 'sendStatusUpdate' doesn't. But 'statusUpdate' does! Again, it seems much cleaner just to create StatusUpdate objects here and simply call 'statusUpdate' as though it had come from the executor.


- Benjamin


On 2012-04-05 18:19:14, Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4619/
> -----------------------------------------------------------
> 
> (Updated 2012-04-05 18:19:14)
> 
> 
> Review request for mesos, Benjamin Hindman and John Sirois.
> 
> 
> Summary
> -------
> 
> This is a short-term fix, because this fix will be subsumed when with we patch with the slave restart code.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp 4dc9ee0 
>   src/slave/slave.hpp 279bc7b 
>   src/slave/slave.cpp 3358ec4 
> 
> Diff: https://reviews.apache.org/r/4619/diff
> 
> 
> Testing
> -------
> 
> make check
> 
> (All tests except external python test succeed)
> 
> /src/examples/python/test-framework local
> ./src/examples/python/test-framework: line 32: test: /Users/vinod/workspace/apache/mesos/build/src/python/dist/mesos-0.0.1-py2.6-macosx-10.6-universal.egg: binary operator expected
> Traceback (most recent call last):
>   File "/Users/vinod/workspace/apache/mesos/build/../src/examples/python/test_framework.py", line 23, in <module>
>     import mesos
> ImportError: No module named mesos
> 
> 
> Thanks,
> 
> Vinod
> 
>


Re: Review Request: Fix for properly sending TASK LOST status updates

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4619/
-----------------------------------------------------------

(Updated 2012-04-05 18:19:14.475674)


Review request for mesos, Benjamin Hindman and John Sirois.


Changes
-------

pinging for review so that this can go in with the deploy of mesos 0.9 next week


Summary
-------

This is a short-term fix, because this fix will be subsumed when with we patch with the slave restart code.


Diffs
-----

  src/master/master.cpp 4dc9ee0 
  src/slave/slave.hpp 279bc7b 
  src/slave/slave.cpp 3358ec4 

Diff: https://reviews.apache.org/r/4619/diff


Testing
-------

make check

(All tests except external python test succeed)

/src/examples/python/test-framework local
./src/examples/python/test-framework: line 32: test: /Users/vinod/workspace/apache/mesos/build/src/python/dist/mesos-0.0.1-py2.6-macosx-10.6-universal.egg: binary operator expected
Traceback (most recent call last):
  File "/Users/vinod/workspace/apache/mesos/build/../src/examples/python/test_framework.py", line 23, in <module>
    import mesos
ImportError: No module named mesos


Thanks,

Vinod


Re: Review Request: Fix for properly sending TASK LOST status updates

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4619/
-----------------------------------------------------------

(Updated 2012-04-03 17:43:06.263672)


Review request for mesos, Benjamin Hindman and John Sirois.


Changes
-------

minor fixes


Summary
-------

This is a short-term fix, because this fix will be subsumed when with we patch with the slave restart code.


Diffs (updated)
-----

  src/master/master.cpp 4dc9ee0 
  src/slave/slave.hpp 279bc7b 
  src/slave/slave.cpp 3358ec4 

Diff: https://reviews.apache.org/r/4619/diff


Testing
-------

make check

(All tests except external python test succeed)

/src/examples/python/test-framework local
./src/examples/python/test-framework: line 32: test: /Users/vinod/workspace/apache/mesos/build/src/python/dist/mesos-0.0.1-py2.6-macosx-10.6-universal.egg: binary operator expected
Traceback (most recent call last):
  File "/Users/vinod/workspace/apache/mesos/build/../src/examples/python/test_framework.py", line 23, in <module>
    import mesos
ImportError: No module named mesos


Thanks,

Vinod