You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Joseph Wu <jo...@mesosphere.io> on 2017/01/26 22:28:34 UTC

Review Request 56001: Fixed a few executor segfaults during cleanup.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56001/
-----------------------------------------------------------

Review request for mesos, Anand Mazumdar, Gilbert Song, and Jie Yu.


Bugs: MESOS-6989
    https://issues.apache.org/jira/browse/MESOS-6989


Repository: mesos


Description
-------

The addition of `process::finalize` at the end of some binaries
led to some segfaulting at the end of the binaries' lifetimes.
This is mostly due calling destructors of libprocess actor
wrappers, which require an initialized libprocess to function.

This commit explicitly calls the destructors on the default
and docker executor actors prior to calling `process::finalize`.


Diffs
-----

  src/docker/executor.cpp f353341c9509532653e2ef8802f20259ce67930c 
  src/launcher/default_executor.cpp a03794934adb93868734f8cf00b337a1bff9b5ab 

Diff: https://reviews.apache.org/r/56001/diff/


Testing
-------

sudo src/mesos-tests --gtest_filter="*ROOT*DOCKER*" --verbose

Checked that all executors exited with status code 0, rather than 11.


Thanks,

Joseph Wu


Re: Review Request 56001: Fixed a few executor segfaults during cleanup.

Posted by Anand Mazumdar <an...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56001/#review163211
-----------------------------------------------------------


Fix it, then Ship it!




LGTM!


src/tests/default_executor_tests.cpp (line 996)
<https://reviews.apache.org/r/56001/#comment234674>

    s/dies/failed


- Anand Mazumdar


On Jan. 27, 2017, 12:01 a.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56001/
> -----------------------------------------------------------
> 
> (Updated Jan. 27, 2017, 12:01 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Gilbert Song, and Jie Yu.
> 
> 
> Bugs: MESOS-6989
>     https://issues.apache.org/jira/browse/MESOS-6989
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The addition of `process::finalize` at the end of some binaries
> led to some segfaulting at the end of the binaries' lifetimes.
> This is mostly due calling destructors of libprocess actor
> wrappers, which require an initialized libprocess to function.
> 
> This commit explicitly calls the destructors on the default
> and docker executor actors prior to calling `process::finalize`.
> 
> 
> Diffs
> -----
> 
>   src/docker/executor.cpp f353341c9509532653e2ef8802f20259ce67930c 
>   src/launcher/default_executor.cpp a03794934adb93868734f8cf00b337a1bff9b5ab 
>   src/tests/containerizer/docker_containerizer_tests.cpp fa734e6b4b5d61e944cd70f1378f2d2eb534fe22 
>   src/tests/default_executor_tests.cpp 8fc7fbeedfab8786c672af1b21cb7fa9a3347266 
> 
> Diff: https://reviews.apache.org/r/56001/diff/
> 
> 
> Testing
> -------
> 
> sudo src/mesos-tests --gtest_filter="*ROOT*DOCKER*" --verbose
> 
> Checked that all executors exited with status code 0, rather than 11.
> 
> ---
> 
> Modified two tests to check on exit status code explicitly.
> 
> sudo src/mesos-tests --gtest_filter="*CommitSuicideOnTaskFailure*:*ROOT_DOCKER_Kill" --verbose
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>


Re: Review Request 56001: Fixed a few executor segfaults during cleanup.

Posted by Joseph Wu <jo...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56001/
-----------------------------------------------------------

(Updated Jan. 26, 2017, 4:01 p.m.)


Review request for mesos, Anand Mazumdar, Gilbert Song, and Jie Yu.


Changes
-------

Tweaked two tests to check for exit codes.  Tweaked/added comments.


Bugs: MESOS-6989
    https://issues.apache.org/jira/browse/MESOS-6989


Repository: mesos


Description
-------

The addition of `process::finalize` at the end of some binaries
led to some segfaulting at the end of the binaries' lifetimes.
This is mostly due calling destructors of libprocess actor
wrappers, which require an initialized libprocess to function.

This commit explicitly calls the destructors on the default
and docker executor actors prior to calling `process::finalize`.


Diffs (updated)
-----

  src/docker/executor.cpp f353341c9509532653e2ef8802f20259ce67930c 
  src/launcher/default_executor.cpp a03794934adb93868734f8cf00b337a1bff9b5ab 
  src/tests/containerizer/docker_containerizer_tests.cpp fa734e6b4b5d61e944cd70f1378f2d2eb534fe22 
  src/tests/default_executor_tests.cpp 8fc7fbeedfab8786c672af1b21cb7fa9a3347266 

Diff: https://reviews.apache.org/r/56001/diff/


Testing (updated)
-------

sudo src/mesos-tests --gtest_filter="*ROOT*DOCKER*" --verbose

Checked that all executors exited with status code 0, rather than 11.

---

Modified two tests to check on exit status code explicitly.

sudo src/mesos-tests --gtest_filter="*CommitSuicideOnTaskFailure*:*ROOT_DOCKER_Kill" --verbose


Thanks,

Joseph Wu


Re: Review Request 56001: Fixed a few executor segfaults during cleanup.

Posted by Joseph Wu <jo...@mesosphere.io>.

> On Jan. 26, 2017, 2:47 p.m., Anand Mazumdar wrote:
> > src/docker/executor.cpp, line 817
> > <https://reviews.apache.org/r/56001/diff/1/?file=1616964#file1616964line817>
> >
> >     Not yours, but we should have a comment here too as to why we need an explicit `finalize()` here in a separate patch.

This is mine :)  I added the `finalize`s in the first place.  I'll add the comments (in 4 places, including 2 of the files in this review) in a separate commit.


- Joseph


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56001/#review163190
-----------------------------------------------------------


On Jan. 26, 2017, 4:01 p.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56001/
> -----------------------------------------------------------
> 
> (Updated Jan. 26, 2017, 4:01 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Gilbert Song, and Jie Yu.
> 
> 
> Bugs: MESOS-6989
>     https://issues.apache.org/jira/browse/MESOS-6989
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The addition of `process::finalize` at the end of some binaries
> led to some segfaulting at the end of the binaries' lifetimes.
> This is mostly due calling destructors of libprocess actor
> wrappers, which require an initialized libprocess to function.
> 
> This commit explicitly calls the destructors on the default
> and docker executor actors prior to calling `process::finalize`.
> 
> 
> Diffs
> -----
> 
>   src/docker/executor.cpp f353341c9509532653e2ef8802f20259ce67930c 
>   src/launcher/default_executor.cpp a03794934adb93868734f8cf00b337a1bff9b5ab 
>   src/tests/containerizer/docker_containerizer_tests.cpp fa734e6b4b5d61e944cd70f1378f2d2eb534fe22 
>   src/tests/default_executor_tests.cpp 8fc7fbeedfab8786c672af1b21cb7fa9a3347266 
> 
> Diff: https://reviews.apache.org/r/56001/diff/
> 
> 
> Testing
> -------
> 
> sudo src/mesos-tests --gtest_filter="*ROOT*DOCKER*" --verbose
> 
> Checked that all executors exited with status code 0, rather than 11.
> 
> ---
> 
> Modified two tests to check on exit status code explicitly.
> 
> sudo src/mesos-tests --gtest_filter="*CommitSuicideOnTaskFailure*:*ROOT_DOCKER_Kill" --verbose
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>


Re: Review Request 56001: Fixed a few executor segfaults during cleanup.

Posted by Anand Mazumdar <an...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56001/#review163190
-----------------------------------------------------------



Nice catch!

As per our offline discussion, can you modify an existing test for both the default/docker executor to also check if the `status` field is set, it shouldn't have a non-zero value.


src/docker/executor.cpp (lines 814 - 815)
<https://reviews.apache.org/r/56001/#comment234651>

    Can you add a comment here for posterity as to why this is needed?



src/docker/executor.cpp (line 817)
<https://reviews.apache.org/r/56001/#comment234653>

    Not yours, but we should have a comment here too as to why we need an explicit `finalize()` here in a separate patch.



src/launcher/default_executor.cpp (line 1142)
<https://reviews.apache.org/r/56001/#comment234652>

    Ditto.


- Anand Mazumdar


On Jan. 26, 2017, 10:28 p.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56001/
> -----------------------------------------------------------
> 
> (Updated Jan. 26, 2017, 10:28 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Gilbert Song, and Jie Yu.
> 
> 
> Bugs: MESOS-6989
>     https://issues.apache.org/jira/browse/MESOS-6989
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The addition of `process::finalize` at the end of some binaries
> led to some segfaulting at the end of the binaries' lifetimes.
> This is mostly due calling destructors of libprocess actor
> wrappers, which require an initialized libprocess to function.
> 
> This commit explicitly calls the destructors on the default
> and docker executor actors prior to calling `process::finalize`.
> 
> 
> Diffs
> -----
> 
>   src/docker/executor.cpp f353341c9509532653e2ef8802f20259ce67930c 
>   src/launcher/default_executor.cpp a03794934adb93868734f8cf00b337a1bff9b5ab 
> 
> Diff: https://reviews.apache.org/r/56001/diff/
> 
> 
> Testing
> -------
> 
> sudo src/mesos-tests --gtest_filter="*ROOT*DOCKER*" --verbose
> 
> Checked that all executors exited with status code 0, rather than 11.
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>