You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Joseph Wu <jo...@mesosphere.io> on 2019/02/22 00:31:17 UTC

Review Request 70040: Added test for terminal operation updates after master failover.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70040/
-----------------------------------------------------------

Review request for mesos, Benno Evers, Gastón Kleiman, and Greg Mann.


Bugs: MESOS-9542
    https://issues.apache.org/jira/browse/MESOS-9542


Repository: mesos


Description
-------

This test covers a corner case where an agent reregisters with the
master with a pending operation, but the operation's originating
framework is unknown.  This can occur in a variety of situations like:
  * the master fails over and a framework never reregisters,
  * a completed framework is rotated out of the master's memory with
    pending operations, or
  * an agent with pending operations is migrated from one cluster to
    another.

In this case, the master should "adopt" the orphan operation only
after a delay.  This gives the framework some time to reregister.
But if the framework does not reregister in time, the master will
be in charge of acknowledging operation status updates.


Diffs
-----

  src/tests/storage_local_resource_provider_tests.cpp a661951a0a326cc342aa0c45dd0967692ae70941 


Diff: https://reviews.apache.org/r/70040/diff/1/


Testing
-------

```
make check
src/mesos-tests --gtest_filter="*TerminalOrphanOperationAfterMasterFailover*" --verbose
src/mesos-tests --gtest_filter="*Operation*" --verbose
```

(Internal CI run pending)


Thanks,

Joseph Wu


Re: Review Request 70040: Added test for terminal operation updates after master failover.

Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70040/#review213117
-----------------------------------------------------------



Patch looks great!

Reviews applied: [69968, 69960, 69961, 69962, 69963, 69967, 69980, 70014, 69872, 69869, 70040]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose --disable-libtool-wrappers --disable-parallel-test-execution' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Feb. 22, 2019, 12:31 a.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70040/
> -----------------------------------------------------------
> 
> (Updated Feb. 22, 2019, 12:31 a.m.)
> 
> 
> Review request for mesos, Benno Evers, Gastón Kleiman, and Greg Mann.
> 
> 
> Bugs: MESOS-9542
>     https://issues.apache.org/jira/browse/MESOS-9542
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This test covers a corner case where an agent reregisters with the
> master with a pending operation, but the operation's originating
> framework is unknown.  This can occur in a variety of situations like:
>   * the master fails over and a framework never reregisters,
>   * a completed framework is rotated out of the master's memory with
>     pending operations, or
>   * an agent with pending operations is migrated from one cluster to
>     another.
> 
> In this case, the master should "adopt" the orphan operation only
> after a delay.  This gives the framework some time to reregister.
> But if the framework does not reregister in time, the master will
> be in charge of acknowledging operation status updates.
> 
> 
> Diffs
> -----
> 
>   src/tests/storage_local_resource_provider_tests.cpp a661951a0a326cc342aa0c45dd0967692ae70941 
> 
> 
> Diff: https://reviews.apache.org/r/70040/diff/1/
> 
> 
> Testing
> -------
> 
> ```
> make check
> src/mesos-tests --gtest_filter="*TerminalOrphanOperationAfterMasterFailover*" --verbose
> src/mesos-tests --gtest_filter="*Operation*" --verbose
> ```
> 
> (Internal CI run pending)
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>


Re: Review Request 70040: Added test for terminal operation updates after master failover.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70040/#review213071
-----------------------------------------------------------



PASS: Mesos patch 70040 was successfully built and tested.

Reviews applied: `['69960', '69961', '69962', '69963', '69967', '69980', '70014', '69872', '69869', '70040']`

All the build artifacts available at: http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2920/mesos-review-70040

- Mesos Reviewbot Windows


On Feb. 22, 2019, 12:31 a.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70040/
> -----------------------------------------------------------
> 
> (Updated Feb. 22, 2019, 12:31 a.m.)
> 
> 
> Review request for mesos, Benno Evers, Gastón Kleiman, and Greg Mann.
> 
> 
> Bugs: MESOS-9542
>     https://issues.apache.org/jira/browse/MESOS-9542
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This test covers a corner case where an agent reregisters with the
> master with a pending operation, but the operation's originating
> framework is unknown.  This can occur in a variety of situations like:
>   * the master fails over and a framework never reregisters,
>   * a completed framework is rotated out of the master's memory with
>     pending operations, or
>   * an agent with pending operations is migrated from one cluster to
>     another.
> 
> In this case, the master should "adopt" the orphan operation only
> after a delay.  This gives the framework some time to reregister.
> But if the framework does not reregister in time, the master will
> be in charge of acknowledging operation status updates.
> 
> 
> Diffs
> -----
> 
>   src/tests/storage_local_resource_provider_tests.cpp a661951a0a326cc342aa0c45dd0967692ae70941 
> 
> 
> Diff: https://reviews.apache.org/r/70040/diff/1/
> 
> 
> Testing
> -------
> 
> ```
> make check
> src/mesos-tests --gtest_filter="*TerminalOrphanOperationAfterMasterFailover*" --verbose
> src/mesos-tests --gtest_filter="*Operation*" --verbose
> ```
> 
> (Internal CI run pending)
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>


Re: Review Request 70040: Added test for terminal operation updates after master failover.

Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70040/#review213189
-----------------------------------------------------------


Fix it, then Ship it!





src/tests/storage_local_resource_provider_tests.cpp
Lines 5388-5389 (patched)
<https://reviews.apache.org/r/70040/#comment299053>

    Could you note in this comment why we do this?



src/tests/storage_local_resource_provider_tests.cpp
Lines 5430 (patched)
<https://reviews.apache.org/r/70040/#comment299055>

    s/completes/completed/



src/tests/storage_local_resource_provider_tests.cpp
Lines 5459 (patched)
<https://reviews.apache.org/r/70040/#comment299056>

    Nit: indent two more spaces.


- Greg Mann


On Feb. 25, 2019, 9 p.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70040/
> -----------------------------------------------------------
> 
> (Updated Feb. 25, 2019, 9 p.m.)
> 
> 
> Review request for mesos, Benno Evers, Gastón Kleiman, and Greg Mann.
> 
> 
> Bugs: MESOS-9542
>     https://issues.apache.org/jira/browse/MESOS-9542
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This test covers a corner case where an agent reregisters with the
> master with a pending operation, but the operation's originating
> framework is unknown.  This can occur in a variety of situations like:
>   * the master fails over and a framework never reregisters,
>   * a completed framework is rotated out of the master's memory with
>     pending operations, or
>   * an agent with pending operations is migrated from one cluster to
>     another.
> 
> In this case, the master should "adopt" the orphan operation only
> after a delay.  This gives the framework some time to reregister.
> But if the framework does not reregister in time, the master will
> be in charge of acknowledging operation status updates.
> 
> 
> Diffs
> -----
> 
>   src/tests/storage_local_resource_provider_tests.cpp a661951a0a326cc342aa0c45dd0967692ae70941 
> 
> 
> Diff: https://reviews.apache.org/r/70040/diff/2/
> 
> 
> Testing
> -------
> 
> ```
> make check
> src/mesos-tests --gtest_filter="*TerminalOrphanOperationAfterMasterFailover*" --verbose
> src/mesos-tests --gtest_filter="*Operation*" --verbose
> ```
> 
> (Internal CI run pending)
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>


Re: Review Request 70040: Added test for terminal operation updates after master failover.

Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70040/#review213190
-----------------------------------------------------------



Patch looks great!

Reviews applied: [69968, 69960, 69961, 69962, 69963, 69967, 69980, 70014, 69872, 69869, 70040]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose --disable-libtool-wrappers --disable-parallel-test-execution' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Feb. 25, 2019, 9 p.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70040/
> -----------------------------------------------------------
> 
> (Updated Feb. 25, 2019, 9 p.m.)
> 
> 
> Review request for mesos, Benno Evers, Gastón Kleiman, and Greg Mann.
> 
> 
> Bugs: MESOS-9542
>     https://issues.apache.org/jira/browse/MESOS-9542
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This test covers a corner case where an agent reregisters with the
> master with a pending operation, but the operation's originating
> framework is unknown.  This can occur in a variety of situations like:
>   * the master fails over and a framework never reregisters,
>   * a completed framework is rotated out of the master's memory with
>     pending operations, or
>   * an agent with pending operations is migrated from one cluster to
>     another.
> 
> In this case, the master should "adopt" the orphan operation only
> after a delay.  This gives the framework some time to reregister.
> But if the framework does not reregister in time, the master will
> be in charge of acknowledging operation status updates.
> 
> 
> Diffs
> -----
> 
>   src/tests/storage_local_resource_provider_tests.cpp a661951a0a326cc342aa0c45dd0967692ae70941 
> 
> 
> Diff: https://reviews.apache.org/r/70040/diff/2/
> 
> 
> Testing
> -------
> 
> ```
> make check
> src/mesos-tests --gtest_filter="*TerminalOrphanOperationAfterMasterFailover*" --verbose
> src/mesos-tests --gtest_filter="*Operation*" --verbose
> ```
> 
> (Internal CI run pending)
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>


Re: Review Request 70040: Added test for terminal operation updates after master failover.

Posted by Joseph Wu <jo...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70040/
-----------------------------------------------------------

(Updated Feb. 26, 2019, 11:25 a.m.)


Review request for mesos, Benno Evers, Gastón Kleiman, and Greg Mann.


Changes
-------

Comment tweak!


Bugs: MESOS-9542
    https://issues.apache.org/jira/browse/MESOS-9542


Repository: mesos


Description
-------

This test covers a corner case where an agent reregisters with the
master with a pending operation, but the operation's originating
framework is unknown.  This can occur in a variety of situations like:
  * the master fails over and a framework never reregisters,
  * a completed framework is rotated out of the master's memory with
    pending operations, or
  * an agent with pending operations is migrated from one cluster to
    another.

In this case, the master should "adopt" the orphan operation only
after a delay.  This gives the framework some time to reregister.
But if the framework does not reregister in time, the master will
be in charge of acknowledging operation status updates.


Diffs (updated)
-----

  src/tests/storage_local_resource_provider_tests.cpp a661951a0a326cc342aa0c45dd0967692ae70941 


Diff: https://reviews.apache.org/r/70040/diff/3/

Changes: https://reviews.apache.org/r/70040/diff/2-3/


Testing
-------

```
make check
src/mesos-tests --gtest_filter="*TerminalOrphanOperationAfterMasterFailover*" --verbose
src/mesos-tests --gtest_filter="*Operation*" --verbose
```

(Internal CI run pending)


Thanks,

Joseph Wu


Re: Review Request 70040: Added test for terminal operation updates after master failover.

Posted by Joseph Wu <jo...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70040/
-----------------------------------------------------------

(Updated Feb. 25, 2019, 1 p.m.)


Review request for mesos, Benno Evers, Gastón Kleiman, and Greg Mann.


Changes
-------

Addressed comments left in previous review, copy-pasted into this one.


Bugs: MESOS-9542
    https://issues.apache.org/jira/browse/MESOS-9542


Repository: mesos


Description
-------

This test covers a corner case where an agent reregisters with the
master with a pending operation, but the operation's originating
framework is unknown.  This can occur in a variety of situations like:
  * the master fails over and a framework never reregisters,
  * a completed framework is rotated out of the master's memory with
    pending operations, or
  * an agent with pending operations is migrated from one cluster to
    another.

In this case, the master should "adopt" the orphan operation only
after a delay.  This gives the framework some time to reregister.
But if the framework does not reregister in time, the master will
be in charge of acknowledging operation status updates.


Diffs (updated)
-----

  src/tests/storage_local_resource_provider_tests.cpp a661951a0a326cc342aa0c45dd0967692ae70941 


Diff: https://reviews.apache.org/r/70040/diff/2/

Changes: https://reviews.apache.org/r/70040/diff/1-2/


Testing
-------

```
make check
src/mesos-tests --gtest_filter="*TerminalOrphanOperationAfterMasterFailover*" --verbose
src/mesos-tests --gtest_filter="*Operation*" --verbose
```

(Internal CI run pending)


Thanks,

Joseph Wu