You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Jan Schlicht <ja...@mesosphere.io> on 2017/12/19 15:14:14 UTC

Review Request 64713: Fixed a crash when resubscribing resource providers.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64713/
-----------------------------------------------------------

Review request for mesos, Benjamin Bannier, Chun-Hung Hsiao, and Jie Yu.


Bugs: MESOS-8346
    https://issues.apache.org/jira/browse/MESOS-8346


Repository: mesos


Description
-------

If a resource provider resubscribed while its old HTTP connection was
still open, the agent would crash, as a continuation would be called
erroneously. This continuation is now only called when a HTTP connection
is closed by a remote side (i.e. the resource provider) and not when
the resource provider manager closes the connection.


Diffs
-----

  src/resource_provider/manager.cpp 2a167aa5c50598758cab4ab6c13f1b1e33e692dd 
  src/tests/resource_provider_manager_tests.cpp 4e86016a450c30fea90e9c6e5c1f2f1aab5f42c0 


Diff: https://reviews.apache.org/r/64713/diff/1/


Testing
-------

make check


Thanks,

Jan Schlicht


Re: Review Request 64713: Fixed a crash when resubscribing resource providers.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64713/#review194148
-----------------------------------------------------------



PASS: Mesos patch 64713 was successfully built and tested.

Reviews applied: `['64713']`

All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64713

- Mesos Reviewbot Windows


On Dec. 19, 2017, 3:14 p.m., Jan Schlicht wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64713/
> -----------------------------------------------------------
> 
> (Updated Dec. 19, 2017, 3:14 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Chun-Hung Hsiao, and Jie Yu.
> 
> 
> Bugs: MESOS-8346
>     https://issues.apache.org/jira/browse/MESOS-8346
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> If a resource provider resubscribed while its old HTTP connection was
> still open, the agent would crash, as a continuation would be called
> erroneously. This continuation is now only called when a HTTP connection
> is closed by a remote side (i.e. the resource provider) and not when
> the resource provider manager closes the connection.
> 
> 
> Diffs
> -----
> 
>   src/resource_provider/manager.cpp 2a167aa5c50598758cab4ab6c13f1b1e33e692dd 
>   src/tests/resource_provider_manager_tests.cpp 4e86016a450c30fea90e9c6e5c1f2f1aab5f42c0 
> 
> 
> Diff: https://reviews.apache.org/r/64713/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Jan Schlicht
> 
>


Re: Review Request 64713: Fixed a crash when resubscribing resource providers.

Posted by Jan Schlicht <ja...@mesosphere.io>.

> On Dec. 19, 2017, 4:28 p.m., Benjamin Bannier wrote:
> > src/tests/resource_provider_manager_tests.cpp
> > Line 1150 (original), 1150 (patched)
> > <https://reviews.apache.org/r/64713/diff/1/?file=1923996#file1923996line1150>
> >
> >     Let's add a comment here outlining that we start a second RP with the same ID which will take `resourceProvider1`'s place with the connection being closed by the manager.

Test case is no longer changed. Dropping.


> On Dec. 19, 2017, 4:28 p.m., Benjamin Bannier wrote:
> > src/tests/resource_provider_manager_tests.cpp
> > Line 1157 (original), 1157 (patched)
> > <https://reviews.apache.org/r/64713/diff/1/?file=1923996#file1923996line1157>
> >
> >     It would be great to check here that `resourceProvider1` actually got disconnected.

I ran into problems doing that. Seems like the `disconnected` callback is never called. And if it would, the changes in this test case wouldn't work, because the old RP instance would immediately try to resubscribe. This would result in both instances always trying to resubscribe, which is not what we want. I've removed the test for now, will investigate, why `disconnected` isn't called and come up with a separate test case for that. I've created https://issues.apache.org/jira/browse/MESOS-8349 for that.


> On Dec. 19, 2017, 4:28 p.m., Benjamin Bannier wrote:
> > src/tests/resource_provider_manager_tests.cpp
> > Line 1184 (original), 1184 (patched)
> > <https://reviews.apache.org/r/64713/diff/1/?file=1923996#file1923996line1184>
> >
> >     Let's add a comment here that this closes the connection on the RP side.

Test case is no longer changed. Dropping.


- Jan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64713/#review194146
-----------------------------------------------------------


On Dec. 20, 2017, 2:24 p.m., Jan Schlicht wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64713/
> -----------------------------------------------------------
> 
> (Updated Dec. 20, 2017, 2:24 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Chun-Hung Hsiao, and Jie Yu.
> 
> 
> Bugs: MESOS-8346
>     https://issues.apache.org/jira/browse/MESOS-8346
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> If a resource provider resubscribed while its old HTTP connection was
> still open, the agent would crash, as a continuation would be called
> erroneously. This continuation is now only called when a HTTP connection
> is closed by a remote side (i.e. the resource provider) and not when
> the resource provider manager closes the connection.
> 
> 
> Diffs
> -----
> 
>   src/resource_provider/manager.cpp e3fcb64b630924e1bb497625708cad3f0fdc064a 
> 
> 
> Diff: https://reviews.apache.org/r/64713/diff/2/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Jan Schlicht
> 
>


Re: Review Request 64713: Fixed a crash when resubscribing resource providers.

Posted by Benjamin Bannier <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64713/#review194146
-----------------------------------------------------------




src/resource_provider/manager.cpp
Lines 602-607 (patched)
<https://reviews.apache.org/r/64713/#comment272884>

    I am not sure this is the semantics we want.
    
    The protocol to add a RP is to first subscribe, then update state, potentially with resources. If we do not explicitly propagate the disconnected event to the master we might continue to perform operations on the previous resources which the resource provider might accept or not, depending on e.g., how it manages updating its resource version. This sounds pretty complicated and with potential for mistakes, and I'd prefer to just propagate the state change to the master which would only make the previous resources available after the RP has updated it state to the outside.



src/tests/resource_provider_manager_tests.cpp
Line 1150 (original), 1150 (patched)
<https://reviews.apache.org/r/64713/#comment272885>

    Let's add a comment here outlining that we start a second RP with the same ID which will take `resourceProvider1`'s place with the connection being closed by the manager.



src/tests/resource_provider_manager_tests.cpp
Line 1157 (original), 1157 (patched)
<https://reviews.apache.org/r/64713/#comment272882>

    It would be great to check here that `resourceProvider1` actually got disconnected.



src/tests/resource_provider_manager_tests.cpp
Line 1184 (original), 1184 (patched)
<https://reviews.apache.org/r/64713/#comment272886>

    Let's add a comment here that this closes the connection on the RP side.


- Benjamin Bannier


On Dec. 19, 2017, 4:14 p.m., Jan Schlicht wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64713/
> -----------------------------------------------------------
> 
> (Updated Dec. 19, 2017, 4:14 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Chun-Hung Hsiao, and Jie Yu.
> 
> 
> Bugs: MESOS-8346
>     https://issues.apache.org/jira/browse/MESOS-8346
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> If a resource provider resubscribed while its old HTTP connection was
> still open, the agent would crash, as a continuation would be called
> erroneously. This continuation is now only called when a HTTP connection
> is closed by a remote side (i.e. the resource provider) and not when
> the resource provider manager closes the connection.
> 
> 
> Diffs
> -----
> 
>   src/resource_provider/manager.cpp 2a167aa5c50598758cab4ab6c13f1b1e33e692dd 
>   src/tests/resource_provider_manager_tests.cpp 4e86016a450c30fea90e9c6e5c1f2f1aab5f42c0 
> 
> 
> Diff: https://reviews.apache.org/r/64713/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Jan Schlicht
> 
>


Re: Review Request 64713: Fixed a crash when resubscribing resource providers.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64713/#review194242
-----------------------------------------------------------



PASS: Mesos patch 64713 was successfully built and tested.

Reviews applied: `['64713']`

All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64713

- Mesos Reviewbot Windows


On Dec. 20, 2017, 1:24 p.m., Jan Schlicht wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64713/
> -----------------------------------------------------------
> 
> (Updated Dec. 20, 2017, 1:24 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Chun-Hung Hsiao, and Jie Yu.
> 
> 
> Bugs: MESOS-8346
>     https://issues.apache.org/jira/browse/MESOS-8346
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> If a resource provider resubscribed while its old HTTP connection was
> still open, the agent would crash, as a continuation would be called
> erroneously. This continuation is now only called when a HTTP connection
> is closed by a remote side (i.e. the resource provider) and not when
> the resource provider manager closes the connection.
> 
> 
> Diffs
> -----
> 
>   src/resource_provider/manager.cpp e3fcb64b630924e1bb497625708cad3f0fdc064a 
> 
> 
> Diff: https://reviews.apache.org/r/64713/diff/2/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Jan Schlicht
> 
>


Re: Review Request 64713: Fixed a crash when resubscribing resource providers.

Posted by Benjamin Bannier <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64713/#review194320
-----------------------------------------------------------


Ship it!





src/resource_provider/manager.cpp
Lines 605-606 (patched)
<https://reviews.apache.org/r/64713/#comment273086>

    nit: Fits on one line.


- Benjamin Bannier


On Dec. 20, 2017, 2:24 p.m., Jan Schlicht wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64713/
> -----------------------------------------------------------
> 
> (Updated Dec. 20, 2017, 2:24 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Chun-Hung Hsiao, and Jie Yu.
> 
> 
> Bugs: MESOS-8346
>     https://issues.apache.org/jira/browse/MESOS-8346
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> If a resource provider resubscribed while its old HTTP connection was
> still open, the agent would crash, as a continuation would be called
> erroneously. This continuation is now only called when a HTTP connection
> is closed by a remote side (i.e. the resource provider) and not when
> the resource provider manager closes the connection.
> 
> 
> Diffs
> -----
> 
>   src/resource_provider/manager.cpp e3fcb64b630924e1bb497625708cad3f0fdc064a 
> 
> 
> Diff: https://reviews.apache.org/r/64713/diff/2/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Jan Schlicht
> 
>


Re: Review Request 64713: Fixed a crash when resubscribing resource providers.

Posted by Jan Schlicht <ja...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64713/
-----------------------------------------------------------

(Updated Dec. 20, 2017, 2:24 p.m.)


Review request for mesos, Benjamin Bannier, Chun-Hung Hsiao, and Jie Yu.


Changes
-------

Addressed issues and removed test case (because it has some issues that need further investigation).


Bugs: MESOS-8346
    https://issues.apache.org/jira/browse/MESOS-8346


Repository: mesos


Description
-------

If a resource provider resubscribed while its old HTTP connection was
still open, the agent would crash, as a continuation would be called
erroneously. This continuation is now only called when a HTTP connection
is closed by a remote side (i.e. the resource provider) and not when
the resource provider manager closes the connection.


Diffs (updated)
-----

  src/resource_provider/manager.cpp e3fcb64b630924e1bb497625708cad3f0fdc064a 


Diff: https://reviews.apache.org/r/64713/diff/2/

Changes: https://reviews.apache.org/r/64713/diff/1-2/


Testing
-------

make check


Thanks,

Jan Schlicht