You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@aurora.apache.org by Zameer Manji <zm...@apache.org> on 2017/03/29 23:52:55 UTC

Review Request 58053: Reliably subscribe to Mesos in the HTTP Driver.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58053/
-----------------------------------------------------------

Review request for Aurora and Stephan Erb.


Bugs: AURORA-1911
    https://issues.apache.org/jira/browse/AURORA-1911


Repository: aurora


Description
-------

As noted in AURORA-1911 the `V1Mesos` driver doesn't re try `SUBSCRIBE` calls if they fail. This means that after a leader subscribes and disconnects, it is possible for it to never re subscribe again if the Mesos Master is unhealthy.

To fix this, I have moved the subscription into the dedicated `SchedulerExecutor` and it coninutes to attempt to subscribe using truncated binary backoff. It only stops if we are disconnected or if we sucessfully connect.


Diffs
-----

  src/jmh/java/org/apache/aurora/benchmark/StatusUpdateBenchmark.java 206b11458da2b0f938f0fcab5e5d3259a88ac9ee 
  src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 5bf1e4e8c46044cb69b266cd203b5ec2f8b9ab61 
  src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverModule.java 10d4f1b515b91d85b283cb7c655275c22fb133f9 
  src/main/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImpl.java 67d356ab66c926a3b56860b906a453d57d6b694d 
  src/test/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImplTest.java 756d0d9e30a447f9fba75c1c60f2f2f3c610399b 


Diff: https://reviews.apache.org/r/58053/diff/1/


Testing
-------


Thanks,

Zameer Manji


Re: Review Request 58053: Reliably subscribe to Mesos in the HTTP Driver.

Posted by Zameer Manji <zm...@apache.org>.

> On March 30, 2017, 8:13 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImpl.java
> > Lines 125-129 (patched)
> > <https://reviews.apache.org/r/58053/diff/1/?file=1680496#file1680496line125>
> >
> >     Does the Mesos docs say anything about simultanous `SUBSCRIBE` calls?
> >     
> >     If the backoff time is still pretty low we might end up sending another subscribe before we have received an answer for the previous one.
> 
> Zameer Manji wrote:
>     From what I understand, multiple subscription per framework is not allowed and subsequent subscribe attempts will fail if a connection was already established. The underlying driver ignores those failures so we should be fine.

The `V1Mesos` driver drops them silently so we are fine if we send extra calls. https://github.com/apache/mesos/blob/1a1fa95d0de179d7efab002a99a0e6261ce307f9/src/scheduler/scheduler.cpp#L230


- Zameer


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58053/#review170580
-----------------------------------------------------------


On March 30, 2017, 11:37 a.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58053/
> -----------------------------------------------------------
> 
> (Updated March 30, 2017, 11:37 a.m.)
> 
> 
> Review request for Aurora and Stephan Erb.
> 
> 
> Bugs: AURORA-1911
>     https://issues.apache.org/jira/browse/AURORA-1911
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> As noted in AURORA-1911 the `V1Mesos` driver doesn't re try `SUBSCRIBE` calls if they fail. This means that after a leader subscribes and disconnects, it is possible for it to never re subscribe again if the Mesos Master is unhealthy.
> 
> To fix this, I have moved the subscription into the dedicated `SchedulerExecutor` and it coninutes to attempt to subscribe using truncated binary backoff. It only stops if we are disconnected or if we sucessfully connect.
> 
> 
> Diffs
> -----
> 
>   src/jmh/java/org/apache/aurora/benchmark/StatusUpdateBenchmark.java 206b11458da2b0f938f0fcab5e5d3259a88ac9ee 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 5bf1e4e8c46044cb69b266cd203b5ec2f8b9ab61 
>   src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverModule.java 10d4f1b515b91d85b283cb7c655275c22fb133f9 
>   src/main/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImpl.java 67d356ab66c926a3b56860b906a453d57d6b694d 
>   src/test/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImplTest.java 756d0d9e30a447f9fba75c1c60f2f2f3c610399b 
> 
> 
> Diff: https://reviews.apache.org/r/58053/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Re: Review Request 58053: Reliably subscribe to Mesos in the HTTP Driver.

Posted by Zameer Manji <zm...@apache.org>.

> On March 30, 2017, 8:13 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImpl.java
> > Lines 120 (patched)
> > <https://reviews.apache.org/r/58053/diff/1/?file=1680496#file1680496line120>
> >
> >     Do we have to give up eventually? (I suppose not...)

I don't think so. If we give up, I assume the scheduler is going to shut down. Suppose if Mesos is down, on scheduler shutdown means we will elect a new leader. A new leader (by default) has a one minute timeout to register to Mesos. If we give up, we will just be flapping between leaders until the system heals. I think that's pretty undesirable.


> On March 30, 2017, 8:13 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImpl.java
> > Lines 125-129 (patched)
> > <https://reviews.apache.org/r/58053/diff/1/?file=1680496#file1680496line125>
> >
> >     Does the Mesos docs say anything about simultanous `SUBSCRIBE` calls?
> >     
> >     If the backoff time is still pretty low we might end up sending another subscribe before we have received an answer for the previous one.

From what I understand, multiple subscription per framework is not allowed and subsequent subscribe attempts will fail if a connection was already established. The underlying driver ignores those failures so we should be fine.


> On March 30, 2017, 8:13 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImpl.java
> > Lines 128-130 (original), 165-167 (patched)
> > <https://reviews.apache.org/r/58053/diff/1/?file=1680496#file1680496line165>
> >
> >     You are unsetting `isSubscribed` in the `disconnected` handler. Doesn't this imply we will never run the reregistration code here?

Good catch, fixed.


> On March 30, 2017, 8:13 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImpl.java
> > Lines 137-138 (original), 174-175 (patched)
> > <https://reviews.apache.org/r/58053/diff/1/?file=1680496#file1680496line175>
> >
> >     I am wondering why we need this here for `OFFERS` but not for `RESCIND`, `INVERSE_OFFERS`, etc.

I put it in here for the same kind of errors are the unversioned driver. Technically we could put it everywhere. I'm not opposed if you think we should do it.


- Zameer


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58053/#review170580
-----------------------------------------------------------


On March 29, 2017, 4:52 p.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58053/
> -----------------------------------------------------------
> 
> (Updated March 29, 2017, 4:52 p.m.)
> 
> 
> Review request for Aurora and Stephan Erb.
> 
> 
> Bugs: AURORA-1911
>     https://issues.apache.org/jira/browse/AURORA-1911
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> As noted in AURORA-1911 the `V1Mesos` driver doesn't re try `SUBSCRIBE` calls if they fail. This means that after a leader subscribes and disconnects, it is possible for it to never re subscribe again if the Mesos Master is unhealthy.
> 
> To fix this, I have moved the subscription into the dedicated `SchedulerExecutor` and it coninutes to attempt to subscribe using truncated binary backoff. It only stops if we are disconnected or if we sucessfully connect.
> 
> 
> Diffs
> -----
> 
>   src/jmh/java/org/apache/aurora/benchmark/StatusUpdateBenchmark.java 206b11458da2b0f938f0fcab5e5d3259a88ac9ee 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 5bf1e4e8c46044cb69b266cd203b5ec2f8b9ab61 
>   src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverModule.java 10d4f1b515b91d85b283cb7c655275c22fb133f9 
>   src/main/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImpl.java 67d356ab66c926a3b56860b906a453d57d6b694d 
>   src/test/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImplTest.java 756d0d9e30a447f9fba75c1c60f2f2f3c610399b 
> 
> 
> Diff: https://reviews.apache.org/r/58053/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Re: Review Request 58053: Reliably subscribe to Mesos in the HTTP Driver.

Posted by Stephan Erb <se...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58053/#review170580
-----------------------------------------------------------



Not sure if I got everything :) A couple of questions below.


src/main/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImpl.java
Lines 120 (patched)
<https://reviews.apache.org/r/58053/#comment243439>

    Do we have to give up eventually? (I suppose not...)



src/main/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImpl.java
Lines 125-129 (patched)
<https://reviews.apache.org/r/58053/#comment243440>

    Does the Mesos docs say anything about simultanous `SUBSCRIBE` calls?
    
    If the backoff time is still pretty low we might end up sending another subscribe before we have received an answer for the previous one.



src/main/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImpl.java
Lines 128-130 (original), 165-167 (patched)
<https://reviews.apache.org/r/58053/#comment243438>

    You are unsetting `isSubscribed` in the `disconnected` handler. Doesn't this imply we will never run the reregistration code here?



src/main/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImpl.java
Lines 137-138 (original), 174-175 (patched)
<https://reviews.apache.org/r/58053/#comment243441>

    I am wondering why we need this here for `OFFERS` but not for `RESCIND`, `INVERSE_OFFERS`, etc.


- Stephan Erb


On March 30, 2017, 1:52 a.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58053/
> -----------------------------------------------------------
> 
> (Updated March 30, 2017, 1:52 a.m.)
> 
> 
> Review request for Aurora and Stephan Erb.
> 
> 
> Bugs: AURORA-1911
>     https://issues.apache.org/jira/browse/AURORA-1911
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> As noted in AURORA-1911 the `V1Mesos` driver doesn't re try `SUBSCRIBE` calls if they fail. This means that after a leader subscribes and disconnects, it is possible for it to never re subscribe again if the Mesos Master is unhealthy.
> 
> To fix this, I have moved the subscription into the dedicated `SchedulerExecutor` and it coninutes to attempt to subscribe using truncated binary backoff. It only stops if we are disconnected or if we sucessfully connect.
> 
> 
> Diffs
> -----
> 
>   src/jmh/java/org/apache/aurora/benchmark/StatusUpdateBenchmark.java 206b11458da2b0f938f0fcab5e5d3259a88ac9ee 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 5bf1e4e8c46044cb69b266cd203b5ec2f8b9ab61 
>   src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverModule.java 10d4f1b515b91d85b283cb7c655275c22fb133f9 
>   src/main/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImpl.java 67d356ab66c926a3b56860b906a453d57d6b694d 
>   src/test/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImplTest.java 756d0d9e30a447f9fba75c1c60f2f2f3c610399b 
> 
> 
> Diff: https://reviews.apache.org/r/58053/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Re: Review Request 58053: Reliably subscribe to Mesos in the HTTP Driver.

Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58053/#review170504
-----------------------------------------------------------


Ship it!




Master (3a9aabd) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot retry"

- Aurora ReviewBot


On March 29, 2017, 11:52 p.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58053/
> -----------------------------------------------------------
> 
> (Updated March 29, 2017, 11:52 p.m.)
> 
> 
> Review request for Aurora and Stephan Erb.
> 
> 
> Bugs: AURORA-1911
>     https://issues.apache.org/jira/browse/AURORA-1911
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> As noted in AURORA-1911 the `V1Mesos` driver doesn't re try `SUBSCRIBE` calls if they fail. This means that after a leader subscribes and disconnects, it is possible for it to never re subscribe again if the Mesos Master is unhealthy.
> 
> To fix this, I have moved the subscription into the dedicated `SchedulerExecutor` and it coninutes to attempt to subscribe using truncated binary backoff. It only stops if we are disconnected or if we sucessfully connect.
> 
> 
> Diffs
> -----
> 
>   src/jmh/java/org/apache/aurora/benchmark/StatusUpdateBenchmark.java 206b11458da2b0f938f0fcab5e5d3259a88ac9ee 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 5bf1e4e8c46044cb69b266cd203b5ec2f8b9ab61 
>   src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverModule.java 10d4f1b515b91d85b283cb7c655275c22fb133f9 
>   src/main/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImpl.java 67d356ab66c926a3b56860b906a453d57d6b694d 
>   src/test/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImplTest.java 756d0d9e30a447f9fba75c1c60f2f2f3c610399b 
> 
> 
> Diff: https://reviews.apache.org/r/58053/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Re: Review Request 58053: Reliably subscribe to Mesos in the HTTP Driver.

Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58053/#review170627
-----------------------------------------------------------


Ship it!




Master (076d917) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot retry"

- Aurora ReviewBot


On March 30, 2017, 6:37 p.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58053/
> -----------------------------------------------------------
> 
> (Updated March 30, 2017, 6:37 p.m.)
> 
> 
> Review request for Aurora and Stephan Erb.
> 
> 
> Bugs: AURORA-1911
>     https://issues.apache.org/jira/browse/AURORA-1911
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> As noted in AURORA-1911 the `V1Mesos` driver doesn't re try `SUBSCRIBE` calls if they fail. This means that after a leader subscribes and disconnects, it is possible for it to never re subscribe again if the Mesos Master is unhealthy.
> 
> To fix this, I have moved the subscription into the dedicated `SchedulerExecutor` and it coninutes to attempt to subscribe using truncated binary backoff. It only stops if we are disconnected or if we sucessfully connect.
> 
> 
> Diffs
> -----
> 
>   src/jmh/java/org/apache/aurora/benchmark/StatusUpdateBenchmark.java 206b11458da2b0f938f0fcab5e5d3259a88ac9ee 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 5bf1e4e8c46044cb69b266cd203b5ec2f8b9ab61 
>   src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverModule.java 10d4f1b515b91d85b283cb7c655275c22fb133f9 
>   src/main/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImpl.java 67d356ab66c926a3b56860b906a453d57d6b694d 
>   src/test/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImplTest.java 756d0d9e30a447f9fba75c1c60f2f2f3c610399b 
> 
> 
> Diff: https://reviews.apache.org/r/58053/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Re: Review Request 58053: Reliably subscribe to Mesos in the HTTP Driver.

Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58053/#review170662
-----------------------------------------------------------



Master (076d917) is red with this patch.
  ./build-support/jenkins/build.sh

:generateBuildProperties
:processResources
:classes
:jar
:startScripts
:distTar
:distZip
:assemble
:compileJmhJavaNote: /home/jenkins/jenkins-slave/workspace/AuroraBot/src/jmh/java/org/apache/aurora/benchmark/fakes/FakeSchedulerDriver.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.

:processJmhResources UP-TO-DATE
:jmhClasses
:checkstyleJmh
:jsHint
:checkstyleMain
:compileTestJava/home/jenkins/jenkins-slave/workspace/AuroraBot/src/test/java/org/apache/aurora/scheduler/thrift/aop/MockDecoratedThrift.java:38: Note: Wrote forwarder org.apache.aurora.scheduler.thrift.aop.MockDecoratedThriftForwarder
@Forward(AnnotatedAuroraAdmin.class)
^
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.

:processTestResources
:testClasses
:checkstyleTest[ant:checkstyle] [ERROR] /home/jenkins/jenkins-slave/workspace/AuroraBot/src/test/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImplTest.java:379: Line is longer than 100 characters (found 101). [LineLength]
[ant:checkstyle] [ERROR] /home/jenkins/jenkins-slave/workspace/AuroraBot/src/test/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImplTest.java:405: Line is longer than 100 characters (found 106). [LineLength]
 FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':checkstyleTest'.
> Checkstyle rule violations were found. See the report at: file:///home/jenkins/jenkins-slave/workspace/AuroraBot/dist/reports/checkstyle/test.html

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output.

BUILD FAILED

Total time: 1 mins 45.207 secs


I will refresh this build result if you post a review containing "@ReviewBot retry"

- Aurora ReviewBot


On March 31, 2017, 12:22 a.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58053/
> -----------------------------------------------------------
> 
> (Updated March 31, 2017, 12:22 a.m.)
> 
> 
> Review request for Aurora and Stephan Erb.
> 
> 
> Bugs: AURORA-1911
>     https://issues.apache.org/jira/browse/AURORA-1911
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> As noted in AURORA-1911 the `V1Mesos` driver doesn't re try `SUBSCRIBE` calls if they fail. This means that after a leader subscribes and disconnects, it is possible for it to never re subscribe again if the Mesos Master is unhealthy.
> 
> To fix this, I have moved the subscription into the dedicated `SchedulerExecutor` and it coninutes to attempt to subscribe using truncated binary backoff. It only stops if we are disconnected or if we sucessfully connect.
> 
> 
> Diffs
> -----
> 
>   src/jmh/java/org/apache/aurora/benchmark/StatusUpdateBenchmark.java 206b11458da2b0f938f0fcab5e5d3259a88ac9ee 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 5bf1e4e8c46044cb69b266cd203b5ec2f8b9ab61 
>   src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverModule.java 10d4f1b515b91d85b283cb7c655275c22fb133f9 
>   src/main/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImpl.java 67d356ab66c926a3b56860b906a453d57d6b694d 
>   src/test/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImplTest.java 756d0d9e30a447f9fba75c1c60f2f2f3c610399b 
> 
> 
> Diff: https://reviews.apache.org/r/58053/diff/3/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Re: Review Request 58053: Reliably subscribe to Mesos in the HTTP Driver.

Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58053/#review170663
-----------------------------------------------------------


Ship it!




Master (076d917) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot retry"

- Aurora ReviewBot


On March 31, 2017, 12:45 a.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58053/
> -----------------------------------------------------------
> 
> (Updated March 31, 2017, 12:45 a.m.)
> 
> 
> Review request for Aurora and Stephan Erb.
> 
> 
> Bugs: AURORA-1911
>     https://issues.apache.org/jira/browse/AURORA-1911
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> As noted in AURORA-1911 the `V1Mesos` driver doesn't re try `SUBSCRIBE` calls if they fail. This means that after a leader subscribes and disconnects, it is possible for it to never re subscribe again if the Mesos Master is unhealthy.
> 
> To fix this, I have moved the subscription into the dedicated `SchedulerExecutor` and it coninutes to attempt to subscribe using truncated binary backoff. It only stops if we are disconnected or if we sucessfully connect.
> 
> 
> Diffs
> -----
> 
>   src/jmh/java/org/apache/aurora/benchmark/StatusUpdateBenchmark.java 206b11458da2b0f938f0fcab5e5d3259a88ac9ee 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 5bf1e4e8c46044cb69b266cd203b5ec2f8b9ab61 
>   src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverModule.java 10d4f1b515b91d85b283cb7c655275c22fb133f9 
>   src/main/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImpl.java 67d356ab66c926a3b56860b906a453d57d6b694d 
>   src/test/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImplTest.java 756d0d9e30a447f9fba75c1c60f2f2f3c610399b 
> 
> 
> Diff: https://reviews.apache.org/r/58053/diff/4/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Re: Review Request 58053: Reliably subscribe to Mesos in the HTTP Driver.

Posted by Stephan Erb <se...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58053/#review170932
-----------------------------------------------------------


Ship it!




Ship It!

- Stephan Erb


On March 31, 2017, 2:45 a.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58053/
> -----------------------------------------------------------
> 
> (Updated March 31, 2017, 2:45 a.m.)
> 
> 
> Review request for Aurora and Stephan Erb.
> 
> 
> Bugs: AURORA-1911
>     https://issues.apache.org/jira/browse/AURORA-1911
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> As noted in AURORA-1911 the `V1Mesos` driver doesn't re try `SUBSCRIBE` calls if they fail. This means that after a leader subscribes and disconnects, it is possible for it to never re subscribe again if the Mesos Master is unhealthy.
> 
> To fix this, I have moved the subscription into the dedicated `SchedulerExecutor` and it coninutes to attempt to subscribe using truncated binary backoff. It only stops if we are disconnected or if we sucessfully connect.
> 
> 
> Diffs
> -----
> 
>   src/jmh/java/org/apache/aurora/benchmark/StatusUpdateBenchmark.java 206b11458da2b0f938f0fcab5e5d3259a88ac9ee 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 5bf1e4e8c46044cb69b266cd203b5ec2f8b9ab61 
>   src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverModule.java 10d4f1b515b91d85b283cb7c655275c22fb133f9 
>   src/main/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImpl.java 67d356ab66c926a3b56860b906a453d57d6b694d 
>   src/test/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImplTest.java 756d0d9e30a447f9fba75c1c60f2f2f3c610399b 
> 
> 
> Diff: https://reviews.apache.org/r/58053/diff/4/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Re: Review Request 58053: Reliably subscribe to Mesos in the HTTP Driver.

Posted by David McLaughlin <da...@dmclaughlin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58053/#review170720
-----------------------------------------------------------


Ship it!




Ship It!

- David McLaughlin


On March 31, 2017, 12:45 a.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58053/
> -----------------------------------------------------------
> 
> (Updated March 31, 2017, 12:45 a.m.)
> 
> 
> Review request for Aurora and Stephan Erb.
> 
> 
> Bugs: AURORA-1911
>     https://issues.apache.org/jira/browse/AURORA-1911
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> As noted in AURORA-1911 the `V1Mesos` driver doesn't re try `SUBSCRIBE` calls if they fail. This means that after a leader subscribes and disconnects, it is possible for it to never re subscribe again if the Mesos Master is unhealthy.
> 
> To fix this, I have moved the subscription into the dedicated `SchedulerExecutor` and it coninutes to attempt to subscribe using truncated binary backoff. It only stops if we are disconnected or if we sucessfully connect.
> 
> 
> Diffs
> -----
> 
>   src/jmh/java/org/apache/aurora/benchmark/StatusUpdateBenchmark.java 206b11458da2b0f938f0fcab5e5d3259a88ac9ee 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 5bf1e4e8c46044cb69b266cd203b5ec2f8b9ab61 
>   src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverModule.java 10d4f1b515b91d85b283cb7c655275c22fb133f9 
>   src/main/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImpl.java 67d356ab66c926a3b56860b906a453d57d6b694d 
>   src/test/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImplTest.java 756d0d9e30a447f9fba75c1c60f2f2f3c610399b 
> 
> 
> Diff: https://reviews.apache.org/r/58053/diff/4/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Re: Review Request 58053: Reliably subscribe to Mesos in the HTTP Driver.

Posted by Zameer Manji <zm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58053/
-----------------------------------------------------------

(Updated March 30, 2017, 5:45 p.m.)


Review request for Aurora and Stephan Erb.


Changes
-------

feedback.


Bugs: AURORA-1911
    https://issues.apache.org/jira/browse/AURORA-1911


Repository: aurora


Description
-------

As noted in AURORA-1911 the `V1Mesos` driver doesn't re try `SUBSCRIBE` calls if they fail. This means that after a leader subscribes and disconnects, it is possible for it to never re subscribe again if the Mesos Master is unhealthy.

To fix this, I have moved the subscription into the dedicated `SchedulerExecutor` and it coninutes to attempt to subscribe using truncated binary backoff. It only stops if we are disconnected or if we sucessfully connect.


Diffs (updated)
-----

  src/jmh/java/org/apache/aurora/benchmark/StatusUpdateBenchmark.java 206b11458da2b0f938f0fcab5e5d3259a88ac9ee 
  src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 5bf1e4e8c46044cb69b266cd203b5ec2f8b9ab61 
  src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverModule.java 10d4f1b515b91d85b283cb7c655275c22fb133f9 
  src/main/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImpl.java 67d356ab66c926a3b56860b906a453d57d6b694d 
  src/test/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImplTest.java 756d0d9e30a447f9fba75c1c60f2f2f3c610399b 


Diff: https://reviews.apache.org/r/58053/diff/4/

Changes: https://reviews.apache.org/r/58053/diff/3-4/


Testing
-------


Thanks,

Zameer Manji


Re: Review Request 58053: Reliably subscribe to Mesos in the HTTP Driver.

Posted by Zameer Manji <zm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58053/
-----------------------------------------------------------

(Updated March 30, 2017, 5:22 p.m.)


Review request for Aurora and Stephan Erb.


Changes
-------

Feedback.


Bugs: AURORA-1911
    https://issues.apache.org/jira/browse/AURORA-1911


Repository: aurora


Description
-------

As noted in AURORA-1911 the `V1Mesos` driver doesn't re try `SUBSCRIBE` calls if they fail. This means that after a leader subscribes and disconnects, it is possible for it to never re subscribe again if the Mesos Master is unhealthy.

To fix this, I have moved the subscription into the dedicated `SchedulerExecutor` and it coninutes to attempt to subscribe using truncated binary backoff. It only stops if we are disconnected or if we sucessfully connect.


Diffs (updated)
-----

  src/jmh/java/org/apache/aurora/benchmark/StatusUpdateBenchmark.java 206b11458da2b0f938f0fcab5e5d3259a88ac9ee 
  src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 5bf1e4e8c46044cb69b266cd203b5ec2f8b9ab61 
  src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverModule.java 10d4f1b515b91d85b283cb7c655275c22fb133f9 
  src/main/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImpl.java 67d356ab66c926a3b56860b906a453d57d6b694d 
  src/test/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImplTest.java 756d0d9e30a447f9fba75c1c60f2f2f3c610399b 


Diff: https://reviews.apache.org/r/58053/diff/3/

Changes: https://reviews.apache.org/r/58053/diff/2-3/


Testing
-------


Thanks,

Zameer Manji


Re: Review Request 58053: Reliably subscribe to Mesos in the HTTP Driver.

Posted by Zameer Manji <zm...@apache.org>.

> On March 30, 2017, 4:08 p.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImpl.java
> > Lines 168 (patched)
> > <https://reviews.apache.org/r/58053/diff/2/?file=1681136#file1681136line168>
> >
> >     Since it has to be true to execute this code block, isn't this redundant?

Yes, this is an error.

I have fixed it and added a full test case for this scenario.


- Zameer


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58053/#review170655
-----------------------------------------------------------


On March 30, 2017, 11:37 a.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58053/
> -----------------------------------------------------------
> 
> (Updated March 30, 2017, 11:37 a.m.)
> 
> 
> Review request for Aurora and Stephan Erb.
> 
> 
> Bugs: AURORA-1911
>     https://issues.apache.org/jira/browse/AURORA-1911
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> As noted in AURORA-1911 the `V1Mesos` driver doesn't re try `SUBSCRIBE` calls if they fail. This means that after a leader subscribes and disconnects, it is possible for it to never re subscribe again if the Mesos Master is unhealthy.
> 
> To fix this, I have moved the subscription into the dedicated `SchedulerExecutor` and it coninutes to attempt to subscribe using truncated binary backoff. It only stops if we are disconnected or if we sucessfully connect.
> 
> 
> Diffs
> -----
> 
>   src/jmh/java/org/apache/aurora/benchmark/StatusUpdateBenchmark.java 206b11458da2b0f938f0fcab5e5d3259a88ac9ee 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 5bf1e4e8c46044cb69b266cd203b5ec2f8b9ab61 
>   src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverModule.java 10d4f1b515b91d85b283cb7c655275c22fb133f9 
>   src/main/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImpl.java 67d356ab66c926a3b56860b906a453d57d6b694d 
>   src/test/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImplTest.java 756d0d9e30a447f9fba75c1c60f2f2f3c610399b 
> 
> 
> Diff: https://reviews.apache.org/r/58053/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Re: Review Request 58053: Reliably subscribe to Mesos in the HTTP Driver.

Posted by David McLaughlin <da...@dmclaughlin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58053/#review170655
-----------------------------------------------------------




src/main/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImpl.java
Lines 168 (patched)
<https://reviews.apache.org/r/58053/#comment243546>

    Since it has to be true to execute this code block, isn't this redundant?


- David McLaughlin


On March 30, 2017, 6:37 p.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58053/
> -----------------------------------------------------------
> 
> (Updated March 30, 2017, 6:37 p.m.)
> 
> 
> Review request for Aurora and Stephan Erb.
> 
> 
> Bugs: AURORA-1911
>     https://issues.apache.org/jira/browse/AURORA-1911
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> As noted in AURORA-1911 the `V1Mesos` driver doesn't re try `SUBSCRIBE` calls if they fail. This means that after a leader subscribes and disconnects, it is possible for it to never re subscribe again if the Mesos Master is unhealthy.
> 
> To fix this, I have moved the subscription into the dedicated `SchedulerExecutor` and it coninutes to attempt to subscribe using truncated binary backoff. It only stops if we are disconnected or if we sucessfully connect.
> 
> 
> Diffs
> -----
> 
>   src/jmh/java/org/apache/aurora/benchmark/StatusUpdateBenchmark.java 206b11458da2b0f938f0fcab5e5d3259a88ac9ee 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 5bf1e4e8c46044cb69b266cd203b5ec2f8b9ab61 
>   src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverModule.java 10d4f1b515b91d85b283cb7c655275c22fb133f9 
>   src/main/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImpl.java 67d356ab66c926a3b56860b906a453d57d6b694d 
>   src/test/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImplTest.java 756d0d9e30a447f9fba75c1c60f2f2f3c610399b 
> 
> 
> Diff: https://reviews.apache.org/r/58053/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Re: Review Request 58053: Reliably subscribe to Mesos in the HTTP Driver.

Posted by Zameer Manji <zm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58053/
-----------------------------------------------------------

(Updated March 30, 2017, 11:37 a.m.)


Review request for Aurora and Stephan Erb.


Changes
-------

Feedback.


Bugs: AURORA-1911
    https://issues.apache.org/jira/browse/AURORA-1911


Repository: aurora


Description
-------

As noted in AURORA-1911 the `V1Mesos` driver doesn't re try `SUBSCRIBE` calls if they fail. This means that after a leader subscribes and disconnects, it is possible for it to never re subscribe again if the Mesos Master is unhealthy.

To fix this, I have moved the subscription into the dedicated `SchedulerExecutor` and it coninutes to attempt to subscribe using truncated binary backoff. It only stops if we are disconnected or if we sucessfully connect.


Diffs (updated)
-----

  src/jmh/java/org/apache/aurora/benchmark/StatusUpdateBenchmark.java 206b11458da2b0f938f0fcab5e5d3259a88ac9ee 
  src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 5bf1e4e8c46044cb69b266cd203b5ec2f8b9ab61 
  src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverModule.java 10d4f1b515b91d85b283cb7c655275c22fb133f9 
  src/main/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImpl.java 67d356ab66c926a3b56860b906a453d57d6b694d 
  src/test/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImplTest.java 756d0d9e30a447f9fba75c1c60f2f2f3c610399b 


Diff: https://reviews.apache.org/r/58053/diff/2/

Changes: https://reviews.apache.org/r/58053/diff/1-2/


Testing
-------


Thanks,

Zameer Manji