You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@aurora.apache.org by Aurora ReviewBot <wf...@apache.org> on 2017/09/19 05:43:55 UTC

Re: Review Request 62397: Replica Hot Standby

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62397/#review185645
-----------------------------------------------------------



Master (96a5287) is red with this patch.
  ./build-support/jenkins/build.sh

:commons-args:jar
:commons:compileJavaNote: Writing file:/home/jenkins/jenkins-slave/workspace/AuroraBot/commons/dist/classes/main/org/apache/aurora/common/args/apt/cmdline.arg.info.txt.1
Note: Writing file:/home/jenkins/jenkins-slave/workspace/AuroraBot/commons/dist/classes/main/META-INF/compiler/resource-mappings/org.apache.aurora.common.args.apt.CmdLineProcessor
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

:commons:generateThriftResources
:commons:processResources
:commons:classes
:commons:jar
:compileJava
Download https://repo1.maven.org/maven2/com/h2database/h2/1.4.196/h2-1.4.196.pom
Download https://repo1.maven.org/maven2/com/h2database/h2/1.4.196/h2-1.4.196.jar
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/storage/log/WriteAheadStorage.java:74: Note: Wrote forwarder org.apache.aurora.scheduler.storage.log.WriteAheadStorageForwarder
@Forward({
^
Note: Writing file:/home/jenkins/jenkins-slave/workspace/AuroraBot/dist/classes/main/org/apache/aurora/common/args/apt/cmdline.arg.info.txt.2
Note: Writing file:/home/jenkins/jenkins-slave/workspace/AuroraBot/dist/classes/main/META-INF/compiler/resource-mappings/org.apache.aurora.common.args.apt.CmdLineProcessor
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLogStreamModule.java:220: error: cannot find symbol
        return reader.catchup(timeout, unit);
                     ^
  symbol:   method catchup(long,TimeUnit)
  location: variable reader of type Reader
1 error
:compileJava FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':compileJava'.
> Compilation failed; see the compiler error output for details.

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output.

BUILD FAILED

Total time: 2 mins 44.87 secs


I will refresh this build result if you post a review containing "@ReviewBot retry"

- Aurora ReviewBot


On Sept. 19, 2017, 5:32 a.m., Jordan Ly wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62397/
> -----------------------------------------------------------
> 
> (Updated Sept. 19, 2017, 5:32 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> This patch implements the option to turn on 'Replica Hot Standby', where Scheduler replicas will be able to keep their volatile storage up to date throughout normal operation. The main motivation behind this change is to reduce failover time. If the leader fails over, the elected replica will be able to serve production traffic much quicker as it has to rebuild less state. However, this change enables future work such as snapshots on replicas and serving traffic from replicas.
> 
> There have been several discussions around this feature: https://lists.apache.org/thread.html/e31d7dbcb054ed570f969ae2043eadfc090383edfe0751cec59b29d3@%3Cdev.aurora.apache.org%3E
> https://lists.apache.org/thread.html/e31d7dbcb054ed570f969ae2043eadfc090383edfe0751cec59b29d3@%3Cdev.aurora.apache.org%3E
> 
> Culminating in a design doc:
> https://docs.google.com/document/d/1DOtKA4-vrtxat1MaUYMQ6Y1iXhA8ob6Mfztzt-R1Oss/edit#heading=h.gjdgxs
> 
> The related Mesos patch can be found here:
> https://reviews.apache.org/r/62288/
> 
> 
> Diffs
> -----
> 
>   config/legacy_untested_classes.txt ec3e934b2e0510b9339ac71182b78546cac0e7eb 
>   src/main/java/org/apache/aurora/scheduler/log/Log.java dc77eb435e5f8fdce56727a2f679e8e1907e977c 
>   src/main/java/org/apache/aurora/scheduler/log/mesos/LogInterface.java b0a7939131e1a3dceaf9635aec6746a5cd7ad394 
>   src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLog.java 21855e184fe20dc339713978b32344b6950701ec 
>   src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLogStreamModule.java 6704a328a4023a178ed8f86ae4772cb04eb2fa8e 
>   src/main/java/org/apache/aurora/scheduler/storage/CallOrderEnforcingStorage.java 2a5ec9c912979811c4badeee9362c22184d9cbbf 
>   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java 387350c7667a5fb8ee674ad0d3dd17529232b25b 
>   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 835f1604c0c5d913a87d570ee01d053bbbf92ecb 
>   src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java ea147c0ba6aaa6d113144be0a8330bd2f73d2453 
>   src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java baf2647c54f1f9918139584e5151873a3853e83c 
>   src/test/java/org/apache/aurora/scheduler/app/SchedulerIT.java a7c9c83eebbbea7ae8a6c807f98d3ce8bd050137 
>   src/test/java/org/apache/aurora/scheduler/log/mesos/MesosLogTest.java f142f545799d64f9352b0ac6c51942eedf5e9ced 
>   src/test/java/org/apache/aurora/scheduler/storage/log/LogManagerTest.java 3f445595a81a5655c6c486791a9b55d8dc7f2f76 
>   src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 0eb54fdaddfbc2af76fd83ffee18ce4c6b61cc48 
> 
> 
> Diff: https://reviews.apache.org/r/62397/diff/1/
> 
> 
> Testing
> -------
> 
> Added unit tests.
> 
> The current version of Mesos does not have the `catchup` function so the tests will fail CI. However, they work on my box if I manually build Mesos with the API change and add it to the dependencies.
> 
> I have had this patch running on a test cluster for the past couple of weeks. There are a few small issues to work out around catchup failure, but it is generally stable.
> 
> Initial (ad-hoc) observations show an improvement in 'scheduler_storage_start' from 70-200 seconds to 5-80 seconds, depending on if the failover occured immediately after a snapshot. I will compile more comprehensive statistics around the results later (ex. time from scheduler disconnect to new scheduler being elected and serving traffic).
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>