You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by pmackles <gi...@git.apache.org> on 2018/04/10 14:38:31 UTC
[GitHub] spark pull request #21027: [SPARK-23943][MESOS][DEPLOY] Improve observabilit...
GitHub user pmackles opened a pull request:
https://github.com/apache/spark/pull/21027
[SPARK-23943][MESOS][DEPLOY] Improve observability of MesosRestServer/MesosClusterDi…
See https://issues.apache.org/jira/browse/SPARK-23943 for details on proposed changes
Tested manually on branch-2.3
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/pmackles/spark new-SPARK-23943
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21027.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21027
----
commit dc06283885aed247391280a12e2cca1f6c6c22ff
Author: Paul Mackles <pm...@...>
Date: 2018-04-09T15:09:34Z
[SPARK-23943] Improve observability of MesosRestServer/MesosClusterDispatcher
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21027: [SPARK-23943][MESOS][DEPLOY] Improve observability of Me...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21027
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21027: [SPARK-23943][MESOS][DEPLOY] Improve observability of Me...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21027
**[Test build #94598 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94598/testReport)** for PR 21027 at commit [`dc06283`](https://github.com/apache/spark/commit/dc06283885aed247391280a12e2cca1f6c6c22ff).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21027: [SPARK-23943][MESOS][DEPLOY] Improve observability of Me...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21027
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21027: [SPARK-23943][MESOS][DEPLOY] Improve observabilit...
Posted by tnachen <gi...@git.apache.org>.
Github user tnachen commented on a diff in the pull request:
https://github.com/apache/spark/pull/21027#discussion_r208467704
--- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala ---
@@ -160,7 +161,10 @@ trait MesosSchedulerUtils extends Logging {
logError("driver.run() failed", e)
error = Some(e)
markErr()
- }
+ } finally {
+ logWarning("schedulerDriver stopped")
+ schedulerDriverStopped.set(true)
+ }
--- End diff --
Fix indent
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21027: [SPARK-23943][MESOS][DEPLOY] Improve observability of Me...
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:
https://github.com/apache/spark/pull/21027
ok to test
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21027: [SPARK-23943][MESOS][DEPLOY] Improve observabilit...
Posted by tnachen <gi...@git.apache.org>.
Github user tnachen commented on a diff in the pull request:
https://github.com/apache/spark/pull/21027#discussion_r208468846
--- Diff: core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala ---
@@ -63,6 +63,8 @@ private[spark] abstract class RestSubmissionServer(
s"$baseContext/create/*" -> submitRequestServlet,
s"$baseContext/kill/*" -> killRequestServlet,
s"$baseContext/status/*" -> statusRequestServlet,
+ "/health" -> new ServerStatusServlet(this),
+ "/status" -> new ServerStatusServlet(this),
--- End diff --
Also, what is the intended user for this information?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21027: [SPARK-23943][MESOS][DEPLOY] Improve observability of Me...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21027
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21027: [SPARK-23943][MESOS][DEPLOY] Improve observability of Me...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21027
**[Test build #92794 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92794/testReport)** for PR 21027 at commit [`dc06283`](https://github.com/apache/spark/commit/dc06283885aed247391280a12e2cca1f6c6c22ff).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21027: [SPARK-23943][MESOS][DEPLOY] Improve observability of Me...
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:
https://github.com/apache/spark/pull/21027
ok to test
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21027: [SPARK-23943][MESOS][DEPLOY] Improve observabilit...
Posted by tnachen <gi...@git.apache.org>.
Github user tnachen commented on a diff in the pull request:
https://github.com/apache/spark/pull/21027#discussion_r208468697
--- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/deploy/rest/mesos/MesosRestServer.scala ---
@@ -50,6 +50,24 @@ private[spark] class MesosRestServer(
new MesosKillRequestServlet(scheduler, masterConf)
protected override val statusRequestServlet =
new MesosStatusRequestServlet(scheduler, masterConf)
+
+ override def isServerHealthy(): Boolean = !scheduler.isSchedulerDriverStopped()
+
+ override def serverStatus(): ServerStatusResponse = {
+ val s = new ServerStatusResponse
+ s.schedulerDriverStopped = scheduler.isSchedulerDriverStopped()
+ s.queuedDrivers = scheduler.getQueuedDriversSize
+ s.launchedDrivers = scheduler.getLaunchedDriversSize
+ s.pendingRetryDrivers = scheduler.getPendingRetryDriversSize
+ s.success = true
+ s.message = "iamok"
--- End diff --
How about leaving this blank?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21027: [SPARK-23943][MESOS][DEPLOY] Improve observabilit...
Posted by tnachen <gi...@git.apache.org>.
Github user tnachen commented on a diff in the pull request:
https://github.com/apache/spark/pull/21027#discussion_r208468305
--- Diff: core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala ---
@@ -63,6 +63,8 @@ private[spark] abstract class RestSubmissionServer(
s"$baseContext/create/*" -> submitRequestServlet,
s"$baseContext/kill/*" -> killRequestServlet,
s"$baseContext/status/*" -> statusRequestServlet,
+ "/health" -> new ServerStatusServlet(this),
--- End diff --
This impacts the Rest submission server in general too, I do like the idea to provide an endpoint to get status but I'm not sure this is a paradigm that Spark is going for. I know the common pattern is to poll Spark metrics to understand status of the components. @felixcheung do you have thoughts around this?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21027: [SPARK-23943][MESOS][DEPLOY] Improve observability of Me...
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:
https://github.com/apache/spark/pull/21027
@tnachen
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21027: [SPARK-23943][MESOS][DEPLOY] Improve observability of Me...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21027
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21027: [SPARK-23943][MESOS][DEPLOY] Improve observability of Me...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21027
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92794/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21027: [SPARK-23943][MESOS][DEPLOY] Improve observability of Me...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21027
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21027: [SPARK-23943][MESOS][DEPLOY] Improve observability of Me...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21027
**[Test build #94598 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94598/testReport)** for PR 21027 at commit [`dc06283`](https://github.com/apache/spark/commit/dc06283885aed247391280a12e2cca1f6c6c22ff).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21027: [SPARK-23943][MESOS][DEPLOY] Improve observability of Me...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21027
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94598/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21027: [SPARK-23943][MESOS][DEPLOY] Improve observability of Me...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21027
**[Test build #92794 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92794/testReport)** for PR 21027 at commit [`dc06283`](https://github.com/apache/spark/commit/dc06283885aed247391280a12e2cca1f6c6c22ff).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21027: [SPARK-23943][MESOS][DEPLOY] Improve observability of Me...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21027
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21027: [SPARK-23943][MESOS][DEPLOY] Improve observabilit...
Posted by tnachen <gi...@git.apache.org>.
Github user tnachen commented on a diff in the pull request:
https://github.com/apache/spark/pull/21027#discussion_r208468594
--- Diff: core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala ---
@@ -331,3 +345,15 @@ private class ErrorServlet extends RestServlet {
sendResponse(error, response)
}
}
+
+private class ServerStatusServlet(server: RestSubmissionServer) extends RestServlet {
+ override def doGet(req: HttpServletRequest, resp: HttpServletResponse): Unit = {
+ val path = req.getRequestURI
+ if (!server.isServerHealthy() && path == "/health") {
--- End diff --
I would switch the order (check path first). However, from this logic, if server is healthy and request is asking for /health, it will return status instead?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org