You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by pmackles <gi...@git.apache.org> on 2018/04/10 14:38:31 UTC

[GitHub] spark pull request #21027: [SPARK-23943][MESOS][DEPLOY] Improve observabilit...

GitHub user pmackles opened a pull request:

    https://github.com/apache/spark/pull/21027

    [SPARK-23943][MESOS][DEPLOY] Improve observability of MesosRestServer/MesosClusterDi…

    See https://issues.apache.org/jira/browse/SPARK-23943 for details on proposed changes
    
    Tested manually on branch-2.3


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/pmackles/spark new-SPARK-23943

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21027.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21027
    
----
commit dc06283885aed247391280a12e2cca1f6c6c22ff
Author: Paul Mackles <pm...@...>
Date:   2018-04-09T15:09:34Z

    [SPARK-23943] Improve observability of MesosRestServer/MesosClusterDispatcher

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21027: [SPARK-23943][MESOS][DEPLOY] Improve observability of Me...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21027
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21027: [SPARK-23943][MESOS][DEPLOY] Improve observability of Me...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21027
  
    **[Test build #94598 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94598/testReport)** for PR 21027 at commit [`dc06283`](https://github.com/apache/spark/commit/dc06283885aed247391280a12e2cca1f6c6c22ff).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21027: [SPARK-23943][MESOS][DEPLOY] Improve observability of Me...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21027
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21027: [SPARK-23943][MESOS][DEPLOY] Improve observabilit...

Posted by tnachen <gi...@git.apache.org>.
Github user tnachen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21027#discussion_r208467704
  
    --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala ---
    @@ -160,7 +161,10 @@ trait MesosSchedulerUtils extends Logging {
                   logError("driver.run() failed", e)
                   error = Some(e)
                   markErr()
    -          }
    +          } finally {
    +            logWarning("schedulerDriver stopped")
    +            schedulerDriverStopped.set(true)
    +        }
    --- End diff --
    
    Fix indent


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21027: [SPARK-23943][MESOS][DEPLOY] Improve observability of Me...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/21027
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21027: [SPARK-23943][MESOS][DEPLOY] Improve observabilit...

Posted by tnachen <gi...@git.apache.org>.
Github user tnachen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21027#discussion_r208468846
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala ---
    @@ -63,6 +63,8 @@ private[spark] abstract class RestSubmissionServer(
         s"$baseContext/create/*" -> submitRequestServlet,
         s"$baseContext/kill/*" -> killRequestServlet,
         s"$baseContext/status/*" -> statusRequestServlet,
    +    "/health" -> new ServerStatusServlet(this),
    +    "/status" -> new ServerStatusServlet(this),
    --- End diff --
    
    Also, what is the intended user for this information? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21027: [SPARK-23943][MESOS][DEPLOY] Improve observability of Me...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21027
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21027: [SPARK-23943][MESOS][DEPLOY] Improve observability of Me...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21027
  
    **[Test build #92794 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92794/testReport)** for PR 21027 at commit [`dc06283`](https://github.com/apache/spark/commit/dc06283885aed247391280a12e2cca1f6c6c22ff).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21027: [SPARK-23943][MESOS][DEPLOY] Improve observability of Me...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/21027
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21027: [SPARK-23943][MESOS][DEPLOY] Improve observabilit...

Posted by tnachen <gi...@git.apache.org>.
Github user tnachen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21027#discussion_r208468697
  
    --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/deploy/rest/mesos/MesosRestServer.scala ---
    @@ -50,6 +50,24 @@ private[spark] class MesosRestServer(
         new MesosKillRequestServlet(scheduler, masterConf)
       protected override val statusRequestServlet =
         new MesosStatusRequestServlet(scheduler, masterConf)
    +
    +  override def isServerHealthy(): Boolean = !scheduler.isSchedulerDriverStopped()
    +
    +  override def serverStatus(): ServerStatusResponse = {
    +    val s = new ServerStatusResponse
    +    s.schedulerDriverStopped = scheduler.isSchedulerDriverStopped()
    +    s.queuedDrivers = scheduler.getQueuedDriversSize
    +    s.launchedDrivers = scheduler.getLaunchedDriversSize
    +    s.pendingRetryDrivers = scheduler.getPendingRetryDriversSize
    +    s.success = true
    +    s.message = "iamok"
    --- End diff --
    
    How about leaving this blank?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21027: [SPARK-23943][MESOS][DEPLOY] Improve observabilit...

Posted by tnachen <gi...@git.apache.org>.
Github user tnachen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21027#discussion_r208468305
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala ---
    @@ -63,6 +63,8 @@ private[spark] abstract class RestSubmissionServer(
         s"$baseContext/create/*" -> submitRequestServlet,
         s"$baseContext/kill/*" -> killRequestServlet,
         s"$baseContext/status/*" -> statusRequestServlet,
    +    "/health" -> new ServerStatusServlet(this),
    --- End diff --
    
    This impacts the Rest submission server in general too, I do like the idea to provide an endpoint to get status but I'm not sure this is a paradigm that Spark is going for. I know the common pattern is to poll Spark metrics to understand status of the components. @felixcheung do you have thoughts around this?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21027: [SPARK-23943][MESOS][DEPLOY] Improve observability of Me...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/21027
  
    @tnachen 



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21027: [SPARK-23943][MESOS][DEPLOY] Improve observability of Me...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21027
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21027: [SPARK-23943][MESOS][DEPLOY] Improve observability of Me...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21027
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92794/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21027: [SPARK-23943][MESOS][DEPLOY] Improve observability of Me...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21027
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21027: [SPARK-23943][MESOS][DEPLOY] Improve observability of Me...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21027
  
    **[Test build #94598 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94598/testReport)** for PR 21027 at commit [`dc06283`](https://github.com/apache/spark/commit/dc06283885aed247391280a12e2cca1f6c6c22ff).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21027: [SPARK-23943][MESOS][DEPLOY] Improve observability of Me...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21027
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94598/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21027: [SPARK-23943][MESOS][DEPLOY] Improve observability of Me...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21027
  
    **[Test build #92794 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92794/testReport)** for PR 21027 at commit [`dc06283`](https://github.com/apache/spark/commit/dc06283885aed247391280a12e2cca1f6c6c22ff).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21027: [SPARK-23943][MESOS][DEPLOY] Improve observability of Me...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21027
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21027: [SPARK-23943][MESOS][DEPLOY] Improve observabilit...

Posted by tnachen <gi...@git.apache.org>.
Github user tnachen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21027#discussion_r208468594
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala ---
    @@ -331,3 +345,15 @@ private class ErrorServlet extends RestServlet {
         sendResponse(error, response)
       }
     }
    +
    +private class ServerStatusServlet(server: RestSubmissionServer) extends RestServlet {
    +  override def doGet(req: HttpServletRequest, resp: HttpServletResponse): Unit = {
    +    val path = req.getRequestURI
    +    if (!server.isServerHealthy() && path == "/health") {
    --- End diff --
    
    I would switch the order (check path first). However, from this logic, if server is healthy and request is asking for /health, it will return status instead? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org