You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by vanzin <gi...@git.apache.org> on 2017/11/08 19:03:35 UTC

[GitHub] spark pull request #19698: [SPARK-20648][core] Port JobsTab and StageTab to ...

GitHub user vanzin opened a pull request:

    https://github.com/apache/spark/pull/19698

    [SPARK-20648][core] Port JobsTab and StageTab to the new UI backend.

    This change is a little larger because there's a whole lot of logic
    behind these pages, all really tied to internal types and listeners,
    and some of that logic had to be implemented in the new listener and
    the needed data exposed through the API types.
    
    - Added missing StageData and ExecutorStageSummary fields which are
      used by the UI. Some json golden files needed to be updated to account
      for new fields.
    
    - Save RDD graph data in the store. This tries to re-use existing types as
      much as possible, so that the code doesn't need to be re-written. So it's
      probably not very optimal.
    
    - Some old classes (e.g. JobProgressListener) still remain, since they're used
      in other parts of the code; they're not used by the UI anymore, though, and
      will be cleaned up in a separate change.
    
    - Save information about active pools in the store. This data is not really used
      in the SHS, but it's not a lot of data so it's still recorded when replaying
      applications.
    
    - Because the new store sorts things slightly differently from the previous
      code, some json golden files had some elements within them shuffled around.
    
    - The retention unit test in UISeleniumSuite was disabled because the code
      to throw away old stages / tasks hasn't been added yet.
    
    - The job description field in the API tries to follow the old behavior, which
      makes it be empty most of the time, even though there's information to fill it
      in. For stages, a new field was added to hold the description (which is basically
      the job description), so that the UI can be rendered in the old way.
    
    - A new stage status ("SKIPPED") was added to account for the fact that the API
      couldn't represent that state before. Without this, the stage would show up as
      "PENDING" in the UI, which is now based on API types.
    
    - The API used to expose "executorRunTime" as the value of the task's duration,
      which wasn't really correct (also because that value was easily available
      from the metrics object); this change fixes that by storing the correct duration,
      which also means a few expectation files needed to be updated to account for
      the new durations and sorting differences due to the changed values.
    
    - Added changes to implement SPARK-20713 and SPARK-21922 in the new code.
    
    Tested with existing unit tests (and by using the UI a lot).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vanzin/spark SPARK-20648

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19698.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19698
    
----
commit a22c45889d8fc0982caf4325eb729048537872bb
Author: Marcelo Vanzin <va...@cloudera.com>
Date:   2017-01-31T21:31:55Z

    [SPARK-20648][core] Port JobsTab and StageTab to the new UI backend.
    
    This change is a little larger because there's a whole lot of logic
    behind these pages, all really tied to internal types and listeners,
    and some of that logic had to be implemented in the new listener and
    the needed data exposed through the API types.
    
    - Added missing StageData and ExecutorStageSummary fields which are
      used by the UI. Some json golden files needed to be updated to account
      for new fields.
    
    - Save RDD graph data in the store. This tries to re-use existing types as
      much as possible, so that the code doesn't need to be re-written. So it's
      probably not very optimal.
    
    - Some old classes (e.g. JobProgressListener) still remain, since they're used
      in other parts of the code; they're not used by the UI anymore, though, and
      will be cleaned up in a separate change.
    
    - Save information about active pools in the store. This data is not really used
      in the SHS, but it's not a lot of data so it's still recorded when replaying
      applications.
    
    - Because the new store sorts things slightly differently from the previous
      code, some json golden files had some elements within them shuffled around.
    
    - The retention unit test in UISeleniumSuite was disabled because the code
      to throw away old stages / tasks hasn't been added yet.
    
    - The job description field in the API tries to follow the old behavior, which
      makes it be empty most of the time, even though there's information to fill it
      in. For stages, a new field was added to hold the description (which is basically
      the job description), so that the UI can be rendered in the old way.
    
    - A new stage status ("SKIPPED") was added to account for the fact that the API
      couldn't represent that state before. Without this, the stage would show up as
      "PENDING" in the UI, which is now based on API types.
    
    - The API used to expose "executorRunTime" as the value of the task's duration,
      which wasn't really correct (also because that value was easily available
      from the metrics object); this change fixes that by storing the correct duration,
      which also means a few expectation files needed to be updated to account for
      the new durations and sorting differences due to the changed values.
    
    - Added changes to implement SPARK-20713 and SPARK-21922 in the new code.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19698: [SPARK-20648][core] Port JobsTab and StageTab to ...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19698#discussion_r150573812
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala ---
    @@ -285,106 +272,120 @@ private[ui] class AllJobsPage(parent: JobsTab) extends WebUIPage("") {
       }
     
       def render(request: HttpServletRequest): Seq[Node] = {
    -    val listener = parent.jobProgresslistener
    -    listener.synchronized {
    -      val startTime = listener.startTime
    -      val endTime = listener.endTime
    -      val activeJobs = listener.activeJobs.values.toSeq
    -      val completedJobs = listener.completedJobs.reverse
    -      val failedJobs = listener.failedJobs.reverse
    -
    -      val activeJobsTable =
    -        jobsTable(request, "active", "activeJob", activeJobs, killEnabled = parent.killEnabled)
    -      val completedJobsTable =
    -        jobsTable(request, "completed", "completedJob", completedJobs, killEnabled = false)
    -      val failedJobsTable =
    -        jobsTable(request, "failed", "failedJob", failedJobs, killEnabled = false)
    -
    -      val shouldShowActiveJobs = activeJobs.nonEmpty
    -      val shouldShowCompletedJobs = completedJobs.nonEmpty
    -      val shouldShowFailedJobs = failedJobs.nonEmpty
    -
    -      val completedJobNumStr = if (completedJobs.size == listener.numCompletedJobs) {
    -        s"${completedJobs.size}"
    -      } else {
    -        s"${listener.numCompletedJobs}, only showing ${completedJobs.size}"
    +    val appInfo = store.applicationInfo()
    +    val startTime = appInfo.attempts.head.startTime.getTime()
    +    val endTime = appInfo.attempts.head.endTime.getTime()
    +
    +    val activeJobs = new ListBuffer[v1.JobData]()
    +    val _completedJobs = new ListBuffer[v1.JobData]()
    +    val _failedJobs = new ListBuffer[v1.JobData]()
    +
    +    store.jobsList(null).foreach { job =>
    +      job.status match {
    +        case JobExecutionStatus.SUCCEEDED =>
    +          _completedJobs += job
    +        case JobExecutionStatus.FAILED =>
    +          _failedJobs += job
    +        case _ =>
    +          activeJobs += job
           }
    +    }
     
    -      val summary: NodeSeq =
    -        <div>
    -          <ul class="unstyled">
    -            <li>
    -              <strong>User:</strong>
    -              {parent.getSparkUser}
    -            </li>
    -            <li>
    -              <strong>Total Uptime:</strong>
    -              {
    -                if (endTime < 0 && parent.sc.isDefined) {
    -                  UIUtils.formatDuration(System.currentTimeMillis() - startTime)
    -                } else if (endTime > 0) {
    -                  UIUtils.formatDuration(endTime - startTime)
    -                }
    -              }
    -            </li>
    -            <li>
    -              <strong>Scheduling Mode: </strong>
    -              {listener.schedulingMode.map(_.toString).getOrElse("Unknown")}
    -            </li>
    +    val completedJobs = _completedJobs.toSeq.reverse
    +    val failedJobs = _failedJobs.toSeq.reverse
    --- End diff --
    
    actually is the `reverse` necessary at all?  seems if you trace through, only goes to `JobsDataSource`, where its sorted anyway


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19698: [SPARK-20648][core] Port JobsTab and StageTab to ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19698#discussion_r150648236
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala ---
    @@ -1036,8 +1006,13 @@ private[ui] class TaskDataSource(
           shuffleRead,
           shuffleWrite,
           bytesSpilled,
    -      taskData.errorMessage.getOrElse(""),
    -      logs)
    +      info.errorMessage.getOrElse(""),
    +      executorLogs(info.executorId))
    +  }
    +
    +  private def executorLogs(id: String): Map[String, String] = {
    +    executors.getOrElseUpdate(id,
    +      store.executorSummary(id).map(_.executorLogs).getOrElse(Map.empty))
    --- End diff --
    
    Pretty sure there's a separate instance of this class per request.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19698: [SPARK-20648][core] Port JobsTab and StageTab to ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19698


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19698: [SPARK-20648][core] Port JobsTab and StageTab to the new...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19698
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83606/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19698: [SPARK-20648][core] Port JobsTab and StageTab to the new...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19698
  
    **[Test build #83659 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83659/testReport)** for PR 19698 at commit [`1d7242b`](https://github.com/apache/spark/commit/1d7242b340b9525feab941c7d61a6dccb8ccc14c).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19698: [SPARK-20648][core] Port JobsTab and StageTab to the new...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19698
  
    **[Test build #83606 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83606/testReport)** for PR 19698 at commit [`a22c458`](https://github.com/apache/spark/commit/a22c45889d8fc0982caf4325eb729048537872bb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19698: [SPARK-20648][core] Port JobsTab and StageTab to the new...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19698
  
    **[Test build #83606 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83606/testReport)** for PR 19698 at commit [`a22c458`](https://github.com/apache/spark/commit/a22c45889d8fc0982caf4325eb729048537872bb).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19698: [SPARK-20648][core] Port JobsTab and StageTab to ...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19698#discussion_r150622874
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobPage.scala ---
    @@ -182,173 +185,183 @@ private[ui] class JobPage(parent: JobsTab) extends WebUIPage("job") {
       }
     
       def render(request: HttpServletRequest): Seq[Node] = {
    -    val listener = parent.jobProgresslistener
    +    // stripXSS is called first to remove suspicious characters used in XSS attacks
    +    val parameterId = UIUtils.stripXSS(request.getParameter("id"))
    +    require(parameterId != null && parameterId.nonEmpty, "Missing id parameter")
     
    -    listener.synchronized {
    -      // stripXSS is called first to remove suspicious characters used in XSS attacks
    -      val parameterId = UIUtils.stripXSS(request.getParameter("id"))
    -      require(parameterId != null && parameterId.nonEmpty, "Missing id parameter")
    -
    -      val jobId = parameterId.toInt
    -      val jobDataOption = listener.jobIdToData.get(jobId)
    -      if (jobDataOption.isEmpty) {
    -        val content =
    -          <div id="no-info">
    -            <p>No information to display for job {jobId}</p>
    -          </div>
    -        return UIUtils.headerSparkPage(
    -          s"Details for Job $jobId", content, parent)
    -      }
    -      val jobData = jobDataOption.get
    -      val isComplete = jobData.status != JobExecutionStatus.RUNNING
    -      val stages = jobData.stageIds.map { stageId =>
    -        // This could be empty if the JobProgressListener hasn't received information about the
    -        // stage or if the stage information has been garbage collected
    -        listener.stageIdToInfo.getOrElse(stageId,
    -          new StageInfo(stageId, 0, "Unknown", 0, Seq.empty, Seq.empty, "Unknown"))
    +    val jobId = parameterId.toInt
    +    val jobDataOption = Try(store.job(jobId)).toOption
    --- End diff --
    
    most places, you do an explicit try/catch, I assume because you only want to convert `NoSuchElementException` to `None`.  Does that concern not apply here?  also since you do that so much, maybe its worth a helper, so that you could use it like this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19698: [SPARK-20648][core] Port JobsTab and StageTab to ...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19698#discussion_r150655624
  
    --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala ---
    @@ -66,13 +66,8 @@ private[spark] class AppStatusStore(store: KVStore) {
         filtered.asScala.map(_.info).toSeq
       }
     
    -  def executorSummary(executorId: String): Option[v1.ExecutorSummary] = {
    -    try {
    -      Some(store.read(classOf[ExecutorSummaryWrapper], executorId).info)
    -    } catch {
    -      case _: NoSuchElementException =>
    -        None
    -    }
    +  def executorSummary(executorId: String): v1.ExecutorSummary = {
    +      store.read(classOf[ExecutorSummaryWrapper], executorId).info
    --- End diff --
    
    nit: indentation


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19698: [SPARK-20648][core] Port JobsTab and StageTab to the new...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19698
  
    **[Test build #83812 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83812/testReport)** for PR 19698 at commit [`0454ed1`](https://github.com/apache/spark/commit/0454ed1ad5bf36cf6c6f5ebdf899298daf2a50eb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19698: [SPARK-20648][core] Port JobsTab and StageTab to the new...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19698
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83659/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19698: [SPARK-20648][core] Port JobsTab and StageTab to the new...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19698
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19698: [SPARK-20648][core] Port JobsTab and StageTab to the new...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19698
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19698: [SPARK-20648][core] Port JobsTab and StageTab to the new...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19698
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83812/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19698: [SPARK-20648][core] Port JobsTab and StageTab to ...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19698#discussion_r150622074
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobPage.scala ---
    @@ -182,173 +185,183 @@ private[ui] class JobPage(parent: JobsTab) extends WebUIPage("job") {
       }
     
       def render(request: HttpServletRequest): Seq[Node] = {
    -    val listener = parent.jobProgresslistener
    +    // stripXSS is called first to remove suspicious characters used in XSS attacks
    +    val parameterId = UIUtils.stripXSS(request.getParameter("id"))
    +    require(parameterId != null && parameterId.nonEmpty, "Missing id parameter")
     
    -    listener.synchronized {
    -      // stripXSS is called first to remove suspicious characters used in XSS attacks
    -      val parameterId = UIUtils.stripXSS(request.getParameter("id"))
    -      require(parameterId != null && parameterId.nonEmpty, "Missing id parameter")
    -
    -      val jobId = parameterId.toInt
    -      val jobDataOption = listener.jobIdToData.get(jobId)
    -      if (jobDataOption.isEmpty) {
    -        val content =
    -          <div id="no-info">
    -            <p>No information to display for job {jobId}</p>
    -          </div>
    -        return UIUtils.headerSparkPage(
    -          s"Details for Job $jobId", content, parent)
    -      }
    -      val jobData = jobDataOption.get
    -      val isComplete = jobData.status != JobExecutionStatus.RUNNING
    -      val stages = jobData.stageIds.map { stageId =>
    -        // This could be empty if the JobProgressListener hasn't received information about the
    -        // stage or if the stage information has been garbage collected
    -        listener.stageIdToInfo.getOrElse(stageId,
    -          new StageInfo(stageId, 0, "Unknown", 0, Seq.empty, Seq.empty, "Unknown"))
    +    val jobId = parameterId.toInt
    +    val jobDataOption = Try(store.job(jobId)).toOption
    +    if (jobDataOption.isEmpty) {
    +      val content =
    +        <div id="no-info">
    +          <p>No information to display for job {jobId}</p>
    +        </div>
    +      return UIUtils.headerSparkPage(
    +        s"Details for Job $jobId", content, parent)
    +    }
    +    val jobData = jobDataOption.get
    +    val isComplete = jobData.status != JobExecutionStatus.RUNNING
    +    val stages = jobData.stageIds.map { stageId =>
    +      // This could be empty if the listener hasn't received information about the
    +      // stage or if the stage information has been garbage collected
    +      store.stageData(stageId).lastOption.getOrElse {
    +        new v1.StageData(
    +          v1.StageStatus.PENDING,
    +          stageId,
    +          0, 0, 0, 0, 0, 0, 0,
    +          0L, 0L, None, None, None, None,
    +          0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
    +          "Unknown",
    +          None,
    +          "Unknown",
    +          null,
    +          Nil,
    +          Nil,
    +          None,
    +          None,
    +          Map())
           }
    +    }
     
    -      val activeStages = Buffer[StageInfo]()
    -      val completedStages = Buffer[StageInfo]()
    -      // If the job is completed, then any pending stages are displayed as "skipped":
    -      val pendingOrSkippedStages = Buffer[StageInfo]()
    -      val failedStages = Buffer[StageInfo]()
    -      for (stage <- stages) {
    -        if (stage.submissionTime.isEmpty) {
    -          pendingOrSkippedStages += stage
    -        } else if (stage.completionTime.isDefined) {
    -          if (stage.failureReason.isDefined) {
    -            failedStages += stage
    -          } else {
    -            completedStages += stage
    -          }
    +    val activeStages = Buffer[v1.StageData]()
    +    val completedStages = Buffer[v1.StageData]()
    +    // If the job is completed, then any pending stages are displayed as "skipped":
    +    val pendingOrSkippedStages = Buffer[v1.StageData]()
    +    val failedStages = Buffer[v1.StageData]()
    +    for (stage <- stages) {
    +      if (stage.submissionTime.isEmpty) {
    +        pendingOrSkippedStages += stage
    +      } else if (stage.completionTime.isDefined) {
    +        if (stage.status == v1.StageStatus.FAILED) {
    +          failedStages += stage
             } else {
    -          activeStages += stage
    +          completedStages += stage
             }
    +      } else {
    +        activeStages += stage
           }
    +    }
     
    -      val basePath = "jobs/job"
    +    val basePath = "jobs/job"
     
    -      val pendingOrSkippedTableId =
    -        if (isComplete) {
    -          "pending"
    -        } else {
    -          "skipped"
    -        }
    +    val pendingOrSkippedTableId =
    +      if (isComplete) {
    +        "pending"
    +      } else {
    +        "skipped"
    +      }
     
    -      val activeStagesTable =
    -        new StageTableBase(request, activeStages, "active", "activeStage", parent.basePath,
    -          basePath, parent.jobProgresslistener, parent.isFairScheduler,
    -          killEnabled = parent.killEnabled, isFailedStage = false)
    -      val pendingOrSkippedStagesTable =
    -        new StageTableBase(request, pendingOrSkippedStages, pendingOrSkippedTableId, "pendingStage",
    -          parent.basePath, basePath, parent.jobProgresslistener, parent.isFairScheduler,
    -          killEnabled = false, isFailedStage = false)
    -      val completedStagesTable =
    -        new StageTableBase(request, completedStages, "completed", "completedStage", parent.basePath,
    -          basePath, parent.jobProgresslistener, parent.isFairScheduler,
    -          killEnabled = false, isFailedStage = false)
    -      val failedStagesTable =
    -        new StageTableBase(request, failedStages, "failed", "failedStage", parent.basePath,
    -          basePath, parent.jobProgresslistener, parent.isFairScheduler,
    -          killEnabled = false, isFailedStage = true)
    +    val activeStagesTable =
    +      new StageTableBase(store, request, activeStages, "active", "activeStage", parent.basePath,
    +        basePath, parent.isFairScheduler,
    +        killEnabled = parent.killEnabled, isFailedStage = false)
    +    val pendingOrSkippedStagesTable =
    +      new StageTableBase(store, request, pendingOrSkippedStages, pendingOrSkippedTableId,
    +        "pendingStage", parent.basePath, basePath, parent.isFairScheduler,
    +        killEnabled = false, isFailedStage = false)
    +    val completedStagesTable =
    +      new StageTableBase(store, request, completedStages, "completed", "completedStage",
    +        parent.basePath, basePath, parent.isFairScheduler,
    +        killEnabled = false, isFailedStage = false)
    +    val failedStagesTable =
    +      new StageTableBase(store, request, failedStages, "failed", "failedStage", parent.basePath,
    +        basePath, parent.isFairScheduler,
    +        killEnabled = false, isFailedStage = true)
     
    -      val shouldShowActiveStages = activeStages.nonEmpty
    -      val shouldShowPendingStages = !isComplete && pendingOrSkippedStages.nonEmpty
    -      val shouldShowCompletedStages = completedStages.nonEmpty
    -      val shouldShowSkippedStages = isComplete && pendingOrSkippedStages.nonEmpty
    -      val shouldShowFailedStages = failedStages.nonEmpty
    +    val shouldShowActiveStages = activeStages.nonEmpty
    +    val shouldShowPendingStages = !isComplete && pendingOrSkippedStages.nonEmpty
    +    val shouldShowCompletedStages = completedStages.nonEmpty
    +    val shouldShowSkippedStages = isComplete && pendingOrSkippedStages.nonEmpty
    +    val shouldShowFailedStages = failedStages.nonEmpty
     
    -      val summary: NodeSeq =
    -        <div>
    -          <ul class="unstyled">
    -            <li>
    -              <Strong>Status:</Strong>
    -              {jobData.status}
    -            </li>
    -            {
    -              if (jobData.jobGroup.isDefined) {
    -                <li>
    -                  <strong>Job Group:</strong>
    -                  {jobData.jobGroup.get}
    -                </li>
    -              }
    -            }
    -            {
    -              if (shouldShowActiveStages) {
    -                <li>
    -                  <a href="#active"><strong>Active Stages:</strong></a>
    -                  {activeStages.size}
    -                </li>
    -              }
    -            }
    -            {
    -              if (shouldShowPendingStages) {
    -                <li>
    -                  <a href="#pending">
    -                    <strong>Pending Stages:</strong>
    -                  </a>{pendingOrSkippedStages.size}
    -                </li>
    -              }
    +    val summary: NodeSeq =
    +      <div>
    +        <ul class="unstyled">
    +          <li>
    +            <Strong>Status:</Strong>
    +            {jobData.status}
    +          </li>
    +          {
    +            if (jobData.jobGroup.isDefined) {
    +              <li>
    +                <strong>Job Group:</strong>
    +                {jobData.jobGroup.get}
    +              </li>
                 }
    -            {
    -              if (shouldShowCompletedStages) {
    -                <li>
    -                  <a href="#completed"><strong>Completed Stages:</strong></a>
    -                  {completedStages.size}
    -                </li>
    -              }
    +          }
    +          {
    +            if (shouldShowActiveStages) {
    +              <li>
    +                <a href="#active"><strong>Active Stages:</strong></a>
    +                {activeStages.size}
    +              </li>
                 }
    -            {
    -              if (shouldShowSkippedStages) {
    +          }
    +          {
    +            if (shouldShowPendingStages) {
                   <li>
    -                <a href="#skipped"><strong>Skipped Stages:</strong></a>
    -                {pendingOrSkippedStages.size}
    +                <a href="#pending">
    +                  <strong>Pending Stages:</strong>
    +                </a>{pendingOrSkippedStages.size}
                   </li>
                 }
    +          }
    +          {
    +            if (shouldShowCompletedStages) {
    +              <li>
    +                <a href="#completed"><strong>Completed Stages:</strong></a>
    +                {completedStages.size}
    +              </li>
                 }
    -            {
    -              if (shouldShowFailedStages) {
    -                <li>
    -                  <a href="#failed"><strong>Failed Stages:</strong></a>
    -                  {failedStages.size}
    -                </li>
    -              }
    +          }
    +          {
    +            if (shouldShowSkippedStages) {
    +            <li>
    +              <a href="#skipped"><strong>Skipped Stages:</strong></a>
    +              {pendingOrSkippedStages.size}
    +            </li>
    +          }
    +          }
    +          {
    +            if (shouldShowFailedStages) {
    +              <li>
    +                <a href="#failed"><strong>Failed Stages:</strong></a>
    +                {failedStages.size}
    +              </li>
                 }
    -          </ul>
    -        </div>
    +          }
    +        </ul>
    +      </div>
     
    -      var content = summary
    -      val appStartTime = listener.startTime
    -      val operationGraphListener = parent.operationGraphListener
    +    var content = summary
    +    val appStartTime = store.applicationInfo().attempts.head.startTime.getTime()
     
    -      content ++= makeTimeline(activeStages ++ completedStages ++ failedStages,
    -          parent.parent.store.executorList(false), appStartTime)
    +    content ++= makeTimeline(activeStages ++ completedStages ++ failedStages,
    +        store.executorList(false), appStartTime)
    --- End diff --
    
    nit: indentation


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19698: [SPARK-20648][core] Port JobsTab and StageTab to ...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19698#discussion_r150654983
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala ---
    @@ -1036,8 +1006,13 @@ private[ui] class TaskDataSource(
           shuffleRead,
           shuffleWrite,
           bytesSpilled,
    -      taskData.errorMessage.getOrElse(""),
    -      logs)
    +      info.errorMessage.getOrElse(""),
    +      executorLogs(info.executorId))
    +  }
    +
    +  private def executorLogs(id: String): Map[String, String] = {
    +    executors.getOrElseUpdate(id,
    +      store.executorSummary(id).map(_.executorLogs).getOrElse(Map.empty))
    --- End diff --
    
    ah right, good point


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19698: [SPARK-20648][core] Port JobsTab and StageTab to the new...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19698
  
    **[Test build #83812 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83812/testReport)** for PR 19698 at commit [`0454ed1`](https://github.com/apache/spark/commit/0454ed1ad5bf36cf6c6f5ebdf899298daf2a50eb).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19698: [SPARK-20648][core] Port JobsTab and StageTab to the new...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on the issue:

    https://github.com/apache/spark/pull/19698
  
    I merged this to master, since its holding up a bunch more of the history server stuff, but I'd encourage other reviews to take a look still @cloud-fan @ajbozarth @jerryshao 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19698: [SPARK-20648][core] Port JobsTab and StageTab to ...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19698#discussion_r150632556
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala ---
    @@ -1036,8 +1006,13 @@ private[ui] class TaskDataSource(
           shuffleRead,
           shuffleWrite,
           bytesSpilled,
    -      taskData.errorMessage.getOrElse(""),
    -      logs)
    +      info.errorMessage.getOrElse(""),
    +      executorLogs(info.executorId))
    +  }
    +
    +  private def executorLogs(id: String): Map[String, String] = {
    +    executors.getOrElseUpdate(id,
    +      store.executorSummary(id).map(_.executorLogs).getOrElse(Map.empty))
    --- End diff --
    
    I think you need to protect `executors` from a race if two UI threads both call this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19698: [SPARK-20648][core] Port JobsTab and StageTab to ...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19698#discussion_r150626334
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/PoolPage.scala ---
    @@ -17,50 +17,49 @@
     
     package org.apache.spark.ui.jobs
     
    +import java.util.NoSuchElementException
     import javax.servlet.http.HttpServletRequest
     
     import scala.xml.Node
     
    -import org.apache.spark.scheduler.StageInfo
    +import org.apache.spark.status.PoolData
    +import org.apache.spark.status.api.v1._
     import org.apache.spark.ui.{UIUtils, WebUIPage}
     
     /** Page showing specific pool details */
     private[ui] class PoolPage(parent: StagesTab) extends WebUIPage("pool") {
    -  private val sc = parent.sc
    -  private val listener = parent.progressListener
     
       def render(request: HttpServletRequest): Seq[Node] = {
    -    listener.synchronized {
    -      // stripXSS is called first to remove suspicious characters used in XSS attacks
    -      val poolName = Option(UIUtils.stripXSS(request.getParameter("poolname"))).map { poolname =>
    -        UIUtils.decodeURLParameter(poolname)
    -      }.getOrElse {
    -        throw new IllegalArgumentException(s"Missing poolname parameter")
    -      }
    +    // stripXSS is called first to remove suspicious characters used in XSS attacks
    +    val poolName = Option(UIUtils.stripXSS(request.getParameter("poolname"))).map { poolname =>
    +      UIUtils.decodeURLParameter(poolname)
    +    }.getOrElse {
    +      throw new IllegalArgumentException(s"Missing poolname parameter")
    +    }
     
    -      val poolToActiveStages = listener.poolToActiveStages
    -      val activeStages = poolToActiveStages.get(poolName) match {
    -        case Some(s) => s.values.toSeq
    -        case None => Seq.empty[StageInfo]
    -      }
    -      val shouldShowActiveStages = activeStages.nonEmpty
    -      val activeStagesTable =
    -        new StageTableBase(request, activeStages, "", "activeStage", parent.basePath, "stages/pool",
    -          parent.progressListener, parent.isFairScheduler, parent.killEnabled,
    -          isFailedStage = false)
    +    // For now, pool information is only accessible in live UIs
    --- End diff --
    
    weird that the PoolPage is even hooked up when there isn't a live UI
    (but I think you have the right change here, I wouldn't change that behavior as part of this)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19698: [SPARK-20648][core] Port JobsTab and StageTab to the new...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19698
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83808/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19698: [SPARK-20648][core] Port JobsTab and StageTab to the new...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19698
  
    **[Test build #83659 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83659/testReport)** for PR 19698 at commit [`1d7242b`](https://github.com/apache/spark/commit/1d7242b340b9525feab941c7d61a6dccb8ccc14c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19698: [SPARK-20648][core] Port JobsTab and StageTab to the new...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19698
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19698: [SPARK-20648][core] Port JobsTab and StageTab to ...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19698#discussion_r150564678
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala ---
    @@ -23,19 +23,21 @@ import javax.servlet.http.HttpServletRequest
     
     import scala.collection.JavaConverters._
     import scala.collection.mutable.{HashMap, ListBuffer}
    +import scala.util.Try
    --- End diff --
    
    unused (and more imports here can be cleaned up)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19698: [SPARK-20648][core] Port JobsTab and StageTab to the new...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19698
  
    **[Test build #83808 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83808/testReport)** for PR 19698 at commit [`1fef09d`](https://github.com/apache/spark/commit/1fef09da600141593942154b312a8bf100d6ea5c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19698: [SPARK-20648][core] Port JobsTab and StageTab to ...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19698#discussion_r150631869
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala ---
    @@ -856,22 +832,24 @@ private[ui] class TaskTableRowData(
         val logs: Map[String, String])
     
     private[ui] class TaskDataSource(
    -    tasks: Seq[TaskUIData],
    +    tasks: Seq[TaskData],
         hasAccumulators: Boolean,
         hasInput: Boolean,
         hasOutput: Boolean,
         hasShuffleRead: Boolean,
         hasShuffleWrite: Boolean,
         hasBytesSpilled: Boolean,
    -    lastUpdateTime: Option[Long],
         currentTime: Long,
         pageSize: Int,
         sortColumn: String,
         desc: Boolean,
         store: AppStatusStore) extends PagedDataSource[TaskTableRowData](pageSize) {
       import StagePage._
     
    -  // Convert TaskUIData to TaskTableRowData which contains the final contents to show in the table
    +  // Keep an internal cache of executor log maps so that long task lists render faster.
    +  private val executors = new HashMap[String, Map[String, String]]()
    --- End diff --
    
    nit: rename to something like `executorIdToLogs`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19698: [SPARK-20648][core] Port JobsTab and StageTab to the new...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19698
  
    **[Test build #83808 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83808/testReport)** for PR 19698 at commit [`1fef09d`](https://github.com/apache/spark/commit/1fef09da600141593942154b312a8bf100d6ea5c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19698: [SPARK-20648][core] Port JobsTab and StageTab to ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19698#discussion_r150648166
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobPage.scala ---
    @@ -182,173 +185,183 @@ private[ui] class JobPage(parent: JobsTab) extends WebUIPage("job") {
       }
     
       def render(request: HttpServletRequest): Seq[Node] = {
    -    val listener = parent.jobProgresslistener
    +    // stripXSS is called first to remove suspicious characters used in XSS attacks
    +    val parameterId = UIUtils.stripXSS(request.getParameter("id"))
    +    require(parameterId != null && parameterId.nonEmpty, "Missing id parameter")
     
    -    listener.synchronized {
    -      // stripXSS is called first to remove suspicious characters used in XSS attacks
    -      val parameterId = UIUtils.stripXSS(request.getParameter("id"))
    -      require(parameterId != null && parameterId.nonEmpty, "Missing id parameter")
    -
    -      val jobId = parameterId.toInt
    -      val jobDataOption = listener.jobIdToData.get(jobId)
    -      if (jobDataOption.isEmpty) {
    -        val content =
    -          <div id="no-info">
    -            <p>No information to display for job {jobId}</p>
    -          </div>
    -        return UIUtils.headerSparkPage(
    -          s"Details for Job $jobId", content, parent)
    -      }
    -      val jobData = jobDataOption.get
    -      val isComplete = jobData.status != JobExecutionStatus.RUNNING
    -      val stages = jobData.stageIds.map { stageId =>
    -        // This could be empty if the JobProgressListener hasn't received information about the
    -        // stage or if the stage information has been garbage collected
    -        listener.stageIdToInfo.getOrElse(stageId,
    -          new StageInfo(stageId, 0, "Unknown", 0, Seq.empty, Seq.empty, "Unknown"))
    +    val jobId = parameterId.toInt
    +    val jobDataOption = Try(store.job(jobId)).toOption
    --- End diff --
    
    I added a shared method; I'm not too happy with it, but this code needs to be resilient to data disappearing (either because events don't arrive or because data is cleaned up to save memory), so, there's not really a good way around it...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19698: [SPARK-20648][core] Port JobsTab and StageTab to the new...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on the issue:

    https://github.com/apache/spark/pull/19698
  
    lgtm


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19698: [SPARK-20648][core] Port JobsTab and StageTab to the new...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19698
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19698: [SPARK-20648][core] Port JobsTab and StageTab to the new...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/19698
  
    For context:
    
    * Project link: https://issues.apache.org/jira/browse/SPARK-18085
    * Upcoming PRs that build on this code: https://github.com/vanzin/spark/pulls
    * PR with more comments: https://github.com/vanzin/spark/pull/47
    
    This PR is missing some code from the original PR that cleans up some now not needed data kept in the UI. That's because SPARK-20647 is not yet committed. When both that and this PR are in, I'll do the cleanup, probably as part of SPARK-20650.
    
    This change is kinda large, but it gets a bit smaller if you ignore whitespace changes:
    https://github.com/apache/spark/pull/19698/files?w=1
    
    Unfortunately you can't comment in that view.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19698: [SPARK-20648][core] Port JobsTab and StageTab to ...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19698#discussion_r150637980
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobsTab.scala ---
    @@ -19,35 +19,52 @@ package org.apache.spark.ui.jobs
     
     import javax.servlet.http.HttpServletRequest
     
    +import scala.collection.JavaConverters._
    +
    +import org.apache.spark.JobExecutionStatus
     import org.apache.spark.scheduler.SchedulingMode
    -import org.apache.spark.ui.{SparkUI, SparkUITab, UIUtils}
    +import org.apache.spark.status.AppStatusStore
    +import org.apache.spark.ui._
     
     /** Web UI showing progress status of all jobs in the given SparkContext. */
    -private[ui] class JobsTab(val parent: SparkUI) extends SparkUITab(parent, "jobs") {
    +private[ui] class JobsTab(parent: SparkUI, store: AppStatusStore)
    +  extends SparkUITab(parent, "jobs") {
    +
       val sc = parent.sc
       val killEnabled = parent.killEnabled
    -  val jobProgresslistener = parent.jobProgressListener
    -  val operationGraphListener = parent.operationGraphListener
     
    -  def isFairScheduler: Boolean =
    -    jobProgresslistener.schedulingMode == Some(SchedulingMode.FAIR)
    +  def isFairScheduler: Boolean = {
    +    val configName = "spark.scheduler.mode"
    +    val config = sc match {
    +      case Some(_sc) =>
    +        _sc.conf.getOption(configName)
    --- End diff --
    
    how come this needs to check the sc.conf, but StagesTab doesn't?  Also doesn't seem like the old code would check this either.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org