You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by attilapiros <gi...@git.apache.org> on 2018/02/13 18:23:21 UTC

[GitHub] spark pull request #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Exe...

GitHub user attilapiros opened a pull request:

    https://github.com/apache/spark/pull/20601

    [SPARK-23413][UI] Fix sorting tasks by Host / Executor ID at the Stage page

    ## What changes were proposed in this pull request?
    
    Fixing exception got at sorting tasks by Host / Executor ID:
    ```
            java.lang.IllegalArgumentException: Invalid sort column: Host
    	at org.apache.spark.ui.jobs.ApiHelper$.indexName(StagePage.scala:1017)
    	at org.apache.spark.ui.jobs.TaskDataSource.sliceData(StagePage.scala:694)
    	at org.apache.spark.ui.PagedDataSource.pageData(PagedTable.scala:61)
    	at org.apache.spark.ui.PagedTable$class.table(PagedTable.scala:96)
    	at org.apache.spark.ui.jobs.TaskPagedTable.table(StagePage.scala:708)
    	at org.apache.spark.ui.jobs.StagePage.liftedTree1$1(StagePage.scala:293)
    	at org.apache.spark.ui.jobs.StagePage.render(StagePage.scala:282)
    	at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82)
    	at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82)
    	at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90)
    	at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
    	at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
    	at org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
    	at org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
    ```
    
    Moreover some refactoring to avoid similar problems by introducing constants for each header name and reusing them at the identification of the corresponding sorting index.   
    
    ## How was this patch tested?
    
    Manually:
    
    ![screen shot 2018-02-13 at 18 57 10](https://user-images.githubusercontent.com/2017933/36166532-1cfdf3b8-10f3-11e8-8d32-5fcaad2af214.png)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/attilapiros/spark SPARK-23413

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20601.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20601
    
----

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

Posted by squito <gi...@git.apache.org>.

Github user squito commented on the issue:

    https://github.com/apache/spark/pull/20601
  
    ack I merged to master but screwed up on 2.3 -- fixing that here: https://github.com/apache/spark/pull/20623


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20601
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20601
  
    **[Test build #87477 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87477/testReport)** for PR 20601 at commit [`c8ef968`](https://github.com/apache/spark/commit/c8ef96839d0ec79eb397757736f9a9df0b876a11).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Exe...

Posted by jiangxb1987 <gi...@git.apache.org>.

Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20601#discussion_r168103464
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala ---
    @@ -963,33 +965,60 @@ private[ui] class TaskPagedTable(
     
     private object ApiHelper {
     
    +  val HEADER_ID = "ID"
    +  val HEADER_TASK_INDEX = "Index"
    +  val HEADER_ATTEMPT = "Attempt"
    +  val HEADER_STATUS = "Status"
    +  val HEADER_LOCALITY = "Locality Level"
    +  val HEADER_EXECUTOR = "Executor ID"
    +  val HEADER_HOST = "Host"
    +  val HEADER_LAUNCH_TIME = "Launch Time"
    +  val HEADER_DURATION = "Duration"
    +  val HEADER_SCHEDULER_DELAY = "Scheduler Delay"
    +  val HEADER_DESER_TIME = "Task Deserialization Time"
    +  val HEADER_GC_TIME = "GC Time"
    +  val HEADER_SER_TIME = "Result Serialization Time"
    +  val HEADER_GETTING_RESULT_TIME = "Getting Result Time"
    +  val HEADER_PEAK_MEM = "Peak Execution Memory"
    +  val HEADER_ACCUMULATORS = "Accumulators"
    +  val HEADER_INPUT_SIZE = "Input Size / Records"
    +  val HEADER_OUTPUT_SIZE = "Output Size / Records"
    +  val HEADER_SHUFFLE_READ_TIME = "Shuffle Read Blocked Time"
    +  val HEADER_SHUFFLE_TOTAL_READS = "Shuffle Read Size / Records"
    +  val HEADER_SHUFFLE_REMOTE_READS = "Shuffle Remote Reads"
    +  val HEADER_SHUFFLE_WRITE_TIME = "Write Time"
    +  val HEADER_SHUFFLE_WRITE_SIZE = "Shuffle Write Size / Records"
    +  val HEADER_MEM_SPILL = "Shuffle Spill (Memory)"
    +  val HEADER_DISK_SPILL = "Shuffle Spill (Disk)"
    +  val HEADER_ERROR = "Errors"
     
       private val COLUMN_TO_INDEX = Map(
    -    "ID" -> null.asInstanceOf[String],
    -    "Index" -> TaskIndexNames.TASK_INDEX,
    -    "Attempt" -> TaskIndexNames.ATTEMPT,
    -    "Status" -> TaskIndexNames.STATUS,
    -    "Locality Level" -> TaskIndexNames.LOCALITY,
    -    "Executor ID / Host" -> TaskIndexNames.EXECUTOR,
    -    "Launch Time" -> TaskIndexNames.LAUNCH_TIME,
    -    "Duration" -> TaskIndexNames.DURATION,
    -    "Scheduler Delay" -> TaskIndexNames.SCHEDULER_DELAY,
    -    "Task Deserialization Time" -> TaskIndexNames.DESER_TIME,
    -    "GC Time" -> TaskIndexNames.GC_TIME,
    -    "Result Serialization Time" -> TaskIndexNames.SER_TIME,
    -    "Getting Result Time" -> TaskIndexNames.GETTING_RESULT_TIME,
    -    "Peak Execution Memory" -> TaskIndexNames.PEAK_MEM,
    -    "Accumulators" -> TaskIndexNames.ACCUMULATORS,
    -    "Input Size / Records" -> TaskIndexNames.INPUT_SIZE,
    -    "Output Size / Records" -> TaskIndexNames.OUTPUT_SIZE,
    -    "Shuffle Read Blocked Time" -> TaskIndexNames.SHUFFLE_READ_TIME,
    -    "Shuffle Read Size / Records" -> TaskIndexNames.SHUFFLE_TOTAL_READS,
    -    "Shuffle Remote Reads" -> TaskIndexNames.SHUFFLE_REMOTE_READS,
    -    "Write Time" -> TaskIndexNames.SHUFFLE_WRITE_TIME,
    -    "Shuffle Write Size / Records" -> TaskIndexNames.SHUFFLE_WRITE_SIZE,
    -    "Shuffle Spill (Memory)" -> TaskIndexNames.MEM_SPILL,
    -    "Shuffle Spill (Disk)" -> TaskIndexNames.DISK_SPILL,
    -    "Errors" -> TaskIndexNames.ERROR)
    +    HEADER_ID -> null.asInstanceOf[String],
    +    HEADER_TASK_INDEX -> TaskIndexNames.TASK_INDEX,
    +    HEADER_ATTEMPT -> TaskIndexNames.ATTEMPT,
    +    HEADER_STATUS -> TaskIndexNames.STATUS,
    +    HEADER_LOCALITY -> TaskIndexNames.LOCALITY,
    +    HEADER_EXECUTOR -> TaskIndexNames.EXECUTOR,
    +    HEADER_HOST -> TaskIndexNames.EXECUTOR,
    --- End diff --
    
    Seems we'd better have a new TaskIndexNames for host column.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Exe...

Posted by squito <gi...@git.apache.org>.

Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20601#discussion_r168211371
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala ---
    @@ -963,33 +965,60 @@ private[ui] class TaskPagedTable(
     
     private object ApiHelper {
     
    +  val HEADER_ID = "ID"
    +  val HEADER_TASK_INDEX = "Index"
    +  val HEADER_ATTEMPT = "Attempt"
    +  val HEADER_STATUS = "Status"
    +  val HEADER_LOCALITY = "Locality Level"
    +  val HEADER_EXECUTOR = "Executor ID"
    +  val HEADER_HOST = "Host"
    +  val HEADER_LAUNCH_TIME = "Launch Time"
    +  val HEADER_DURATION = "Duration"
    +  val HEADER_SCHEDULER_DELAY = "Scheduler Delay"
    +  val HEADER_DESER_TIME = "Task Deserialization Time"
    +  val HEADER_GC_TIME = "GC Time"
    +  val HEADER_SER_TIME = "Result Serialization Time"
    +  val HEADER_GETTING_RESULT_TIME = "Getting Result Time"
    +  val HEADER_PEAK_MEM = "Peak Execution Memory"
    +  val HEADER_ACCUMULATORS = "Accumulators"
    +  val HEADER_INPUT_SIZE = "Input Size / Records"
    +  val HEADER_OUTPUT_SIZE = "Output Size / Records"
    +  val HEADER_SHUFFLE_READ_TIME = "Shuffle Read Blocked Time"
    +  val HEADER_SHUFFLE_TOTAL_READS = "Shuffle Read Size / Records"
    +  val HEADER_SHUFFLE_REMOTE_READS = "Shuffle Remote Reads"
    +  val HEADER_SHUFFLE_WRITE_TIME = "Write Time"
    +  val HEADER_SHUFFLE_WRITE_SIZE = "Shuffle Write Size / Records"
    +  val HEADER_MEM_SPILL = "Shuffle Spill (Memory)"
    +  val HEADER_DISK_SPILL = "Shuffle Spill (Disk)"
    +  val HEADER_ERROR = "Errors"
     
       private val COLUMN_TO_INDEX = Map(
    -    "ID" -> null.asInstanceOf[String],
    -    "Index" -> TaskIndexNames.TASK_INDEX,
    -    "Attempt" -> TaskIndexNames.ATTEMPT,
    -    "Status" -> TaskIndexNames.STATUS,
    -    "Locality Level" -> TaskIndexNames.LOCALITY,
    -    "Executor ID / Host" -> TaskIndexNames.EXECUTOR,
    -    "Launch Time" -> TaskIndexNames.LAUNCH_TIME,
    -    "Duration" -> TaskIndexNames.DURATION,
    -    "Scheduler Delay" -> TaskIndexNames.SCHEDULER_DELAY,
    -    "Task Deserialization Time" -> TaskIndexNames.DESER_TIME,
    -    "GC Time" -> TaskIndexNames.GC_TIME,
    -    "Result Serialization Time" -> TaskIndexNames.SER_TIME,
    -    "Getting Result Time" -> TaskIndexNames.GETTING_RESULT_TIME,
    -    "Peak Execution Memory" -> TaskIndexNames.PEAK_MEM,
    -    "Accumulators" -> TaskIndexNames.ACCUMULATORS,
    -    "Input Size / Records" -> TaskIndexNames.INPUT_SIZE,
    -    "Output Size / Records" -> TaskIndexNames.OUTPUT_SIZE,
    -    "Shuffle Read Blocked Time" -> TaskIndexNames.SHUFFLE_READ_TIME,
    -    "Shuffle Read Size / Records" -> TaskIndexNames.SHUFFLE_TOTAL_READS,
    -    "Shuffle Remote Reads" -> TaskIndexNames.SHUFFLE_REMOTE_READS,
    -    "Write Time" -> TaskIndexNames.SHUFFLE_WRITE_TIME,
    -    "Shuffle Write Size / Records" -> TaskIndexNames.SHUFFLE_WRITE_SIZE,
    -    "Shuffle Spill (Memory)" -> TaskIndexNames.MEM_SPILL,
    -    "Shuffle Spill (Disk)" -> TaskIndexNames.DISK_SPILL,
    -    "Errors" -> TaskIndexNames.ERROR)
    +    HEADER_ID -> null.asInstanceOf[String],
    +    HEADER_TASK_INDEX -> TaskIndexNames.TASK_INDEX,
    +    HEADER_ATTEMPT -> TaskIndexNames.ATTEMPT,
    +    HEADER_STATUS -> TaskIndexNames.STATUS,
    +    HEADER_LOCALITY -> TaskIndexNames.LOCALITY,
    +    HEADER_EXECUTOR -> TaskIndexNames.EXECUTOR,
    +    HEADER_HOST -> TaskIndexNames.EXECUTOR,
    --- End diff --
    
    another alternative is to disable sorting by host, and just fix sorting by executor.  That could go into 2.3.1 without breaking compatibility.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20601
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87487/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20601
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87481/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20601
  
    **[Test build #87487 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87487/testReport)** for PR 20601 at commit [`22179e8`](https://github.com/apache/spark/commit/22179e84f6cf601be18e9b060246c54bd0cede8d).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20601
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Exe...

Posted by squito <gi...@git.apache.org>.

Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20601#discussion_r168313710
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala ---
    @@ -963,33 +965,60 @@ private[ui] class TaskPagedTable(
     
     private object ApiHelper {
     
    +  val HEADER_ID = "ID"
    +  val HEADER_TASK_INDEX = "Index"
    +  val HEADER_ATTEMPT = "Attempt"
    +  val HEADER_STATUS = "Status"
    +  val HEADER_LOCALITY = "Locality Level"
    +  val HEADER_EXECUTOR = "Executor ID"
    +  val HEADER_HOST = "Host"
    +  val HEADER_LAUNCH_TIME = "Launch Time"
    +  val HEADER_DURATION = "Duration"
    +  val HEADER_SCHEDULER_DELAY = "Scheduler Delay"
    +  val HEADER_DESER_TIME = "Task Deserialization Time"
    +  val HEADER_GC_TIME = "GC Time"
    +  val HEADER_SER_TIME = "Result Serialization Time"
    +  val HEADER_GETTING_RESULT_TIME = "Getting Result Time"
    +  val HEADER_PEAK_MEM = "Peak Execution Memory"
    +  val HEADER_ACCUMULATORS = "Accumulators"
    +  val HEADER_INPUT_SIZE = "Input Size / Records"
    +  val HEADER_OUTPUT_SIZE = "Output Size / Records"
    +  val HEADER_SHUFFLE_READ_TIME = "Shuffle Read Blocked Time"
    +  val HEADER_SHUFFLE_TOTAL_READS = "Shuffle Read Size / Records"
    +  val HEADER_SHUFFLE_REMOTE_READS = "Shuffle Remote Reads"
    +  val HEADER_SHUFFLE_WRITE_TIME = "Write Time"
    +  val HEADER_SHUFFLE_WRITE_SIZE = "Shuffle Write Size / Records"
    +  val HEADER_MEM_SPILL = "Shuffle Spill (Memory)"
    +  val HEADER_DISK_SPILL = "Shuffle Spill (Disk)"
    +  val HEADER_ERROR = "Errors"
     
       private val COLUMN_TO_INDEX = Map(
    -    "ID" -> null.asInstanceOf[String],
    -    "Index" -> TaskIndexNames.TASK_INDEX,
    -    "Attempt" -> TaskIndexNames.ATTEMPT,
    -    "Status" -> TaskIndexNames.STATUS,
    -    "Locality Level" -> TaskIndexNames.LOCALITY,
    -    "Executor ID / Host" -> TaskIndexNames.EXECUTOR,
    -    "Launch Time" -> TaskIndexNames.LAUNCH_TIME,
    -    "Duration" -> TaskIndexNames.DURATION,
    -    "Scheduler Delay" -> TaskIndexNames.SCHEDULER_DELAY,
    -    "Task Deserialization Time" -> TaskIndexNames.DESER_TIME,
    -    "GC Time" -> TaskIndexNames.GC_TIME,
    -    "Result Serialization Time" -> TaskIndexNames.SER_TIME,
    -    "Getting Result Time" -> TaskIndexNames.GETTING_RESULT_TIME,
    -    "Peak Execution Memory" -> TaskIndexNames.PEAK_MEM,
    -    "Accumulators" -> TaskIndexNames.ACCUMULATORS,
    -    "Input Size / Records" -> TaskIndexNames.INPUT_SIZE,
    -    "Output Size / Records" -> TaskIndexNames.OUTPUT_SIZE,
    -    "Shuffle Read Blocked Time" -> TaskIndexNames.SHUFFLE_READ_TIME,
    -    "Shuffle Read Size / Records" -> TaskIndexNames.SHUFFLE_TOTAL_READS,
    -    "Shuffle Remote Reads" -> TaskIndexNames.SHUFFLE_REMOTE_READS,
    -    "Write Time" -> TaskIndexNames.SHUFFLE_WRITE_TIME,
    -    "Shuffle Write Size / Records" -> TaskIndexNames.SHUFFLE_WRITE_SIZE,
    -    "Shuffle Spill (Memory)" -> TaskIndexNames.MEM_SPILL,
    -    "Shuffle Spill (Disk)" -> TaskIndexNames.DISK_SPILL,
    -    "Errors" -> TaskIndexNames.ERROR)
    +    HEADER_ID -> null.asInstanceOf[String],
    +    HEADER_TASK_INDEX -> TaskIndexNames.TASK_INDEX,
    +    HEADER_ATTEMPT -> TaskIndexNames.ATTEMPT,
    +    HEADER_STATUS -> TaskIndexNames.STATUS,
    +    HEADER_LOCALITY -> TaskIndexNames.LOCALITY,
    +    HEADER_EXECUTOR -> TaskIndexNames.EXECUTOR,
    +    HEADER_HOST -> TaskIndexNames.EXECUTOR,
    --- End diff --
    
    or even go back to the 2.2 behavior, with executor & host in the same column.
    
    I do think having a separate column for host, and having it be sortable, is actually better ... but just trying to think of simple solutions.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20601
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20601
  
    **[Test build #87411 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87411/testReport)** for PR 20601 at commit [`d51602f`](https://github.com/apache/spark/commit/d51602feee636e86907339f5771d61ca670676f4).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20601
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Exe...

Posted by attilapiros <gi...@git.apache.org>.

Github user attilapiros commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20601#discussion_r168458891
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala ---
    @@ -963,33 +965,60 @@ private[ui] class TaskPagedTable(
     
     private object ApiHelper {
     
    +  val HEADER_ID = "ID"
    +  val HEADER_TASK_INDEX = "Index"
    +  val HEADER_ATTEMPT = "Attempt"
    +  val HEADER_STATUS = "Status"
    +  val HEADER_LOCALITY = "Locality Level"
    +  val HEADER_EXECUTOR = "Executor ID"
    +  val HEADER_HOST = "Host"
    +  val HEADER_LAUNCH_TIME = "Launch Time"
    +  val HEADER_DURATION = "Duration"
    +  val HEADER_SCHEDULER_DELAY = "Scheduler Delay"
    +  val HEADER_DESER_TIME = "Task Deserialization Time"
    +  val HEADER_GC_TIME = "GC Time"
    +  val HEADER_SER_TIME = "Result Serialization Time"
    +  val HEADER_GETTING_RESULT_TIME = "Getting Result Time"
    +  val HEADER_PEAK_MEM = "Peak Execution Memory"
    +  val HEADER_ACCUMULATORS = "Accumulators"
    +  val HEADER_INPUT_SIZE = "Input Size / Records"
    +  val HEADER_OUTPUT_SIZE = "Output Size / Records"
    +  val HEADER_SHUFFLE_READ_TIME = "Shuffle Read Blocked Time"
    +  val HEADER_SHUFFLE_TOTAL_READS = "Shuffle Read Size / Records"
    --- End diff --
    
    In the header constants naming I have followed the existing task index names:
    
    ```scala   
      HEADER_SHUFFLE_TOTAL_READS -> TaskIndexNames.SHUFFLE_TOTAL_READS,
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Exe...

Posted by squito <gi...@git.apache.org>.

Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20601#discussion_r167991914
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala ---
    @@ -963,33 +965,60 @@ private[ui] class TaskPagedTable(
     
     private object ApiHelper {
     
    +  val HEADER_ID = "ID"
    +  val HEADER_TASK_INDEX = "Index"
    +  val HEADER_ATTEMPT = "Attempt"
    +  val HEADER_STATUS = "Status"
    +  val HEADER_LOCALITY = "Locality Level"
    +  val HEADER_EXECUTOR = "Executor ID"
    +  val HEADER_HOST = "Host"
    +  val HEADER_LAUNCH_TIME = "Launch Time"
    +  val HEADER_DURATION = "Duration"
    +  val HEADER_SCHEDULER_DELAY = "Scheduler Delay"
    +  val HEADER_DESER_TIME = "Task Deserialization Time"
    +  val HEADER_GC_TIME = "GC Time"
    +  val HEADER_SER_TIME = "Result Serialization Time"
    +  val HEADER_GETTING_RESULT_TIME = "Getting Result Time"
    +  val HEADER_PEAK_MEM = "Peak Execution Memory"
    +  val HEADER_ACCUMULATORS = "Accumulators"
    +  val HEADER_INPUT_SIZE = "Input Size / Records"
    +  val HEADER_OUTPUT_SIZE = "Output Size / Records"
    +  val HEADER_SHUFFLE_READ_TIME = "Shuffle Read Blocked Time"
    +  val HEADER_SHUFFLE_TOTAL_READS = "Shuffle Read Size / Records"
    +  val HEADER_SHUFFLE_REMOTE_READS = "Shuffle Remote Reads"
    +  val HEADER_SHUFFLE_WRITE_TIME = "Write Time"
    +  val HEADER_SHUFFLE_WRITE_SIZE = "Shuffle Write Size / Records"
    +  val HEADER_MEM_SPILL = "Shuffle Spill (Memory)"
    +  val HEADER_DISK_SPILL = "Shuffle Spill (Disk)"
    +  val HEADER_ERROR = "Errors"
     
       private val COLUMN_TO_INDEX = Map(
    -    "ID" -> null.asInstanceOf[String],
    -    "Index" -> TaskIndexNames.TASK_INDEX,
    -    "Attempt" -> TaskIndexNames.ATTEMPT,
    -    "Status" -> TaskIndexNames.STATUS,
    -    "Locality Level" -> TaskIndexNames.LOCALITY,
    -    "Executor ID / Host" -> TaskIndexNames.EXECUTOR,
    -    "Launch Time" -> TaskIndexNames.LAUNCH_TIME,
    -    "Duration" -> TaskIndexNames.DURATION,
    -    "Scheduler Delay" -> TaskIndexNames.SCHEDULER_DELAY,
    -    "Task Deserialization Time" -> TaskIndexNames.DESER_TIME,
    -    "GC Time" -> TaskIndexNames.GC_TIME,
    -    "Result Serialization Time" -> TaskIndexNames.SER_TIME,
    -    "Getting Result Time" -> TaskIndexNames.GETTING_RESULT_TIME,
    -    "Peak Execution Memory" -> TaskIndexNames.PEAK_MEM,
    -    "Accumulators" -> TaskIndexNames.ACCUMULATORS,
    -    "Input Size / Records" -> TaskIndexNames.INPUT_SIZE,
    -    "Output Size / Records" -> TaskIndexNames.OUTPUT_SIZE,
    -    "Shuffle Read Blocked Time" -> TaskIndexNames.SHUFFLE_READ_TIME,
    -    "Shuffle Read Size / Records" -> TaskIndexNames.SHUFFLE_TOTAL_READS,
    -    "Shuffle Remote Reads" -> TaskIndexNames.SHUFFLE_REMOTE_READS,
    -    "Write Time" -> TaskIndexNames.SHUFFLE_WRITE_TIME,
    -    "Shuffle Write Size / Records" -> TaskIndexNames.SHUFFLE_WRITE_SIZE,
    -    "Shuffle Spill (Memory)" -> TaskIndexNames.MEM_SPILL,
    -    "Shuffle Spill (Disk)" -> TaskIndexNames.DISK_SPILL,
    -    "Errors" -> TaskIndexNames.ERROR)
    +    HEADER_ID -> null.asInstanceOf[String],
    +    HEADER_TASK_INDEX -> TaskIndexNames.TASK_INDEX,
    +    HEADER_ATTEMPT -> TaskIndexNames.ATTEMPT,
    +    HEADER_STATUS -> TaskIndexNames.STATUS,
    +    HEADER_LOCALITY -> TaskIndexNames.LOCALITY,
    +    HEADER_EXECUTOR -> TaskIndexNames.EXECUTOR,
    +    HEADER_HOST -> TaskIndexNames.EXECUTOR,
    --- End diff --
    
    sorting by host and executor is not the same ... you might have executors 1 & 5 on host A, and execs 2,3,4 on host B. 
    
    The 2.2 UI had both executor and host in the same column: https://github.com/apache/spark/blob/branch-2.2/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala#L1203
    
    I think we either need to go back to having one column for both, or add an index on host.
    
    thoughts @vanzin ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Exe...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20601


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20601
  
    **[Test build #87411 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87411/testReport)** for PR 20601 at commit [`d51602f`](https://github.com/apache/spark/commit/d51602feee636e86907339f5771d61ca670676f4).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20601
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20601
  
    **[Test build #87478 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87478/testReport)** for PR 20601 at commit [`22179e8`](https://github.com/apache/spark/commit/22179e84f6cf601be18e9b060246c54bd0cede8d).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20601
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87411/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20601
  
    **[Test build #87477 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87477/testReport)** for PR 20601 at commit [`c8ef968`](https://github.com/apache/spark/commit/c8ef96839d0ec79eb397757736f9a9df0b876a11).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Exe...

Posted by attilapiros <gi...@git.apache.org>.

Github user attilapiros commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20601#discussion_r168223637
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala ---
    @@ -963,33 +965,60 @@ private[ui] class TaskPagedTable(
     
     private object ApiHelper {
     
    +  val HEADER_ID = "ID"
    +  val HEADER_TASK_INDEX = "Index"
    +  val HEADER_ATTEMPT = "Attempt"
    +  val HEADER_STATUS = "Status"
    +  val HEADER_LOCALITY = "Locality Level"
    +  val HEADER_EXECUTOR = "Executor ID"
    +  val HEADER_HOST = "Host"
    +  val HEADER_LAUNCH_TIME = "Launch Time"
    +  val HEADER_DURATION = "Duration"
    +  val HEADER_SCHEDULER_DELAY = "Scheduler Delay"
    +  val HEADER_DESER_TIME = "Task Deserialization Time"
    +  val HEADER_GC_TIME = "GC Time"
    +  val HEADER_SER_TIME = "Result Serialization Time"
    +  val HEADER_GETTING_RESULT_TIME = "Getting Result Time"
    +  val HEADER_PEAK_MEM = "Peak Execution Memory"
    +  val HEADER_ACCUMULATORS = "Accumulators"
    +  val HEADER_INPUT_SIZE = "Input Size / Records"
    +  val HEADER_OUTPUT_SIZE = "Output Size / Records"
    +  val HEADER_SHUFFLE_READ_TIME = "Shuffle Read Blocked Time"
    +  val HEADER_SHUFFLE_TOTAL_READS = "Shuffle Read Size / Records"
    +  val HEADER_SHUFFLE_REMOTE_READS = "Shuffle Remote Reads"
    +  val HEADER_SHUFFLE_WRITE_TIME = "Write Time"
    +  val HEADER_SHUFFLE_WRITE_SIZE = "Shuffle Write Size / Records"
    +  val HEADER_MEM_SPILL = "Shuffle Spill (Memory)"
    +  val HEADER_DISK_SPILL = "Shuffle Spill (Disk)"
    +  val HEADER_ERROR = "Errors"
     
       private val COLUMN_TO_INDEX = Map(
    -    "ID" -> null.asInstanceOf[String],
    -    "Index" -> TaskIndexNames.TASK_INDEX,
    -    "Attempt" -> TaskIndexNames.ATTEMPT,
    -    "Status" -> TaskIndexNames.STATUS,
    -    "Locality Level" -> TaskIndexNames.LOCALITY,
    -    "Executor ID / Host" -> TaskIndexNames.EXECUTOR,
    -    "Launch Time" -> TaskIndexNames.LAUNCH_TIME,
    -    "Duration" -> TaskIndexNames.DURATION,
    -    "Scheduler Delay" -> TaskIndexNames.SCHEDULER_DELAY,
    -    "Task Deserialization Time" -> TaskIndexNames.DESER_TIME,
    -    "GC Time" -> TaskIndexNames.GC_TIME,
    -    "Result Serialization Time" -> TaskIndexNames.SER_TIME,
    -    "Getting Result Time" -> TaskIndexNames.GETTING_RESULT_TIME,
    -    "Peak Execution Memory" -> TaskIndexNames.PEAK_MEM,
    -    "Accumulators" -> TaskIndexNames.ACCUMULATORS,
    -    "Input Size / Records" -> TaskIndexNames.INPUT_SIZE,
    -    "Output Size / Records" -> TaskIndexNames.OUTPUT_SIZE,
    -    "Shuffle Read Blocked Time" -> TaskIndexNames.SHUFFLE_READ_TIME,
    -    "Shuffle Read Size / Records" -> TaskIndexNames.SHUFFLE_TOTAL_READS,
    -    "Shuffle Remote Reads" -> TaskIndexNames.SHUFFLE_REMOTE_READS,
    -    "Write Time" -> TaskIndexNames.SHUFFLE_WRITE_TIME,
    -    "Shuffle Write Size / Records" -> TaskIndexNames.SHUFFLE_WRITE_SIZE,
    -    "Shuffle Spill (Memory)" -> TaskIndexNames.MEM_SPILL,
    -    "Shuffle Spill (Disk)" -> TaskIndexNames.DISK_SPILL,
    -    "Errors" -> TaskIndexNames.ERROR)
    +    HEADER_ID -> null.asInstanceOf[String],
    +    HEADER_TASK_INDEX -> TaskIndexNames.TASK_INDEX,
    +    HEADER_ATTEMPT -> TaskIndexNames.ATTEMPT,
    +    HEADER_STATUS -> TaskIndexNames.STATUS,
    +    HEADER_LOCALITY -> TaskIndexNames.LOCALITY,
    +    HEADER_EXECUTOR -> TaskIndexNames.EXECUTOR,
    +    HEADER_HOST -> TaskIndexNames.EXECUTOR,
    --- End diff --
    
    I think that is good idea. I can extend taskHeadersAndCssClasses to store Tuple3 objects where the additional Boolean property flags whether the column is sortable. And for a not sortable column we are skipping the headerLink.  


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20601
  
    **[Test build #87481 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87481/testReport)** for PR 20601 at commit [`22179e8`](https://github.com/apache/spark/commit/22179e84f6cf601be18e9b060246c54bd0cede8d).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Exe...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20601#discussion_r168408617
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala ---
    @@ -963,33 +965,60 @@ private[ui] class TaskPagedTable(
     
     private object ApiHelper {
     
    +  val HEADER_ID = "ID"
    +  val HEADER_TASK_INDEX = "Index"
    +  val HEADER_ATTEMPT = "Attempt"
    +  val HEADER_STATUS = "Status"
    +  val HEADER_LOCALITY = "Locality Level"
    +  val HEADER_EXECUTOR = "Executor ID"
    +  val HEADER_HOST = "Host"
    +  val HEADER_LAUNCH_TIME = "Launch Time"
    +  val HEADER_DURATION = "Duration"
    +  val HEADER_SCHEDULER_DELAY = "Scheduler Delay"
    +  val HEADER_DESER_TIME = "Task Deserialization Time"
    +  val HEADER_GC_TIME = "GC Time"
    +  val HEADER_SER_TIME = "Result Serialization Time"
    +  val HEADER_GETTING_RESULT_TIME = "Getting Result Time"
    +  val HEADER_PEAK_MEM = "Peak Execution Memory"
    +  val HEADER_ACCUMULATORS = "Accumulators"
    +  val HEADER_INPUT_SIZE = "Input Size / Records"
    +  val HEADER_OUTPUT_SIZE = "Output Size / Records"
    +  val HEADER_SHUFFLE_READ_TIME = "Shuffle Read Blocked Time"
    +  val HEADER_SHUFFLE_TOTAL_READS = "Shuffle Read Size / Records"
    +  val HEADER_SHUFFLE_REMOTE_READS = "Shuffle Remote Reads"
    +  val HEADER_SHUFFLE_WRITE_TIME = "Write Time"
    +  val HEADER_SHUFFLE_WRITE_SIZE = "Shuffle Write Size / Records"
    +  val HEADER_MEM_SPILL = "Shuffle Spill (Memory)"
    +  val HEADER_DISK_SPILL = "Shuffle Spill (Disk)"
    +  val HEADER_ERROR = "Errors"
     
       private val COLUMN_TO_INDEX = Map(
    -    "ID" -> null.asInstanceOf[String],
    -    "Index" -> TaskIndexNames.TASK_INDEX,
    -    "Attempt" -> TaskIndexNames.ATTEMPT,
    -    "Status" -> TaskIndexNames.STATUS,
    -    "Locality Level" -> TaskIndexNames.LOCALITY,
    -    "Executor ID / Host" -> TaskIndexNames.EXECUTOR,
    -    "Launch Time" -> TaskIndexNames.LAUNCH_TIME,
    -    "Duration" -> TaskIndexNames.DURATION,
    -    "Scheduler Delay" -> TaskIndexNames.SCHEDULER_DELAY,
    -    "Task Deserialization Time" -> TaskIndexNames.DESER_TIME,
    -    "GC Time" -> TaskIndexNames.GC_TIME,
    -    "Result Serialization Time" -> TaskIndexNames.SER_TIME,
    -    "Getting Result Time" -> TaskIndexNames.GETTING_RESULT_TIME,
    -    "Peak Execution Memory" -> TaskIndexNames.PEAK_MEM,
    -    "Accumulators" -> TaskIndexNames.ACCUMULATORS,
    -    "Input Size / Records" -> TaskIndexNames.INPUT_SIZE,
    -    "Output Size / Records" -> TaskIndexNames.OUTPUT_SIZE,
    -    "Shuffle Read Blocked Time" -> TaskIndexNames.SHUFFLE_READ_TIME,
    -    "Shuffle Read Size / Records" -> TaskIndexNames.SHUFFLE_TOTAL_READS,
    -    "Shuffle Remote Reads" -> TaskIndexNames.SHUFFLE_REMOTE_READS,
    -    "Write Time" -> TaskIndexNames.SHUFFLE_WRITE_TIME,
    -    "Shuffle Write Size / Records" -> TaskIndexNames.SHUFFLE_WRITE_SIZE,
    -    "Shuffle Spill (Memory)" -> TaskIndexNames.MEM_SPILL,
    -    "Shuffle Spill (Disk)" -> TaskIndexNames.DISK_SPILL,
    -    "Errors" -> TaskIndexNames.ERROR)
    +    HEADER_ID -> null.asInstanceOf[String],
    +    HEADER_TASK_INDEX -> TaskIndexNames.TASK_INDEX,
    +    HEADER_ATTEMPT -> TaskIndexNames.ATTEMPT,
    +    HEADER_STATUS -> TaskIndexNames.STATUS,
    +    HEADER_LOCALITY -> TaskIndexNames.LOCALITY,
    +    HEADER_EXECUTOR -> TaskIndexNames.EXECUTOR,
    +    HEADER_HOST -> TaskIndexNames.EXECUTOR,
    --- End diff --
    
    Looks like we'll have a new RC, so I'll jump in the bandwagon and mark this one a blocker too. We can then add the new index in 2.3.0.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/20601
  
    LGTM pending tests.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

Posted by attilapiros <gi...@git.apache.org>.

Github user attilapiros commented on the issue:

    https://github.com/apache/spark/pull/20601
  
    Yes, of course. The test of @zsxwing is perfect to avoid similar problems in the future.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20601
  
    **[Test build #87481 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87481/testReport)** for PR 20601 at commit [`22179e8`](https://github.com/apache/spark/commit/22179e84f6cf601be18e9b060246c54bd0cede8d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

Posted by zsxwing <gi...@git.apache.org>.

Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/20601
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/20601
  
    I'm trying to think of a way that we can avoid these issues going forward - not that I expect this code to change much. Maybe have a unit test that makes sure all declared header constants are mapped to some index, or something like that, and fails if you add a new header constant without a mapping.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Exe...

Posted by jiangxb1987 <gi...@git.apache.org>.

Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20601#discussion_r168102644
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala ---
    @@ -963,33 +965,60 @@ private[ui] class TaskPagedTable(
     
     private object ApiHelper {
     
    +  val HEADER_ID = "ID"
    +  val HEADER_TASK_INDEX = "Index"
    +  val HEADER_ATTEMPT = "Attempt"
    +  val HEADER_STATUS = "Status"
    +  val HEADER_LOCALITY = "Locality Level"
    +  val HEADER_EXECUTOR = "Executor ID"
    +  val HEADER_HOST = "Host"
    +  val HEADER_LAUNCH_TIME = "Launch Time"
    +  val HEADER_DURATION = "Duration"
    +  val HEADER_SCHEDULER_DELAY = "Scheduler Delay"
    +  val HEADER_DESER_TIME = "Task Deserialization Time"
    +  val HEADER_GC_TIME = "GC Time"
    +  val HEADER_SER_TIME = "Result Serialization Time"
    +  val HEADER_GETTING_RESULT_TIME = "Getting Result Time"
    +  val HEADER_PEAK_MEM = "Peak Execution Memory"
    +  val HEADER_ACCUMULATORS = "Accumulators"
    +  val HEADER_INPUT_SIZE = "Input Size / Records"
    +  val HEADER_OUTPUT_SIZE = "Output Size / Records"
    +  val HEADER_SHUFFLE_READ_TIME = "Shuffle Read Blocked Time"
    +  val HEADER_SHUFFLE_TOTAL_READS = "Shuffle Read Size / Records"
    --- End diff --
    
    nit: `HEADER_SHUFFLE_TOTAL_READS` -> `HEADER_SHUFFLE_READ_SIZE` ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20601
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87477/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Exe...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20601#discussion_r168105632
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala ---
    @@ -963,33 +965,60 @@ private[ui] class TaskPagedTable(
     
     private object ApiHelper {
     
    +  val HEADER_ID = "ID"
    +  val HEADER_TASK_INDEX = "Index"
    +  val HEADER_ATTEMPT = "Attempt"
    +  val HEADER_STATUS = "Status"
    +  val HEADER_LOCALITY = "Locality Level"
    +  val HEADER_EXECUTOR = "Executor ID"
    +  val HEADER_HOST = "Host"
    +  val HEADER_LAUNCH_TIME = "Launch Time"
    +  val HEADER_DURATION = "Duration"
    +  val HEADER_SCHEDULER_DELAY = "Scheduler Delay"
    +  val HEADER_DESER_TIME = "Task Deserialization Time"
    +  val HEADER_GC_TIME = "GC Time"
    +  val HEADER_SER_TIME = "Result Serialization Time"
    +  val HEADER_GETTING_RESULT_TIME = "Getting Result Time"
    +  val HEADER_PEAK_MEM = "Peak Execution Memory"
    +  val HEADER_ACCUMULATORS = "Accumulators"
    +  val HEADER_INPUT_SIZE = "Input Size / Records"
    +  val HEADER_OUTPUT_SIZE = "Output Size / Records"
    +  val HEADER_SHUFFLE_READ_TIME = "Shuffle Read Blocked Time"
    +  val HEADER_SHUFFLE_TOTAL_READS = "Shuffle Read Size / Records"
    +  val HEADER_SHUFFLE_REMOTE_READS = "Shuffle Remote Reads"
    +  val HEADER_SHUFFLE_WRITE_TIME = "Write Time"
    +  val HEADER_SHUFFLE_WRITE_SIZE = "Shuffle Write Size / Records"
    +  val HEADER_MEM_SPILL = "Shuffle Spill (Memory)"
    +  val HEADER_DISK_SPILL = "Shuffle Spill (Disk)"
    +  val HEADER_ERROR = "Errors"
     
       private val COLUMN_TO_INDEX = Map(
    -    "ID" -> null.asInstanceOf[String],
    -    "Index" -> TaskIndexNames.TASK_INDEX,
    -    "Attempt" -> TaskIndexNames.ATTEMPT,
    -    "Status" -> TaskIndexNames.STATUS,
    -    "Locality Level" -> TaskIndexNames.LOCALITY,
    -    "Executor ID / Host" -> TaskIndexNames.EXECUTOR,
    -    "Launch Time" -> TaskIndexNames.LAUNCH_TIME,
    -    "Duration" -> TaskIndexNames.DURATION,
    -    "Scheduler Delay" -> TaskIndexNames.SCHEDULER_DELAY,
    -    "Task Deserialization Time" -> TaskIndexNames.DESER_TIME,
    -    "GC Time" -> TaskIndexNames.GC_TIME,
    -    "Result Serialization Time" -> TaskIndexNames.SER_TIME,
    -    "Getting Result Time" -> TaskIndexNames.GETTING_RESULT_TIME,
    -    "Peak Execution Memory" -> TaskIndexNames.PEAK_MEM,
    -    "Accumulators" -> TaskIndexNames.ACCUMULATORS,
    -    "Input Size / Records" -> TaskIndexNames.INPUT_SIZE,
    -    "Output Size / Records" -> TaskIndexNames.OUTPUT_SIZE,
    -    "Shuffle Read Blocked Time" -> TaskIndexNames.SHUFFLE_READ_TIME,
    -    "Shuffle Read Size / Records" -> TaskIndexNames.SHUFFLE_TOTAL_READS,
    -    "Shuffle Remote Reads" -> TaskIndexNames.SHUFFLE_REMOTE_READS,
    -    "Write Time" -> TaskIndexNames.SHUFFLE_WRITE_TIME,
    -    "Shuffle Write Size / Records" -> TaskIndexNames.SHUFFLE_WRITE_SIZE,
    -    "Shuffle Spill (Memory)" -> TaskIndexNames.MEM_SPILL,
    -    "Shuffle Spill (Disk)" -> TaskIndexNames.DISK_SPILL,
    -    "Errors" -> TaskIndexNames.ERROR)
    +    HEADER_ID -> null.asInstanceOf[String],
    +    HEADER_TASK_INDEX -> TaskIndexNames.TASK_INDEX,
    +    HEADER_ATTEMPT -> TaskIndexNames.ATTEMPT,
    +    HEADER_STATUS -> TaskIndexNames.STATUS,
    +    HEADER_LOCALITY -> TaskIndexNames.LOCALITY,
    +    HEADER_EXECUTOR -> TaskIndexNames.EXECUTOR,
    +    HEADER_HOST -> TaskIndexNames.EXECUTOR,
    --- End diff --
    
    Hmmm... I agree that the correct thing would be to sort by host and have an index on that. The problem is that we'd be changing the data on disk, breaking compatibility with previous versions of the disk store. So unless that change goes into 2.3.0, that means revving the disk version number, which would require re-parsing all logs. And that kinda sucks. (I hope by the next major version I - or someone - get time to better investigate versioning of the disk data.)
    
    Given this affects 2.3 we could potentially consider it a blocker. @sameeragarwal probably won't be very happy though.
    
    2.2 actually sorts by executor id, and doesn't have a separate host column (added in SPARK-21675). That's one of these small changes I missed while merging all the SHS stuff.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/20601
  
    ah, flaky tests. retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20601
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87478/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20601
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/20601
  
    @attilapiros could you take a look at the test case Ryan added in #20615 and add something like that to your patch? It'd be nice to catch these things in unit tests.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20601
  
    **[Test build #87487 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87487/testReport)** for PR 20601 at commit [`22179e8`](https://github.com/apache/spark/commit/22179e84f6cf601be18e9b060246c54bd0cede8d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

Posted by squito <gi...@git.apache.org>.

Github user squito commented on the issue:

    https://github.com/apache/spark/pull/20601
  
    Everything that might have changed from this has passed, the failures are known flaky tests:
    
    https://issues.apache.org/jira/browse/SPARK-23369
    
    https://issues.apache.org/jira/browse/SPARK-23390
    
    merging to master / 2.3


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20601
  
    **[Test build #87478 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87478/testReport)** for PR 20601 at commit [`22179e8`](https://github.com/apache/spark/commit/22179e84f6cf601be18e9b060246c54bd0cede8d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

Posted by attilapiros <gi...@git.apache.org>.

Github user attilapiros commented on the issue:

    https://github.com/apache/spark/pull/20601
  
    jenkins retest please



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org