You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by davies <gi...@git.apache.org> on 2014/10/31 02:06:54 UTC

[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

GitHub user davies opened a pull request:

    https://github.com/apache/spark/pull/3029

    [SPARK-4017] show progress bar in console and title

    The progress bar will look like this:
    
    ![1___spark_job__85_250_finished__4_are_running___java_](https://cloud.githubusercontent.com/assets/40902/4854813/a02f44ac-6099-11e4-9060-7c73a73151d6.png)
    
    In the right corner, the numbers are: finished tasks, running tasks, total tasks, and used time since beginning.
    
    After the stage has finished, it will show a line of summary for it.
    
    Also, the status will be showed in the title of console.
    
    The progress bar is only showed if logging level is WARN or higher (but progress in title is still showed), it can be turned off by spark.driver.showConsoleProgress.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/davies/spark progress

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/3029.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3029
    
----
commit 9e42208c9acdfb3854ef7c6ec4dbdb12aed9dbc6
Author: Davies Liu <da...@databricks.com>
Date:   2014-10-31T00:58:41Z

    show progress bar in console and title

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3029#discussion_r19702287
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -521,6 +529,51 @@ private[spark] class TaskSetManager(
         sched.dagScheduler.taskGettingResult(info)
       }
     
    +  private def progressBar(curr: Int, total: Int): Unit = {
    +    val now = clock.getTime()
    +    // Only update title once in one second
    +    if (now - lastUpdate < 100 && curr < total) {
    --- End diff --
    
    comment says 1 second, but this looks like 100ms


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63257383
  
      [Test build #523 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/523/consoleFull) for   PR 3029 at commit [`a353e85`](https://github.com/apache/spark/commit/a353e8567d6b4fdcdeaabb03d8f8ca1f6b6ddbad).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61199090
  
      [Test build #22584 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22584/consoleFull) for   PR 3029 at commit [`9e42208`](https://github.com/apache/spark/commit/9e42208c9acdfb3854ef7c6ec4dbdb12aed9dbc6).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61340878
  
      [Test build #22656 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22656/consoleFull) for   PR 3029 at commit [`7e7d4e7`](https://github.com/apache/spark/commit/7e7d4e784864baa4168819b0fc3fc01a01abc1cd).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-62480890
  
    @kayousterhout @squito The motivation of this PR is to allow user to see the progress without flushing away their input/output, so it's better to have a one-line progress bar. For example
    ```scala
    scala> rdd.count()
    [Stage 1] finished in xxx seconds (med/avg/xxxx)
    res0: Int = xxxx
    ```
    If we put the progress bar into normal logging infrastructure (as level WARN), also do not mix them into one-line, then this will much easier, but not as good as current approach to users (it's still noisy).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3029#discussion_r20467790
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/ConsoleProgressBar.scala ---
    @@ -0,0 +1,115 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.ui
    +
    +import java.util.{Timer, TimerTask}
    +
    +import org.apache.spark._
    +
    +/**
    + * ConsoleProgressBar shows the progress of stages in the next line of the console. It poll the
    + * status of active stages from `sc.statusTracker` in every 200ms, the progress bar will be showed
    + * up after the stage has ran at least 500ms. If multiple stages run in the same time, the status
    + * of them will be combined together, showed in one line.
    + */
    +private[spark] class ConsoleProgressBar(sc: SparkContext) extends Logging {
    +
    +  // Update period of progress bar, in milli seconds
    +  val UPDATE_PERIOD = 200L
    +  // Delay to show up a progress bar, in milli seconds
    --- End diff --
    
    milliseconds is 1 word


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3029#discussion_r20467516
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/ConsoleProgressBar.scala ---
    @@ -0,0 +1,115 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.ui
    +
    +import java.util.{Timer, TimerTask}
    +
    +import org.apache.spark._
    +
    +/**
    + * ConsoleProgressBar shows the progress of stages in the next line of the console. It poll the
    + * status of active stages from `sc.statusTracker` in every 200ms, the progress bar will be showed
    + * up after the stage has ran at least 500ms. If multiple stages run in the same time, the status
    + * of them will be combined together, showed in one line.
    + */
    +private[spark] class ConsoleProgressBar(sc: SparkContext) extends Logging {
    --- End diff --
    
    minor: but could this just take the `StatusTracker` in the constructor? This makes the dependencies between the components more clear.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3029#discussion_r20467643
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/ConsoleProgressBar.scala ---
    @@ -0,0 +1,115 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.ui
    +
    +import java.util.{Timer, TimerTask}
    +
    +import org.apache.spark._
    +
    +/**
    + * ConsoleProgressBar shows the progress of stages in the next line of the console. It poll the
    + * status of active stages from `sc.statusTracker` in every 200ms, the progress bar will be showed
    + * up after the stage has ran at least 500ms. If multiple stages run in the same time, the status
    + * of them will be combined together, showed in one line.
    + */
    +private[spark] class ConsoleProgressBar(sc: SparkContext) extends Logging {
    +
    +  // Update period of progress bar, in milli seconds
    +  val UPDATE_PERIOD = 200L
    +  // Delay to show up a progress bar, in milli seconds
    +  val DELAY_SHOW_UP = 500L
    +
    +  // The width of terminal
    +  val TerminalWidth = if (!sys.env.getOrElse("COLUMNS", "").isEmpty) {
    +    sys.env.get("COLUMNS").get.toInt
    +  } else {
    +    80
    +  }
    +
    +  var hasShowed = false
    +  var lastFinishTime = 0L
    +
    +  // Schedule a refresh thread to run in every 200ms
    +  private val timer = new Timer("refresh progress", true)
    +  timer.schedule(new TimerTask{
    +    override def run() {
    +      refresh()
    +    }
    +  }, DELAY_SHOW_UP, UPDATE_PERIOD)
    +
    +  /**
    +   * Try to refresh the progress bar in every cycle
    +   */
    +  private def refresh(): Unit = synchronized {
    +    val now = System.currentTimeMillis()
    +    if (now - lastFinishTime < DELAY_SHOW_UP) {
    +      return
    +    }
    +    val stageIds = sc.statusTracker.getActiveStageIds()
    +    val stages = stageIds.map(sc.statusTracker.getStageInfo).flatten.filter(_.numTasks() > 1)
    +      .filter(now - _.submissionTime() > DELAY_SHOW_UP).sortBy(_.stageId())
    +    if (stages.size > 0) {
    +      show(stages.take(3))  // display at most 3 stages in same time
    +      hasShowed = true
    +    }
    +  }
    +
    +  /**
    +   * Show progress bar in console. The progress bar is displayed in the next line
    +   * after your last output, keeps overwriting itself to hold in one line. The logging will follow
    +   * the progress bar, then progress bar will be showed in next line without overwrite logs.
    +   */
    +  private def show(stages: Seq[SparkStageInfo]) {
    +    System.err.print("\r")
    +    val width = TerminalWidth / stages.size
    +    stages.foreach { s =>
    +      val total = s.numTasks()
    +      val header = s"[Stage ${s.stageId()}:"
    +      val tailer = s"(${s.numCompletedTasks()} + ${s.numActiveTasks()}) / $total]"
    +      val w = width - header.size - tailer.size
    +      val bar = if (w > 0) {
    +        val percent = w * s.numCompletedTasks() / total
    +        (0 until w).map { i =>
    +          if (i < percent) "=" else if (i == percent) ">" else " "
    +        }.mkString("")
    +      } else {
    +        ""
    +      }
    +      System.err.print(header + bar + tailer)
    +    }
    +  }
    +
    +  /**
    +   * Clear the progress bar if showed.
    +   */
    +  private def clear() = {
    +    if (hasShowed) {
    +      System.err.printf("\r" + " " * TerminalWidth + "\r")
    --- End diff --
    
    does this work on windows? Is there a constant we should be using instead of `\r`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3029#discussion_r20468037
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/ConsoleProgressBar.scala ---
    @@ -0,0 +1,115 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.ui
    +
    +import java.util.{Timer, TimerTask}
    +
    +import org.apache.spark._
    +
    +/**
    + * ConsoleProgressBar shows the progress of stages in the next line of the console. It poll the
    + * status of active stages from `sc.statusTracker` in every 200ms, the progress bar will be showed
    + * up after the stage has ran at least 500ms. If multiple stages run in the same time, the status
    + * of them will be combined together, showed in one line.
    + */
    +private[spark] class ConsoleProgressBar(sc: SparkContext) extends Logging {
    +
    +  // Update period of progress bar, in milli seconds
    +  val UPDATE_PERIOD = 200L
    +  // Delay to show up a progress bar, in milli seconds
    +  val DELAY_SHOW_UP = 500L
    +
    +  // The width of terminal
    +  val TerminalWidth = if (!sys.env.getOrElse("COLUMNS", "").isEmpty) {
    +    sys.env.get("COLUMNS").get.toInt
    +  } else {
    +    80
    +  }
    +
    +  var hasShowed = false
    +  var lastFinishTime = 0L
    +
    +  // Schedule a refresh thread to run in every 200ms
    +  private val timer = new Timer("refresh progress", true)
    +  timer.schedule(new TimerTask{
    +    override def run() {
    +      refresh()
    +    }
    +  }, DELAY_SHOW_UP, UPDATE_PERIOD)
    +
    +  /**
    +   * Try to refresh the progress bar in every cycle
    +   */
    +  private def refresh(): Unit = synchronized {
    +    val now = System.currentTimeMillis()
    +    if (now - lastFinishTime < DELAY_SHOW_UP) {
    +      return
    +    }
    +    val stageIds = sc.statusTracker.getActiveStageIds()
    +    val stages = stageIds.map(sc.statusTracker.getStageInfo).flatten.filter(_.numTasks() > 1)
    +      .filter(now - _.submissionTime() > DELAY_SHOW_UP).sortBy(_.stageId())
    +    if (stages.size > 0) {
    +      show(stages.take(3))  // display at most 3 stages in same time
    +      hasShowed = true
    +    }
    +  }
    +
    +  /**
    +   * Show progress bar in console. The progress bar is displayed in the next line
    +   * after your last output, keeps overwriting itself to hold in one line. The logging will follow
    +   * the progress bar, then progress bar will be showed in next line without overwrite logs.
    +   */
    +  private def show(stages: Seq[SparkStageInfo]) {
    +    System.err.print("\r")
    +    val width = TerminalWidth / stages.size
    +    stages.foreach { s =>
    +      val total = s.numTasks()
    +      val header = s"[Stage ${s.stageId()}:"
    +      val tailer = s"(${s.numCompletedTasks()} + ${s.numActiveTasks()}) / $total]"
    +      val w = width - header.size - tailer.size
    +      val bar = if (w > 0) {
    +        val percent = w * s.numCompletedTasks() / total
    +        (0 until w).map { i =>
    +          if (i < percent) "=" else if (i == percent) ">" else " "
    +        }.mkString("")
    +      } else {
    +        ""
    +      }
    +      System.err.print(header + bar + tailer)
    +    }
    +  }
    +
    +  /**
    +   * Clear the progress bar if showed.
    +   */
    +  private def clear() = {
    +    if (hasShowed) {
    +      System.err.printf("\r" + " " * TerminalWidth + "\r")
    +      hasShowed = false
    +    }
    +  }
    +
    +  /**
    +   * Mark all the stages as finished, clear the progress bar if showed, then the progress will not
    +   * interwave with output of jobs.
    --- End diff --
    
    interleave? interweave? I don't think interwave is a word


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-62475673
  
    I was just about to suggest the same thing .  So I admit it seemed a lot cooler to have the console keep updating, but I agree with their concerns.
    
    As a slight modification of @kayousterhout 's proposal, what if instead of logging for *every* update, you log whenever some time unit have elapsed (eg., 1 second) *and* some unit of work has been done (that is, both conditions must be true, not either for either condition)?  That way the logs dont' get clobbered with lots of little updates -- if you have 1000 tasks but the whole thing finishes in under 1 second, you really don't to monitor the progress in the logs.  But by just using the normal logging mechanism, its still controllable via normal logging mechanism & plays nicely.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61199091
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22584/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63204911
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23435/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63208925
  
      [Test build #23438 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23438/consoleFull) for   PR 3029 at commit [`38c42f1`](https://github.com/apache/spark/commit/38c42f18ab24c8e3aecce0e39f0f2fa996627ec4).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63244854
  
    By the way, @squito, I don't think this can go through log4j, it needs to access the jline console interface directly. This feature will just be controlled by a flag and users can decide whether to use it or not. By default we turn it on when the log level is WARN or higher, since at INFO level it's hard to display progress given all the other messages that are interpolated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63204439
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23434/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61360812
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22685/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63254818
  
      [Test build #23448 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23448/consoleFull) for   PR 3029 at commit [`a353e85`](https://github.com/apache/spark/commit/a353e8567d6b4fdcdeaabb03d8f8ca1f6b6ddbad).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61356828
  
      [Test build #503 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/503/consoleFull) for   PR 3029 at commit [`e6bb189`](https://github.com/apache/spark/commit/e6bb1895c51bef6201296601725821d12d4deb8e).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61394842
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22739/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63257525
  
      [Test build #23454 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23454/consoleFull) for   PR 3029 at commit [`2e90f75`](https://github.com/apache/spark/commit/2e90f7599779fe1e51b7b39d0d39e7c77260e47a).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63149772
  
    Sorry for my delay in responding ...
    
    (a) I think this DOES add a lot of value over the std INFO logging.  One log line per task completion is *much* noisier than what I'm proposing here, for a job with hundreds of tasks that completes in a few seconds (at least for me, a very common case).
    
    (b) I think changing the logging configuration to be INFO for this, and leaving at WARN for everything else, is pretty easy.  The other comments above already request this be moved into a `SparkListener`, so you would just add a line:
    
    ```
    log4j.logger.org.apache.spark.reporter.JobProgressConsoleReporter=INFO
    ```
    
    (though I realize now that I actually am not sure where the logging setup for the examples is configured ...)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63383666
  
    @davies @pwendell The three-stage solution looks reasonable to me!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-62499404
  
    I totally see the appeal of the one-progress bar (hence my initial excitement when I tried this out).  But if it doesn't play nicely with logging & multiple stages, this seems like a very small improvement for the initial user experience, but a big headache for serious users.
    
    I don't think its really that much worse if your example changes to
    
    ```
    scala> rdd.count()
    [INFO] Stage 1 [=>                                         ]
    [INFO] Stage 1 [==========>                                ]
    [INFO] Stage 1 finished in 2 seconds (med/avg/xxxx)
    res0: Int = xxxx
    ```
    
    In this case its just a couple more lines.  If the stage took longer, than it would be even more lines, but that seems ok, since its not that much noise per unit time.
    
    If the code were moved to a separate SparkListener implementation, than it could have its own log level, and even be INFO by default (so we leave everything else as WARN).  INFO for everything in spark is way too noisy for the average spark user, but that doesn't mean we can't use INFO for a few select classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63204653
  
      [Test build #521 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/521/consoleFull) for   PR 3029 at commit [`0cee236`](https://github.com/apache/spark/commit/0cee2368b09fb8167e0a992bccea8eb17257ad35).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3029#discussion_r20112968
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -569,6 +641,11 @@ private[spark] class TaskSetManager(
           logInfo("Ignoring task-finished event for " + info.id + " in stage " + taskSet.id +
             " because task " + index + " has already completed successfully")
         }
    +    if (showProgress) {
    +      showProgressBar(tasksSuccessful, numTasks)
    +    }
    +    sched.dagScheduler.taskEnded(
    --- End diff --
    
    What's the reason for moving this call?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61588063
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22845/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-62330615
  
    this is awesome!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63206812
  
      [Test build #23437 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23437/consoleFull) for   PR 3029 at commit [`ab87958`](https://github.com/apache/spark/commit/ab879587d6d230a044afeb3789170220533bb861).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3029#discussion_r20467763
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/ConsoleProgressBar.scala ---
    @@ -0,0 +1,115 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.ui
    +
    +import java.util.{Timer, TimerTask}
    +
    +import org.apache.spark._
    +
    +/**
    + * ConsoleProgressBar shows the progress of stages in the next line of the console. It poll the
    + * status of active stages from `sc.statusTracker` in every 200ms, the progress bar will be showed
    + * up after the stage has ran at least 500ms. If multiple stages run in the same time, the status
    + * of them will be combined together, showed in one line.
    + */
    +private[spark] class ConsoleProgressBar(sc: SparkContext) extends Logging {
    +
    +  // Update period of progress bar, in milli seconds
    +  val UPDATE_PERIOD = 200L
    +  // Delay to show up a progress bar, in milli seconds
    +  val DELAY_SHOW_UP = 500L
    +
    +  // The width of terminal
    +  val TerminalWidth = if (!sys.env.getOrElse("COLUMNS", "").isEmpty) {
    +    sys.env.get("COLUMNS").get.toInt
    +  } else {
    +    80
    +  }
    +
    +  var hasShowed = false
    +  var lastFinishTime = 0L
    +
    +  // Schedule a refresh thread to run in every 200ms
    +  private val timer = new Timer("refresh progress", true)
    +  timer.schedule(new TimerTask{
    +    override def run() {
    +      refresh()
    +    }
    +  }, DELAY_SHOW_UP, UPDATE_PERIOD)
    +
    +  /**
    +   * Try to refresh the progress bar in every cycle
    +   */
    +  private def refresh(): Unit = synchronized {
    +    val now = System.currentTimeMillis()
    +    if (now - lastFinishTime < DELAY_SHOW_UP) {
    +      return
    +    }
    +    val stageIds = sc.statusTracker.getActiveStageIds()
    +    val stages = stageIds.map(sc.statusTracker.getStageInfo).flatten.filter(_.numTasks() > 1)
    +      .filter(now - _.submissionTime() > DELAY_SHOW_UP).sortBy(_.stageId())
    +    if (stages.size > 0) {
    +      show(stages.take(3))  // display at most 3 stages in same time
    +      hasShowed = true
    +    }
    +  }
    +
    +  /**
    +   * Show progress bar in console. The progress bar is displayed in the next line
    +   * after your last output, keeps overwriting itself to hold in one line. The logging will follow
    +   * the progress bar, then progress bar will be showed in next line without overwrite logs.
    +   */
    +  private def show(stages: Seq[SparkStageInfo]) {
    +    System.err.print("\r")
    +    val width = TerminalWidth / stages.size
    +    stages.foreach { s =>
    +      val total = s.numTasks()
    +      val header = s"[Stage ${s.stageId()}:"
    +      val tailer = s"(${s.numCompletedTasks()} + ${s.numActiveTasks()}) / $total]"
    +      val w = width - header.size - tailer.size
    +      val bar = if (w > 0) {
    +        val percent = w * s.numCompletedTasks() / total
    +        (0 until w).map { i =>
    +          if (i < percent) "=" else if (i == percent) ">" else " "
    +        }.mkString("")
    +      } else {
    +        ""
    +      }
    +      System.err.print(header + bar + tailer)
    +    }
    +  }
    +
    +  /**
    +   * Clear the progress bar if showed.
    +   */
    +  private def clear() = {
    --- End diff --
    
    Add `: Unit` return type, here and other places


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61353655
  
      [Test build #22669 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22669/consoleFull) for   PR 3029 at commit [`e6bb189`](https://github.com/apache/spark/commit/e6bb1895c51bef6201296601725821d12d4deb8e).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63261770
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23454/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61360811
  
      [Test build #22685 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22685/consoleFull) for   PR 3029 at commit [`bc53d99`](https://github.com/apache/spark/commit/bc53d99d518d6fafd607c617d0915c7a2f9eee85).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63204979
  
      [Test build #23437 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23437/consoleFull) for   PR 3029 at commit [`ab87958`](https://github.com/apache/spark/commit/ab879587d6d230a044afeb3789170220533bb861).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63204768
  
    @JoshRosen @pwendell  @kayousterhout  @squito  I had re-implemented it using the new poll based progress api. The progress bar is much simplified as the original one, remove the progress in title (which did not work well with pyspark), remove the stage summary.
    
    Once the job/stage is finished, the progress bar will be disappear.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63244719
  
    Hey @davies - I played with this a bit and I actually found the behavior around concurrent stages might not be great. The reason is that the set of active stages will change as stages complete, and then it will suddenly change the slider significantly once one stage completes. Here is an example workload:
    
    ```
    > ./bin/spark-shell --conf spark.scheduler.mode=FAIR
    scala> val a = sc.makeRDD(1 to 1000, 10000).map(x => (x, x)).reduceByKey(_ + _)
    scala> val b = sc.makeRDD(1 to 1000, 10000).map(x => (x, x)).reduceByKey(_ + _)
    scala> a.union(b).count()
    ```
    
    Probably what we want in the longer term is to have a slider for the entire job rather than stages. But anyways, I'd prefer either the "flip flop" behavior or have multiple stacked progress bars. @kayousterhout didn't like the "flip flop" but I find it more understandable than what is here now. Since this is an opt-in feature I think it's fine to have some version that can go in now and then refine it later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3029#discussion_r20414049
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/ConsoleProgressBar.scala ---
    @@ -0,0 +1,143 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.ui
    +
    +import java.util.{Timer, TimerTask}
    +import scala.collection.mutable.HashMap
    +
    +import org.apache.spark._
    +import org.apache.spark.scheduler.{SparkListenerStageSubmitted, SparkListenerStageCompleted, SparkListener}
    +
    +/**
    + * ConsoleProgressBar shows the progress of stages in the next line of the console. It poll the
    + * status of active stages from `sc.statusTracker` in every 200ms, the progress bar will be showed
    + * up after the stage has ran at least 500ms. If multiple stages run in the same time, the status
    + * of them will be combined together, showed in one line.
    + */
    +private[spark] class ConsoleProgressBar(sc: SparkContext) extends Logging {
    +
    +  // Update period of progress bar, in milli seconds
    +  val UPDATE_PERIOD = 200L
    +  // Delay to show up a progress bar, in milli seconds
    +  val DELAY_SHOW_UP = 500L
    +  // The width of terminal
    +  val TerminalWidth = if (!sys.env.getOrElse("COLUMNS", "").isEmpty) {
    +    sys.env.get("COLUMNS").get.toInt
    +  } else {
    +    80
    +  }
    +
    +  @volatile var hasShowed = false
    +
    +  /**
    +   * Track the life cycle of stages
    +   */
    +  val activeStages = new HashMap[Int, Long]()
    +
    +  private class StageProgressListener extends SparkListener {
    +    override def onStageSubmitted(stageSubmitted: SparkListenerStageSubmitted) = {
    --- End diff --
    
    Instead of building your own listener here, why don't we just add `submissionTime` to the `SparkStageInfo` class?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61587948
  
      [Test build #22846 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22846/consoleFull) for   PR 3029 at commit [`e1f524d`](https://github.com/apache/spark/commit/e1f524d9239bd94099fdd9c227ac8a4b5dda70ba).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3029#discussion_r20112906
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -528,7 +601,7 @@ private[spark] class TaskSetManager(
         sched.dagScheduler.taskGettingResult(info)
       }
     
    -  /**
    +  /*
    --- End diff --
    
    Can you revert this style change, which seems unrelated to what you did?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61211033
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22589/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61581372
  
      [Test build #22845 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22845/consoleFull) for   PR 3029 at commit [`a60477c`](https://github.com/apache/spark/commit/a60477c274675a35276a6e924656db45935b6144).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61393375
  
      [Test build #22737 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22737/consoleFull) for   PR 3029 at commit [`ea49fe0`](https://github.com/apache/spark/commit/ea49fe07d681d3821110954342f1c17cbbcf7ccc).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63261768
  
      [Test build #23454 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23454/consoleFull) for   PR 3029 at commit [`2e90f75`](https://github.com/apache/spark/commit/2e90f7599779fe1e51b7b39d0d39e7c77260e47a).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3029#discussion_r20469995
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/ConsoleProgressBar.scala ---
    @@ -0,0 +1,115 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.ui
    +
    +import java.util.{Timer, TimerTask}
    +
    +import org.apache.spark._
    +
    +/**
    + * ConsoleProgressBar shows the progress of stages in the next line of the console. It poll the
    + * status of active stages from `sc.statusTracker` in every 200ms, the progress bar will be showed
    + * up after the stage has ran at least 500ms. If multiple stages run in the same time, the status
    + * of them will be combined together, showed in one line.
    + */
    +private[spark] class ConsoleProgressBar(sc: SparkContext) extends Logging {
    +
    +  // Update period of progress bar, in milli seconds
    +  val UPDATE_PERIOD = 200L
    +  // Delay to show up a progress bar, in milli seconds
    +  val DELAY_SHOW_UP = 500L
    +
    +  // The width of terminal
    +  val TerminalWidth = if (!sys.env.getOrElse("COLUMNS", "").isEmpty) {
    --- End diff --
    
    It's possible that the COLUMNS could be "", so sys.env.contains() will not work


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63399151
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23506/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63382985
  
    Super cool. I left mostly minor comments. Otherwise it LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61349718
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22656/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3029#discussion_r20467830
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/ConsoleProgressBar.scala ---
    @@ -0,0 +1,115 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.ui
    +
    +import java.util.{Timer, TimerTask}
    +
    +import org.apache.spark._
    +
    +/**
    + * ConsoleProgressBar shows the progress of stages in the next line of the console. It poll the
    + * status of active stages from `sc.statusTracker` in every 200ms, the progress bar will be showed
    + * up after the stage has ran at least 500ms. If multiple stages run in the same time, the status
    + * of them will be combined together, showed in one line.
    + */
    +private[spark] class ConsoleProgressBar(sc: SparkContext) extends Logging {
    +
    +  // Update period of progress bar, in milli seconds
    +  val UPDATE_PERIOD = 200L
    +  // Delay to show up a progress bar, in milli seconds
    +  val DELAY_SHOW_UP = 500L
    --- End diff --
    
    Isn't this more like `SHOW_DELAY`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63385857
  
    The new thing for multiple stages is really nice. I also think the new architecture is great. I made some minor comments, but overall looks good.
    
    On thing, does this work in the SQL cli? If not, we can have a follow-up task be making it work there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63251878
  
      [Test build #23448 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23448/consoleFull) for   PR 3029 at commit [`a353e85`](https://github.com/apache/spark/commit/a353e8567d6b4fdcdeaabb03d8f8ca1f6b6ddbad).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63261422
  
      [Test build #523 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/523/consoleFull) for   PR 3029 at commit [`a353e85`](https://github.com/apache/spark/commit/a353e8567d6b4fdcdeaabb03d8f8ca1f6b6ddbad).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61583786
  
      [Test build #22850 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22850/consoleFull) for   PR 3029 at commit [`6fd30ff`](https://github.com/apache/spark/commit/6fd30ff1a1935c26d716271d729a38e26b953e49).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61589372
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22850/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3029#discussion_r20414811
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -231,6 +231,13 @@ class SparkContext(config: SparkConf) extends Logging {
     
       val statusTracker = new SparkStatusTracker(this)
     
    +  private[spark] val progressBar: Option[ConsoleProgressBar] =
    +    if (conf.getBoolean("spark.ui.showConsoleProgress", true)) {
    --- End diff --
    
    It's disabled in ConsoleProgressBar (because we may like to keep the progress in title when logging level is INFO).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63204747
  
      [Test build #23436 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23436/consoleFull) for   PR 3029 at commit [`30ac852`](https://github.com/apache/spark/commit/30ac852e87cbc3d2017567c21e1a895b8828fbe1).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3029#discussion_r20467968
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/ConsoleProgressBar.scala ---
    @@ -0,0 +1,115 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.ui
    +
    +import java.util.{Timer, TimerTask}
    +
    +import org.apache.spark._
    +
    +/**
    + * ConsoleProgressBar shows the progress of stages in the next line of the console. It poll the
    + * status of active stages from `sc.statusTracker` in every 200ms, the progress bar will be showed
    + * up after the stage has ran at least 500ms. If multiple stages run in the same time, the status
    --- End diff --
    
    I wouldn't hard-code the time in the comment. Just say "periodically" here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61393558
  
      [Test build #22739 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22739/consoleFull) for   PR 3029 at commit [`5cae3f2`](https://github.com/apache/spark/commit/5cae3f22bd187d56b5bd0067dd9129f22ced4941).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63388317
  
      [Test build #23506 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23506/consoleFull) for   PR 3029 at commit [`95336d5`](https://github.com/apache/spark/commit/95336d575f3dc2a6e277a0d8778797c106a6098f).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63206407
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23436/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61204861
  
      [Test build #22589 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22589/consoleFull) for   PR 3029 at commit [`5df26bb`](https://github.com/apache/spark/commit/5df26bbb6587972e6b02b765d0e831c05e58d0d2).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61353658
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22669/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3029#discussion_r20414302
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -231,6 +231,13 @@ class SparkContext(config: SparkConf) extends Logging {
     
       val statusTracker = new SparkStatusTracker(this)
     
    +  private[spark] val progressBar: Option[ConsoleProgressBar] =
    +    if (conf.getBoolean("spark.ui.showConsoleProgress", true)) {
    --- End diff --
    
    should we disable this when the log level is set to info?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63208927
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23438/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63206817
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23437/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-62454596
  
    This is a very nifty feature. However, it's not great to have modifications to the TaskSetManager and other scheduler internals for this presentation-related logic. It would be good if this could instead use our new progress reporting api (/cc @JoshRosen) and if we need to modify one or two things about that API we can do it. /cc @kayousterhout who maintains the scheduler code. My suggestion was to move a lot of this logic outside of the scheduler.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63254820
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23448/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63251889
  
    @pwendell  @kayousterhout  It can show mutiple stages (at most 3) in one line in the same time now, it looks like
    ```
    [Stage 0:====>          (316 + 4) / 1000][Stage 1:>                (0 + 0) / 1000][Stage 2:>                (0 + 0) / 1000]]]
    ```
    ```
    [Stage 2:=============================>                                                                     (294 + 4) / 1000]
    ```
    
    If there are more than three concurrent stages, the first three of them will be showed. Once a stage is finished, it will be removed.
    
    Does this work for both of you?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #3029: [SPARK-4017] show progress bar in console

Posted by nchammas <gi...@git.apache.org>.
Github user nchammas commented on the issue:

    https://github.com/apache/spark/pull/3029
  
    `spark.ui.showConsoleProgress=false` works for me. I pass it via `--conf` to `spark-submit`. Try that if you haven't already.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #3029: [SPARK-4017] show progress bar in console

Posted by leonsoft <gi...@git.apache.org>.
Github user leonsoft commented on the issue:

    https://github.com/apache/spark/pull/3029
  
    I run it in eclispe. Every interval I see 200+ lines of "===>". 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61588061
  
      [Test build #22845 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22845/consoleFull) for   PR 3029 at commit [`a60477c`](https://github.com/apache/spark/commit/a60477c274675a35276a6e924656db45935b6144).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class ExecutorLostFailure(execId: String) extends TaskFailedReason `
      * `class NullType(PrimitiveType):`
      * `class DecimalType(DataType):`
      * `        //       in some cases, such as when a class is enclosed in an object (in which case`
      * `  case class ScalaUdfBuilder[T: TypeTag](f: AnyRef) `
      * `case class UnscaledValue(child: Expression) extends UnaryExpression `
      * `case class MakeDecimal(child: Expression, precision: Int, scale: Int) extends UnaryExpression `
      * `case class MutableLiteral(var value: Any, dataType: DataType, nullable: Boolean = true)`
      * `abstract class GenericStrategy[PhysicalPlan <: TreeNode[PhysicalPlan]] extends Logging `
      * `case class PrecisionInfo(precision: Int, scale: Int)`
      * `case class DecimalType(precisionInfo: Option[PrecisionInfo]) extends FractionalType `
      * `abstract class UserDefinedType[UserType] extends DataType with Serializable `
      * `final class Decimal extends Ordered[Decimal] with Serializable `
      * `  trait DecimalIsConflicted extends Numeric[Decimal] `
      * `public abstract class UserDefinedType<UserType> extends DataType implements Serializable `
      * `trait RunnableCommand extends logical.Command `
      * `case class ExecutedCommand(cmd: RunnableCommand) extends SparkPlan `
      * `  protected case class Keyword(str: String)`
      * `            sys.error(s"Failed to load class for data source: $provider")`
      * `case class EqualTo(attribute: String, value: Any) extends Filter`
      * `case class GreaterThan(attribute: String, value: Any) extends Filter`
      * `case class GreaterThanOrEqual(attribute: String, value: Any) extends Filter`
      * `case class LessThan(attribute: String, value: Any) extends Filter`
      * `case class LessThanOrEqual(attribute: String, value: Any) extends Filter`
      * `trait RelationProvider `
      * `abstract class BaseRelation `
      * `abstract class TableScan extends BaseRelation `
      * `abstract class PrunedScan extends BaseRelation `
      * `abstract class PrunedFilteredScan extends BaseRelation `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3029#discussion_r20468318
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/ConsoleProgressBar.scala ---
    @@ -0,0 +1,115 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.ui
    +
    +import java.util.{Timer, TimerTask}
    +
    +import org.apache.spark._
    +
    +/**
    + * ConsoleProgressBar shows the progress of stages in the next line of the console. It poll the
    + * status of active stages from `sc.statusTracker` in every 200ms, the progress bar will be showed
    + * up after the stage has ran at least 500ms. If multiple stages run in the same time, the status
    + * of them will be combined together, showed in one line.
    + */
    +private[spark] class ConsoleProgressBar(sc: SparkContext) extends Logging {
    +
    +  // Update period of progress bar, in milli seconds
    +  val UPDATE_PERIOD = 200L
    +  // Delay to show up a progress bar, in milli seconds
    +  val DELAY_SHOW_UP = 500L
    +
    +  // The width of terminal
    +  val TerminalWidth = if (!sys.env.getOrElse("COLUMNS", "").isEmpty) {
    --- End diff --
    
    COLUMNS is a variable exported by bash via:
    ```
    export COLUMNS
    ```
    It's not enabled by default, so we add it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61581788
  
      [Test build #22846 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22846/consoleFull) for   PR 3029 at commit [`e1f524d`](https://github.com/apache/spark/commit/e1f524d9239bd94099fdd9c227ac8a4b5dda70ba).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3029#discussion_r20468644
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/ConsoleProgressBar.scala ---
    @@ -0,0 +1,115 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.ui
    +
    +import java.util.{Timer, TimerTask}
    +
    +import org.apache.spark._
    +
    +/**
    + * ConsoleProgressBar shows the progress of stages in the next line of the console. It poll the
    + * status of active stages from `sc.statusTracker` in every 200ms, the progress bar will be showed
    + * up after the stage has ran at least 500ms. If multiple stages run in the same time, the status
    + * of them will be combined together, showed in one line.
    + */
    +private[spark] class ConsoleProgressBar(sc: SparkContext) extends Logging {
    --- End diff --
    
    Right now, it only depends on statusTracker, but it may show some metrics later, so it's better to have `SparkContext` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61211030
  
      [Test build #22589 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22589/consoleFull) for   PR 3029 at commit [`5df26bb`](https://github.com/apache/spark/commit/5df26bbb6587972e6b02b765d0e831c05e58d0d2).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61349714
  
    **[Test build #22656 timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22656/consoleFull)**     for PR 3029 at commit [`7e7d4e7`](https://github.com/apache/spark/commit/7e7d4e784864baa4168819b0fc3fc01a01abc1cd)     after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-62473806
  
    @pwendell @kayousterhout , Thanks for review this.
    
    Console is a character device, all the graph are implemented as an stream of characters. It's very hard to show a near realtime progress bar without discruptting the logging (they don't know each other), for example:  1) if progress bar is showed after cursor of logging, then future logging will overwrite the progress bar (part of) 2) if progress bar is showed after the cursor of logging (such as top line of console), then old loggings (also the output of results) will be overwritten by progress bar.
    
    So the current approach is that the progress bar is only showed when the logging level is WARN (or higher). If the logging level is DEBUG or INFO, users can get the progress info from logging, also it's hard the manage these twos. The progress bar is showed between call a action API and it returns, so it's expected that there is no output/logging in this period, the console will not become mess. If we move to listener based implementation, then it's hard to cleanup the progress bar before the api `return`, it's also the reason that I move `sched.dagScheduler.taskEnded` after showProgressBar().
    
    It's did not work properly when a job has multiple concurrent stages, the concurrent progress bar will overwrite each other randomly. Each bar will begin with it's stage id, so it's still kind of readable.
    
    I agree that putting the code of progress bar into TaskSetManager is not good idea, I will move them out after we finalize other stuff (how to deal with logging, use listener api or not). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-62458897
  
    This is definitely a cool/useful feature!  A few things:
    
    (1) As Patrick alluded to, I think it could and should be able to implemented as a SparkListener, which is definitely preferable to adding, as @aarondav wisely said, "a bunch of random code in the middle of TaskSetManager".  The scheduling code is already quite complex and we should try to avoid adding unnecessary complexity there.
    
    (2) I tried this out and and it doesn't seem to work properly when a job has multiple concurrent stages, because of the way lines get overwritten (only one of the stages gets shown).
    
    (3) This bar doesn't show up if any other messages get logged to the console (e.g., if info logging is turned on).  Is there a way to make this so that it's always the bottom thing shown in the console, after other messages?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3029#discussion_r19702293
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -521,6 +529,51 @@ private[spark] class TaskSetManager(
         sched.dagScheduler.taskGettingResult(info)
       }
     
    +  private def progressBar(curr: Int, total: Int): Unit = {
    --- End diff --
    
    Please add method comment describing what this is doing and what the parameters are (it is, after all, a bunch of random code in the middle of TaskSetManager).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63249917
  
      [Test build #23442 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23442/consoleFull) for   PR 3029 at commit [`0081bcc`](https://github.com/apache/spark/commit/0081bcca2d67097c33ecbd0052e72cda8889935b).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3029#discussion_r20467888
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/ConsoleProgressBar.scala ---
    @@ -0,0 +1,115 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.ui
    +
    +import java.util.{Timer, TimerTask}
    +
    +import org.apache.spark._
    +
    +/**
    + * ConsoleProgressBar shows the progress of stages in the next line of the console. It poll the
    + * status of active stages from `sc.statusTracker` in every 200ms, the progress bar will be showed
    + * up after the stage has ran at least 500ms. If multiple stages run in the same time, the status
    + * of them will be combined together, showed in one line.
    + */
    +private[spark] class ConsoleProgressBar(sc: SparkContext) extends Logging {
    +
    +  // Update period of progress bar, in milli seconds
    +  val UPDATE_PERIOD = 200L
    +  // Delay to show up a progress bar, in milli seconds
    +  val DELAY_SHOW_UP = 500L
    +
    +  // The width of terminal
    +  val TerminalWidth = if (!sys.env.getOrElse("COLUMNS", "").isEmpty) {
    --- End diff --
    
    Also, is `COLUMNS` a thing you added or something that's already there? If it's something you added maybe we should rename it to `CONSOLE_COLUMN_WIDTH` or something


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3029#discussion_r20468419
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/ConsoleProgressBar.scala ---
    @@ -0,0 +1,115 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.ui
    +
    +import java.util.{Timer, TimerTask}
    +
    +import org.apache.spark._
    +
    +/**
    + * ConsoleProgressBar shows the progress of stages in the next line of the console. It poll the
    + * status of active stages from `sc.statusTracker` in every 200ms, the progress bar will be showed
    + * up after the stage has ran at least 500ms. If multiple stages run in the same time, the status
    + * of them will be combined together, showed in one line.
    + */
    +private[spark] class ConsoleProgressBar(sc: SparkContext) extends Logging {
    +
    +  // Update period of progress bar, in milli seconds
    +  val UPDATE_PERIOD = 200L
    +  // Delay to show up a progress bar, in milli seconds
    +  val DELAY_SHOW_UP = 500L
    --- End diff --
    
    This is the delay for first show up, I did not figure out the right now for it. `UPDATE_DELAY` may be confusing with UPDATE_PERIOD


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #3029: [SPARK-4017] show progress bar in console

Posted by leonsoft <gi...@git.apache.org>.
Github user leonsoft commented on the issue:

    https://github.com/apache/spark/pull/3029
  
    How to turn it off on earth? Below does not work.
    
    ```
    log4j.rootCategory=ERROR, console
    spark.driver.showConsoleProgress=false
    spark.ui.showConsoleProgress=false
    ```
    
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3029#discussion_r19709497
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -521,6 +529,51 @@ private[spark] class TaskSetManager(
         sched.dagScheduler.taskGettingResult(info)
       }
     
    +  private def progressBar(curr: Int, total: Int): Unit = {
    +    val now = clock.getTime()
    +    // Only update title once in one second
    +    if (now - lastUpdate < 100 && curr < total) {
    +      return
    +    }
    +    lastUpdate = now
    +
    +    // show progress in title
    +    if (Terminal.getTerminal.isANSISupported) {
    +      val ESC = "\033"
    +      val title = if (curr < total) {
    +        s"Spark Job: $curr/$total Finished, $runningTasks are running"
    +      } else {
    +        s"Spark Job: Finished in ${Utils.msDurationToString(now - startTime)}"
    +      }
    +      console.printf(s"$ESC]0; $title \007")
    +    }
    +
    +    // show one line progress bar
    +    if (!log.isInfoEnabled) {
    +      if (curr < total) {
    +        val header = s"Stage $stageId: ["
    +        val tailer = s"] $curr+$runningTasks/$total - ${Utils.msDurationToString(now - startTime)}"
    +        val width = Terminal.getTerminal.getTerminalWidth - header.size - tailer.size
    +        val percent = curr * width / total;
    +        val bar = (0 until width).map { i =>
    +          if (i < percent) "=" else if (i==percent) ">" else " "
    +        }.mkString("")
    +        console.printf("\r" + header + bar + tailer)
    --- End diff --
    
    Either stdout or stderr could be redirect to somewhere, but console is the real target.
    
    If last log line ws print, it will overwrite it. In most cases, logging will be writted as println.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63389796
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23504/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63206404
  
      [Test build #23436 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23436/consoleFull) for   PR 3029 at commit [`30ac852`](https://github.com/apache/spark/commit/30ac852e87cbc3d2017567c21e1a895b8828fbe1).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61359713
  
    @JoshRosen I had make some improvements: 1) finished the bar before print the result in scala shell 2) can interwave with logging better(will not overwrite each other) 3) will not show progress in jenkins (using console instead of stderr


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61359587
  
      [Test build #22685 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22685/consoleFull) for   PR 3029 at commit [`bc53d99`](https://github.com/apache/spark/commit/bc53d99d518d6fafd607c617d0915c7a2f9eee85).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3029#discussion_r19697163
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -521,6 +528,47 @@ private[spark] class TaskSetManager(
         sched.dagScheduler.taskGettingResult(info)
       }
     
    +  private def progressBar(curr: Int, total: Int): Unit = {
    +    val now = clock.getTime()
    +    // Only update title once in one second
    +    if (now - lastUpdate < 100 && curr < total) {
    +      return
    +    }
    +    val SetTitle = "\033]0;"
    +    val EndTitle = "\007"
    +    if (curr < total) {
    +      System.err.print(s"${SetTitle} Spark Job: $curr/$total Finished, " +
    +        s"$runningTasks are running ${EndTitle}")
    +
    +      if (!log.isInfoEnabled) {
    +        val used = (now - startTime) / 1000
    +        val header = s"Stage ${stageId}: ["
    +        val tailer = s"] ${curr}+${runningTasks}/${total} - ${used}s"
    +        val width = Terminal.getTerminal.getTerminalWidth - header.size - tailer.size
    +        val percent = curr * width / total;
    +        val bar = (0 until width).map { i =>
    +          if (i < percent) "=" else if (i==percent) ">" else " "
    +        }.mkString("")
    +        System.err.print(header + bar + tailer + s"\n${ANSICodes.up(0)}")
    +      }
    +    } else {
    +      System.err.print(s"${SetTitle} Spark Job: All Finished ${EndTitle}")
    +      if (!log.isInfoEnabled) {
    +        val used = (now  - startTime) / 1000
    +        val finishTimes = taskInfos.map(_._2.finishTime - startTime)
    +        val avg = finishTimes.sum / finishTimes.size / 1000
    +        val min = finishTimes.min / 1000
    +        val max = finishTimes.max / 1000
    +        val med = finishTimes.toSeq.sorted.slice(0, finishTimes.size / 2).last / 1000
    +        // erase current line
    +        System.err.print(" " * Terminal.getTerminal.getTerminalWidth + "\n" + ANSICodes.up(0))
    +        System.err.println(s"Stage ${stageId}: Finished in ${used}s with ${total} tasks " +
    +          s"(${min}/${med}/${avg}/${max}s).")
    --- End diff --
    
    Maybe we could explicitly say `min=*/median=*/avg=*/max=*` to make this clearer to useres?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61350386
  
      [Test build #22669 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22669/consoleFull) for   PR 3029 at commit [`e6bb189`](https://github.com/apache/spark/commit/e6bb1895c51bef6201296601725821d12d4deb8e).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-62474857
  
    I'm not sure if this is a good idea, but what if the progress bar printed a new line each time it advanced, and just used the normal logging infrastructure?  So the log would look something like:
    
    [INFO] Stage 1 [=>                                  
    [INFO] Stage 1 [==>
    [INFO] Stage 1 [===>
    
    and so on?  It's a little more verbose / less pretty to look at, but I think it would more cleanly handle both (a) playing nice with the existing info logging, and (b) showing info about multiple stages.  Thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63598077
  
    merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63206787
  
      [Test build #23438 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23438/consoleFull) for   PR 3029 at commit [`38c42f1`](https://github.com/apache/spark/commit/38c42f18ab24c8e3aecce0e39f0f2fa996627ec4).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63204436
  
      [Test build #23434 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23434/consoleFull) for   PR 3029 at commit [`0cee236`](https://github.com/apache/spark/commit/0cee2368b09fb8167e0a992bccea8eb17257ad35).
     * This patch **fails RAT tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3029#discussion_r20467846
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/ConsoleProgressBar.scala ---
    @@ -0,0 +1,115 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.ui
    +
    +import java.util.{Timer, TimerTask}
    +
    +import org.apache.spark._
    +
    +/**
    + * ConsoleProgressBar shows the progress of stages in the next line of the console. It poll the
    + * status of active stages from `sc.statusTracker` in every 200ms, the progress bar will be showed
    + * up after the stage has ran at least 500ms. If multiple stages run in the same time, the status
    + * of them will be combined together, showed in one line.
    + */
    +private[spark] class ConsoleProgressBar(sc: SparkContext) extends Logging {
    +
    +  // Update period of progress bar, in milli seconds
    +  val UPDATE_PERIOD = 200L
    +  // Delay to show up a progress bar, in milli seconds
    +  val DELAY_SHOW_UP = 500L
    +
    +  // The width of terminal
    +  val TerminalWidth = if (!sys.env.getOrElse("COLUMNS", "").isEmpty) {
    --- End diff --
    
    You can just do `sys.env.contains`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3029#discussion_r20414069
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/ConsoleProgressBar.scala ---
    @@ -0,0 +1,143 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.ui
    +
    +import java.util.{Timer, TimerTask}
    +import scala.collection.mutable.HashMap
    +
    +import org.apache.spark._
    +import org.apache.spark.scheduler.{SparkListenerStageSubmitted, SparkListenerStageCompleted, SparkListener}
    +
    +/**
    + * ConsoleProgressBar shows the progress of stages in the next line of the console. It poll the
    + * status of active stages from `sc.statusTracker` in every 200ms, the progress bar will be showed
    + * up after the stage has ran at least 500ms. If multiple stages run in the same time, the status
    + * of them will be combined together, showed in one line.
    + */
    +private[spark] class ConsoleProgressBar(sc: SparkContext) extends Logging {
    +
    +  // Update period of progress bar, in milli seconds
    +  val UPDATE_PERIOD = 200L
    +  // Delay to show up a progress bar, in milli seconds
    +  val DELAY_SHOW_UP = 500L
    +  // The width of terminal
    +  val TerminalWidth = if (!sys.env.getOrElse("COLUMNS", "").isEmpty) {
    +    sys.env.get("COLUMNS").get.toInt
    +  } else {
    +    80
    +  }
    +
    +  @volatile var hasShowed = false
    +
    +  /**
    +   * Track the life cycle of stages
    +   */
    +  val activeStages = new HashMap[Int, Long]()
    +
    +  private class StageProgressListener extends SparkListener {
    +    override def onStageSubmitted(stageSubmitted: SparkListenerStageSubmitted) = {
    +      activeStages.synchronized {
    +        activeStages.put(stageSubmitted.stageInfo.stageId, System.currentTimeMillis())
    +      }
    +    }
    +    override def onStageCompleted(stageCompleted: SparkListenerStageCompleted) = {
    +      activeStages.synchronized {
    +        activeStages.remove(stageCompleted.stageInfo.stageId)
    +        if (activeStages.isEmpty) {
    +          clearProgressBar()
    +        }
    +      }
    +    }
    +  }
    +  sc.listenerBus.addListener(new StageProgressListener)
    +
    +  // Schedule a update thread to run in every 200ms
    +  private val timer = new Timer("show progress", true)
    +  timer.schedule(new TimerTask{
    +    override def run() {
    +      var running = 0
    +      var finished = 0
    +      var tasks = 0
    +      var failed = 0
    +      val now = System.currentTimeMillis()
    +      val stageIds = sc.statusTracker.getActiveStageIds()
    +      stageIds.map(sc.statusTracker.getStageInfo).foreach{
    +        case Some(stage) =>
    +          activeStages.synchronized {
    +            // Don't show progress for stage which has only one task (useless),
    +            // also don't show progress for stage which had started in 500 ms
    +            if (stage.numTasks > 1 && activeStages.contains(stage.stageId)
    +              && now - activeStages(stage.stageId) > DELAY_SHOW_UP) {
    +              tasks += stage.numTasks
    +              running += stage.numActiveTasks
    +              finished += stage.numCompletedTasks
    +              failed += stage.numFailedTasks
    +            }
    +          }
    +      }
    +      if (tasks > 0) {
    +        showProgressBar(stageIds, tasks, running, finished, failed)
    +      }
    +    }
    +  }, DELAY_SHOW_UP, UPDATE_PERIOD)
    +
    +  /**
    +   * Show progress in console (also in title). The progress bar is displayed in the next line
    +   * after your last output, keeps overwriting itself to hold in one line. The logging will follow
    +   * the progress bar, then progress bar will be showed in next line without overwrite logs.
    +   */
    +  private def showProgressBar(stageIds: Seq[Int], total: Int, running: Int, finished: Int,
    +                              failed: Int): Unit = {
    +    // show progress of all stages in one line progress bar
    +    val ids = stageIds.mkString("/")
    +    if (!log.isInfoEnabled) {
    +      if (finished < total) {
    +        val header = s"Stage $ids: ["
    +        val tailer = s"] $finished + $running / $total"
    --- End diff --
    
    This output is a little confusing:
    
    ```
        ] 1834 + 4 / 1000
    ```
    
    What about adding parentheses to make the associativity more clear?
    
    ```
        ] (1834 + 4) / 1000
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63392560
  
    LGTM - however, I think this needs to be merged cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3029#discussion_r20467709
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/ConsoleProgressBar.scala ---
    @@ -0,0 +1,115 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.ui
    +
    +import java.util.{Timer, TimerTask}
    +
    +import org.apache.spark._
    +
    +/**
    + * ConsoleProgressBar shows the progress of stages in the next line of the console. It poll the
    + * status of active stages from `sc.statusTracker` in every 200ms, the progress bar will be showed
    + * up after the stage has ran at least 500ms. If multiple stages run in the same time, the status
    + * of them will be combined together, showed in one line.
    + */
    +private[spark] class ConsoleProgressBar(sc: SparkContext) extends Logging {
    +
    +  // Update period of progress bar, in milli seconds
    +  val UPDATE_PERIOD = 200L
    +  // Delay to show up a progress bar, in milli seconds
    +  val DELAY_SHOW_UP = 500L
    +
    +  // The width of terminal
    +  val TerminalWidth = if (!sys.env.getOrElse("COLUMNS", "").isEmpty) {
    +    sys.env.get("COLUMNS").get.toInt
    +  } else {
    +    80
    +  }
    +
    +  var hasShowed = false
    +  var lastFinishTime = 0L
    +
    +  // Schedule a refresh thread to run in every 200ms
    +  private val timer = new Timer("refresh progress", true)
    +  timer.schedule(new TimerTask{
    +    override def run() {
    +      refresh()
    +    }
    +  }, DELAY_SHOW_UP, UPDATE_PERIOD)
    +
    +  /**
    +   * Try to refresh the progress bar in every cycle
    +   */
    +  private def refresh(): Unit = synchronized {
    +    val now = System.currentTimeMillis()
    +    if (now - lastFinishTime < DELAY_SHOW_UP) {
    +      return
    +    }
    +    val stageIds = sc.statusTracker.getActiveStageIds()
    +    val stages = stageIds.map(sc.statusTracker.getStageInfo).flatten.filter(_.numTasks() > 1)
    +      .filter(now - _.submissionTime() > DELAY_SHOW_UP).sortBy(_.stageId())
    +    if (stages.size > 0) {
    +      show(stages.take(3))  // display at most 3 stages in same time
    +      hasShowed = true
    +    }
    +  }
    +
    +  /**
    +   * Show progress bar in console. The progress bar is displayed in the next line
    +   * after your last output, keeps overwriting itself to hold in one line. The logging will follow
    +   * the progress bar, then progress bar will be showed in next line without overwrite logs.
    +   */
    +  private def show(stages: Seq[SparkStageInfo]) {
    +    System.err.print("\r")
    +    val width = TerminalWidth / stages.size
    +    stages.foreach { s =>
    +      val total = s.numTasks()
    +      val header = s"[Stage ${s.stageId()}:"
    +      val tailer = s"(${s.numCompletedTasks()} + ${s.numActiveTasks()}) / $total]"
    +      val w = width - header.size - tailer.size
    +      val bar = if (w > 0) {
    +        val percent = w * s.numCompletedTasks() / total
    +        (0 until w).map { i =>
    +          if (i < percent) "=" else if (i == percent) ">" else " "
    +        }.mkString("")
    +      } else {
    +        ""
    +      }
    +      System.err.print(header + bar + tailer)
    +    }
    +  }
    +
    +  /**
    +   * Clear the progress bar if showed.
    +   */
    +  private def clear() = {
    +    if (hasShowed) {
    --- End diff --
    
    It more correct gramatically for this to be "isShown" rather than "hasShowed". This refers to whether there is currently a bar printed in the console, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63206309
  
      [Test build #521 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/521/consoleFull) for   PR 3029 at commit [`0cee236`](https://github.com/apache/spark/commit/0cee2368b09fb8167e0a992bccea8eb17257ad35).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3029#discussion_r20467949
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/ConsoleProgressBar.scala ---
    @@ -0,0 +1,115 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.ui
    +
    +import java.util.{Timer, TimerTask}
    +
    +import org.apache.spark._
    +
    +/**
    + * ConsoleProgressBar shows the progress of stages in the next line of the console. It poll the
    + * status of active stages from `sc.statusTracker` in every 200ms, the progress bar will be showed
    --- End diff --
    
    I wouldn't hard-code the time in the comment. Just say "periodically" here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61587953
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22846/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61198910
  
      [Test build #22584 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22584/consoleFull) for   PR 3029 at commit [`9e42208`](https://github.com/apache/spark/commit/9e42208c9acdfb3854ef7c6ec4dbdb12aed9dbc6).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61198507
  
    cc @JoshRosen @mateiz 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63389665
  
    @pwendell @andrewor14 I should had addressed you comments, please have another look, thanks!
    
    This should work in SQL Cli, also in Windows.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3029#discussion_r19702297
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -521,6 +529,51 @@ private[spark] class TaskSetManager(
         sched.dagScheduler.taskGettingResult(info)
       }
     
    +  private def progressBar(curr: Int, total: Int): Unit = {
    +    val now = clock.getTime()
    +    // Only update title once in one second
    +    if (now - lastUpdate < 100 && curr < total) {
    +      return
    +    }
    +    lastUpdate = now
    +
    +    // show progress in title
    +    if (Terminal.getTerminal.isANSISupported) {
    +      val ESC = "\033"
    +      val title = if (curr < total) {
    +        s"Spark Job: $curr/$total Finished, $runningTasks are running"
    +      } else {
    +        s"Spark Job: Finished in ${Utils.msDurationToString(now - startTime)}"
    +      }
    +      console.printf(s"$ESC]0; $title \007")
    +    }
    +
    +    // show one line progress bar
    +    if (!log.isInfoEnabled) {
    +      if (curr < total) {
    +        val header = s"Stage $stageId: ["
    +        val tailer = s"] $curr+$runningTasks/$total - ${Utils.msDurationToString(now - startTime)}"
    +        val width = Terminal.getTerminal.getTerminalWidth - header.size - tailer.size
    +        val percent = curr * width / total;
    +        val bar = (0 until width).map { i =>
    +          if (i < percent) "=" else if (i==percent) ">" else " "
    +        }.mkString("")
    +        console.printf("\r" + header + bar + tailer)
    --- End diff --
    
    I'm not familiar with the finer points of console, but does this overwrite the last log line? Or would it do so if the last log line was `print`'d instead of `println`'d?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63253165
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23442/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-62503514
  
    > But if it doesn't play nicely with logging & multiple stages, this seems like a very small improvement for the initial user experience, but a big headache for serious users.
    
    If the logging level is DEBUG/INFO, progress bar can not contribute more value than logging, so it's fine to disable it. In WARN level, the logging should be much less (even nothing), in case there is some logging, they will look like this
    ```
    [Stage 2] ==============>                                           ] 10 + 5/100 xxx
    [WARN] xxxxxxxx
    [Stage 2] =================>                                     ] 15 + 5/100 xxx
    ```
    I think it's not bad, especially it's not common cases.
    
    If there are multiple stages, which are not running concurrently (this is common cases), the progress bar will be showed stage by stage, looks like
    ```
    [Stage 1] Finished in xxx seconds. (med/avg/xxx)
    [Stage 2] =================>                                     ] 15 + 5/100 xxx
    ```
    
    We can improve the case in which multiple stages run concurrently, looks like
    ```
    [Stage 1/2/3] [=============>                                      ] 140+39/340 xxxx
    ```
    
    So, I can not agree with you that the current approach does not play nicely with logging/multiple stages.
    
    > In this case its just a couple more lines. If the stage took longer, than it would be even more lines, but that seems ok, since its not that much noise per unit time.
    
    Progress bar is useful for slow jobs, so it's expected to take long time (maybe more than 1 minute) to finish a stage, then the progress bar will occupy the whole screen (more than 40 lines), use need to scroll the screen to see previous output (results).
    
    If we use INFO for progress bar, we need to special trick to enable progress bar also disable others. If we can do this, we can filter out the progress logging without all others right now. It will not so easy to use to users, special for user who is not familiar with log4j.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3029#discussion_r20414182
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/ConsoleProgressBar.scala ---
    @@ -0,0 +1,143 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.ui
    +
    +import java.util.{Timer, TimerTask}
    +import scala.collection.mutable.HashMap
    +
    +import org.apache.spark._
    +import org.apache.spark.scheduler.{SparkListenerStageSubmitted, SparkListenerStageCompleted, SparkListener}
    +
    +/**
    + * ConsoleProgressBar shows the progress of stages in the next line of the console. It poll the
    + * status of active stages from `sc.statusTracker` in every 200ms, the progress bar will be showed
    + * up after the stage has ran at least 500ms. If multiple stages run in the same time, the status
    + * of them will be combined together, showed in one line.
    + */
    +private[spark] class ConsoleProgressBar(sc: SparkContext) extends Logging {
    +
    +  // Update period of progress bar, in milli seconds
    +  val UPDATE_PERIOD = 200L
    +  // Delay to show up a progress bar, in milli seconds
    +  val DELAY_SHOW_UP = 500L
    +  // The width of terminal
    +  val TerminalWidth = if (!sys.env.getOrElse("COLUMNS", "").isEmpty) {
    +    sys.env.get("COLUMNS").get.toInt
    +  } else {
    +    80
    +  }
    +
    +  @volatile var hasShowed = false
    +
    +  /**
    +   * Track the life cycle of stages
    +   */
    +  val activeStages = new HashMap[Int, Long]()
    +
    +  private class StageProgressListener extends SparkListener {
    +    override def onStageSubmitted(stageSubmitted: SparkListenerStageSubmitted) = {
    +      activeStages.synchronized {
    +        activeStages.put(stageSubmitted.stageInfo.stageId, System.currentTimeMillis())
    +      }
    +    }
    +    override def onStageCompleted(stageCompleted: SparkListenerStageCompleted) = {
    +      activeStages.synchronized {
    +        activeStages.remove(stageCompleted.stageInfo.stageId)
    +        if (activeStages.isEmpty) {
    +          clearProgressBar()
    +        }
    +      }
    +    }
    +  }
    +  sc.listenerBus.addListener(new StageProgressListener)
    +
    +  // Schedule a update thread to run in every 200ms
    +  private val timer = new Timer("show progress", true)
    +  timer.schedule(new TimerTask{
    +    override def run() {
    +      var running = 0
    +      var finished = 0
    +      var tasks = 0
    +      var failed = 0
    +      val now = System.currentTimeMillis()
    +      val stageIds = sc.statusTracker.getActiveStageIds()
    +      stageIds.map(sc.statusTracker.getStageInfo).foreach{
    +        case Some(stage) =>
    +          activeStages.synchronized {
    +            // Don't show progress for stage which has only one task (useless),
    +            // also don't show progress for stage which had started in 500 ms
    +            if (stage.numTasks > 1 && activeStages.contains(stage.stageId)
    +              && now - activeStages(stage.stageId) > DELAY_SHOW_UP) {
    +              tasks += stage.numTasks
    +              running += stage.numActiveTasks
    +              finished += stage.numCompletedTasks
    +              failed += stage.numFailedTasks
    +            }
    +          }
    +      }
    +      if (tasks > 0) {
    +        showProgressBar(stageIds, tasks, running, finished, failed)
    +      }
    +    }
    +  }, DELAY_SHOW_UP, UPDATE_PERIOD)
    +
    +  /**
    +   * Show progress in console (also in title). The progress bar is displayed in the next line
    +   * after your last output, keeps overwriting itself to hold in one line. The logging will follow
    +   * the progress bar, then progress bar will be showed in next line without overwrite logs.
    +   */
    +  private def showProgressBar(stageIds: Seq[Int], total: Int, running: Int, finished: Int,
    +                              failed: Int): Unit = {
    +    // show progress of all stages in one line progress bar
    +    val ids = stageIds.mkString("/")
    --- End diff --
    
    it might be good to comma-separate these. Other wise if you have "Stages 1/2" it could be read "1 out of 2"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-62476320
  
    @squito that's a good idea with the minimum time interval to avoid unnecessary clutter!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by davies <gi...@git.apache.org>.
Github user davies closed the pull request at:

    https://github.com/apache/spark/pull/3029


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61393355
  
      [Test build #22737 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22737/consoleFull) for   PR 3029 at commit [`ea49fe0`](https://github.com/apache/spark/commit/ea49fe07d681d3821110954342f1c17cbbcf7ccc).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63399144
  
      [Test build #23506 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23506/consoleFull) for   PR 3029 at commit [`95336d5`](https://github.com/apache/spark/commit/95336d575f3dc2a6e277a0d8778797c106a6098f).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63204432
  
      [Test build #23434 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23434/consoleFull) for   PR 3029 at commit [`0cee236`](https://github.com/apache/spark/commit/0cee2368b09fb8167e0a992bccea8eb17257ad35).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61357827
  
      [Test build #503 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/503/consoleFull) for   PR 3029 at commit [`e6bb189`](https://github.com/apache/spark/commit/e6bb1895c51bef6201296601725821d12d4deb8e).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3029#discussion_r20112947
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -554,12 +627,11 @@ private[spark] class TaskSetManager(
         val index = info.index
         info.markSuccessful()
         removeRunningTask(tid)
    -    sched.dagScheduler.taskEnded(
    -      tasks(index), Success, result.value(), result.accumUpdates, info, result.metrics)
         if (!successful(index)) {
           tasksSuccessful += 1
    -      logInfo("Finished task %s in stage %s (TID %d) in %d ms on %s (%d/%d)".format(
    -        info.id, taskSet.id, info.taskId, info.duration, info.host, tasksSuccessful, numTasks))
    +      logDebug("Finished task %s in stage %s (TID %d) in %.3fs on %s (%d/%d)".format(
    +        info.id, taskSet.id, info.taskId, info.duration/1000.0, info.host,
    --- End diff --
    
    Why did you change this message from ms to s?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61394840
  
      [Test build #22739 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22739/consoleFull) for   PR 3029 at commit [`5cae3f2`](https://github.com/apache/spark/commit/5cae3f22bd187d56b5bd0067dd9129f22ced4941).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61589367
  
      [Test build #22850 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22850/consoleFull) for   PR 3029 at commit [`6fd30ff`](https://github.com/apache/spark/commit/6fd30ff1a1935c26d716271d729a38e26b953e49).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class ExecutorLostFailure(execId: String) extends TaskFailedReason `
      * `class NullType(PrimitiveType):`
      * `class DecimalType(DataType):`
      * `        //       in some cases, such as when a class is enclosed in an object (in which case`
      * `  case class ScalaUdfBuilder[T: TypeTag](f: AnyRef) `
      * `case class UnscaledValue(child: Expression) extends UnaryExpression `
      * `case class MakeDecimal(child: Expression, precision: Int, scale: Int) extends UnaryExpression `
      * `case class MutableLiteral(var value: Any, dataType: DataType, nullable: Boolean = true)`
      * `abstract class GenericStrategy[PhysicalPlan <: TreeNode[PhysicalPlan]] extends Logging `
      * `case class PrecisionInfo(precision: Int, scale: Int)`
      * `case class DecimalType(precisionInfo: Option[PrecisionInfo]) extends FractionalType `
      * `abstract class UserDefinedType[UserType] extends DataType with Serializable `
      * `final class Decimal extends Ordered[Decimal] with Serializable `
      * `  trait DecimalIsConflicted extends Numeric[Decimal] `
      * `public abstract class UserDefinedType<UserType> extends DataType implements Serializable `
      * `trait RunnableCommand extends logical.Command `
      * `case class ExecutedCommand(cmd: RunnableCommand) extends SparkPlan `
      * `  protected case class Keyword(str: String)`
      * `            sys.error(s"Failed to load class for data source: $provider")`
      * `case class EqualTo(attribute: String, value: Any) extends Filter`
      * `case class GreaterThan(attribute: String, value: Any) extends Filter`
      * `case class GreaterThanOrEqual(attribute: String, value: Any) extends Filter`
      * `case class LessThan(attribute: String, value: Any) extends Filter`
      * `case class LessThanOrEqual(attribute: String, value: Any) extends Filter`
      * `trait RelationProvider `
      * `abstract class BaseRelation `
      * `abstract class TableScan extends BaseRelation `
      * `abstract class PrunedScan extends BaseRelation `
      * `abstract class PrunedFilteredScan extends BaseRelation `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-61393376
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22737/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63549732
  
    Okay pulling this in - thanks davies!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4017] show progress bar in console

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3029#issuecomment-63253164
  
      [Test build #23442 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23442/consoleFull) for   PR 3029 at commit [`0081bcc`](https://github.com/apache/spark/commit/0081bcca2d67097c33ecbd0052e72cda8889935b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org