You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Shixiong Zhu (JIRA)" <ji...@apache.org> on 2018/02/01 21:23:00 UTC
[jira] [Updated] (SPARK-23307) Spark UI should sort jobs/stages
with the completed timestamp before cleaning up them
[ https://issues.apache.org/jira/browse/SPARK-23307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shixiong Zhu updated SPARK-23307:
---------------------------------
Description:
When you have a long running job, it may be deleted from UI quickly when it completes, if you happen to run a small job after it. It's pretty annoying when you run lots of jobs in the same driver concurrently (e.g., running multiple Structured Streaming queries). We should sort jobs/stages with the completed timestamp before cleaning up them.
In 2.2, Spark has a separated buffer for completed jobs/stages, so it doesn't need to sort the jobs/stages.
What's the behavior I expect:
Set "spark.ui.retainedJobs" to 10 and run the following codes, job 0 should be kept in the Spark UI.
{code:java}
new Thread() {
override def run() {
// job 0
sc.makeRDD(1 to 1, 1).foreach { i =>
Thread.sleep(10000)
}
}
}.start()
Thread.sleep(1000)
for (_ <- 1 to 20) {
new Thread() {
override def run() {
sc.makeRDD(1 to 1, 1).foreach { i =>
}
}
}.start()
}
Thread.sleep(15000)
sc.makeRDD(1 to 1, 1).foreach { i =>
}
{code}
was:
When you have a long running job, it may be deleted from UI quickly when it completes, if you happen to run a small job after it. It's pretty annoying when you run lots of jobs in the same driver concurrently (e.g., running multiple Structured Streaming queries). We should sort jobs/stages with the completed timestamp before cleaning up them.
In 2.2, Spark has a separated buffer for completed jobs/stages, so it doesn't need to sort the jobs/stages.
What the behavior I expect:
Set "spark.ui.retainedJobs" to 10 and run the following codes, job 0 should be kept in the Spark UI.
{code}
new Thread() {
override def run() {
// job 0
sc.makeRDD(1 to 1, 1).foreach { i =>
Thread.sleep(10000)
}
}
}.start()
Thread.sleep(1000)
for (_ <- 1 to 20) {
new Thread() {
override def run() {
sc.makeRDD(1 to 1, 1).foreach { i =>
}
}
}.start()
}
Thread.sleep(15000)
sc.makeRDD(1 to 1, 1).foreach { i =>
}
{code}
> Spark UI should sort jobs/stages with the completed timestamp before cleaning up them
> -------------------------------------------------------------------------------------
>
> Key: SPARK-23307
> URL: https://issues.apache.org/jira/browse/SPARK-23307
> Project: Spark
> Issue Type: Bug
> Components: Web UI
> Affects Versions: 2.3.0
> Reporter: Shixiong Zhu
> Priority: Major
>
> When you have a long running job, it may be deleted from UI quickly when it completes, if you happen to run a small job after it. It's pretty annoying when you run lots of jobs in the same driver concurrently (e.g., running multiple Structured Streaming queries). We should sort jobs/stages with the completed timestamp before cleaning up them.
> In 2.2, Spark has a separated buffer for completed jobs/stages, so it doesn't need to sort the jobs/stages.
> What's the behavior I expect:
> Set "spark.ui.retainedJobs" to 10 and run the following codes, job 0 should be kept in the Spark UI.
>
> {code:java}
> new Thread() {
> override def run() {
> // job 0
> sc.makeRDD(1 to 1, 1).foreach { i =>
> Thread.sleep(10000)
> }
> }
> }.start()
> Thread.sleep(1000)
> for (_ <- 1 to 20) {
> new Thread() {
> override def run() {
> sc.makeRDD(1 to 1, 1).foreach { i =>
> }
> }
> }.start()
> }
> Thread.sleep(15000)
> sc.makeRDD(1 to 1, 1).foreach { i =>
> }
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org