You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by JoshRosen <gi...@git.apache.org> on 2014/10/07 23:48:46 UTC

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

GitHub user JoshRosen opened a pull request:

    https://github.com/apache/spark/pull/2696

    [SPARK-2321] [WIP] Stable pull-based progress / status API

    This is a WIP pull request that proposes a stable, pull-based progress / status API for Spark (see [SPARK-2321](https://issues.apache.org/jira/browse/SPARK-2321)).  For now, I'd like to discuss the basic implementation, API names, and overall interface design.  Once we arrive at a good design, I'll go back and add additional methods to expose more information via these API.
    
    #### Design goals:
    
    - Pull-based API
    - Usable from Java / Scala / Python
    - Can be extended to expose more information without introducing binary incompatibilities.
    - Returns immutable objects.
    - Don't leak any implementation details, preserving our freedom to change the implementation.
    
    #### Implementation:
    
    - Add public methods (`getJobInfo`, `getStageInfo`) to SparkContext to allow status / progress information to be retrieved.
    - Add public interfaces (`SparkJobInfo`, `SparkStageInfo`) for our API return values.  These interfaces consist entirely of Java-style getter methods.  The interfaces are currently implemented in Java.  I decided to explicitly separate the interface from its implementation (`SparkJobInfoImpl`, `SparkStageInfoImpl`) in order to prevent users from constructing these responses themselves.
    - Move the construction of the web UI's SparkListeners into SparkContext and out of the web UI tabs themselves.  This allows us to re-use these listeners in the implementation of this status API.  There are a few reasons why this listener re-use makes sense:
       - The status API and web UI are guaranteed to show consistent information.
       - These listeners are already well-tested.
       - The same garbage-collection / information retention configurations can apply to both this API and the web UI.
    - Extend JobProgressListener to maintain `jobId -> Job` and `stageId -> Stage` mappings.
    
    
    #### Open questions:
    
    - I'm open to naming suggestions / improvements.
    - There's a "job -> stages -> tasks" hierarchy; how should we expose this in the API responses?  Should `SparkJobInfo` only expose stages' ids, requiring consumers to call `getStageInfo` to get information on stages?  If it returns `SparkStageInfo` objects, then should it enforce "consistent snapshot" semantics such that once I get a `SparkJobInfo` I have a complete, immutable snapshot of the state of the job's stages and tasks at the time that I call `getJobInfo()`?  That would be nice, but potentially expensive for large jobs.
    - By implementing the interfaces in Java, we're limited in terms of how much Scala syntactic sugar we can provide for Scala users.  For example, paren-less accessor methods don't seem to work on Java interfaces, so this current API means that Scala users have to write
    
       ```scala
       sc.getJobInfo(0).getStages()(0)
       ```
       
       to get information on the first stage in a job.  On the other hand, this prevents us from introducing incompatibilities if we ever choose to overload the `getStages` method (or any other ones).
    
    #### TODOs:
    
    - [ ] Unit tests
    - [ ] Examples
    - [ ] Python API
    - [ ] Documentation of new configuration options

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JoshRosen/spark progress-reporting-api

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2696.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2696
    
----
commit 24de26347c4421d54393dc60263e462a2112a10b
Author: Josh Rosen <jo...@apache.org>
Date:   2014-10-07T04:02:14Z

    Create UI listeners in SparkContext instead of in Tabs:
    
    This change means that the listeners will always be registered, even if the web
    UI is disabled.  The motivation for this change is to allow these listeners to
    be used when implementing a stable pull-based status / progress API for 1.2.

commit ac2d13aae4fc226c9e625ae6a9e6bba0f71d2dd6
Author: Josh Rosen <jo...@apache.org>
Date:   2014-10-07T06:23:56Z

    Add jobId->stage, stageId->stage mappings in JobProgressListener

commit 08cbec9e7f5b38e40c7ab7c41782fef67db74024
Author: Josh Rosen <jo...@apache.org>
Date:   2014-10-07T06:24:35Z

    Begin to sketch the interfaces for a stable, public status API.
    
    Some design goals here:
    
    - Hide constructors and implementations from users; only expose interfaces.
    - Return only immutable objects.
    - Ensure API will be usable from Java (e.g. only expose Array collections,
      since they're nicely usable from Scala and Java without having to do any
      implicit conversions on the Scala side or wrapping into Java-friendly
      types on the Java side).

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] Stable pull-based progress / stat...

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r19361954
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/SparkUI.scala ---
    @@ -116,4 +94,60 @@ private[spark] object SparkUI {
       def getUIPort(conf: SparkConf): Int = {
         conf.getInt("spark.ui.port", SparkUI.DEFAULT_PORT)
       }
    +
    +  def createLiveUI(
    +      sc: SparkContext,
    +      conf: SparkConf,
    +      listenerBus: SparkListenerBus,
    +      jobProgressListener: JobProgressListener,
    +      securityManager: SecurityManager,
    +      appName: String): SparkUI =  {
    +    create(Some(sc), conf, listenerBus, securityManager, appName,
    +      jobProgressListener = Some(jobProgressListener))
    +  }
    +
    +  def createHistoryUI(
    +      conf: SparkConf,
    +      listenerBus: ReplayListenerBus,
    --- End diff --
    
    I think that helps to clarify the intent behind the two methods, `createLiveUI` and `createHistoryUI`.  If we need a more general constructor in another context, we can add it later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r18864057
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -853,6 +873,12 @@ class SparkContext(config: SparkConf) extends Logging {
       /** The version of Spark on which this application is running. */
       def version = SPARK_VERSION
     
    +  def getJobsIdsForGroup(jobGroup: String): Array[Int] = statusApi.jobIdsForGroup(jobGroup)
    +
    +  def getJobInfo(jobId: Int): Option[SparkJobInfo] = statusApi.newJobInfo(jobId)
    --- End diff --
    
    The garbage collection / data retention semantics are the same as what's displayed in the Spark web UI, since we're built on top of the same listeners.  While a job is active, we'll keep information on it.  After the job completes / fails, a configurable maximum number of jobs and stages will be retained.  I'll be sure to clearly document this.
    
    Regarding snapshots / consistency, I added a note about this in one of my commit messages, reproduced here:
    
    ```
    - The "consistent snapshot of the entire job -> stage -> task mapping"
      semantics might be very expensive to implement for large jobs, so I've
      decided to remove chaining between SparkJobInfo and SparkStageInfo
      interfaces.  Concretely, this means that you can't write something like
    
         job.stages()(0).name
    
      to get the name of the first stage in a job.  Instead, you have to explicitly
      get the stage's ID from the job and then look up that stage using
      sc.getStageInfo().  This isn't to say that we can't implement methods like
      "getNumActiveStages" that reflect consistent state; the goal is mainly to
      avoid spending lots of time / memory to construct huge object graphs.
    ```
    
    My concern was that it may be expensive to snapshot large jobs with many stages and tasks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r18855753
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -853,6 +873,12 @@ class SparkContext(config: SparkConf) extends Logging {
       /** The version of Spark on which this application is running. */
       def version = SPARK_VERSION
     
    +  def getJobsIdsForGroup(jobGroup: String): Array[Int] = statusApi.jobIdsForGroup(jobGroup)
    --- End diff --
    
    Probably a good idea to add scaladoc to all these new public methods.
    
    In this case in particular, should probably mention that the returned list is a view of the current state, and subsequent invocations may return a different list.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] Stable pull-based progress / stat...

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/2696#issuecomment-60455677
  
    I added a couple of simple tests.  One subtlety here is the fact that this API is sort of "eventually consistent" in the sense that we might know about a job before we have any information on its stages.  Offering stronger guarantees, like "once you have a JobInfo for a job that's running, you should always be able to get a StageInfo", may require significant scheduler / listener refactoring.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r18796584
  
    --- Diff: core/src/main/java/org/apache/spark/SparkStageInfo.java ---
    @@ -0,0 +1,33 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark;
    +
    +/**
    + * Exposes information about Spark Stages.
    + *
    + * This interface is not designed to be implemented outside of Spark.  We may add additional methods
    + * which may break binary compatibility with outside implementations.
    + */
    +public interface SparkStageInfo {
    --- End diff --
    
    do we need to encode attempt here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] Stable pull-based progress / stat...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/2696#issuecomment-60136663
  
    @JoshRosen lgtm; I don't really like the listener changes in `SparkUI` (aside from the job one which is necessary), but can't really think of a much better approach (other than the previous one being just a ugly :-)).
    
    Also, what about unit tests for the new APIs? I don't see any...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r18856537
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -853,6 +873,12 @@ class SparkContext(config: SparkConf) extends Logging {
       /** The version of Spark on which this application is running. */
       def version = SPARK_VERSION
     
    +  def getJobsIdsForGroup(jobGroup: String): Array[Int] = statusApi.jobIdsForGroup(jobGroup)
    +
    +  def getJobInfo(jobId: Int): Option[SparkJobInfo] = statusApi.newJobInfo(jobId)
    --- End diff --
    
    A question about the semantics of using all these methods together: is data about all jobs kept in memory for the whole duration of the job, or is it discarded after a job or job group is finished?
    
    Asking it in a different way: is it possible for someone to call `getJobsIdsForGroup` (should that be `getJobIdsForGroup`?) and, when calling this method to get each job's info, hit an error because the job data has been discarded?
    
    If that's possible, perhaps we need a different approach, where you get a snapshot of the whole state of a job / job group in a single method, instead of having separate finer-grained methods. Or some way to lock things so that the data is not discarded while someone is looking for it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] Stable pull-based progress / stat...

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/2696#issuecomment-60013190
  
    Alright, I've updated this based on the latest round of feedback.  I've removed the WIP, since I think this is enough for a basic first-cut at this API; we can always expose more things via it in later commits.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/2696#issuecomment-59129325
  
    @vanzin Yeah, maybe we should split SPARK-2321; this PR is only concerned with progress monitoring; I'm not proposing any stabilization of the listener APIs here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] Stable pull-based progress / stat...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/2696#issuecomment-60460535
  
    LGTM. Thanks for adding the tests!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r18855660
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -229,10 +233,26 @@ class SparkContext(config: SparkConf) extends Logging {
       private[spark] val metadataCleaner =
         new MetadataCleaner(MetadataCleanerType.SPARK_CONTEXT, this.cleanup, conf)
     
    -  // Initialize the Spark UI, registering all associated listeners
    +
    +  private[spark] val environmentListener = new EnvironmentListener
    --- End diff --
    
    Since you're touching this, let me suggest something different, since I'm not sure I like the approach you took very much. 
    
    How about making SparkUI a listener itself, that just delegates events to its internal listeners?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r18993930
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/SparkUI.scala ---
    @@ -116,4 +99,23 @@ private[spark] object SparkUI {
       def getUIPort(conf: SparkConf): Int = {
         conf.getInt("spark.ui.port", SparkUI.DEFAULT_PORT)
       }
    +
    +  // Called by HistoryServer and Master when reconstituting a SparkUI from event logs:
    +  def create(conf: SparkConf, listenerBus: SparkListenerBus, securityManager: SecurityManager,
    --- End diff --
    
    Seems like the `listenerBus` here will always be a `ReplayListenerBus`. I think you can make it tighter and maybe rename this `createForReplay`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] Stable pull-based progress / stat...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r19363621
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/SparkUI.scala ---
    @@ -116,4 +94,60 @@ private[spark] object SparkUI {
       def getUIPort(conf: SparkConf): Int = {
         conf.getInt("spark.ui.port", SparkUI.DEFAULT_PORT)
       }
    +
    +  def createLiveUI(
    +      sc: SparkContext,
    +      conf: SparkConf,
    +      listenerBus: SparkListenerBus,
    +      jobProgressListener: JobProgressListener,
    +      securityManager: SecurityManager,
    +      appName: String): SparkUI =  {
    +    create(Some(sc), conf, listenerBus, securityManager, appName,
    +      jobProgressListener = Some(jobProgressListener))
    +  }
    +
    +  def createHistoryUI(
    +      conf: SparkConf,
    +      listenerBus: ReplayListenerBus,
    --- End diff --
    
    How do you create history UIs without the replay bus? Do you have your own implementation of listener bus?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r18796050
  
    --- Diff: core/src/main/scala/org/apache/spark/StatusAPIImpl.scala ---
    @@ -0,0 +1,68 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark
    +
    +private[spark] class StatusAPIImpl(sc: SparkContext) {
    --- End diff --
    
    would be great to document the behaviors of these methods explicitly


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] Stable pull-based progress / stat...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2696#issuecomment-60019534
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22013/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] Stable pull-based progress / stat...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r19235533
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkStatusAPI.scala ---
    @@ -0,0 +1,142 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark
    +
    +import scala.collection.Map
    +import scala.collection.JavaConversions._
    +
    +import org.apache.spark.annotation.DeveloperApi
    +import org.apache.spark.rdd.RDD
    +import org.apache.spark.scheduler.{SchedulingMode, Schedulable}
    +import org.apache.spark.storage.{StorageStatus, StorageUtils, RDDInfo}
    +
    +
    +/**
    + * Trait that implements Spark's status APIs.  This trait is designed to be mixed into
    + * SparkContext; it allows the status API code to live in its own file.
    + */
    +private[spark] trait SparkStatusAPI { this: SparkContext =>
    +
    +  /**
    +   * Return a map from the slave to the max memory available for caching and the remaining
    +   * memory available for caching.
    +   */
    +  def getExecutorMemoryStatus: Map[String, (Long, Long)] = {
    +    env.blockManager.master.getMemoryStatus.map { case(blockManagerId, mem) =>
    +      (blockManagerId.host + ":" + blockManagerId.port, mem)
    +    }
    +  }
    +
    +  /**
    +   * :: DeveloperApi ::
    +   * Return information about what RDDs are cached, if they are in mem or on disk, how much space
    +   * they take, etc.
    +   */
    +  @DeveloperApi
    +  def getRDDStorageInfo: Array[RDDInfo] = {
    +    val rddInfos = persistentRdds.values.map(RDDInfo.fromRdd).toArray
    +    StorageUtils.updateRddInfo(rddInfos, getExecutorStorageStatus)
    +    rddInfos.filter(_.isCached)
    +  }
    +
    +  /**
    +   * Returns an immutable map of RDDs that have marked themselves as persistent via cache() call.
    +   * Note that this does not necessarily mean the caching or computation was successful.
    +   */
    +  def getPersistentRDDs: Map[Int, RDD[_]] = persistentRdds.toMap
    +
    +  /**
    +   * :: DeveloperApi ::
    +   * Return information about blocks stored in all of the slaves
    +   */
    +  @DeveloperApi
    +  def getExecutorStorageStatus: Array[StorageStatus] = {
    +    env.blockManager.master.getStorageStatus
    +  }
    +
    +  /**
    +   * :: DeveloperApi ::
    +   * Return pools for fair scheduler
    +   */
    +  @DeveloperApi
    +  def getAllPools: Seq[Schedulable] = {
    +    // TODO(xiajunluan): We should take nested pools into account
    +    taskScheduler.rootPool.schedulableQueue.toSeq
    +  }
    +
    +  /**
    +   * :: DeveloperApi ::
    +   * Return the pool associated with the given name, if one exists
    +   */
    +  @DeveloperApi
    +  def getPoolForName(pool: String): Option[Schedulable] = {
    +    Option(taskScheduler.rootPool.schedulableNameToSchedulable.get(pool))
    +  }
    +
    +  /**
    +   * Return current scheduling mode
    +   */
    +  def getSchedulingMode: SchedulingMode.SchedulingMode = {
    +    taskScheduler.schedulingMode
    +  }
    +
    +
    +  /**
    +   * Return a list of all known jobs in a particular job group.  The returned list may contain
    +   * running, failed, and completed jobs, and may vary across invocations of this method.
    +   */
    +  def getJobIdsForGroup(jobGroup: String): Array[Int] = {
    +    jobProgressListener.synchronized {
    +      val jobData = jobProgressListener.jobIdToData.valuesIterator
    +      jobData.filter(_.jobGroup.exists(_ == jobGroup)).map(_.jobId).toArray
    +    }
    +  }
    +
    +  /**
    +   * Returns job information, or None if the job info could not be found or was garbage collected.
    --- End diff --
    
    nit: `None` (in backticks, no idea how to escape that in this editor).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r18864694
  
    --- Diff: core/src/main/scala/org/apache/spark/StatusAPIImpl.scala ---
    @@ -0,0 +1,68 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark
    +
    +private[spark] class StatusAPIImpl(sc: SparkContext) {
    +
    +  def jobIdsForGroup(jobGroup: String): Array[Int] = {
    +    sc.jobProgressListener.synchronized {
    +      val jobData = sc.jobProgressListener.jobIdToData.valuesIterator
    +      jobData.filter(_.jobGroup == Some(jobGroup)).map(_.jobId).toArray
    --- End diff --
    
    I could also do
    
    ```scala
    _.jobGroup.exists(_ == jobGroup)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2696#issuecomment-59145855
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21755/consoleFull) for   PR 2696 at commit [`249ca16`](https://github.com/apache/spark/commit/249ca16c1a9b6f101121b821718b360309463356).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2696#issuecomment-60002148
  
    Hey @JoshRosen is this still WIP?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] Stable pull-based progress / stat...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/2696


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r19126684
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala ---
    @@ -40,17 +40,25 @@ class JobProgressListener(conf: SparkConf) extends SparkListener with Logging {
     
       import JobProgressListener._
     
    +  type JobId = Int
    +  type StageId = Int
    +  type StageAttemptId = Int
    --- End diff --
    
    I think these should be public as long as the hashmaps using them are public fields.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/2696#issuecomment-59112305
  
    Hi @JoshRosen,
    
    I think this is going in the right direction and is definitely an improvement over the status quo. I left some comments about semantics of certain parts of the API and some general suggestions.
    
    One thing is that this PR seems to cover half of SPARK-2321 - the part about monitoring, but not the listener API. Perhaps that bug should be broken into separate sub-tasks?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by ChengXiangLi <gi...@git.apache.org>.

Github user ChengXiangLi commented on a diff in the pull request:

https://github.com/apache/spark/pull/2696#discussion_r18806856

--- Diff: core/src/main/java/org/apache/spark/SparkJobInfo.java ---
@@ -0,0 +1,30 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark;
+
+/**
+ * Exposes information about Spark Jobs.
+ *
+ * This interface is not designed to be implemented outside of Spark. We may add additional methods
+ * which may break binary compatibility with outside implementations.
+ */
+public interface SparkJobInfo {
+ int jobId();
+ int[] stageIds();
+ String status();
--- End diff --

I suppose status here represent job current state, like RUNNING/SUCCEED/FAILED... could we use enum here so that user can switch state instead of a bunch if-else.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] Stable pull-based progress / stat...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r19235280
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -51,6 +50,7 @@ import org.apache.spark.scheduler.cluster.mesos.{CoarseMesosSchedulerBackend, Me
     import org.apache.spark.scheduler.local.LocalBackend
     import org.apache.spark.storage._
     import org.apache.spark.ui.SparkUI
    +import org.apache.spark.ui.jobs.JobProgressListener
    --- End diff --
    
    I'd at least put a "TODO" somewhere to move this to a different package. This listener is not UI-specific anymore.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r18796556
  
    --- Diff: core/src/main/java/org/apache/spark/SparkJobInfo.java ---
    @@ -0,0 +1,30 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark;
    +
    +/**
    + * Exposes information about Spark Jobs.
    + *
    + * This interface is not designed to be implemented outside of Spark.  We may add additional methods
    + * which may break binary compatibility with outside implementations.
    + */
    +public interface SparkJobInfo {
    +  int jobId();
    +  int[] stageIds();
    +  String status();
    --- End diff --
    
    can u document what the return string might be?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r18993758
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/SparkUI.scala ---
    @@ -116,4 +99,23 @@ private[spark] object SparkUI {
       def getUIPort(conf: SparkConf): Int = {
         conf.getInt("spark.ui.port", SparkUI.DEFAULT_PORT)
       }
    +
    +  // Called by HistoryServer and Master when reconstituting a SparkUI from event logs:
    +  def create(conf: SparkConf, listenerBus: SparkListenerBus, securityManager: SecurityManager,
    +             appName: String, basePath: String): SparkUI = {
    --- End diff --
    
    ```
    def create(
        conf: SparkConf,
        ...
        basePath: String): SparkUI = {
      ...
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] Stable pull-based progress / stat...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r19235882
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/SparkUI.scala ---
    @@ -116,4 +94,60 @@ private[spark] object SparkUI {
       def getUIPort(conf: SparkConf): Int = {
         conf.getInt("spark.ui.port", SparkUI.DEFAULT_PORT)
       }
    +
    +  def createLiveUI(
    +      sc: SparkContext,
    +      conf: SparkConf,
    +      listenerBus: SparkListenerBus,
    +      jobProgressListener: JobProgressListener,
    +      securityManager: SecurityManager,
    +      appName: String): SparkUI =  {
    +    create(Some(sc), conf, listenerBus, securityManager, appName,
    +      jobProgressListener = Some(jobProgressListener))
    +  }
    +
    +  def createHistoryUI(
    +      conf: SparkConf,
    +      listenerBus: ReplayListenerBus,
    --- End diff --
    
    I wouldn't force this to be a `ReplayListenerBus`, since `SparkUI` doesn't require it to be.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r18994206
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/UIData.scala ---
    @@ -65,4 +66,10 @@ private[jobs] object UIData {
           var taskInfo: TaskInfo,
           var taskMetrics: Option[TaskMetrics] = None,
           var errorMessage: Option[String] = None)
    +
    +  case class JobUIData(
    --- End diff --
    
    Pretty minor, but I would move this above `StageUIData` so the organization of the code is Job > Stage > Task. Also, this should probably not be a case class (`TaskUIData` is a bad example)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r18863507
  
    --- Diff: core/src/main/scala/org/apache/spark/StatusAPIImpl.scala ---
    @@ -0,0 +1,68 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark
    +
    +private[spark] class StatusAPIImpl(sc: SparkContext) {
    --- End diff --
    
    I see how this is a confusing name.  The intent here was to keep all of the status API implementation in a single file, including the code that interacts with the listeners and the implementations of the return types.  I suppose that I could just remove this class and place these methods directly in SparkContext.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r18994137
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala ---
    @@ -61,8 +69,30 @@ class JobProgressListener(conf: SparkConf) extends SparkListener with Logging {
     
       def blockManagerIds = executorIdToBlockManagerId.values.toSeq
     
    +  override def onJobStart(jobStart: SparkListenerJobStart) = synchronized {
    +    val jobGroup = Option(jobStart.properties).map(_.getProperty(SparkContext.SPARK_JOB_GROUP_ID))
    +    val jobData: JobUIData =
    +      JobUIData(jobStart.jobId, jobStart.stageIds, jobGroup,ExecutionStatus.RUNNING)
    +    jobIdToData(jobStart.jobId) = jobData
    +    activeJobs(jobStart.jobId) = jobData
    +  }
    +
    +  override def onJobEnd(jobEnd: SparkListenerJobEnd) = synchronized {
    +    val jobData: JobUIData = activeJobs(jobEnd.jobId)
    --- End diff --
    
    maybe we should `get` here just to be safe


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] Stable pull-based progress / stat...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r19235559
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkStatusAPI.scala ---
    @@ -0,0 +1,142 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark
    +
    +import scala.collection.Map
    +import scala.collection.JavaConversions._
    +
    +import org.apache.spark.annotation.DeveloperApi
    +import org.apache.spark.rdd.RDD
    +import org.apache.spark.scheduler.{SchedulingMode, Schedulable}
    +import org.apache.spark.storage.{StorageStatus, StorageUtils, RDDInfo}
    +
    +
    +/**
    + * Trait that implements Spark's status APIs.  This trait is designed to be mixed into
    + * SparkContext; it allows the status API code to live in its own file.
    + */
    +private[spark] trait SparkStatusAPI { this: SparkContext =>
    +
    +  /**
    +   * Return a map from the slave to the max memory available for caching and the remaining
    +   * memory available for caching.
    +   */
    +  def getExecutorMemoryStatus: Map[String, (Long, Long)] = {
    +    env.blockManager.master.getMemoryStatus.map { case(blockManagerId, mem) =>
    +      (blockManagerId.host + ":" + blockManagerId.port, mem)
    +    }
    +  }
    +
    +  /**
    +   * :: DeveloperApi ::
    +   * Return information about what RDDs are cached, if they are in mem or on disk, how much space
    +   * they take, etc.
    +   */
    +  @DeveloperApi
    +  def getRDDStorageInfo: Array[RDDInfo] = {
    +    val rddInfos = persistentRdds.values.map(RDDInfo.fromRdd).toArray
    +    StorageUtils.updateRddInfo(rddInfos, getExecutorStorageStatus)
    +    rddInfos.filter(_.isCached)
    +  }
    +
    +  /**
    +   * Returns an immutable map of RDDs that have marked themselves as persistent via cache() call.
    +   * Note that this does not necessarily mean the caching or computation was successful.
    +   */
    +  def getPersistentRDDs: Map[Int, RDD[_]] = persistentRdds.toMap
    +
    +  /**
    +   * :: DeveloperApi ::
    +   * Return information about blocks stored in all of the slaves
    +   */
    +  @DeveloperApi
    +  def getExecutorStorageStatus: Array[StorageStatus] = {
    +    env.blockManager.master.getStorageStatus
    +  }
    +
    +  /**
    +   * :: DeveloperApi ::
    +   * Return pools for fair scheduler
    +   */
    +  @DeveloperApi
    +  def getAllPools: Seq[Schedulable] = {
    +    // TODO(xiajunluan): We should take nested pools into account
    +    taskScheduler.rootPool.schedulableQueue.toSeq
    +  }
    +
    +  /**
    +   * :: DeveloperApi ::
    +   * Return the pool associated with the given name, if one exists
    +   */
    +  @DeveloperApi
    +  def getPoolForName(pool: String): Option[Schedulable] = {
    +    Option(taskScheduler.rootPool.schedulableNameToSchedulable.get(pool))
    +  }
    +
    +  /**
    +   * Return current scheduling mode
    +   */
    +  def getSchedulingMode: SchedulingMode.SchedulingMode = {
    +    taskScheduler.schedulingMode
    +  }
    +
    +
    +  /**
    +   * Return a list of all known jobs in a particular job group.  The returned list may contain
    +   * running, failed, and completed jobs, and may vary across invocations of this method.
    +   */
    +  def getJobIdsForGroup(jobGroup: String): Array[Int] = {
    +    jobProgressListener.synchronized {
    +      val jobData = jobProgressListener.jobIdToData.valuesIterator
    +      jobData.filter(_.jobGroup.exists(_ == jobGroup)).map(_.jobId).toArray
    +    }
    +  }
    +
    +  /**
    +   * Returns job information, or None if the job info could not be found or was garbage collected.
    +   */
    +  def getJobInfo(jobId: Int): Option[SparkJobInfo] = {
    +    jobProgressListener.synchronized {
    +      jobProgressListener.jobIdToData.get(jobId).map { data =>
    +        new SparkJobInfoImpl(jobId, data.stageIds.toArray, data.status)
    +      }
    +    }
    +  }
    +
    +  /**
    +   * Returns stage information, or None if the stage info could not be found or was
    --- End diff --
    
    nit: `None` (backticks again)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r18993986
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala ---
    @@ -61,8 +69,30 @@ class JobProgressListener(conf: SparkConf) extends SparkListener with Logging {
     
       def blockManagerIds = executorIdToBlockManagerId.values.toSeq
     
    +  override def onJobStart(jobStart: SparkListenerJobStart) = synchronized {
    +    val jobGroup = Option(jobStart.properties).map(_.getProperty(SparkContext.SPARK_JOB_GROUP_ID))
    +    val jobData: JobUIData =
    +      JobUIData(jobStart.jobId, jobStart.stageIds, jobGroup,ExecutionStatus.RUNNING)
    --- End diff --
    
    space after `jobGroup`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2696#issuecomment-59145858
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21755/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2696#issuecomment-58276464
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21407/Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2696#issuecomment-59871647
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21956/consoleFull) for   PR 2696 at commit [`787444c`](https://github.com/apache/spark/commit/787444c4ee20693a8f8c4fb5320ee4c4133a0d91).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r18863745
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -229,10 +233,26 @@ class SparkContext(config: SparkConf) extends Logging {
       private[spark] val metadataCleaner =
         new MetadataCleaner(MetadataCleanerType.SPARK_CONTEXT, this.cleanup, conf)
     
    -  // Initialize the Spark UI, registering all associated listeners
    +
    +  private[spark] val environmentListener = new EnvironmentListener
    --- End diff --
    
    The idea here is to use this listeners to maintain state that's used in both the web UI and the progress / status APIs.  Users can disable the web UI but might still want to use the progress APIs, so these listeners can't be created / owned by the web UI.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] Stable pull-based progress / stat...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r19235115
  
    --- Diff: core/src/main/java/org/apache/spark/JobExecutionStatus.java ---
    @@ -0,0 +1,25 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark;
    +
    +public enum JobExecutionStatus {
    --- End diff --
    
    Feels like it's missing `KILLED`, although I'm not sure whether it's possible to differentiate that from `FAILED` in the backend at the moment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] Stable pull-based progress / stat...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2696#issuecomment-60013181
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22013/consoleFull) for   PR 2696 at commit [`2707f98`](https://github.com/apache/spark/commit/2707f98482a471d9f668643f7248b5d99739a825).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r18867651
  
    --- Diff: core/src/main/java/org/apache/spark/SparkJobInfo.java ---
    @@ -0,0 +1,30 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark;
    +
    +/**
    + * Exposes information about Spark Jobs.
    + *
    + * This interface is not designed to be implemented outside of Spark.  We may add additional methods
    --- End diff --
    
    I think that exposing interfaces helps to maximize the flexibility of our internal implementations of Java-friendly classes.  Imagine that I want to have a `private[spark]` method on an object that I expose to users.  If I implement this as a Scala class, my `private[spark]` methods and fields are public from Java users' points of view.  If I implement this as a `public final class` Java with package-private methods, then I need to add a separate class containing "forwarder" methods in order to allow the Java package-private methods to be called from Scala subpackages.
    
    The need for this kind of flexibility isn't much of a concern for this specific case, since SparkJobInfo and SparkStageInfo are simple immutable objects, but it _is_ a problem for classes like TaskContext.  Since other Java-friendly APIs will likely be exposed as interfaces, I was thinking of doing the same here for consistency's sake.
    
    Does this sound reasonable or are there other approaches / considerations that I'm missing?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r18993665
  
    --- Diff: core/src/main/java/org/apache/spark/ExecutionStatus.java ---
    @@ -0,0 +1,24 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark;
    +
    +public enum ExecutionStatus {
    --- End diff --
    
    This is really a `JobExecutionStatus`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r18796034
  
    --- Diff: core/src/main/scala/org/apache/spark/StatusAPIImpl.scala ---
    @@ -0,0 +1,68 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark
    +
    +private[spark] class StatusAPIImpl(sc: SparkContext) {
    +
    +  def jobIdsForGroup(jobGroup: String): Array[Int] = {
    +    sc.jobProgressListener.synchronized {
    +      val jobData = sc.jobProgressListener.jobIdToData.valuesIterator
    +      jobData.filter(_.jobGroup == Some(jobGroup)).map(_.jobId).toArray
    +    }
    +  }
    +
    +  def newJobInfo(jobId: Int): Option[SparkJobInfo] = {
    --- End diff --
    
    if i understand this correctly, this returns None if the job id doesn't exist. Kind of strange to call this newJobInfo


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] Stable pull-based progress / stat...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r19362366
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/SparkUI.scala ---
    @@ -116,4 +94,60 @@ private[spark] object SparkUI {
       def getUIPort(conf: SparkConf): Int = {
         conf.getInt("spark.ui.port", SparkUI.DEFAULT_PORT)
       }
    +
    +  def createLiveUI(
    +      sc: SparkContext,
    +      conf: SparkConf,
    +      listenerBus: SparkListenerBus,
    +      jobProgressListener: JobProgressListener,
    +      securityManager: SecurityManager,
    +      appName: String): SparkUI =  {
    +    create(Some(sc), conf, listenerBus, securityManager, appName,
    +      jobProgressListener = Some(jobProgressListener))
    +  }
    +
    +  def createHistoryUI(
    +      conf: SparkConf,
    +      listenerBus: ReplayListenerBus,
    --- End diff --
    
    Well, don't the method names already convey that difference?
    
    I ask because (i) the distinction is not needed and (ii) I actually have code - still not submitted for review - that creates history UIs but does not use a ReplayListenerBus...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r18856178
  
    --- Diff: core/src/main/java/org/apache/spark/SparkJobInfo.java ---
    @@ -0,0 +1,30 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark;
    +
    +/**
    + * Exposes information about Spark Jobs.
    + *
    + * This interface is not designed to be implemented outside of Spark.  We may add additional methods
    --- End diff --
    
    It seems like your goals here are:
    
    - people shouldn't implement these interfaces
    - people shouldn't be able to instantiate their own objects of these types
    
    Instead of interface / private implementation, how about a `public final class` with a package-private constructor? Seems like this package is the only place where you're actually instantiating these types, so that should work just as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] Stable pull-based progress / stat...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2696#issuecomment-60455158
  
      [Test build #22171 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22171/consoleFull) for   PR 2696 at commit [`e6aa78d`](https://github.com/apache/spark/commit/e6aa78d16314c66903aa7d1d2c52e5237a9a9d0d).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] Stable pull-based progress / stat...

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r19361746
  
    --- Diff: core/src/main/java/org/apache/spark/JobExecutionStatus.java ---
    @@ -0,0 +1,25 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark;
    +
    +public enum JobExecutionStatus {
    --- End diff --
    
    I don't think that we have backend support for this, at least not at the JobProgressListener layer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2696#issuecomment-58269497
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21407/consoleFull) for   PR 2696 at commit [`08cbec9`](https://github.com/apache/spark/commit/08cbec9e7f5b38e40c7ab7c41782fef67db74024).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2696#discussion_r18868330

I understand your point, but as you say, that doesn't really apply here. And a final class would avoid the scary comment.

But not a big deal.

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] Stable pull-based progress / stat...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r19363966
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/SparkUI.scala ---
    @@ -116,4 +94,60 @@ private[spark] object SparkUI {
       def getUIPort(conf: SparkConf): Int = {
         conf.getInt("spark.ui.port", SparkUI.DEFAULT_PORT)
       }
    +
    +  def createLiveUI(
    +      sc: SparkContext,
    +      conf: SparkConf,
    +      listenerBus: SparkListenerBus,
    +      jobProgressListener: JobProgressListener,
    +      securityManager: SecurityManager,
    +      appName: String): SparkUI =  {
    +    create(Some(sc), conf, listenerBus, securityManager, appName,
    +      jobProgressListener = Some(jobProgressListener))
    +  }
    +
    +  def createHistoryUI(
    +      conf: SparkConf,
    +      listenerBus: ReplayListenerBus,
    --- End diff --
    
    If you want to look at code:
    https://github.com/vanzin/spark/blob/yarn-timeline/yarn/timeline/src/main/scala/org/apache/spark/deploy/yarn/timeline/YarnTimelineProvider.scala#L107
    
    Short story: ReplayListenerBus is very tightly coupled with EventLoggingListener and the idea that events come from a file in a specific format. So unless you're reading a file created by EventLoggingListener, it's not very useful at the moment.
    
    But my main point is that the Spark UI code does not depend on that instance being a ReplayListenerBus, so I don't see the point in making the API enforce that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r18863321
  
    --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala ---
    @@ -132,6 +132,12 @@ class JavaSparkContext(val sc: SparkContext)
       /** Default min number of partitions for Hadoop RDDs when not given by user */
       def defaultMinPartitions: java.lang.Integer = sc.defaultMinPartitions
     
    +  def getJobsIdsForGroup(jobGroup: String): Array[Int] = sc.getJobsIdsForGroup(jobGroup)
    +
    +  def getJobInfo(jobId: Int): SparkJobInfo = sc.getJobInfo(jobId).orNull
    --- End diff --
    
    I suppose that I could use a Guava `Optional` here, although we were bitten in the past by exposing this third-party type in our APIs.  Returning `null` would be inconsistent with the use of `Optional` in other parts of the Java API (such as `JavaSparkContext.getCheckpointDir()`).  On the other hand, we may be stuck with `Optional` since other code exposes it.
    
    What do you think we should do here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2696#issuecomment-59139518
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21755/consoleFull) for   PR 2696 at commit [`249ca16`](https://github.com/apache/spark/commit/249ca16c1a9b6f101121b821718b360309463356).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r18864112
  
    --- Diff: core/src/main/scala/org/apache/spark/StatusAPIImpl.scala ---
    @@ -0,0 +1,68 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark
    +
    +private[spark] class StatusAPIImpl(sc: SparkContext) {
    +
    +  def jobIdsForGroup(jobGroup: String): Array[Int] = {
    +    sc.jobProgressListener.synchronized {
    +      val jobData = sc.jobProgressListener.jobIdToData.valuesIterator
    +      jobData.filter(_.jobGroup == Some(jobGroup)).map(_.jobId).toArray
    +    }
    +  }
    +
    +  def newJobInfo(jobId: Int): Option[SparkJobInfo] = {
    --- End diff --
    
    This was bad naming.  I'll just remove this layer of indirection by moving this code into SparkContext.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] Stable pull-based progress / stat...

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r19185681
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala ---
    @@ -40,17 +40,25 @@ class JobProgressListener(conf: SparkConf) extends SparkListener with Logging {
     
       import JobProgressListener._
     
    +  type JobId = Int
    +  type StageId = Int
    +  type StageAttemptId = Int
    +
       // How many stages to remember
       val retainedStages = conf.getInt("spark.ui.retainedStages", DEFAULT_RETAINED_STAGES)
    +  // How many jobs to remember
    +  val retailedJobs = conf.getInt("spark.ui.retainedJobs", DEFAULT_RETAINED_JOBS)
    --- End diff --
    
    Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r19180655
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/SparkUI.scala ---
    @@ -21,47 +21,30 @@ import org.apache.spark.{Logging, SecurityManager, SparkConf, SparkContext}
     import org.apache.spark.scheduler._
     import org.apache.spark.storage.StorageStatusListener
     import org.apache.spark.ui.JettyUtils._
    -import org.apache.spark.ui.env.EnvironmentTab
    -import org.apache.spark.ui.exec.ExecutorsTab
    -import org.apache.spark.ui.jobs.JobProgressTab
    -import org.apache.spark.ui.storage.StorageTab
    +import org.apache.spark.ui.env.{EnvironmentListener, EnvironmentTab}
    +import org.apache.spark.ui.exec.{ExecutorsListener, ExecutorsTab}
    +import org.apache.spark.ui.jobs.{JobProgressListener, JobProgressTab}
    +import org.apache.spark.ui.storage.{StorageListener, StorageTab}
     
     /**
      * Top level user interface for a Spark application.
      */
    -private[spark] class SparkUI(
    -    val sc: SparkContext,
    +private[spark] class SparkUI private[ui] (
    --- End diff --
    
    I think you can even make the constructor here fully `private`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r19127119
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala ---
    @@ -40,17 +40,25 @@ class JobProgressListener(conf: SparkConf) extends SparkListener with Logging {
     
       import JobProgressListener._
     
    +  type JobId = Int
    +  type StageId = Int
    +  type StageAttemptId = Int
    --- End diff --
    
    Ah I see, so we don't leak the types


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2696#issuecomment-59875888
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21956/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r18856060
  
    --- Diff: core/src/main/scala/org/apache/spark/StatusAPIImpl.scala ---
    @@ -0,0 +1,68 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark
    +
    +private[spark] class StatusAPIImpl(sc: SparkContext) {
    +
    +  def jobIdsForGroup(jobGroup: String): Array[Int] = {
    +    sc.jobProgressListener.synchronized {
    +      val jobData = sc.jobProgressListener.jobIdToData.valuesIterator
    +      jobData.filter(_.jobGroup == Some(jobGroup)).map(_.jobId).toArray
    +    }
    +  }
    +
    +  def newJobInfo(jobId: Int): Option[SparkJobInfo] = {
    +    sc.jobProgressListener.synchronized {
    +      sc.jobProgressListener.jobIdToData.get(jobId).map { data =>
    +        new SparkJobInfoImpl(jobId, data.stageIds.toArray, data.status)
    +      }
    +    }
    +  }
    +
    +  def newStageInfo(stageId: Int): Option[SparkStageInfo] = {
    +    sc.jobProgressListener.synchronized {
    +      for (
    +        info <- sc.jobProgressListener.stageIdToInfo.get(stageId);
    +        data <- sc.jobProgressListener.stageIdToData.get((stageId, info.attemptId))
    +      ) yield {
    +        new SparkStageInfoImpl(
    +          stageId,
    +          info.name,
    +          numTasks = info.numTasks,
    +          numActiveTasks = data.numActiveTasks,
    +          numCompleteTasks = data.numCompleteTasks,
    +          numFailedTasks = data.numFailedTasks)
    +      }
    +    }
    +  }
    +}
    +
    +private class SparkJobInfoImpl (
    +  val jobId: Int,
    +  val stageIds: Array[Int],
    +  val status: String)
    + extends SparkJobInfo
    +
    +private class SparkStageInfoImpl(
    +  val stageId: Int,
    +  val name: String,
    +  val numTasks: Int,
    +  val numActiveTasks: Int,
    +  val numCompleteTasks: Int,
    +  val numFailedTasks: Int)
    + extends SparkStageInfo
    --- End diff --
    
    super nit: add new line. :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r18855937
  
    --- Diff: core/src/main/scala/org/apache/spark/StatusAPIImpl.scala ---
    @@ -0,0 +1,68 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark
    +
    +private[spark] class StatusAPIImpl(sc: SparkContext) {
    +
    +  def jobIdsForGroup(jobGroup: String): Array[Int] = {
    +    sc.jobProgressListener.synchronized {
    +      val jobData = sc.jobProgressListener.jobIdToData.valuesIterator
    +      jobData.filter(_.jobGroup == Some(jobGroup)).map(_.jobId).toArray
    --- End diff --
    
    super nit: the filter could be a little more efficient by avoiding instantiating a new `Option` in each invocation.
    
        x => x.jobGroup.isDefined && x.jobGroup.get == jobGroup


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2696#issuecomment-59875884
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21956/consoleFull) for   PR 2696 at commit [`787444c`](https://github.com/apache/spark/commit/787444c4ee20693a8f8c4fb5320ee4c4133a0d91).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class SparkContext(config: SparkConf) extends SparkStatusAPI with Logging `
      * `  class JobUIData(`
      * `public final class JavaStatusAPITest `
      * `  public static final class IdentityWithDelay<T> implements Function<T, T> `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2696#issuecomment-59449668
  
    This looks pretty good to me. As discussed offline, it's worth considering putting the new logic in SparkContext into a trait and mix it in. This will mark the first step towards simplifying SparkContext through mixed-in traits. I don't think we can do something similar for Java, however, so that's a reason not to do it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] Stable pull-based progress / stat...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2696#issuecomment-60459939
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22171/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] Stable pull-based progress / stat...

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r19362398
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/SparkUI.scala ---
    @@ -116,4 +94,60 @@ private[spark] object SparkUI {
       def getUIPort(conf: SparkConf): Int = {
         conf.getInt("spark.ui.port", SparkUI.DEFAULT_PORT)
       }
    +
    +  def createLiveUI(
    +      sc: SparkContext,
    +      conf: SparkConf,
    +      listenerBus: SparkListenerBus,
    +      jobProgressListener: JobProgressListener,
    +      securityManager: SecurityManager,
    +      appName: String): SparkUI =  {
    +    create(Some(sc), conf, listenerBus, securityManager, appName,
    +      jobProgressListener = Some(jobProgressListener))
    +  }
    +
    +  def createHistoryUI(
    +      conf: SparkConf,
    +      listenerBus: ReplayListenerBus,
    --- End diff --
    
    Ah, fair enough.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] Stable pull-based progress / stat...

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2696#discussion_r19361659

--- Diff: core/src/main/java/org/apache/spark/SparkStageInfo.java ---
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark;
+
+/**
+ * Exposes information about Spark Stages.
+ *
+ * This interface is not designed to be implemented outside of Spark. We may add additional methods
+ * which may break binary compatibility with outside implementations.
+ */
+public interface SparkStageInfo {
--- End diff --

I've updated this to include the current attempt id.

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r18993968
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala ---
    @@ -40,17 +40,25 @@ class JobProgressListener(conf: SparkConf) extends SparkListener with Logging {
     
       import JobProgressListener._
     
    +  type JobId = Int
    +  type StageId = Int
    +  type StageAttemptId = Int
    --- End diff --
    
    private type?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r18794000
  
    --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala ---
    @@ -132,6 +132,12 @@ class JavaSparkContext(val sc: SparkContext)
       /** Default min number of partitions for Hadoop RDDs when not given by user */
       def defaultMinPartitions: java.lang.Integer = sc.defaultMinPartitions
     
    +  def getJobsIdsForGroup(jobGroup: String): Array[Int] = sc.getJobsIdsForGroup(jobGroup)
    +
    +  def getJobInfo(jobId: Int): SparkJobInfo = sc.getJobInfo(jobId).orNull
    --- End diff --
    
    add java doc to these 3 methods and explain that this returns null when jobId doesn't exist.
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r19180607
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/SparkUI.scala ---
    @@ -116,4 +94,40 @@ private[spark] object SparkUI {
       def getUIPort(conf: SparkConf): Int = {
         conf.getInt("spark.ui.port", SparkUI.DEFAULT_PORT)
       }
    +
    +  /**
    +   * Create a new Spark UI.
    +   *
    +   * @param sc optional SparkContext; this can be None when reconstituting a UI from event logs.
    +   * @param jobProgressListener if supplied, this JobProgressListener will be used; otherwise, the
    +   *                            web UI will create and register its own JobProgressListener.
    +   */
    +  def create(
    +    sc: Option[SparkContext],
    +    conf: SparkConf,
    +    listenerBus: SparkListenerBus,
    +    securityManager: SecurityManager,
    +    appName: String,
    +    basePath: String = "",
    +    jobProgressListener: Option[JobProgressListener] = None): SparkUI = {
    --- End diff --
    
    This signature is a little ugly to be exposed even internally. Can we make this private and have two other methods `createLiveUI` and `createHistoryUI` or something that hides the optional arguments that would be null?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2696#issuecomment-58276452
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21407/consoleFull) for   PR 2696 at commit [`08cbec9`](https://github.com/apache/spark/commit/08cbec9e7f5b38e40c7ab7c41782fef67db74024).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  case class JobUIData(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r18863409
  
    --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala ---
    @@ -132,6 +132,12 @@ class JavaSparkContext(val sc: SparkContext)
       /** Default min number of partitions for Hadoop RDDs when not given by user */
       def defaultMinPartitions: java.lang.Integer = sc.defaultMinPartitions
     
    +  def getJobsIdsForGroup(jobGroup: String): Array[Int] = sc.getJobsIdsForGroup(jobGroup)
    +
    +  def getJobInfo(jobId: Int): SparkJobInfo = sc.getJobInfo(jobId).orNull
    --- End diff --
    
    null is fine - and definitely don't depend on teh Optional type in Guava. I was merely suggesting adding more documentation about this API


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r18868548
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -229,10 +233,26 @@ class SparkContext(config: SparkConf) extends Logging {
       private[spark] val metadataCleaner =
         new MetadataCleaner(MetadataCleanerType.SPARK_CONTEXT, this.cleanup, conf)
     
    -  // Initialize the Spark UI, registering all associated listeners
    +
    +  private[spark] val environmentListener = new EnvironmentListener
    --- End diff --
    
    If that's your goal, then I'd suggest refactoring these listeners. The only one I see being used outside of the UI is the job progress one. So all the others could still be kept internal to the UI.
    
    It just feels really weird to be importing a bunch of UI-specific classes to service functionality that is not UI-specific.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2696#issuecomment-58451983
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21504/Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] Stable pull-based progress / stat...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2696#issuecomment-60459934
  
      [Test build #22171 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22171/consoleFull) for   PR 2696 at commit [`e6aa78d`](https://github.com/apache/spark/commit/e6aa78d16314c66903aa7d1d2c52e5237a9a9d0d).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class SparkContext(config: SparkConf) extends SparkStatusAPI with Logging `
      * `  class JobUIData(`
      * `public final class JavaStatusAPIDemo `
      * `  public static final class IdentityWithDelay<T> implements Function<T, T> `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] Stable pull-based progress / stat...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2696#issuecomment-60019531
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22013/consoleFull) for   PR 2696 at commit [`2707f98`](https://github.com/apache/spark/commit/2707f98482a471d9f668643f7248b5d99739a825).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class SparkContext(config: SparkConf) extends SparkStatusAPI with Logging `
      * `  class JobUIData(`
      * `public final class JavaStatusAPIDemo `
      * `  public static final class IdentityWithDelay<T> implements Function<T, T> `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] Stable pull-based progress / stat...

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/2696#issuecomment-60474144
  
    Thanks for all of the review feedback!  I'm going to merge this now so that we can begin using it in other PRs and features, such as progress bars in the Spark shells.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] Stable pull-based progress / stat...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r19235394
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkStatusAPI.scala ---
    @@ -0,0 +1,142 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark
    +
    +import scala.collection.Map
    +import scala.collection.JavaConversions._
    +
    +import org.apache.spark.annotation.DeveloperApi
    +import org.apache.spark.rdd.RDD
    +import org.apache.spark.scheduler.{SchedulingMode, Schedulable}
    +import org.apache.spark.storage.{StorageStatus, StorageUtils, RDDInfo}
    +
    --- End diff --
    
    super nit: too many empty lines.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2696#discussion_r18794207
  
    --- Diff: core/src/main/scala/org/apache/spark/StatusAPIImpl.scala ---
    @@ -0,0 +1,68 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark
    +
    +private[spark] class StatusAPIImpl(sc: SparkContext) {
    --- End diff --
    
    is this supposed to implement some interface?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org