You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mubarak <gi...@git.apache.org> on 2014/08/02 00:43:25 UTC

[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

GitHub user mubarak opened a pull request:

    https://github.com/apache/spark/pull/1723

    [SPARK-1853] Show Streaming application code context (file, line number) in Spark Stages UI

    Please review: [SPARK-1853] Show Streaming application code context (file, line number) in Spark Stages UI
    
    _Screenshot after fix:_
    
    ![screen shot 2014-07-18 at 7 31 54 am](https://cloud.githubusercontent.com/assets/668134/3785835/e70e3416-19ca-11e4-9d0e-44f3311d65df.png)
    
    _Sample code used: Tutorial.scala_
    ```scala
    1   package org.apache.spark.examples.streaming
    2
    3   import org.apache.spark._
    4   import org.apache.spark.SparkContext._
    5   import org.apache.spark.streaming._
    6   import org.apache.spark.streaming.twitter._
    7   import org.apache.spark.streaming.StreamingContext._
    8   import org.apache.spark.examples.streaming.TutorialHelper._
    9   import org.apache.spark.util.{CallSite, Utils}
    10
    11 object Tutorial {
    12  def main(args: Array[String]) {
    13    
    14    // Checkpoint directory
    15    val checkpointDir = TutorialHelper.getCheckpointDirectory()
    16
    17   // Configure Twitter credentials
    18    val apiKey = ""
    19   val apiSecret = ""
    20    val accessToken = ""
    21    val accessTokenSecret = ""
    22    TutorialHelper.configureTwitterCredentials(apiKey, apiSecret, accessToken, accessTokenSecret)
    23
    24    // Your code goes here
    25    val sc: SparkContext = new SparkContext(new SparkConf().setAppName("Twitter Example").setMaster("spark://ec2-54-241-226-42.us-west-1.compute.amazonaws.com:7077"))
    26    val ssc = new StreamingContext(sc, Seconds(5))
    27    val tweets = TwitterUtils.createStream(ssc, None)
    28
    29    val statuses = tweets.map(status => status.getText())
    30    val words = statuses.flatMap(status => status.split(" "))
    31
    32    val hashtags = words.filter(word => word.startsWith("#"))
    33    val counts = hashtags.map(tag => (tag, 1)).reduceByKeyAndWindow(_ + _, _ - _, Seconds(60 * 5), Seconds(10))    
    34
    35    val sortedCounts = counts.map { case(tag, count) => (count, tag) }
    36                         .transform(rdd => rdd.sortByKey(false))
    37    sortedCounts.foreach(rdd =>
    38      println("\nTop 10 hashtags:\n" + rdd.take(10).mkString("\n")))
    39
    40    ssc.checkpoint(checkpointDir)
    41    ssc.start()
    42    ssc.awaitTermination()
    43
    44   }
    45 }
    
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mubarak/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1723.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1723
    
----
commit d466d75341bf246bf57341ca6b9ea3ed11679040
Author: Mubarak Seyed <mu...@gmail.com>
Date:   2014-07-18T01:15:05Z

    Changes for spark streaming UI

commit 9d38d3cecedc37775cb5bd70c661c3acd2dd7f43
Author: Mubarak Seyed <mu...@gmail.com>
Date:   2014-07-18T01:30:46Z

    [SPARK-1853] Show Streaming application code context (file, line number) in Spark Stages UI

commit 1500deb44c7c484fae725144c288240b5b477fdb
Author: Mubarak Seyed <mu...@gmail.com>
Date:   2014-07-18T15:11:58Z

    Changes in Spark Streaming UI

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-56577498
  
    @mubarak mind closing this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1723#discussion_r16337534
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala ---
    @@ -106,6 +106,27 @@ abstract class DStream[T: ClassTag] (
       /** Return the StreamingContext associated with this DStream */
       def context = ssc
     
    +  /* Find the creation callSite */
    +  val creationSite = Utils.getCallSite
    +
    +  /* Store the creation callSite in threadlocal */
    +  private[streaming] def setCreationCallSite() = {
    --- End diff --
    
    Can you change these method names to `setRDD***CallSite` to make it clear that this does not affect the callsite of the DStream but of the RDDs generated by the DStream. Also, just `private` is sufficient, as this is not being used by any other class other than DStream.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1723#discussion_r16337606
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -813,7 +816,10 @@ private[spark] object Utils extends Logging {
        * A regular expression to match classes of the "core" Spark API that we want to skip when
        * finding the call site of a method.
        */
    -  private val SPARK_CLASS_REGEX = """^org\.apache\.spark(\.api\.java)?(\.util)?(\.rdd)?\.[A-Z]""".r
    +  private val SPARK_CLASS_REGEX = """^org\.apache\.spark(\.api\.java)?(\.util)?(\.rdd)?(\.streaming)?(\.streaming\.dstream)?(\.streaming\.scheduler)?(\.streaming\.twitter)?(\.streaming\.kafka)?(\.streaming\.flume)?(\.streaming\.mqtt)?(\.streaming\.zeromq)?\.[A-Z]""".r
    +  private val SCALA_CLASS_REGEX = """^scala(\.util)?(\.collection)?(\.collection\.mutable)?(\.collection\.immutable)?(\.concurrent\.forkjoin)?\.[A-Z]""".r
    --- End diff --
    
    Aah, I guessed so. But this does change how callsites will get interpreted for general Spark programs as well. Will need some more input from others regarding this. @andrewor14 Can you take a look at this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-54700500
  
    Hey @mubarak, @andrewor14 and I discussed this PR, and it would be cool use this opportunity refactor a the callSite stuff a bit, and document this callsite stuff (which is very hard to understand, I think you would agree). So is it cool if I take your branch and add some documentation and do a bit of refactoring that makes this clearer? Your commits will be there and yuo will be one of the authors, so nothing to worry. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-54689188
  
    No no, we want to preserve the behavior for Spark, and only change it for streaming.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by mubarak <gi...@git.apache.org>.
Github user mubarak commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-52450485
  
    New screenshot
    ![screen shot 2014-08-17 at 8 02 21 pm](https://cloud.githubusercontent.com/assets/668134/3947565/5550e9c0-268f-11e4-9c35-0001e232e8f6.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1723#discussion_r16336535
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala ---
    @@ -302,8 +325,10 @@ abstract class DStream[T: ClassTag] (
                       " for checkpointing at time " + time)
                   }
                   generatedRDDs.put(time, newRDD)
    +              setCallSite(prevCallSite)
    --- End diff --
    
    Dealing with this separately in two code paths is prone to errors in the future. Can you make the `compute(time) match { ... }` return into a variable and use that later. Something like this.
    
    ```
    val rddOption = compute(time) match {
        // no setCallSite inside this
    }
    setCallSite(prevCallSite)   // only one instance of setCallSite, common to both code paths.
    return rddOption
    ```
    
    And while on that, can you also add `return` at other code paths in this function. That is, the `None` in lines 332 and 335 in this commit?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-54688918
  
    This sets a flag in `SparkContext` to reflect whether we want streaming call sites vs normal call sites. Doesn't this mean if we use this same `SparkContext` for other things, we will get the streaming call sites instead? I understand this is through the thread local properties, but this still has strange semantics. Because I am using this `SparkContext` for streaming, its call sites are suddenly different.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by mubarak <gi...@git.apache.org>.
Github user mubarak commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-54701472
  
    @tdas 
    +1. Can you please review? Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-54694525
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-52602246
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18821/consoleFull) for   PR 1723 at commit [`f51fd9f`](https://github.com/apache/spark/commit/f51fd9f62ed25611c7cd078ed5207f8c5392e60a).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-54701514
  
    @mubarak I commented earlier about me making some small refactoring. Do you mind? See earlier comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-52450636
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18733/consoleFull) for   PR 1723 at commit [`5051c58`](https://github.com/apache/spark/commit/5051c58c9fce1e6821a35b5ffe7cb670a07912ed).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-50952570
  
    Also, it doesnt merge cleanly with apache.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-52450684
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18733/consoleFull) for   PR 1723 at commit [`5051c58`](https://github.com/apache/spark/commit/5051c58c9fce1e6821a35b5ffe7cb670a07912ed).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1723#discussion_r16407308
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala ---
    @@ -106,6 +106,27 @@ abstract class DStream[T: ClassTag] (
       /** Return the StreamingContext associated with this DStream */
       def context = ssc
     
    +  /* Find the creation callSite */
    +  val creationSite = Utils.getCallSite
    +
    +  /* Store the RDD creation callSite in threadlocal */
    +  private def setRDDCreationCallSite() = {
    +    ssc.sparkContext.setLocalProperty(Utils.CALL_SITE_SHORT, creationSite.shortForm)
    --- End diff --
    
    These two methods are pretty much similar. Can you please merge them? Make the `callsite` parameter have the default value of `creationSite`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by mubarak <gi...@git.apache.org>.
Github user mubarak commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-52598307
  
    *Tested few examples:*
    
    ![screen shot 2014-08-18 at 11 45 04 pm](https://cloud.githubusercontent.com/assets/668134/3962778/a8f85360-2770-11e4-93ba-c7d26883760a.png)
    
    ![screen shot 2014-08-18 at 11 44 12 pm](https://cloud.githubusercontent.com/assets/668134/3962784/b4a8f480-2770-11e4-9bde-dccbca5053b3.png)
    
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by mubarak <gi...@git.apache.org>.
Github user mubarak commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1723#discussion_r16336958
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -813,7 +816,10 @@ private[spark] object Utils extends Logging {
        * A regular expression to match classes of the "core" Spark API that we want to skip when
        * finding the call site of a method.
        */
    -  private val SPARK_CLASS_REGEX = """^org\.apache\.spark(\.api\.java)?(\.util)?(\.rdd)?\.[A-Z]""".r
    +  private val SPARK_CLASS_REGEX = """^org\.apache\.spark(\.api\.java)?(\.util)?(\.rdd)?(\.streaming)?(\.streaming\.dstream)?(\.streaming\.scheduler)?(\.streaming\.twitter)?(\.streaming\.kafka)?(\.streaming\.flume)?(\.streaming\.mqtt)?(\.streaming\.zeromq)?\.[A-Z]""".r
    +  private val SCALA_CLASS_REGEX = """^scala(\.util)?(\.collection)?(\.collection\.mutable)?(\.collection\.immutable)?(\.concurrent\.forkjoin)?\.[A-Z]""".r
    --- End diff --
    
    I was getting this stacktrace if i don't filter `scala.*` and one of the stage was `apply at List.###`
    
    ```
    apply at List.scala:318 +details
    org.apache.spark.rdd.RDD$$anonfun$doCheckpoint$1.apply(RDD.scala:1283)
    scala.collection.immutable.List.foreach(List.scala:318)
    org.apache.spark.rdd.RDD.doCheckpoint(RDD.scala:1283)
    org.apache.spark.rdd.RDD$$anonfun$doCheckpoint$1.apply(RDD.scala:1283)
    org.apache.spark.rdd.RDD$$anonfun$doCheckpoint$1.apply(RDD.scala:1283)
    scala.collection.immutable.List.foreach(List.scala:318)
    org.apache.spark.rdd.RDD.doCheckpoint(RDD.scala:1283)
    org.apache.spark.rdd.RDD$$anonfun$doCheckpoint$1.apply(RDD.scala:1283)
    org.apache.spark.rdd.RDD$$anonfun$doCheckpoint$1.apply(RDD.scala:1283)
    scala.collection.immutable.List.foreach(List.scala:318)
    org.apache.spark.rdd.RDD.doCheckpoint(RDD.scala:1283)
    org.apache.spark.SparkContext.runJob(SparkContext.scala:1092)
    org.apache.spark.SparkContext.runJob(SparkContext.scala:1107)
    org.apache.spark.SparkContext.runJob(SparkContext.scala:1121)
    org.apache.spark.SparkContext.runJob(SparkContext.scala:1135)
    org.apache.spark.rdd.RDD.collect(RDD.scala:774)
    org.apache.spark.RangePartitioner$.sketch(Partitioner.scala:262)
    org.apache.spark.RangePartitioner.<init>(Partitioner.scala:124)
    org.apache.spark.rdd.OrderedRDDFunctions.sortByKey(OrderedRDDFunctions.scala:63)
    org.apache.spark.examples.streaming.Tutorial$$anonfun$8.apply(Tutorial.scala:36)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-50952562
  
    This is a great start @mubarak thanks for navigating all the control flow graph and putting this together. Though I feel there is a better way of doing this that does not involve customizing every DStream. In particular, can explain what purpose does the "name" field in DStream serve?
    
    Also if local property needs to be set in before creating RDDs in compute, then it can be called commonly for all DStreams at the DStream.getOrCompute function, right before compute function is called.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-52443994
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18724/consoleFull) for   PR 1723 at commit [`ccde038`](https://github.com/apache/spark/commit/ccde038f664c46f2c48e1f073bcfc6d8917ecef1).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1723#discussion_r16334878
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/ReducedWindowedDStream.scala ---
    @@ -52,6 +52,8 @@ class ReducedWindowedDStream[K: ClassTag, V: ClassTag](
           "must be multiple of the slide duration of parent DStream (" + parent.slideDuration + ")"
       )
     
    +
    --- End diff --
    
    Unnecessary extra space.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-52448650
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18731/consoleFull) for   PR 1723 at commit [`a207eb7`](https://github.com/apache/spark/commit/a207eb7ba14e4f51011dea7d2ba0651c5466f218).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by mubarak <gi...@git.apache.org>.
Github user mubarak closed the pull request at:

    https://github.com/apache/spark/pull/1723


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-54376381
  
    Any updates? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-54698936
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19880/consoleFull) for   PR 1723 at commit [`ceb43da`](https://github.com/apache/spark/commit/ceb43daeb0947acab5629031c6e35d29f96f6376).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-52443825
  
    Jenkins, this is ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1723#discussion_r17201452
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -883,8 +900,8 @@ private[spark] object Utils extends Logging {
     
         for (el <- trace) {
           if (insideSpark) {
    -        if (SPARK_CLASS_REGEX.findFirstIn(el.getClassName).isDefined) {
    -          lastSparkMethod = if (el.getMethodName == "<init>") {
    +        if (regexFunc(el.getClassName)) {
    +            lastSparkMethod = if (el.getMethodName == "<init>") {
    --- End diff --
    
    I don't think this was intended


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-54701548
  
    Thanks @mubarak!!!! I will create a new PR and ask you to take a look. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1723#discussion_r16337761
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -813,7 +816,10 @@ private[spark] object Utils extends Logging {
        * A regular expression to match classes of the "core" Spark API that we want to skip when
        * finding the call site of a method.
        */
    -  private val SPARK_CLASS_REGEX = """^org\.apache\.spark(\.api\.java)?(\.util)?(\.rdd)?\.[A-Z]""".r
    +  private val SPARK_CLASS_REGEX = """^org\.apache\.spark(\.api\.java)?(\.util)?(\.rdd)?(\.streaming)?(\.streaming\.dstream)?(\.streaming\.scheduler)?(\.streaming\.twitter)?(\.streaming\.kafka)?(\.streaming\.flume)?(\.streaming\.mqtt)?(\.streaming\.zeromq)?\.[A-Z]""".r
    +  private val SCALA_CLASS_REGEX = """^scala(\.util)?(\.collection)?(\.collection\.mutable)?(\.collection\.immutable)?(\.concurrent\.forkjoin)?\.[A-Z]""".r
    --- End diff --
    
    Actually, second thought. I am wondering that should all the spark classes be ignored directly? That is all classes `org.apache.spark.*`? Why include subpackages in the regular expressions? Are there some internal Spark classes that we want to show up in the callsite shortform?
    @andrewor14 What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by mubarak <gi...@git.apache.org>.
Github user mubarak commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-54701520
  
    @tdas I don't mind. Please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by mubarak <gi...@git.apache.org>.
Github user mubarak commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-54547524
  
    Will take a look and update you tonight. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by mubarak <gi...@git.apache.org>.
Github user mubarak commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1723#discussion_r16463252
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala ---
    @@ -106,6 +106,27 @@ abstract class DStream[T: ClassTag] (
       /** Return the StreamingContext associated with this DStream */
       def context = ssc
     
    +  /* Find the creation callSite */
    +  val creationSite = Utils.getCallSite
    +
    +  /* Store the RDD creation callSite in threadlocal */
    +  private def setRDDCreationCallSite() = {
    +    ssc.sparkContext.setLocalProperty(Utils.CALL_SITE_SHORT, creationSite.shortForm)
    --- End diff --
    
    Updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-52746058
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18943/consoleFull) for   PR 1723 at commit [`491a1eb`](https://github.com/apache/spark/commit/491a1eb23aa94919a9a06a84a9cf3cab118fa628).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-52750288
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18943/consoleFull) for   PR 1723 at commit [`491a1eb`](https://github.com/apache/spark/commit/491a1eb23aa94919a9a06a84a9cf3cab118fa628).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-54697540
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-52444020
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18724/consoleFull) for   PR 1723 at commit [`ccde038`](https://github.com/apache/spark/commit/ccde038f664c46f2c48e1f073bcfc6d8917ecef1).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1723#discussion_r16384185
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -813,7 +816,10 @@ private[spark] object Utils extends Logging {
        * A regular expression to match classes of the "core" Spark API that we want to skip when
        * finding the call site of a method.
        */
    -  private val SPARK_CLASS_REGEX = """^org\.apache\.spark(\.api\.java)?(\.util)?(\.rdd)?\.[A-Z]""".r
    +  private val SPARK_CLASS_REGEX = """^org\.apache\.spark(\.api\.java)?(\.util)?(\.rdd)?(\.streaming)?(\.streaming\.dstream)?(\.streaming\.scheduler)?(\.streaming\.twitter)?(\.streaming\.kafka)?(\.streaming\.flume)?(\.streaming\.mqtt)?(\.streaming\.zeromq)?\.[A-Z]""".r
    +  private val SCALA_CLASS_REGEX = """^scala(\.util)?(\.collection)?(\.collection\.mutable)?(\.collection\.immutable)?(\.concurrent\.forkjoin)?\.[A-Z]""".r
    --- End diff --
    
    In that case, then the condition should be converted to 
    `if ((isSparkClass && ! isSparkExampleClass) || isScalaClass) { ... } `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-52452666
  
    Screenshot is great! The tests dont pass because of style check errors. Please address them.
    
    error file=/home/jenkins/workspace/SparkPullRequestBuilder/core/src/main/scala/org/apache/spark/util/Utils.scala message=File line length exceeds 100 characters line=837
    error file=/home/jenkins/workspace/SparkPullRequestBuilder/core/src/main/scala/org/apache/spark/util/Utils.scala message=File line length exceeds 100 characters line=838
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-50943156
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1723#discussion_r15726777
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala ---
    @@ -106,6 +106,24 @@ abstract class DStream[T: ClassTag] (
       /** Return the StreamingContext associated with this DStream */
       def context = ssc
     
    +  private[streaming] val RDD_NAME: String = "rddName";
    +
    +  @transient var name: String = null
    +
    +  /** Assign a name to this DStream */
    +  def setName(_name: String) = {
    +    name = _name
    +  }
    +
    +  /* Find the creation callSite */
    +  val creationSite = Utils.getCallSite
    +
    +  /* Store the creation callSite in threadlocal */
    +  private[streaming] def setCallSite = {
    +    ssc.sparkContext.setLocalProperty(name + Utils.CALL_SITE_SHORT, creationSite.short)
    +    ssc.sparkContext.setLocalProperty(name + Utils.CALL_SITE_LONG, creationSite.long)
    --- End diff --
    
    What is the use of the "name"? You seem to create set name in each DStream, then set local property using that name, and then the same name and subsequent property gets read by RDDs generated in the thread. It doesnt seem like that the "name" gets is even shown as part of the short/long callsite. Only used in the property name. Wouldnt it have worked without the "name" in the property name?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by mubarak <gi...@git.apache.org>.
Github user mubarak commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-52605797
  
    ```
    [info] - call sites report correct locations *** FAILED ***
    [info]   "[apply]" did not equal "[makeRDD]" (SparkContextInfoSuite.scala:85)
    ```
    CallSite related testcase has failed. Will take a look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1723#discussion_r16407131
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala ---
    @@ -106,6 +106,27 @@ abstract class DStream[T: ClassTag] (
       /** Return the StreamingContext associated with this DStream */
       def context = ssc
     
    +  /* Find the creation callSite */
    +  val creationSite = Utils.getCallSite
    +
    +  /* Store the RDD creation callSite in threadlocal */
    +  private def setRDDCreationCallSite() = {
    +    ssc.sparkContext.setLocalProperty(Utils.CALL_SITE_SHORT, creationSite.shortForm)
    +    ssc.sparkContext.setLocalProperty(Utils.CALL_SITE_LONG, creationSite.longForm)
    +  }
    +
    +  /* Store the supplied callSite in threadlocal */
    +  private def setRDDCallSite(callSite: CallSite) = {
    +    ssc.sparkContext.setLocalProperty(Utils.CALL_SITE_SHORT, callSite.shortForm)
    +    ssc.sparkContext.setLocalProperty(Utils.CALL_SITE_LONG, callSite.longForm)
    +  }
    +
    +  /* Return the current callSite */
    +  private[streaming] def getCallSite(): CallSite = {
    --- End diff --
    
    Any reason why this was not changed to `getRDDCreationCallSite()` to match the set method?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by mubarak <gi...@git.apache.org>.
Github user mubarak commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-57581072
  
    Fixed using https://github.com/apache/spark/pull/2464


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-54700379
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19880/consoleFull) for   PR 1723 at commit [`ceb43da`](https://github.com/apache/spark/commit/ceb43daeb0947acab5629031c6e35d29f96f6376).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1723#discussion_r15726784
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/FilteredDStream.scala ---
    @@ -27,12 +27,17 @@ class FilteredDStream[T: ClassTag](
         filterFunc: T => Boolean
       ) extends DStream[T](parent.ssc) {
     
    +  setName("FilteredRDD")
    +
       override def dependencies = List(parent)
     
       override def slideDuration: Duration = parent.slideDuration
     
       override def compute(validTime: Time): Option[RDD[T]] = {
    -    parent.getOrCompute(validTime).map(_.filter(filterFunc))
    +    setCallSite
    +    val rdd: Option[RDD[T]] = parent.getOrCompute(validTime).map(_.filter(filterFunc))
    +    ssc.sparkContext.setLocalProperty(RDD_NAME, name)
    --- End diff --
    
    Whats the point of setting the property here after the RDD has been generated? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by mubarak <gi...@git.apache.org>.
Github user mubarak commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-51712229
  
    @tdas 
    I have removed 'name' from DStream and addressed your review comments. Can you please review? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1723#discussion_r16334743
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -813,7 +816,10 @@ private[spark] object Utils extends Logging {
        * A regular expression to match classes of the "core" Spark API that we want to skip when
        * finding the call site of a method.
        */
    -  private val SPARK_CLASS_REGEX = """^org\.apache\.spark(\.api\.java)?(\.util)?(\.rdd)?\.[A-Z]""".r
    +  private val SPARK_CLASS_REGEX = """^org\.apache\.spark(\.api\.java)?(\.util)?(\.rdd)?(\.streaming)?(\.streaming\.dstream)?(\.streaming\.scheduler)?(\.streaming\.twitter)?(\.streaming\.kafka)?(\.streaming\.flume)?(\.streaming\.mqtt)?(\.streaming\.zeromq)?\.[A-Z]""".r
    +  private val SCALA_CLASS_REGEX = """^scala(\.util)?(\.collection)?(\.collection\.mutable)?(\.collection\.immutable)?(\.concurrent\.forkjoin)?\.[A-Z]""".r
    --- End diff --
    
    Why were these new patterns added?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1723#discussion_r16334830
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala ---
    @@ -38,7 +38,7 @@ import org.apache.spark.streaming.dstream._
     import org.apache.spark.streaming.receiver.{ActorSupervisorStrategy, ActorReceiver, Receiver}
     import org.apache.spark.streaming.scheduler._
     import org.apache.spark.streaming.ui.StreamingTab
    -import org.apache.spark.util.MetadataCleaner
    +import org.apache.spark.util.{Utils, MetadataCleaner}
    --- End diff --
    
    This seems unnecessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-54682054
  
    Jenkins is currently having issues so cannot test it. But this looks pretty good except comment on the streaming regex - it should be in streaming/util/Utils.scala not in spark core. Other than that, I have asked @andrewor14 to take a look. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by mubarak <gi...@git.apache.org>.
Github user mubarak commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-53229308
  
    @tdas
    I don't think the new (proposed) REGEX in `Utils.getCallSite` works for test suite. For instance,
    
    `bagel/src/test/scala/org/apache/spark/bagel/BagelSuite.scala`
    ```scala
     32 class BagelSuite extends FunSuite with Assertions with BeforeAndAfter with Timeouts {
     33 
     34   var sc: SparkContext = _
     35 
     36   after {
     37     if (sc != null) {
     38       sc.stop()
     39       sc = null
     40     }
     41   }
     42 
     43   test("halting by voting") {
     44     sc = new SparkContext("local", "test")
     45     println("getCallSite: " + sc.getCallSite.longForm)
     46 
     47     val rdd1 = sc.makeRDD(Array(1, 2, 3, 4))
     48     println("creationSite: " + rdd1.getCreationLongSite)
     49     println("creationShortSite: " + rdd1.getCreationSite)
    ```
    creationSite for `makeRDD (line #47)` would look like
    
    ```
     org.apache.spark.SparkContext.makeRDD(SparkContext.scala:455)
     org.apache.spark.bagel.BagelSuite$$anonfun$2.apply$mcV$sp(BagelSuite.scala:47)
     org.apache.spark.bagel.BagelSuite$$anonfun$2.apply(BagelSuite.scala:43)
     org.apache.spark.bagel.BagelSuite$$anonfun$2.apply(BagelSuite.scala:43)
     org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22)
     org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22)
     org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
     org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
     org.scalatest.Transformer.apply(Transformer.scala:22)
     org.scalatest.Transformer.apply(Transformer.scala:20)
     org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:158)
     org.scalatest.Suite$class.withFixture(Suite.scala:1121)
     org.scalatest.FunSuite.withFixture(FunSuite.scala:1559)
     org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:155)
     org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:167)
     org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:167)
     org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
     org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:167)
     org.apache.spark.bagel.BagelSuite.org$scalatest$BeforeAndAfter$$super$runTest(BagelSuite.scala:32)
     org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200)
    ```
    and after new REGEX, callSite will be `apply at Transformer.scala:22` as `org.apache.spark.bagel.*` will come inside spark (same-thing will be applied to `SparkContextInfoSuite`)
    
    Thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-52615460
  
    Oh yes, that failure makes sense. Since we are capturing all Spark classes, it is not able to identify the callsite inside SparkContextInfoSuite. This is because the default core behavior is being modified here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by mubarak <gi...@git.apache.org>.
Github user mubarak commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1723#discussion_r15726747
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala ---
    @@ -112,6 +112,7 @@ class StreamingContext private[streaming] (
         if (isCheckpointPresent) {
           new SparkContext(cp_.sparkConf)
         } else {
    +      sc_.setCallSite(Utils.getCallSite.short)
    --- End diff --
    
    To set the callSite of streaming driver code (e.g., StreamingContext at Tutorial.scala:26). I think same call needs to be added if isCheckpointPresent=true (line #112)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-57575023
  
    Hi @mubarak,
    
    This issue has been fixed by #2464, so do you mind closing this? Thanks!
    
    (Due to the way that this GitHub mirror is set up, we don't have permission to close your PR).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by mubarak <gi...@git.apache.org>.
Github user mubarak commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-54701499
  
    Jenkins, this is ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-52448704
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18731/consoleFull) for   PR 1723 at commit [`a207eb7`](https://github.com/apache/spark/commit/a207eb7ba14e4f51011dea7d2ba0651c5466f218).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-52449035
  
    @mubarak This is what I had in mind. I have a few more comments in the code. Can you please comment/address those? Also, can you post a new screenshot?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-52444695
  
    This looks pretty promising!! However why do every DStream need to change? The DStream parent class can do the following. Before calling compute(), it can store the previous callsite, and set the new callsite, and then call compute. Upon returning from compute(), it can reset the previous callsite.  As far as I can see, that should work for all the DStreams without modifying every single stream.
    
    Also can you show us a new screenshot of what it looks like now. Does it look the same as the on posted in this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by mubarak <gi...@git.apache.org>.
Github user mubarak commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-52598093
  
    Jenkins, this is ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-52598483
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18821/consoleFull) for   PR 1723 at commit [`f51fd9f`](https://github.com/apache/spark/commit/f51fd9f62ed25611c7cd078ed5207f8c5392e60a).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1723#discussion_r15726022
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala ---
    @@ -112,6 +112,7 @@ class StreamingContext private[streaming] (
         if (isCheckpointPresent) {
           new SparkContext(cp_.sparkConf)
         } else {
    +      sc_.setCallSite(Utils.getCallSite.short)
    --- End diff --
    
    What is this for?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-54698297
  
    Jenkins, this is ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-54676763
  
    Jenkins, test this please. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-53509499
  
    Yeah, better to not modify Spark's default behavior. One solution for this is to modify Utils.getCallSite to take a parameter - a filter function. There will be the default filter function that is same as the SPARK_rEGEX check. That will preserve the default functionality. But the streaming code will plugin its own filter function, where it can do all the SPARK_REGEX+STREAMING_REGEX+SCALA_REGEX check. 
    
    How does that sound?
    
    Also, in the screen shot the only active stage is "<unknown>:0" , but should be the file:line-number of the site where streamingContext.start() was called. See the code in ReceiverTracker. Can you set the callsite correctly over there, so that it does not show "<unknown>:0"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by mubarak <gi...@git.apache.org>.
Github user mubarak commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-54606650
  
    Jenkins, this is ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1723#discussion_r17196753
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -856,13 +859,27 @@ private[spark] object Utils extends Logging {
        * finding the call site of a method.
        */
       private val SPARK_CLASS_REGEX = """^org\.apache\.spark(\.api\.java)?(\.util)?(\.rdd)?\.[A-Z]""".r
    +  private val SCALA_CLASS_REGEX = """^scala""".r
    +  private val SPARK_STREAMING_CLASS_REGEX = """^org\.apache\.spark""".r
    +  private val SPARK_EXAMPLES_CLASS_REGEX = """^org\.apache\.spark\.examples""".r
    +
    +  private def defaultRegexFunc(className: String): Boolean = {
    +    SPARK_CLASS_REGEX.findFirstIn(className).isDefined ||
    +    SCALA_CLASS_REGEX.findFirstIn(className).isDefined
    +  }
    +
    +  def streamingRegexFunc(className: String): Boolean = {
    --- End diff --
    
    @mubarak Sorry if I wasnt clear in my previous comment, but I wanted the streaming regex to be added to spark.streaming.util.Utils (new file needed). Logically, Spark code should not have any reference to higher level stuff as Spark Streaming. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1723#discussion_r16334866
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala ---
    @@ -106,6 +106,27 @@ abstract class DStream[T: ClassTag] (
       /** Return the StreamingContext associated with this DStream */
       def context = ssc
     
    +  /* Find the creation callSite */
    +  val creationSite = Utils.getCallSite
    +
    +  /* Store the creation callSite in threadlocal */
    +  private[streaming] def setCreationCallSite() = {
    +    ssc.sparkContext.setLocalProperty(Utils.CALL_SITE_SHORT, creationSite.shortForm)
    +    ssc.sparkContext.setLocalProperty(Utils.CALL_SITE_LONG, creationSite.longForm)
    +  }
    +
    +  /* Store the supplied callSite in threadlocal */
    +  private[streaming] def setCallSite(callSite: CallSite) = {
    +    ssc.sparkContext.setLocalProperty(Utils.CALL_SITE_SHORT, callSite.shortForm)
    +    ssc.sparkContext.setLocalProperty(Utils.CALL_SITE_LONG, callSite.longForm)
    +  }
    +
    +  /* Return the current callSite */
    +  private[streaming] def getCallSite() = {
    --- End diff --
    
    Please specify the return type, in these functions. I think we are enforcing explicit return-types nowadays, even for Unit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-56239335
  
    @mubarak I have opened a new PR with my changes. Could you please take a look at https://github.com/apache/spark/pull/2464 
    
    And can you close this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by mubarak <gi...@git.apache.org>.
Github user mubarak commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-52598991
  
    *Streaming example:*
    
    ![screen shot 2014-08-19 at 12 26 23 am](https://cloud.githubusercontent.com/assets/668134/3962878/189b730e-2772-11e4-8fcc-79ce694a36f0.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1723#discussion_r15727686
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala ---
    @@ -112,6 +112,7 @@ class StreamingContext private[streaming] (
         if (isCheckpointPresent) {
           new SparkContext(cp_.sparkConf)
         } else {
    +      sc_.setCallSite(Utils.getCallSite.short)
    --- End diff --
    
    Why do you need to set the callsite of streaming driver? Which Spark job requires that callsite?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by mubarak <gi...@git.apache.org>.
Github user mubarak commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1723#discussion_r16339260
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -813,7 +816,10 @@ private[spark] object Utils extends Logging {
        * A regular expression to match classes of the "core" Spark API that we want to skip when
        * finding the call site of a method.
        */
    -  private val SPARK_CLASS_REGEX = """^org\.apache\.spark(\.api\.java)?(\.util)?(\.rdd)?\.[A-Z]""".r
    +  private val SPARK_CLASS_REGEX = """^org\.apache\.spark(\.api\.java)?(\.util)?(\.rdd)?(\.streaming)?(\.streaming\.dstream)?(\.streaming\.scheduler)?(\.streaming\.twitter)?(\.streaming\.kafka)?(\.streaming\.flume)?(\.streaming\.mqtt)?(\.streaming\.zeromq)?\.[A-Z]""".r
    +  private val SCALA_CLASS_REGEX = """^scala(\.util)?(\.collection)?(\.collection\.mutable)?(\.collection\.immutable)?(\.concurrent\.forkjoin)?\.[A-Z]""".r
    --- End diff --
    
    I think example programs from `org.apache.spark.examples` needs to be included.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by mubarak <gi...@git.apache.org>.
Github user mubarak commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-52746109
  
    Jenkins, this is ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by mubarak <gi...@git.apache.org>.
Github user mubarak commented on the pull request:

    https://github.com/apache/spark/pull/1723#issuecomment-54606563
  
    @tdas 
    Can you please review? Thanks
    ![screen shot 2014-09-05 at 1 42 28 am](https://cloud.githubusercontent.com/assets/668134/4163160/b9b9b538-34e3-11e4-9fae-0e70f3ba1693.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1723#discussion_r17201437
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -856,13 +859,27 @@ private[spark] object Utils extends Logging {
        * finding the call site of a method.
        */
       private val SPARK_CLASS_REGEX = """^org\.apache\.spark(\.api\.java)?(\.util)?(\.rdd)?\.[A-Z]""".r
    +  private val SCALA_CLASS_REGEX = """^scala""".r
    +  private val SPARK_STREAMING_CLASS_REGEX = """^org\.apache\.spark""".r
    --- End diff --
    
    This is just Spark class right? No need to add `STREAMING` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org