You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by tdas <gi...@git.apache.org> on 2015/04/09 00:35:13 UTC

[GitHub] spark pull request: [Spark-6752]

GitHub user tdas opened a pull request:

    https://github.com/apache/spark/pull/5428

    [Spark-6752] 

    Currently if you want to create a StreamingContext from checkpoint information, the system will create a new SparkContext. This prevent StreamingContext to be recreated from checkpoints in managed environments where SparkContext is precreated.
    
    The solution in this PR: Introduce the following methods on StreamingContext
    1. `new StreamingContext(checkpointDirectory, sparkContext)`
       Recreate StreamingContext from checkpoint using the provided SparkContext
    2. `StreamingContext.getOrCreate(checkpointDirectory, sparkContext, createFunction: SparkContext => StreamingContext)`
       If checkpoint file exists, then recreate StreamingContext using the provided SparkContext (that is, call 1.), else create StreamingContext using the provided createFunction
    
    TODO: the corresponding Java and Python API has to be added as well.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tdas/spark SPARK-6752

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5428.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5428
    
----
commit 204814ea2be868257b32f686e1455254f5d60582
Author: Tathagata Das <ta...@gmail.com>
Date:   2015-04-08T22:30:48Z

    Added StreamingContext.getOrCreate with existing SparkContext

commit 36a782356e9fc032a0d6a42251ae82e2af25aeaa
Author: Tathagata Das <ta...@gmail.com>
Date:   2015-04-08T22:32:02Z

    Minor changes.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by zzcclp <gi...@git.apache.org>.
Github user zzcclp commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-91180116
  
    @tdas , can this RP resolve [this issue](https://issues.apache.org/jira/browse/SPARK-5206)? 
    Restart a streaming app from checkpoint  incorrectly if using accumulators .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-94978575
  
      [Test build #691 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/691/consoleFull) for   PR 5428 at commit [`eabd092`](https://github.com/apache/spark/commit/eabd092fd794e50c67c82a926b44b173a8dfc5e6).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by harishreedharan <gi...@git.apache.org>.
Github user harishreedharan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5428#discussion_r28085485
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala ---
    @@ -77,7 +77,8 @@ object Checkpoint extends Logging {
       }
     
       /** Get checkpoint files present in the give directory, ordered by oldest-first */
    -  def getCheckpointFiles(checkpointDir: String, fs: FileSystem): Seq[Path] = {
    +  def getCheckpointFiles(checkpointDir: String, fsOption: Option[FileSystem] = None): Seq[Path] = {
    --- End diff --
    
    This change seems unrelated to this fix. I think we can simply do a null check inside this method and create a FileSystem if needed to avoid unnecessary changes to the calls (all the fs being passed in changing to Some(fs)) -- keeps git history sane.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5428#discussion_r28060398
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala ---
    @@ -271,7 +282,10 @@ object CheckpointReader extends Logging {
         })
     
         // If none of checkpoint files could be read, then throw exception
    -    throw new SparkException("Failed to read checkpoint from directory " + checkpointPath)
    +    if (!ignoreReadError) {
    +      throw new SparkException("Failed to read checkpoint from directory " + checkpointPath)
    --- End diff --
    
    Is it better to change to string interpolator style?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-93671163
  
    Jenkins, test this again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-95323488
  
    LGTM pending Jenkins.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5428#discussion_r28384336
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala ---
    @@ -271,7 +282,10 @@ object CheckpointReader extends Logging {
         })
     
         // If none of checkpoint files could be read, then throw exception
    -    throw new SparkException("Failed to read checkpoint from directory " + checkpointPath)
    +    if (!ignoreReadError) {
    +      throw new SparkException("Failed to read checkpoint from directory " + checkpointPath)
    --- End diff --
    
    Yes, I will. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-97531627
  
    This PR was reverted because I had used MutableBoolean which does not seem to work well with Hadoop 1.0.4. I reopened the PR in #5773.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by harishreedharan <gi...@git.apache.org>.
Github user harishreedharan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5428#discussion_r28086500
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala ---
    @@ -114,11 +123,15 @@ class StreamingContext private[streaming] (
     
       private[streaming] val isCheckpointPresent = (cp_ != null)
     
    +  private[streaming] val isSparkContextPresent = (sc_ != null)
    --- End diff --
    
    This seems to have been defined for setting `sc` below, but once `sc` is set this variable can be false, while `sc` is no longer `null`. This `val` should probably not be added, else we risk ending up having bugs caused by this being `false` which it can never be, since a spark context is always present.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-95206224
  
      [Test build #693 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/693/consoleFull) for   PR 5428 at commit [`94db63c`](https://github.com/apache/spark/commit/94db63c7603c159d2156bd5fe55acf1149a3b89b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-95497583
  
      [Test build #697 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/697/consoleFull) for   PR 5428 at commit [`94db63c`](https://github.com/apache/spark/commit/94db63c7603c159d2156bd5fe55acf1149a3b89b).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-93620037
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30389/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-95147426
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30751/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-91214515
  
    @tdas Yeah, will do.
    
    @zzcclp I'm not sure, maybe you can take a try, from my guess, this could possibly work, since accumulator is registered in SparkContext, while SparkContext is still existed. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by zzcclp <gi...@git.apache.org>.
Github user zzcclp commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-91394800
  
    @jerryshao , thanks, I will test this case later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5428#discussion_r28910108
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala ---
    @@ -655,6 +656,7 @@ object JavaStreamingContext {
        * @param checkpointPath Checkpoint directory used in an earlier JavaStreamingContext program
        * @param factory        JavaStreamingContextFactory object to create a new JavaStreamingContext
        */
    +  @deprecated("use getOrCreate without JavaStreamingContextFactor", "1.4.0")
    --- End diff --
    
    This still needs to be fixed on this line!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-91073266
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29898/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5428#discussion_r28384469
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala ---
    @@ -77,7 +77,8 @@ object Checkpoint extends Logging {
       }
     
       /** Get checkpoint files present in the give directory, ordered by oldest-first */
    -  def getCheckpointFiles(checkpointDir: String, fs: FileSystem): Seq[Path] = {
    +  def getCheckpointFiles(checkpointDir: String, fsOption: Option[FileSystem] = None): Seq[Path] = {
    --- End diff --
    
    The reason I added this is so that we should not have to handle nulls. Dealing with nulls is severely frowned upon in Sclaa, and precisely why Option was introduced. There are many places where this has been done, and slowly I was fix those. I think this is a small enough change (doesnt change functionality, or existing code paths) that is okay to do this. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5428#discussion_r28841272
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala ---
    @@ -271,7 +282,10 @@ object CheckpointReader extends Logging {
         })
     
         // If none of checkpoint files could be read, then throw exception
    -    throw new SparkException("Failed to read checkpoint from directory " + checkpointPath)
    +    if (!ignoreReadError) {
    +      throw new SparkException("Failed to read checkpoint from directory " + checkpointPath)
    --- End diff --
    
    Do you think that we should log a warning message in the case where we ignore the error?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-95074534
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30743/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-93620032
  
      [Test build #30389 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30389/consoleFull) for   PR 5428 at commit [`eabd092`](https://github.com/apache/spark/commit/eabd092fd794e50c67c82a926b44b173a8dfc5e6).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5428#discussion_r28841856
  
    --- Diff: streaming/src/test/java/org/apache/spark/streaming/JavaAPISuite.java ---
    @@ -1707,6 +1708,71 @@ public Integer call(String s) throws Exception {
         Utils.deleteRecursively(tempDir);
       }
     
    +  @SuppressWarnings("unchecked")
    +  @Test
    +  public void testContextGetOrCreate() throws InterruptedException {
    +
    +    final SparkConf conf = new SparkConf()
    +        .setMaster("local[2]")
    +        .setAppName("test")
    +        .set("newContext", "true");
    +
    +    File emptyDir = Files.createTempDir();
    +    emptyDir.deleteOnExit();
    +    StreamingContextSuite contextSuite = new StreamingContextSuite();
    +    String corruptedCheckpointDir = contextSuite.createCorruptedCheckpoint();
    +    String checkpointDir = contextSuite.createValidCheckpoint();
    +
    +    // Function to create JavaStreamingContext without any output operations
    +    // (used to detect the new context)
    +    Function0<JavaStreamingContext> creatingFunc = new Function0<JavaStreamingContext>() {
    +      public JavaStreamingContext call() {
    +        return new JavaStreamingContext(conf, Seconds.apply(1));
    +      }
    +    };
    +
    +    ssc = JavaStreamingContext.getOrCreate(emptyDir.getAbsolutePath(), creatingFunc);
    +    Assert.assertTrue("new context not created",
    --- End diff --
    
    Is your goal here to assert that `creatingFunc` was not called?  This seems like kind of a roundabout way of doing that.  I know that Mockito makes it pretty easy to write assertions that check the number of times that methods are invoked; alternatively, I guess you could just stick a "number of times called" counter in your function and check it in your assert.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-95305068
  
      [Test build #694 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/694/consoleFull) for   PR 5428 at commit [`94db63c`](https://github.com/apache/spark/commit/94db63c7603c159d2156bd5fe55acf1149a3b89b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-91232170
  
    It looks good to me. Simply curious about the scenarios of this usage, is there any situation where streaming context is failed but spark context is still existed when driver failure is met?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5428#discussion_r28850515
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala ---
    @@ -621,19 +636,59 @@ object StreamingContext extends Logging {
           hadoopConf: Configuration = new Configuration(),
           createOnError: Boolean = false
         ): StreamingContext = {
    -    val checkpointOption = try {
    -      CheckpointReader.read(checkpointPath,  new SparkConf(), hadoopConf)
    -    } catch {
    -      case e: Exception =>
    -        if (createOnError) {
    -          None
    -        } else {
    -          throw e
    -        }
    -    }
    +    val checkpointOption = CheckpointReader.read(
    +      checkpointPath, new SparkConf(), hadoopConf, createOnError)
         checkpointOption.map(new StreamingContext(null, _, null)).getOrElse(creatingFunc())
       }
     
    +
    +  /**
    +   * Either recreate a StreamingContext from checkpoint data or create a new StreamingContext.
    +   * If checkpoint data exists in the provided `checkpointPath`, then StreamingContext will be
    +   * recreated from the checkpoint data. If the data does not exist, then the StreamingContext
    +   * will be created by called the provided `creatingFunc` on the provided `sparkContext`. Note
    +   * that the SparkConf configuration in the checkpoint data will not be restored as the
    +   * SparkContext has already been created.
    +   *
    +   * @param checkpointPath Checkpoint directory used in an earlier StreamingContext program
    +   * @param creatingFunc   Function to create a new StreamingContext using the given SparkContext
    +   * @param sparkContext   SparkContext using which the StreamingContext will be created
    +   */
    +  def getOrCreate(
    --- End diff --
    
    This is needed because there is already a version of getOrCreate with default arguments, and there can be only one overloaded version with default args. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-91176687
  
    @jerryshao Mind taking a look at this? Its still WIP as unit tests are commented out.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-94586606
  
    Jenkins, test this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-95128390
  
      [Test build #30751 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30751/consoleFull) for   PR 5428 at commit [`94db63c`](https://github.com/apache/spark/commit/94db63c7603c159d2156bd5fe55acf1149a3b89b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5428#discussion_r28847514
  
    --- Diff: streaming/src/test/java/org/apache/spark/streaming/JavaAPISuite.java ---
    @@ -1707,6 +1708,71 @@ public Integer call(String s) throws Exception {
         Utils.deleteRecursively(tempDir);
       }
     
    +  @SuppressWarnings("unchecked")
    +  @Test
    +  public void testContextGetOrCreate() throws InterruptedException {
    +
    +    final SparkConf conf = new SparkConf()
    +        .setMaster("local[2]")
    +        .setAppName("test")
    +        .set("newContext", "true");
    +
    +    File emptyDir = Files.createTempDir();
    +    emptyDir.deleteOnExit();
    +    StreamingContextSuite contextSuite = new StreamingContextSuite();
    +    String corruptedCheckpointDir = contextSuite.createCorruptedCheckpoint();
    +    String checkpointDir = contextSuite.createValidCheckpoint();
    +
    +    // Function to create JavaStreamingContext without any output operations
    +    // (used to detect the new context)
    +    Function0<JavaStreamingContext> creatingFunc = new Function0<JavaStreamingContext>() {
    +      public JavaStreamingContext call() {
    +        return new JavaStreamingContext(conf, Seconds.apply(1));
    +      }
    +    };
    +
    +    ssc = JavaStreamingContext.getOrCreate(emptyDir.getAbsolutePath(), creatingFunc);
    +    Assert.assertTrue("new context not created",
    --- End diff --
    
    My knowledge of Mockito is limited, but it seemed to me that when you configure the mock with  `when...thenReturn` , you have to return a object, and you cannot on-demand call a function to create the object to return. That is necessary in this case, as you have to create a new StreamingContext when getOrCreate is called, and cannot configure with  `when...thenReturn` an already created context. 
    
    That said, I think you are right about the current way be roundabout. I was trying to avoid having an extra flag var to signify called or not. But that would make it easier to understand. I will change it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5428#discussion_r28847268
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala ---
    @@ -107,6 +107,15 @@ class StreamingContext private[streaming] (
        */
       def this(path: String) = this(path, new Configuration)
     
    +
    +  def this(path: String, sparkContext: SparkContext) = {
    --- End diff --
    
    Right. Damn, missed that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5428#discussion_r28478175
  
    --- Diff: streaming/src/test/java/org/apache/spark/streaming/JavaAPISuite.java ---
    @@ -987,12 +988,12 @@ public void testPairMap2() { // Maps pair -> single
         JavaDStream<Tuple2<String, Integer>> stream = JavaTestUtils.attachTestInputStream(ssc, inputData, 1);
         JavaPairDStream<String, Integer> pairStream = JavaPairDStream.fromJavaDStream(stream);
         JavaDStream<Integer> reversed = pairStream.map(
    --- End diff --
    
    Some of these are unrelated to the PR, but just cleans up the formatting of the JavaAPISuite which is quite badly formatted.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-95002858
  
    Yes, this is not intended to solve SPARK-5206. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5428#discussion_r28841661
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala ---
    @@ -621,19 +636,59 @@ object StreamingContext extends Logging {
           hadoopConf: Configuration = new Configuration(),
           createOnError: Boolean = false
         ): StreamingContext = {
    -    val checkpointOption = try {
    -      CheckpointReader.read(checkpointPath,  new SparkConf(), hadoopConf)
    -    } catch {
    -      case e: Exception =>
    -        if (createOnError) {
    -          None
    -        } else {
    -          throw e
    -        }
    -    }
    +    val checkpointOption = CheckpointReader.read(
    +      checkpointPath, new SparkConf(), hadoopConf, createOnError)
         checkpointOption.map(new StreamingContext(null, _, null)).getOrElse(creatingFunc())
       }
     
    +
    +  /**
    +   * Either recreate a StreamingContext from checkpoint data or create a new StreamingContext.
    +   * If checkpoint data exists in the provided `checkpointPath`, then StreamingContext will be
    +   * recreated from the checkpoint data. If the data does not exist, then the StreamingContext
    +   * will be created by called the provided `creatingFunc` on the provided `sparkContext`. Note
    +   * that the SparkConf configuration in the checkpoint data will not be restored as the
    +   * SparkContext has already been created.
    +   *
    +   * @param checkpointPath Checkpoint directory used in an earlier StreamingContext program
    +   * @param creatingFunc   Function to create a new StreamingContext using the given SparkContext
    +   * @param sparkContext   SparkContext using which the StreamingContext will be created
    +   */
    +  def getOrCreate(
    --- End diff --
    
    Do you need this particular overloaded method?  Couldn't you just provide `false` as a default argument to `createOnError` in the next line?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-93120421
  
    @zzcclp I dont think it will solve this issue directly. But it may allow the SparkContext to be re-initialized properly before the StreamingContext is recreated from checkpoints. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-95383691
  
      [Test build #695 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/695/consoleFull) for   PR 5428 at commit [`94db63c`](https://github.com/apache/spark/commit/94db63c7603c159d2156bd5fe55acf1149a3b89b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-93611520
  
    @JoshRosen Please take a quick look at the Function. 
    @jerryshao @harishreedharan I have updated the patch with Java API and unit tests. I think I am going to create a separate JIRA for python API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-95398368
  
      [Test build #695 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/695/consoleFull) for   PR 5428 at commit [`94db63c`](https://github.com/apache/spark/commit/94db63c7603c159d2156bd5fe55acf1149a3b89b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-91073251
  
      [Test build #29898 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29898/consoleFull) for   PR 5428 at commit [`36a7823`](https://github.com/apache/spark/commit/36a782356e9fc032a0d6a42251ae82e2af25aeaa).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5428#discussion_r28841304
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala ---
    @@ -107,6 +107,15 @@ class StreamingContext private[streaming] (
        */
       def this(path: String) = this(path, new Configuration)
     
    +
    +  def this(path: String, sparkContext: SparkContext) = {
    --- End diff --
    
    This should probably have scaladoc to explain that it restores from a checkpoint?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-94996743
  
      [Test build #692 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/692/consoleFull) for   PR 5428 at commit [`eabd092`](https://github.com/apache/spark/commit/eabd092fd794e50c67c82a926b44b173a8dfc5e6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5428#discussion_r28850543
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala ---
    @@ -655,6 +656,7 @@ object JavaStreamingContext {
        * @param checkpointPath Checkpoint directory used in an earlier JavaStreamingContext program
        * @param factory        JavaStreamingContextFactory object to create a new JavaStreamingContext
        */
    +  @deprecated("use getOrCreate without JavaStreamingContextFactor", "1.4.0")
    --- End diff --
    
    Fixed. Good catch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/5428


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-95074518
  
      [Test build #30743 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30743/consoleFull) for   PR 5428 at commit [`524f519`](https://github.com/apache/spark/commit/524f519ae69d7e1e70a637f826e8f0d859690aaf).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5428#discussion_r28847303
  
    --- Diff: streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala ---
    @@ -328,6 +330,138 @@ class StreamingContextSuite extends FunSuite with BeforeAndAfter with Timeouts w
         }
       }
     
    +  test("getOrCreate") {
    --- End diff --
    
    Thank you!!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-93122446
  
    @all This is still a WIP. Adding the equivalent Java API requires refactoring the existing `JavaStreamingContext.getOrCreate` to not use `JavaStreamingContextFactory` and use `o.a.s.java.api.Function0` (which needs to be added). I am going to open other PRs for them before this can be merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-91056649
  
      [Test build #29898 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29898/consoleFull) for   PR 5428 at commit [`36a7823`](https://github.com/apache/spark/commit/36a782356e9fc032a0d6a42251ae82e2af25aeaa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-95174821
  
      [Test build #693 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/693/consoleFull) for   PR 5428 at commit [`94db63c`](https://github.com/apache/spark/commit/94db63c7603c159d2156bd5fe55acf1149a3b89b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-93610784
  
      [Test build #30389 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30389/consoleFull) for   PR 5428 at commit [`eabd092`](https://github.com/apache/spark/commit/eabd092fd794e50c67c82a926b44b173a8dfc5e6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-95147418
  
      [Test build #30751 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30751/consoleFull) for   PR 5428 at commit [`94db63c`](https://github.com/apache/spark/commit/94db63c7603c159d2156bd5fe55acf1149a3b89b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5428#discussion_r28841681
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala ---
    @@ -655,6 +656,7 @@ object JavaStreamingContext {
        * @param checkpointPath Checkpoint directory used in an earlier JavaStreamingContext program
        * @param factory        JavaStreamingContextFactory object to create a new JavaStreamingContext
        */
    +  @deprecated("use getOrCreate without JavaStreamingContextFactor", "1.4.0")
    --- End diff --
    
    Typo: missing a `y` in "Factory"; same for other annotations.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by zzcclp <gi...@git.apache.org>.
Github user zzcclp commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-96484530
  
    hi, @tdas , why this PR was be reverted?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-95678011
  
    Merging this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5428#discussion_r28388452
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala ---
    @@ -77,7 +77,8 @@ object Checkpoint extends Logging {
       }
     
       /** Get checkpoint files present in the give directory, ordered by oldest-first */
    -  def getCheckpointFiles(checkpointDir: String, fs: FileSystem): Seq[Path] = {
    +  def getCheckpointFiles(checkpointDir: String, fsOption: Option[FileSystem] = None): Seq[Path] = {
    --- End diff --
    
    BTW, this file has to change anyways in the attempt to make the semantics of `read` more clear.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-95439240
  
      [Test build #696 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/696/consoleFull) for   PR 5428 at commit [`94db63c`](https://github.com/apache/spark/commit/94db63c7603c159d2156bd5fe55acf1149a3b89b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5428#discussion_r28841893
  
    --- Diff: streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala ---
    @@ -328,6 +330,138 @@ class StreamingContextSuite extends FunSuite with BeforeAndAfter with Timeouts w
         }
       }
     
    +  test("getOrCreate") {
    --- End diff --
    
    This is a nice test; very comprehensive!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-95474793
  
      [Test build #697 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/697/consoleFull) for   PR 5428 at commit [`94db63c`](https://github.com/apache/spark/commit/94db63c7603c159d2156bd5fe55acf1149a3b89b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5428#discussion_r28384609
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala ---
    @@ -114,11 +123,15 @@ class StreamingContext private[streaming] (
     
       private[streaming] val isCheckpointPresent = (cp_ != null)
     
    +  private[streaming] val isSparkContextPresent = (sc_ != null)
    --- End diff --
    
    Good point. My idea was to keep it consistent with isCheckpointPresent, which has the semantics of "was checkpoint present at the time of creation of the StremaingContext". So this was meant to be "was an existing sparkcontext used to create the stremaingContext". But i can see that it can be confusing. Will remove that completely.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-95010237
  
      [Test build #692 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/692/consoleFull) for   PR 5428 at commit [`eabd092`](https://github.com/apache/spark/commit/eabd092fd794e50c67c82a926b44b173a8dfc5e6).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by zzcclp <gi...@git.apache.org>.
Github user zzcclp commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-95000892
  
    @tdas , I tested streaming recovering from checkpoint with this PR, it failed if it use accumulators, so this assuredly can't solve [issue SPARK-5206](https://issues.apache.org/jira/browse/SPARK-5206) directly. how to solve [issue SPARK-5206](https://issues.apache.org/jira/browse/SPARK-5206)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-94967558
  
      [Test build #691 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/691/consoleFull) for   PR 5428 at commit [`eabd092`](https://github.com/apache/spark/commit/eabd092fd794e50c67c82a926b44b173a8dfc5e6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6752][Streaming] Allow StreamingContext...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5428#issuecomment-95073586
  
      [Test build #30743 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30743/consoleFull) for   PR 5428 at commit [`524f519`](https://github.com/apache/spark/commit/524f519ae69d7e1e70a637f826e8f0d859690aaf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org