You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jerryshao <gi...@git.apache.org> on 2018/04/02 09:07:53 UTC

[GitHub] spark pull request #20958: []Fix socket source honors recovered offsets issu...

GitHub user jerryshao opened a pull request:

    https://github.com/apache/spark/pull/20958

    []Fix socket source honors recovered offsets issue

    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jerryshao/apache-spark SPARK-23844

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20958.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20958
    
----
commit 5538a0229f6df62b33360925238cff4fafbad1ff
Author: jerryshao <ss...@...>
Date:   2018-04-02T09:02:30Z

    Fix socket source honors recovered offsets issue

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20958: [SPARK-23844][SS] Fix socket source honors recovered off...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/20958
  
    Jenkins, test this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20958: [SPARK-23844][SS] Fix socket source honors recovered off...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20958
  
    test this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20958: [SPARK-23844][SS] Fix socket source honors recovered off...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20958
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2074/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20958: [SPARK-23844][SS] Fix socket source honors recovered off...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/20958
  
    Not sure why the test is not triggered, maybe jenkins is down.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20958: [SPARK-23844][SS] Fix socket source honors recovered off...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20958
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20958: [SPARK-23844][SS] Fix socket source honors recovered off...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20958
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20958: [SPARK-23844][SS] Fix socket source honors recove...

Posted by attilapiros <gi...@git.apache.org>.
Github user attilapiros commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20958#discussion_r214063180
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/sources/TextSocketStreamSuite.scala ---
    @@ -256,6 +257,58 @@ class TextSocketStreamSuite extends StreamTest with SharedSQLContext with Before
         }
       }
     
    +  test("fail on recovery - true") {
    +    val checkpointDir = Files.createTempDirectory("checkpoint").toFile.getAbsolutePath
    +    runSocketStream(checkpointDir, true, Seq(("hello", "hello"), ("world", "world")))
    +
    +    // Rerun socket stream will throw an exception, because it wrongly honors the recovered offsets.
    +    val exception = intercept[StreamingQueryException](
    +      runSocketStream(checkpointDir, true, Seq(("hello", "hello"), ("world", "world"))))
    +    assert(exception.getMessage.contains(
    +      "terminated with exception: Offsets committed out of order"))
    +  }
    +
    +  test("fail on recovery - false") {
    +    val checkpointDir = Files.createTempDirectory("checkpoint").toFile.getAbsolutePath
    +    runSocketStream(checkpointDir, false, Seq(("hello", "hello"), ("world", "world")))
    +
    +    // Rerun socket stream will not throw exception
    +    runSocketStream(checkpointDir, false, Seq(("hello", "hello"), ("world", "world")))
    +  }
    +
    +  private def runSocketStream(
    +      chkpointDir: String,
    +      failOnRecovery: Boolean,
    +      inputAndRets: Seq[(String, String)]): Unit = withSQLConf(
    +    "spark.sql.streaming.unsupportedOperationCheck" -> "false") {
    +      serverThread = new ServerThread()
    +      serverThread.start()
    +
    +      try {
    +        val ref = spark
    +        import ref.implicits._
    +        val socket = spark
    +          .readStream
    +          .format("socket")
    +          .options(Map("host" -> "localhost", "port" -> serverThread.port.toString))
    +          .option("failonrecovery", failOnRecovery.toString)
    +          .load()
    +          .as[String]
    +
    +        val actions = Seq(StartStream(checkpointLocation = chkpointDir)) ++
    +          inputAndRets.flatMap {
    +            case (input, ret) => Seq(AddSocketData(input), CheckLastBatch(ret))
    +          } ++ Seq(StopStream)
    +        testStream(socket)(actions : _*)
    +      } finally {
    +        if (serverThread != null) {
    +          serverThread.interrupt()
    +          serverThread.join()
    +          serverThread = null
    --- End diff --
    
    Nit: assignment not needed


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20958: [SPARK-23844][SS] Fix socket source honors recovered off...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20958
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1555/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20958: [SPARK-23844][SS] Fix socket source honors recovered off...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/20958
  
    Jenkins, retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20958: [SPARK-23844][SS] Fix socket source honors recovered off...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20958
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20958: [SPARK-23844][SS] Fix socket source honors recovered off...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/20958
  
    Thanks @tdas for your comments. I agree that socket source should only be used in testing. But it doesn't mean that it can throw weird exception in testing env. For example, if we're dumping socket source data to parquet in test env, parquet source requires checkpoint (even in test env), but socket source should not honor checkpoint data, this basically not works also in test env. My thinking is that cases in test env should also be worked basically, but doesn't have in-production features like checkpointing.
    
    Maybe we can improve error log, but still cases like socket -> parquet doesn't work. 
    
    Let me think about how to better address this issue. Thanks again.
    
    
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20958: [SPARK-23844][SS] Fix socket source honors recove...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20958#discussion_r178646725
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala ---
    @@ -238,6 +238,10 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) {
             "write files of Hive data source directly.")
         }
     
    +    val isSocketExists = df.queryExecution.analyzed.collect {
    --- End diff --
    
    I see what you are trying to do. But, honestly, we should NOT add any more special cases for specific sources. We already have memory and foreach, because it is hard to get rid of those. We should not add more.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20958: [SPARK-23844][SS] Fix socket source honors recovered off...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20958
  
    **[Test build #89025 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89025/testReport)** for PR 20958 at commit [`919bd11`](https://github.com/apache/spark/commit/919bd112a4a556abe21aa0d2c609e2468af89316).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20958: [SPARK-23844][SS] Fix socket source honors recovered off...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20958
  
    **[Test build #89028 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89028/testReport)** for PR 20958 at commit [`919bd11`](https://github.com/apache/spark/commit/919bd112a4a556abe21aa0d2c609e2468af89316).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20958: [SPARK-23844][SS] Fix socket source honors recovered off...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20958
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89025/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20958: [SPARK-23844][SS] Fix socket source honors recovered off...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20958
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5210/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20958: [SPARK-23844][SS] Fix socket source honors recovered off...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20958
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20958: [SPARK-23844][SS] Fix socket source honors recovered off...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the issue:

    https://github.com/apache/spark/pull/20958
  
    We have made it clear that sockets is ONLY for testing and will not recover data from checkpoints. So I see no problem that it throws errors when attempting to recover. May we can improve the error message by making it clear that recovery is not supported. 
    
    If you indeed want to forget lost data and proceed, then that should be an opt-in. We could do this by explicitly setting a source option (like failOnDataLoss = false in Kafka source). 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20958: [SPARK-23844][SS] Fix socket source honors recovered off...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20958
  
    **[Test build #89025 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89025/testReport)** for PR 20958 at commit [`919bd11`](https://github.com/apache/spark/commit/919bd112a4a556abe21aa0d2c609e2468af89316).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20958: [SPARK-23844][SS] Fix socket source honors recovered off...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20958
  
    Build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20958: [SPARK-23844][SS] Fix socket source honors recovered off...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20958
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20958: [SPARK-23844][SS] Fix socket source honors recovered off...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20958
  
    **[Test build #89027 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89027/testReport)** for PR 20958 at commit [`919bd11`](https://github.com/apache/spark/commit/919bd112a4a556abe21aa0d2c609e2468af89316).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20958: [SPARK-23844][SS] Fix socket source honors recovered off...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20958
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20958: [SPARK-23844][SS] Fix socket source honors recovered off...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20958
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89027/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20958: [SPARK-23844][SS] Fix socket source honors recovered off...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20958
  
    Build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20958: [SPARK-23844][SS] Fix socket source honors recovered off...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20958
  
    **[Test build #89027 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89027/testReport)** for PR 20958 at commit [`919bd11`](https://github.com/apache/spark/commit/919bd112a4a556abe21aa0d2c609e2468af89316).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20958: [SPARK-23844][SS] Fix socket source honors recovered off...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20958
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20958: [SPARK-23844][SS] Fix socket source honors recovered off...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20958
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2072/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20958: [SPARK-23844][SS] Fix socket source honors recovered off...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20958
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20958: [SPARK-23844][SS] Fix socket source honors recovered off...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20958
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89028/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20958: [SPARK-23844][SS] Fix socket source honors recovered off...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/20958
  
    Jenkins, retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20958: [SPARK-23844][SS] Fix socket source honors recovered off...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20958
  
    **[Test build #89028 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89028/testReport)** for PR 20958 at commit [`919bd11`](https://github.com/apache/spark/commit/919bd112a4a556abe21aa0d2c609e2468af89316).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20958: [SPARK-23844][SS] Fix socket source honors recovered off...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/20958
  
    @tdas , by thought about your suggestion about "failOnDataLoss" option, I made a similar proposal on socket source, would you please review again. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20958: [SPARK-23844][SS] Fix socket source honors recovered off...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20958
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2073/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org