You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by zsxwing <gi...@git.apache.org> on 2017/08/01 18:27:54 UTC

[GitHub] spark pull request #18799: [SPARK-21596][SS]Ensure places calling HDFSMetada...

GitHub user zsxwing opened a pull request:

    https://github.com/apache/spark/pull/18799

    [SPARK-21596][SS]Ensure places calling HDFSMetadataLog.get check the return value

    ## What changes were proposed in this pull request?
    
    When I was investigating a flaky test, I realized that many places don't check the return value of `HDFSMetadataLog.get(batchId: Long): Option[T]`. When a batch is supposed to be there, the caller just ignores None rather than throwing an error. If some bug causes a query doesn't generate a batch metadata file, this behavior will hide it and allow the query continuing to run and finally delete metadata logs and make it hard to debug.
    
    This PR ensures that places calling HDFSMetadataLog.get always check the return value.
    
    ## How was this patch tested?
    
    Jenkins

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zsxwing/spark SPARK-21596

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18799.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18799
    
----
commit 7090fc064b9df22f6082f6de03b82a5bdfb29210
Author: Shixiong Zhu <sh...@databricks.com>
Date:   2017-08-01T18:26:02Z

    Ensure places calling HDFSMetadataLog.get check the return value

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18799: [SPARK-21596][SS]Ensure places calling HDFSMetadataLog.g...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18799
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80429/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18799: [SPARK-21596][SS]Ensure places calling HDFSMetada...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18799#discussion_r132067907
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala ---
    @@ -1314,6 +1314,7 @@ class FileStreamSourceSuite extends FileStreamSourceTest {
         val metadataLog =
           new FileStreamSourceLog(FileStreamSourceLog.VERSION, spark, dir.getAbsolutePath)
           assert(metadataLog.add(0, Array(FileEntry(s"$scheme:///file1", 100L, 0))))
    +      assert(metadataLog.add(1, Array(FileEntry(s"$scheme:///file2", 200L, 0))))
    --- End diff --
    
    `newSource.getBatch(None, FileStreamSourceOffset(1))` will fail without this because batch `1` doesn't exist.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18799: [SPARK-21596][SS]Ensure places calling HDFSMetadataLog.g...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18799
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80418/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18799: [SPARK-21596][SS]Ensure places calling HDFSMetadataLog.g...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18799
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18799: [SPARK-21596][SS]Ensure places calling HDFSMetadataLog.g...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18799
  
    **[Test build #80418 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80418/testReport)** for PR 18799 at commit [`7f85b63`](https://github.com/apache/spark/commit/7f85b637467b559e9fb6065bdc2ac59f82a35333).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18799: [SPARK-21596][SS]Ensure places calling HDFSMetadataLog.g...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18799
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18799: [SPARK-21596][SS]Ensure places calling HDFSMetadataLog.g...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the issue:

    https://github.com/apache/spark/pull/18799
  
    Never mind @zsxwing I already opened a PR #18890


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18799: [SPARK-21596][SS]Ensure places calling HDFSMetadataLog.g...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18799
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80129/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18799: [SPARK-21596][SS]Ensure places calling HDFSMetada...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18799#discussion_r131804073
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala ---
    @@ -186,7 +190,11 @@ abstract class CompactibleFileStreamLog[T <: AnyRef : ClassTag](
           if (latestId >= 0) {
             try {
               val logs =
    -            getAllValidBatches(latestId, compactInterval).flatMap(id => super.get(id)).flatten
    +            getAllValidBatches(latestId, compactInterval).map { id =>
    +              super.get(id).getOrElse {
    +                throw new IllegalStateException(s"batch $id doesn't exist")
    --- End diff --
    
    would be good give more context information. like latestId, etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18799: [SPARK-21596][SS]Ensure places calling HDFSMetadataLog.g...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18799
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80370/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18799: [SPARK-21596][SS]Ensure places calling HDFSMetada...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18799#discussion_r132050794
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala ---
    @@ -1314,6 +1314,7 @@ class FileStreamSourceSuite extends FileStreamSourceTest {
         val metadataLog =
           new FileStreamSourceLog(FileStreamSourceLog.VERSION, spark, dir.getAbsolutePath)
           assert(metadataLog.add(0, Array(FileEntry(s"$scheme:///file1", 100L, 0))))
    +      assert(metadataLog.add(1, Array(FileEntry(s"$scheme:///file2", 200L, 0))))
    --- End diff --
    
    what was this change for?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18799: [SPARK-21596][SS]Ensure places calling HDFSMetadataLog.g...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18799
  
    **[Test build #80138 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80138/testReport)** for PR 18799 at commit [`91efeb3`](https://github.com/apache/spark/commit/91efeb3553e5e9f0f6fb45ae7574231c2e70d845).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18799: [SPARK-21596][SS]Ensure places calling HDFSMetadataLog.g...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18799
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80138/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18799: [SPARK-21596][SS]Ensure places calling HDFSMetadataLog.g...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18799
  
    **[Test build #80418 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80418/testReport)** for PR 18799 at commit [`7f85b63`](https://github.com/apache/spark/commit/7f85b637467b559e9fb6065bdc2ac59f82a35333).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18799: [SPARK-21596][SS]Ensure places calling HDFSMetadataLog.g...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the issue:

    https://github.com/apache/spark/pull/18799
  
    Merged to master. But there were conflicts with 2.2. Can you make another PR for 2.2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18799: [SPARK-21596][SS]Ensure places calling HDFSMetadataLog.g...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18799
  
    **[Test build #80129 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80129/testReport)** for PR 18799 at commit [`7090fc0`](https://github.com/apache/spark/commit/7090fc064b9df22f6082f6de03b82a5bdfb29210).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18799: [SPARK-21596][SS]Ensure places calling HDFSMetadataLog.g...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18799
  
    **[Test build #80138 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80138/testReport)** for PR 18799 at commit [`91efeb3`](https://github.com/apache/spark/commit/91efeb3553e5e9f0f6fb45ae7574231c2e70d845).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18799: [SPARK-21596][SS]Ensure places calling HDFSMetadataLog.g...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18799
  
    **[Test build #80429 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80429/testReport)** for PR 18799 at commit [`16b02da`](https://github.com/apache/spark/commit/16b02da4f75f7838006ce8a03d63e04d8bc21455).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18799: [SPARK-21596][SS]Ensure places calling HDFSMetadataLog.g...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the issue:

    https://github.com/apache/spark/pull/18799
  
    LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18799: [SPARK-21596][SS]Ensure places calling HDFSMetadataLog.g...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18799
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18799: [SPARK-21596][SS]Ensure places calling HDFSMetadataLog.g...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18799
  
    **[Test build #80129 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80129/testReport)** for PR 18799 at commit [`7090fc0`](https://github.com/apache/spark/commit/7090fc064b9df22f6082f6de03b82a5bdfb29210).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18799: [SPARK-21596][SS]Ensure places calling HDFSMetadataLog.g...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18799
  
    **[Test build #80370 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80370/testReport)** for PR 18799 at commit [`5170950`](https://github.com/apache/spark/commit/5170950784235984978d9a29f50f8317204c9e19).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18799: [SPARK-21596][SS]Ensure places calling HDFSMetadataLog.g...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18799
  
    **[Test build #80429 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80429/testReport)** for PR 18799 at commit [`16b02da`](https://github.com/apache/spark/commit/16b02da4f75f7838006ce8a03d63e04d8bc21455).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18799: [SPARK-21596][SS]Ensure places calling HDFSMetadataLog.g...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18799
  
    **[Test build #80370 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80370/testReport)** for PR 18799 at commit [`5170950`](https://github.com/apache/spark/commit/5170950784235984978d9a29f50f8317204c9e19).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18799: [SPARK-21596][SS]Ensure places calling HDFSMetadataLog.g...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/18799
  
    @tdas addressed your comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18799: [SPARK-21596][SS]Ensure places calling HDFSMetada...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18799#discussion_r132050716
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLogSuite.scala ---
    @@ -259,6 +259,19 @@ class HDFSMetadataLogSuite extends SparkFunSuite with SharedSQLContext {
           fm.rename(path2, path3)
         }
       }
    +
    +  test("verifyBatchIds") {
    +    import HDFSMetadataLog.verifyBatchIds
    +    verifyBatchIds(Seq(1L, 2L, 3L), Some(1L), Some(3L))
    +    verifyBatchIds(Seq(1L), Some(1L), Some(1L))
    --- End diff --
    
    you didnt test the valid cases when one of the start or end is None.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18799: [SPARK-21596][SS]Ensure places calling HDFSMetada...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18799#discussion_r132050517
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala ---
    @@ -437,4 +441,48 @@ object HDFSMetadataLog {
           }
         }
       }
    +
    +  /**
    +   * Verify if batchIds are continuous and between `startId` and `endId`.
    +   *
    +   * @param batchIds the sorted ids to verify.
    +   * @param startId the start id. If it's set, batchIds should start with this id.
    +   * @param endId the start id. If it's set, batchIds should end with this id.
    +   */
    +  def verifyBatchIds(batchIds: Seq[Long], startId: Option[Long], endId: Option[Long]): Unit = {
    +    // Verify that we can get all batches between `startId` and `endId`.
    +    if (startId.isDefined || endId.isDefined) {
    +      if (batchIds.isEmpty) {
    +        throw new IllegalStateException(s"batch ${startId.orElse(endId).get} doesn't exist")
    --- End diff --
    
    would be good to print the range that was asked for. otherwise its hard to see what was expected while debugging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18799: [SPARK-21596][SS]Ensure places calling HDFSMetada...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/18799


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18799: [SPARK-21596][SS]Ensure places calling HDFSMetadataLog.g...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18799
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18799: [SPARK-21596][SS]Ensure places calling HDFSMetada...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18799#discussion_r131804168
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala ---
    @@ -396,6 +400,7 @@ object HDFSMetadataLog {
     
         /**
          * Rename a path. Note that this implementation is not atomic.
    +     *
    --- End diff --
    
    is this needed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18799: [SPARK-21596][SS]Ensure places calling HDFSMetada...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18799#discussion_r131798109
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala ---
    @@ -169,7 +169,11 @@ abstract class CompactibleFileStreamLog[T <: AnyRef : ClassTag](
        */
       private def compact(batchId: Long, logs: Array[T]): Boolean = {
         val validBatches = getValidBatchesBeforeCompactionBatch(batchId, compactInterval)
    -    val allLogs = validBatches.flatMap(batchId => super.get(batchId)).flatten ++ logs
    +    val allLogs = validBatches.map { batchId =>
    +      super.get(batchId).getOrElse {
    +        throw new IllegalStateException(s"batch $batchId doesn't exist")
    --- End diff --
    
    what does it mean when this get return None? somehow the batch file is missing? is there anyway to improve the message any more?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18799: [SPARK-21596][SS]Ensure places calling HDFSMetada...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18799#discussion_r131793251
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala ---
    @@ -123,7 +123,7 @@ class HDFSMetadataLog[T <: AnyRef : ClassTag](sparkSession: SparkSession, path:
               serialize(metadata, output)
               return Some(tempPath)
             } finally {
    -          IOUtils.closeQuietly(output)
    +          output.close()
    --- End diff --
    
    The output stream may fail to close (e.g., fail to flush the internal buffer), if it happens, we should fail the query rather than ignoring it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18799: [SPARK-21596][SS]Ensure places calling HDFSMetadataLog.g...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18799
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org