You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by lw-lin <gi...@git.apache.org> on 2017/03/19 10:31:53 UTC

[GitHub] spark pull request #17346: [WIP] DataFrame batch reader may fail to infer pa...

GitHub user lw-lin opened a pull request:

    https://github.com/apache/spark/pull/17346

    [WIP] DataFrame batch reader may fail to infer partitions when reading FileStreamSink's output

    WIP of SPARK-19965
    
    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/lw-lin/spark filter-metadata

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17346.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17346
    
----
commit 19d0d485f0190cc8b1df8bec3a9f3b56bca3883e
Author: Liwei Lin <lw...@gmail.com>
Date:   2017-03-19T09:43:33Z

    Initial commit

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17346: [SPARK-19965][SS] DataFrame batch reader may fail...

Posted by lw-lin <gi...@git.apache.org>.
Github user lw-lin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17346#discussion_r114468821
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala ---
    @@ -53,6 +53,29 @@ object FileStreamSink extends Logging {
           case _ => false
         }
       }
    +
    +  /**
    +   * Returns true if the path is the metadata dir or its ancestor is the metadata dir.
    +   * E.g.:
    +   *  - ancestorIsMetadataDirectory(/.../_spark_metadata) => true
    +   *  - ancestorIsMetadataDirectory(/.../_spark_metadata/0) => true
    +   *  - ancestorIsMetadataDirectory(/a/b/c) => false
    +   */
    +  def ancestorIsMetadataDirectory(path: Path): Boolean = {
    +    require(path.isAbsolute, s"$path is required to be absolute")
    +    var currentPath = path
    +    var finished = false
    +    while (!finished) {
    --- End diff --
    
    fixed. good point!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17346: [SPARK-19965][SS] DataFrame batch reader may fail...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17346#discussion_r114396634
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSinkSuite.scala ---
    @@ -145,6 +147,41 @@ class FileStreamSinkSuite extends StreamTest {
         }
       }
     
    +  test("partitioned writing and batch reading with 'basePath'") {
    +    val inputData = MemoryStream[Int]
    +    val ds = inputData.toDS()
    +
    +    val outputDir = Utils.createTempDir(namePrefix = "stream.output").getCanonicalPath
    +    val checkpointDir = Utils.createTempDir(namePrefix = "stream.checkpoint").getCanonicalPath
    --- End diff --
    
    nit: same as above


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    **[Test build #76301 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76301/testReport)** for PR 17346 at commit [`59ee112`](https://github.com/apache/spark/commit/59ee11219764745abe40ea7f846b0f68e5833e94).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by lw-lin <gi...@git.apache.org>.
Github user lw-lin commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by lw-lin <gi...@git.apache.org>.
Github user lw-lin commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    Comments have beed addressed -- @zsxwing it'd be great if you could take another look


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [WIP] DataFrame batch reader may fail to infer partition...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    **[Test build #74819 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74819/testReport)** for PR 17346 at commit [`19d0d48`](https://github.com/apache/spark/commit/19d0d485f0190cc8b1df8bec3a9f3b56bca3883e).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17346: [SPARK-19965][SS] DataFrame batch reader may fail...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17346


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76348/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [WIP] DataFrame batch reader may fail to infer partition...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74819/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by lw-lin <gi...@git.apache.org>.
Github user lw-lin commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    **[Test build #76408 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76408/testReport)** for PR 17346 at commit [`49ee54d`](https://github.com/apache/spark/commit/49ee54d7a644b916e5c1c2c58f4cd1e011c7abc6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17346: [SPARK-19965][SS] DataFrame batch reader may fail...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17346#discussion_r114396542
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSinkSuite.scala ---
    @@ -145,6 +147,41 @@ class FileStreamSinkSuite extends StreamTest {
         }
       }
     
    +  test("partitioned writing and batch reading with 'basePath'") {
    +    val inputData = MemoryStream[Int]
    +    val ds = inputData.toDS()
    +
    +    val outputDir = Utils.createTempDir(namePrefix = "stream.output").getCanonicalPath
    --- End diff --
    
    nit: use `withTempDir` to create temp dir instead


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    LGTM. Thanks! Merging to master and 2.2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    **[Test build #75336 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75336/testReport)** for PR 17346 at commit [`0e35db7`](https://github.com/apache/spark/commit/0e35db701342ff426a037c519e50c17d003931fb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17346: [SPARK-19965][SS] DataFrame batch reader may fail...

Posted by lw-lin <gi...@git.apache.org>.
Github user lw-lin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17346#discussion_r114468801
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala ---
    @@ -53,6 +53,29 @@ object FileStreamSink extends Logging {
           case _ => false
         }
       }
    +
    +  /**
    +   * Returns true if the path is the metadata dir or its ancestor is the metadata dir.
    +   * E.g.:
    +   *  - ancestorIsMetadataDirectory(/.../_spark_metadata) => true
    +   *  - ancestorIsMetadataDirectory(/.../_spark_metadata/0) => true
    +   *  - ancestorIsMetadataDirectory(/a/b/c) => false
    +   */
    +  def ancestorIsMetadataDirectory(path: Path): Boolean = {
    +    require(path.isAbsolute, s"$path is required to be absolute")
    --- End diff --
    
    switched to `makeQualified`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by lw-lin <gi...@git.apache.org>.
Github user lw-lin commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    Rebased to master to resolve conflicts


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76081/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    **[Test build #76301 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76301/testReport)** for PR 17346 at commit [`59ee112`](https://github.com/apache/spark/commit/59ee11219764745abe40ea7f846b0f68e5833e94).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    **[Test build #75565 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75565/testReport)** for PR 17346 at commit [`0e35db7`](https://github.com/apache/spark/commit/0e35db701342ff426a037c519e50c17d003931fb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    **[Test build #76408 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76408/testReport)** for PR 17346 at commit [`49ee54d`](https://github.com/apache/spark/commit/49ee54d7a644b916e5c1c2c58f4cd1e011c7abc6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17346: [SPARK-19965][SS] DataFrame batch reader may fail...

Posted by lw-lin <gi...@git.apache.org>.
Github user lw-lin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17346#discussion_r114468833
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSinkSuite.scala ---
    @@ -145,6 +147,41 @@ class FileStreamSinkSuite extends StreamTest {
         }
       }
     
    +  test("partitioned writing and batch reading with 'basePath'") {
    +    val inputData = MemoryStream[Int]
    +    val ds = inputData.toDS()
    +
    +    val outputDir = Utils.createTempDir(namePrefix = "stream.output").getCanonicalPath
    --- End diff --
    
    done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    **[Test build #76081 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76081/testReport)** for PR 17346 at commit [`59ee112`](https://github.com/apache/spark/commit/59ee11219764745abe40ea7f846b0f68e5833e94).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [WIP] DataFrame batch reader may fail to infer partition...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    **[Test build #74819 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74819/testReport)** for PR 17346 at commit [`19d0d48`](https://github.com/apache/spark/commit/19d0d485f0190cc8b1df8bec3a9f3b56bca3883e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17346: [SPARK-19965][SS] DataFrame batch reader may fail...

Posted by lw-lin <gi...@git.apache.org>.
Github user lw-lin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17346#discussion_r114468906
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala ---
    @@ -36,20 +37,27 @@ import org.apache.spark.util.SerializableConfiguration
      * A [[FileIndex]] that generates the list of files to process by recursively listing all the
      * files present in `paths`.
      *
    - * @param rootPaths the list of root table paths to scan
    + * @param rootPathsSpecified the list of root table paths to scan (some of which might be
    + *                           filtered out later)
      * @param parameters as set of options to control discovery
      * @param partitionSchema an optional partition schema that will be use to provide types for the
      *                        discovered partitions
      */
     class InMemoryFileIndex(
         sparkSession: SparkSession,
    -    override val rootPaths: Seq[Path],
    +    rootPathsSpecified: Seq[Path],
         parameters: Map[String, String],
         partitionSchema: Option[StructType],
         fileStatusCache: FileStatusCache = NoopCache)
       extends PartitioningAwareFileIndex(
         sparkSession, parameters, partitionSchema, fileStatusCache) {
     
    +  // Filter out streaming metadata dirs or files such as "/.../_spark_metadata" (the metadata dir)
    +  // or "/.../_spark_metadata/0" (a file in the metadata dir). `rootPathsSpecified` might contain
    +  // such streaming metadata dir or files, e.g. when after globbing "basePath/*" where "basePath"
    +  // is the output of a streaming query.
    +  override val rootPaths = rootPathsSpecified.filterNot(FileStreamSink.ancestorIsMetadataDirectory)
    --- End diff --
    
    Yea your are quite correct! They will be filted by `InMemoryFileIndex.shouldFilterOut`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17346: [SPARK-19965][SS] DataFrame batch reader may fail...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17346#discussion_r114395863
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala ---
    @@ -53,6 +53,29 @@ object FileStreamSink extends Logging {
           case _ => false
         }
       }
    +
    +  /**
    +   * Returns true if the path is the metadata dir or its ancestor is the metadata dir.
    +   * E.g.:
    +   *  - ancestorIsMetadataDirectory(/.../_spark_metadata) => true
    +   *  - ancestorIsMetadataDirectory(/.../_spark_metadata/0) => true
    +   *  - ancestorIsMetadataDirectory(/a/b/c) => false
    +   */
    +  def ancestorIsMetadataDirectory(path: Path): Boolean = {
    +    require(path.isAbsolute, s"$path is required to be absolute")
    +    var currentPath = path
    +    var finished = false
    +    while (!finished) {
    --- End diff --
    
    How about changing it to `currentPath  != null`? Then you don't need `finished`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [WIP] DataFrame batch reader may fail to infer partition...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    **[Test build #74820 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74820/testReport)** for PR 17346 at commit [`0e35db7`](https://github.com/apache/spark/commit/0e35db701342ff426a037c519e50c17d003931fb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    **[Test build #76348 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76348/testReport)** for PR 17346 at commit [`59ee112`](https://github.com/apache/spark/commit/59ee11219764745abe40ea7f846b0f68e5833e94).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [WIP] DataFrame batch reader may fail to infer partition...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74820/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    **[Test build #76081 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76081/testReport)** for PR 17346 at commit [`59ee112`](https://github.com/apache/spark/commit/59ee11219764745abe40ea7f846b0f68e5833e94).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [WIP] DataFrame batch reader may fail to infer partition...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    **[Test build #74820 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74820/testReport)** for PR 17346 at commit [`0e35db7`](https://github.com/apache/spark/commit/0e35db701342ff426a037c519e50c17d003931fb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76293/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by lw-lin <gi...@git.apache.org>.
Github user lw-lin commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    @zsxwing would you take a look at your convenience? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    **[Test build #76348 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76348/testReport)** for PR 17346 at commit [`59ee112`](https://github.com/apache/spark/commit/59ee11219764745abe40ea7f846b0f68e5833e94).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by lw-lin <gi...@git.apache.org>.
Github user lw-lin commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17346: [SPARK-19965][SS] DataFrame batch reader may fail...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17346#discussion_r114395114
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala ---
    @@ -53,6 +53,29 @@ object FileStreamSink extends Logging {
           case _ => false
         }
       }
    +
    +  /**
    +   * Returns true if the path is the metadata dir or its ancestor is the metadata dir.
    +   * E.g.:
    +   *  - ancestorIsMetadataDirectory(/.../_spark_metadata) => true
    +   *  - ancestorIsMetadataDirectory(/.../_spark_metadata/0) => true
    +   *  - ancestorIsMetadataDirectory(/a/b/c) => false
    +   */
    +  def ancestorIsMetadataDirectory(path: Path): Boolean = {
    +    require(path.isAbsolute, s"$path is required to be absolute")
    --- End diff --
    
    I'm wondering if we can call `makeQualified` instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    **[Test build #75336 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75336/testReport)** for PR 17346 at commit [`0e35db7`](https://github.com/apache/spark/commit/0e35db701342ff426a037c519e50c17d003931fb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by lw-lin <gi...@git.apache.org>.
Github user lw-lin commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    thank you @zsxwing 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by lw-lin <gi...@git.apache.org>.
Github user lw-lin commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    @zsxwing would you take a look at this? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76408/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75336/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [WIP] DataFrame batch reader may fail to infer partition...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by lw-lin <gi...@git.apache.org>.
Github user lw-lin commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    **[Test build #76293 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76293/testReport)** for PR 17346 at commit [`59ee112`](https://github.com/apache/spark/commit/59ee11219764745abe40ea7f846b0f68e5833e94).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    **[Test build #75565 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75565/testReport)** for PR 17346 at commit [`0e35db7`](https://github.com/apache/spark/commit/0e35db701342ff426a037c519e50c17d003931fb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75565/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76301/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

Posted by lw-lin <gi...@git.apache.org>.
Github user lw-lin commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17346: [SPARK-19965][SS] DataFrame batch reader may fail...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17346#discussion_r114397372
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala ---
    @@ -36,20 +37,27 @@ import org.apache.spark.util.SerializableConfiguration
      * A [[FileIndex]] that generates the list of files to process by recursively listing all the
      * files present in `paths`.
      *
    - * @param rootPaths the list of root table paths to scan
    + * @param rootPathsSpecified the list of root table paths to scan (some of which might be
    + *                           filtered out later)
      * @param parameters as set of options to control discovery
      * @param partitionSchema an optional partition schema that will be use to provide types for the
      *                        discovered partitions
      */
     class InMemoryFileIndex(
         sparkSession: SparkSession,
    -    override val rootPaths: Seq[Path],
    +    rootPathsSpecified: Seq[Path],
         parameters: Map[String, String],
         partitionSchema: Option[StructType],
         fileStatusCache: FileStatusCache = NoopCache)
       extends PartitioningAwareFileIndex(
         sparkSession, parameters, partitionSchema, fileStatusCache) {
     
    +  // Filter out streaming metadata dirs or files such as "/.../_spark_metadata" (the metadata dir)
    +  // or "/.../_spark_metadata/0" (a file in the metadata dir). `rootPathsSpecified` might contain
    +  // such streaming metadata dir or files, e.g. when after globbing "basePath/*" where "basePath"
    +  // is the output of a streaming query.
    +  override val rootPaths = rootPathsSpecified.filterNot(FileStreamSink.ancestorIsMetadataDirectory)
    --- End diff --
    
    Just to confirm one thing: for files in `rootPaths` or their sub dirs, they will be dropped by `InMemoryFileIndex.shouldFilterOut`. Right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17346: [WIP] DataFrame batch reader may fail to infer partition...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17346
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org