You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by gatorsmile <gi...@git.apache.org> on 2016/05/02 02:43:05 UTC

[GitHub] spark pull request: [SPARK-14993] Fix Partition Discovery Inconsis...

GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/12828

    [SPARK-14993] Fix Partition Discovery Inconsistency when Input is a Path to Parquet File 

    #### What changes were proposed in this pull request?
    When we load a dataset, if we set the path to ```/path/a=1```, we will not take `a` as the partitioning column. However, if we set the path to ```/path/a=1/file.parquet```, we take `a` as the partitioning column and it shows up in the schema. 
    
    This PR is to fix the issue. If users set the path to a Parquet file, we do not include the partitioning columns into the schema. 
    
    The related PRs: 
    - https://github.com/apache/spark/pull/9651
    - https://github.com/apache/spark/pull/10211
    
    #### How was this patch tested?
    Added a couple of test cases

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark readPartitionedTable

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12828.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12828
    
----
commit 461441c8acdfce8e175f34ba44b0d08dea761225
Author: gatorsmile <ga...@gmail.com>
Date:   2016-05-02T02:29:49Z

    initial fix.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] [SQL] Fix Partition Discovery In...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12828#issuecomment-216968024
  
    **[Test build #57787 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57787/consoleFull)** for PR 12828 at commit [`e92e9b2`](https://github.com/apache/spark/commit/e92e9b291b00dd19e7ce1656862d0d4699a8b65f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] [SQL] Fix Partition Discovery In...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12828#issuecomment-216991511
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57787/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] [SQL] Fix Partition Discovery In...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/12828#issuecomment-216412618
  
    @gatorsmile When we call PartitioningUtils.parsePartitions, we should provide a `Seq[Path]` representing leaf dirs, right? We have this problem is caused by the fact we actually pass leaf files in?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] [SQL] Fix Partition Discovery In...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12828#issuecomment-216749648
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57711/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] [SQL] Fix Partition Discovery In...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12828#discussion_r61971061
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala ---
    @@ -423,23 +423,34 @@ class HDFSFileCatalog(
       /**
        * Contains a set of paths that are considered as the base dirs of the input datasets.
        * The partitioning discovery logic will make sure it will stop when it reaches any
    -   * base path. By default, the paths of the dataset provided by users will be base paths.
    -   * For example, if a user uses `sqlContext.read.parquet("/path/something=true/")`, the base path
    -   * will be `/path/something=true/`, and the returned DataFrame will not contain a column of
    -   * `something`. If users want to override the basePath. They can set `basePath` in the options
    -   * to pass the new base path to the data source.
    -   * For the above example, if the user-provided base path is `/path/`, the returned
    +   * base path.
    +   *
    +   * By default, the paths of the dataset provided by users will be base paths.
    +   * Below are three typical examples,
    +   * Case 1) `sqlContext.read.parquet("/path/something=true/")`: the base path will be
    +   * `/path/something=true/`, and the returned DataFrame will not contain a column of `something`.
    +   * Case 2) `sqlContext.read.parquet("/path/something=true/a.parquet")`: the base path will be
    +   * still `/path/something=true/`, and the returned DataFrame will also not contain a column of
    +   * `something`.
    +   * Case 3) `sqlContext.read.parquet("/path/")`: the base path will be `/path/`, and the returned
        * DataFrame will have the column of `something`.
    +   *
    +   * Users also can override the basePath by setting `basePath` in the options to pass the new base
    +   * path to the data source.
    +   * For example, `sqlContext.read.option("basePath", "/path/").parquet("/path/something=true/")`,
    +   * and the returned DataFrame will have the column of `something`.
        */
       private def basePaths: Set[Path] = {
    -    val userDefinedBasePath = parameters.get("basePath").map(basePath => Set(new Path(basePath)))
    -    userDefinedBasePath.getOrElse {
    -      // If the user does not provide basePath, we will just use paths.
    -      paths.toSet
    -    }.map { hdfsPath =>
    -      // Make the path qualified (consistent with listLeafFiles and listLeafFilesInParallel).
    -      val fs = hdfsPath.getFileSystem(hadoopConf)
    -      hdfsPath.makeQualified(fs.getUri, fs.getWorkingDirectory)
    --- End diff --
    
    I am wondering why I did not just call `makeQualified(fs)` when I wrote this part...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] [SQL] Fix Partition Discovery In...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12828#discussion_r61832127
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala ---
    @@ -184,8 +184,10 @@ private[sql] object PartitioningUtils {
             return (None, None)
           }
     
    -      if (basePaths.contains(currentPath)) {
    +      if (basePaths.contains(currentPath) ||
    +        basePaths.exists(_.toString.startsWith(currentPath.toString))) {
    --- End diff --
    
    Sure, please include the test case in 
    https://github.com/apache/spark/pull/12828/files#diff-cf57fe1c329fb21ac00a8528f049da4aR435
    
    This test case checks three typical cases. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] Fix Partition Discovery Inconsis...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12828#issuecomment-216095931
  
    **[Test build #57500 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57500/consoleFull)** for PR 12828 at commit [`461441c`](https://github.com/apache/spark/commit/461441c8acdfce8e175f34ba44b0d08dea761225).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] Fix Partition Discovery Inconsis...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12828#issuecomment-216102217
  
    **[Test build #57500 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57500/consoleFull)** for PR 12828 at commit [`461441c`](https://github.com/apache/spark/commit/461441c8acdfce8e175f34ba44b0d08dea761225).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] [SQL] Fix Partition Discovery In...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12828#issuecomment-216452958
  
    **[Test build #57599 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57599/consoleFull)** for PR 12828 at commit [`bf98150`](https://github.com/apache/spark/commit/bf98150cd1c36368d38d934a6590da4419ab9fae).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] [SQL] Fix Partition Discovery In...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12828#discussion_r61976113
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala ---
    @@ -423,23 +423,34 @@ class HDFSFileCatalog(
       /**
        * Contains a set of paths that are considered as the base dirs of the input datasets.
        * The partitioning discovery logic will make sure it will stop when it reaches any
    -   * base path. By default, the paths of the dataset provided by users will be base paths.
    -   * For example, if a user uses `sqlContext.read.parquet("/path/something=true/")`, the base path
    -   * will be `/path/something=true/`, and the returned DataFrame will not contain a column of
    -   * `something`. If users want to override the basePath. They can set `basePath` in the options
    -   * to pass the new base path to the data source.
    -   * For the above example, if the user-provided base path is `/path/`, the returned
    +   * base path.
    +   *
    +   * By default, the paths of the dataset provided by users will be base paths.
    +   * Below are three typical examples,
    +   * Case 1) `sqlContext.read.parquet("/path/something=true/")`: the base path will be
    +   * `/path/something=true/`, and the returned DataFrame will not contain a column of `something`.
    +   * Case 2) `sqlContext.read.parquet("/path/something=true/a.parquet")`: the base path will be
    +   * still `/path/something=true/`, and the returned DataFrame will also not contain a column of
    +   * `something`.
    +   * Case 3) `sqlContext.read.parquet("/path/")`: the base path will be `/path/`, and the returned
        * DataFrame will have the column of `something`.
    +   *
    +   * Users also can override the basePath by setting `basePath` in the options to pass the new base
    +   * path to the data source.
    +   * For example, `sqlContext.read.option("basePath", "/path/").parquet("/path/something=true/")`,
    +   * and the returned DataFrame will have the column of `something`.
        */
       private def basePaths: Set[Path] = {
    -    val userDefinedBasePath = parameters.get("basePath").map(basePath => Set(new Path(basePath)))
    -    userDefinedBasePath.getOrElse {
    -      // If the user does not provide basePath, we will just use paths.
    -      paths.toSet
    -    }.map { hdfsPath =>
    -      // Make the path qualified (consistent with listLeafFiles and listLeafFilesInParallel).
    -      val fs = hdfsPath.getFileSystem(hadoopConf)
    -      hdfsPath.makeQualified(fs.getUri, fs.getWorkingDirectory)
    +    parameters.get("basePath").map(new Path(_)) match {
    +      case Some(userDefinedBasePath) =>
    +        val fs = userDefinedBasePath.getFileSystem(hadoopConf)
    +        if (!fs.isDirectory(userDefinedBasePath)) {
    +          throw new IllegalArgumentException("Option 'basePath' must be a directory")
    +        }
    +        Set(userDefinedBasePath.makeQualified(fs.getUri, fs.getWorkingDirectory))
    +
    +      case None =>
    +        paths.map { path => if (leafFiles.contains(path)) path.getParent else path }.toSet
    --- End diff --
    
    I believe leaf files contain only qualified. There was comments elsewhere in the file that same so. 
    
    Here it is - https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala#L468


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] [SQL] Fix Partition Discovery In...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12828#discussion_r61969829
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala ---
    @@ -423,23 +423,34 @@ class HDFSFileCatalog(
       /**
        * Contains a set of paths that are considered as the base dirs of the input datasets.
        * The partitioning discovery logic will make sure it will stop when it reaches any
    -   * base path. By default, the paths of the dataset provided by users will be base paths.
    -   * For example, if a user uses `sqlContext.read.parquet("/path/something=true/")`, the base path
    -   * will be `/path/something=true/`, and the returned DataFrame will not contain a column of
    -   * `something`. If users want to override the basePath. They can set `basePath` in the options
    -   * to pass the new base path to the data source.
    -   * For the above example, if the user-provided base path is `/path/`, the returned
    +   * base path.
    +   *
    +   * By default, the paths of the dataset provided by users will be base paths.
    +   * Below are three typical examples,
    +   * Case 1) `sqlContext.read.parquet("/path/something=true/")`: the base path will be
    +   * `/path/something=true/`, and the returned DataFrame will not contain a column of `something`.
    +   * Case 2) `sqlContext.read.parquet("/path/something=true/a.parquet")`: the base path will be
    +   * still `/path/something=true/`, and the returned DataFrame will also not contain a column of
    +   * `something`.
    +   * Case 3) `sqlContext.read.parquet("/path/")`: the base path will be `/path/`, and the returned
        * DataFrame will have the column of `something`.
    +   *
    +   * Users also can override the basePath by setting `basePath` in the options to pass the new base
    +   * path to the data source.
    +   * For example, `sqlContext.read.option("basePath", "/path/").parquet("/path/something=true/")`,
    +   * and the returned DataFrame will have the column of `something`.
        */
       private def basePaths: Set[Path] = {
    -    val userDefinedBasePath = parameters.get("basePath").map(basePath => Set(new Path(basePath)))
    -    userDefinedBasePath.getOrElse {
    -      // If the user does not provide basePath, we will just use paths.
    -      paths.toSet
    -    }.map { hdfsPath =>
    -      // Make the path qualified (consistent with listLeafFiles and listLeafFilesInParallel).
    -      val fs = hdfsPath.getFileSystem(hadoopConf)
    -      hdfsPath.makeQualified(fs.getUri, fs.getWorkingDirectory)
    +    parameters.get("basePath").map(new Path(_)) match {
    +      case Some(userDefinedBasePath) =>
    +        val fs = userDefinedBasePath.getFileSystem(hadoopConf)
    +        if (!fs.isDirectory(userDefinedBasePath)) {
    +          throw new IllegalArgumentException("Option 'basePath' must be a directory")
    +        }
    +        Set(userDefinedBasePath.makeQualified(fs.getUri, fs.getWorkingDirectory))
    +
    +      case None =>
    +        paths.map { path => if (leafFiles.contains(path)) path.getParent else path }.toSet
    --- End diff --
    
    `leafFiles` only contain qualified paths, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] [SQL] Fix Partition Discovery In...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12828#issuecomment-216453097
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] [SQL] Fix Partition Discovery In...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/12828#issuecomment-217056363
  
    LGTM. Merging to master and branch 2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] [SQL] Fix Partition Discovery In...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12828#discussion_r61969371
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala ---
    @@ -423,23 +423,34 @@ class HDFSFileCatalog(
       /**
        * Contains a set of paths that are considered as the base dirs of the input datasets.
        * The partitioning discovery logic will make sure it will stop when it reaches any
    -   * base path. By default, the paths of the dataset provided by users will be base paths.
    -   * For example, if a user uses `sqlContext.read.parquet("/path/something=true/")`, the base path
    -   * will be `/path/something=true/`, and the returned DataFrame will not contain a column of
    -   * `something`. If users want to override the basePath. They can set `basePath` in the options
    -   * to pass the new base path to the data source.
    -   * For the above example, if the user-provided base path is `/path/`, the returned
    +   * base path.
    +   *
    +   * By default, the paths of the dataset provided by users will be base paths.
    +   * Below are three typical examples,
    +   * Case 1) `sqlContext.read.parquet("/path/something=true/")`: the base path will be
    +   * `/path/something=true/`, and the returned DataFrame will not contain a column of `something`.
    +   * Case 2) `sqlContext.read.parquet("/path/something=true/a.parquet")`: the base path will be
    +   * still `/path/something=true/`, and the returned DataFrame will also not contain a column of
    +   * `something`.
    +   * Case 3) `sqlContext.read.parquet("/path/")`: the base path will be `/path/`, and the returned
        * DataFrame will have the column of `something`.
    +   *
    +   * Users also can override the basePath by setting `basePath` in the options to pass the new base
    +   * path to the data source.
    +   * For example, `sqlContext.read.option("basePath", "/path/").parquet("/path/something=true/")`,
    +   * and the returned DataFrame will have the column of `something`.
        */
       private def basePaths: Set[Path] = {
    -    val userDefinedBasePath = parameters.get("basePath").map(basePath => Set(new Path(basePath)))
    -    userDefinedBasePath.getOrElse {
    -      // If the user does not provide basePath, we will just use paths.
    -      paths.toSet
    -    }.map { hdfsPath =>
    -      // Make the path qualified (consistent with listLeafFiles and listLeafFilesInParallel).
    -      val fs = hdfsPath.getFileSystem(hadoopConf)
    -      hdfsPath.makeQualified(fs.getUri, fs.getWorkingDirectory)
    +    parameters.get("basePath").map(new Path(_)) match {
    +      case Some(userDefinedBasePath) =>
    +        val fs = userDefinedBasePath.getFileSystem(hadoopConf)
    +        if (!fs.isDirectory(userDefinedBasePath)) {
    +          throw new IllegalArgumentException("Option 'basePath' must be a directory")
    +        }
    +        Set(userDefinedBasePath.makeQualified(fs.getUri, fs.getWorkingDirectory))
    +
    +      case None =>
    +        paths.map { path => if (leafFiles.contains(path)) path.getParent else path }.toSet
    --- End diff --
    
    Do we need to make this `path` qualified?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] Fix Partition Discovery Inconsis...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12828#issuecomment-216102278
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] [SQL] Fix Partition Discovery In...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/12828#issuecomment-216717998
  
    Sure, I will wait for it. Thanks for letting me know it! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] [SQL] Fix Partition Discovery In...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12828#issuecomment-216991165
  
    **[Test build #57787 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57787/consoleFull)** for PR 12828 at commit [`e92e9b2`](https://github.com/apache/spark/commit/e92e9b291b00dd19e7ce1656862d0d4699a8b65f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] [SQL] Fix Partition Discovery In...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12828#discussion_r61832023
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala ---
    @@ -184,8 +184,10 @@ private[sql] object PartitioningUtils {
             return (None, None)
           }
     
    -      if (basePaths.contains(currentPath)) {
    +      if (basePaths.contains(currentPath) ||
    +        basePaths.exists(_.toString.startsWith(currentPath.toString))) {
    --- End diff --
    
    I see. We are trying to check if there is a basePath starts with the currentPath. 
    
    So, the actual problem is that `basePaths` in HDFSFileCatalog contains files, right? I discussed it with @tdas. He will have a pr to change `basePaths`. Let's review his fix together. What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] [SQL] Fix Partition Discovery In...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12828#issuecomment-216453098
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57599/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] Fix Partition Discovery Inconsis...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12828#issuecomment-216102279
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57500/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] [SQL] Fix Partition Discovery In...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12828#discussion_r61985328
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala ---
    @@ -423,23 +423,34 @@ class HDFSFileCatalog(
       /**
        * Contains a set of paths that are considered as the base dirs of the input datasets.
        * The partitioning discovery logic will make sure it will stop when it reaches any
    -   * base path. By default, the paths of the dataset provided by users will be base paths.
    -   * For example, if a user uses `sqlContext.read.parquet("/path/something=true/")`, the base path
    -   * will be `/path/something=true/`, and the returned DataFrame will not contain a column of
    -   * `something`. If users want to override the basePath. They can set `basePath` in the options
    -   * to pass the new base path to the data source.
    -   * For the above example, if the user-provided base path is `/path/`, the returned
    +   * base path.
    +   *
    +   * By default, the paths of the dataset provided by users will be base paths.
    +   * Below are three typical examples,
    +   * Case 1) `sqlContext.read.parquet("/path/something=true/")`: the base path will be
    +   * `/path/something=true/`, and the returned DataFrame will not contain a column of `something`.
    +   * Case 2) `sqlContext.read.parquet("/path/something=true/a.parquet")`: the base path will be
    +   * still `/path/something=true/`, and the returned DataFrame will also not contain a column of
    +   * `something`.
    +   * Case 3) `sqlContext.read.parquet("/path/")`: the base path will be `/path/`, and the returned
        * DataFrame will have the column of `something`.
    +   *
    +   * Users also can override the basePath by setting `basePath` in the options to pass the new base
    +   * path to the data source.
    +   * For example, `sqlContext.read.option("basePath", "/path/").parquet("/path/something=true/")`,
    +   * and the returned DataFrame will have the column of `something`.
        */
       private def basePaths: Set[Path] = {
    -    val userDefinedBasePath = parameters.get("basePath").map(basePath => Set(new Path(basePath)))
    -    userDefinedBasePath.getOrElse {
    -      // If the user does not provide basePath, we will just use paths.
    -      paths.toSet
    -    }.map { hdfsPath =>
    -      // Make the path qualified (consistent with listLeafFiles and listLeafFilesInParallel).
    -      val fs = hdfsPath.getFileSystem(hadoopConf)
    -      hdfsPath.makeQualified(fs.getUri, fs.getWorkingDirectory)
    --- End diff --
    
    Will change it to `fs.makeQualified(userDefinedBasePath)`. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] [SQL] Fix Partition Discovery In...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12828#issuecomment-216749556
  
    **[Test build #57711 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57711/consoleFull)** for PR 12828 at commit [`252065c`](https://github.com/apache/spark/commit/252065cb0afa1c624274e5f589a38d8614a3d91f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] [SQL] Fix Partition Discovery In...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/12828#issuecomment-216415109
  
    @yhuai We passed leaf dirs to `path`, but the `basePaths` is a path to a `Parquet` file. For example, 
    ```
        parsePartition(
          path = new Path("file://path/a=10"),
          defaultPartitionName = defaultPartitionName,
          typeInference = true,
          basePaths = Set(new Path("file://path/a=10/p.parquet")))
    ```
    
    In this case, we need to follow what we did in https://github.com/apache/spark/pull/9651. 
    
    The current behavior is shown in the test case:
    https://github.com/apache/spark/pull/12828/files#diff-cf57fe1c329fb21ac00a8528f049da4aR435


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] [SQL] Fix Partition Discovery In...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/12828#issuecomment-216424273
  
    @yhuai and I discussed that this solution of substring match seems very hacky. 
    
    The real problem is that basePaths should never have files as it does not make sense to have a basePath that is not a directory. So, our strategy in HDFSFileCatalog of making the set of input files as the default basePath is incorrect. The correct fix is to set the default base path based on the [dirs in input paths] UNION [parent dirs of files in input paths]. 
    
    Here is the fix - https://github.com/apache/spark/commit/fbef90f47db7c0a81ec29db27e83d0daf56673bd
    Please update your PR with this. You dont have to change `parsePartition` in that case. 
    
    Consider updating the scala docs to make this implicit assumption of `basePath` clear in the code.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] [SQL] Fix Partition Discovery In...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12828#discussion_r61975976
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala ---
    @@ -423,23 +423,34 @@ class HDFSFileCatalog(
       /**
        * Contains a set of paths that are considered as the base dirs of the input datasets.
        * The partitioning discovery logic will make sure it will stop when it reaches any
    -   * base path. By default, the paths of the dataset provided by users will be base paths.
    -   * For example, if a user uses `sqlContext.read.parquet("/path/something=true/")`, the base path
    -   * will be `/path/something=true/`, and the returned DataFrame will not contain a column of
    -   * `something`. If users want to override the basePath. They can set `basePath` in the options
    -   * to pass the new base path to the data source.
    -   * For the above example, if the user-provided base path is `/path/`, the returned
    +   * base path.
    +   *
    +   * By default, the paths of the dataset provided by users will be base paths.
    +   * Below are three typical examples,
    +   * Case 1) `sqlContext.read.parquet("/path/something=true/")`: the base path will be
    +   * `/path/something=true/`, and the returned DataFrame will not contain a column of `something`.
    +   * Case 2) `sqlContext.read.parquet("/path/something=true/a.parquet")`: the base path will be
    +   * still `/path/something=true/`, and the returned DataFrame will also not contain a column of
    +   * `something`.
    +   * Case 3) `sqlContext.read.parquet("/path/")`: the base path will be `/path/`, and the returned
        * DataFrame will have the column of `something`.
    +   *
    +   * Users also can override the basePath by setting `basePath` in the options to pass the new base
    +   * path to the data source.
    +   * For example, `sqlContext.read.option("basePath", "/path/").parquet("/path/something=true/")`,
    +   * and the returned DataFrame will have the column of `something`.
        */
       private def basePaths: Set[Path] = {
    -    val userDefinedBasePath = parameters.get("basePath").map(basePath => Set(new Path(basePath)))
    -    userDefinedBasePath.getOrElse {
    -      // If the user does not provide basePath, we will just use paths.
    -      paths.toSet
    -    }.map { hdfsPath =>
    -      // Make the path qualified (consistent with listLeafFiles and listLeafFilesInParallel).
    -      val fs = hdfsPath.getFileSystem(hadoopConf)
    -      hdfsPath.makeQualified(fs.getUri, fs.getWorkingDirectory)
    --- End diff --
    
    I was wondering the same multiple times in other PRs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] [SQL] Fix Partition Discovery In...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12828#issuecomment-216991509
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] [SQL] Fix Partition Discovery In...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12828#discussion_r61830374
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala ---
    @@ -184,8 +184,10 @@ private[sql] object PartitioningUtils {
             return (None, None)
           }
     
    -      if (basePaths.contains(currentPath)) {
    +      if (basePaths.contains(currentPath) ||
    +        basePaths.exists(_.toString.startsWith(currentPath.toString))) {
    --- End diff --
    
    Can you explain this and provide an example?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] [SQL] Fix Partition Discovery In...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12828#issuecomment-216737469
  
    **[Test build #57711 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57711/consoleFull)** for PR 12828 at commit [`252065c`](https://github.com/apache/spark/commit/252065cb0afa1c624274e5f589a38d8614a3d91f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] [SQL] Fix Partition Discovery In...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/12828


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] [SQL] Fix Partition Discovery In...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12828#issuecomment-216749645
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] [SQL] Fix Partition Discovery In...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12828#issuecomment-216442484
  
    **[Test build #57599 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57599/consoleFull)** for PR 12828 at commit [`bf98150`](https://github.com/apache/spark/commit/bf98150cd1c36368d38d934a6590da4419ab9fae).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] [SQL] Fix Partition Discovery In...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12828#discussion_r61985799
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala ---
    @@ -423,23 +423,34 @@ class HDFSFileCatalog(
       /**
        * Contains a set of paths that are considered as the base dirs of the input datasets.
        * The partitioning discovery logic will make sure it will stop when it reaches any
    -   * base path. By default, the paths of the dataset provided by users will be base paths.
    -   * For example, if a user uses `sqlContext.read.parquet("/path/something=true/")`, the base path
    -   * will be `/path/something=true/`, and the returned DataFrame will not contain a column of
    -   * `something`. If users want to override the basePath. They can set `basePath` in the options
    -   * to pass the new base path to the data source.
    -   * For the above example, if the user-provided base path is `/path/`, the returned
    +   * base path.
    +   *
    +   * By default, the paths of the dataset provided by users will be base paths.
    +   * Below are three typical examples,
    +   * Case 1) `sqlContext.read.parquet("/path/something=true/")`: the base path will be
    +   * `/path/something=true/`, and the returned DataFrame will not contain a column of `something`.
    +   * Case 2) `sqlContext.read.parquet("/path/something=true/a.parquet")`: the base path will be
    +   * still `/path/something=true/`, and the returned DataFrame will also not contain a column of
    +   * `something`.
    +   * Case 3) `sqlContext.read.parquet("/path/")`: the base path will be `/path/`, and the returned
        * DataFrame will have the column of `something`.
    +   *
    +   * Users also can override the basePath by setting `basePath` in the options to pass the new base
    +   * path to the data source.
    +   * For example, `sqlContext.read.option("basePath", "/path/").parquet("/path/something=true/")`,
    +   * and the returned DataFrame will have the column of `something`.
        */
       private def basePaths: Set[Path] = {
    -    val userDefinedBasePath = parameters.get("basePath").map(basePath => Set(new Path(basePath)))
    -    userDefinedBasePath.getOrElse {
    -      // If the user does not provide basePath, we will just use paths.
    -      paths.toSet
    -    }.map { hdfsPath =>
    -      // Make the path qualified (consistent with listLeafFiles and listLeafFilesInParallel).
    -      val fs = hdfsPath.getFileSystem(hadoopConf)
    -      hdfsPath.makeQualified(fs.getUri, fs.getWorkingDirectory)
    +    parameters.get("basePath").map(new Path(_)) match {
    +      case Some(userDefinedBasePath) =>
    +        val fs = userDefinedBasePath.getFileSystem(hadoopConf)
    +        if (!fs.isDirectory(userDefinedBasePath)) {
    +          throw new IllegalArgumentException("Option 'basePath' must be a directory")
    +        }
    +        Set(userDefinedBasePath.makeQualified(fs.getUri, fs.getWorkingDirectory))
    +
    +      case None =>
    +        paths.map { path => if (leafFiles.contains(path)) path.getParent else path }.toSet
    --- End diff --
    
    Will make `path` qualified before comparison. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] [SQL] Fix Partition Discovery In...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/12828#issuecomment-216705514
  
    just a heads up ... i have a PR that refactors FileCatalog significantly - #12879 . I want to merge that first which is cause conflicts in this PR as well as my PR #12856 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] [SQL] Fix Partition Discovery In...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/12828#issuecomment-216513574
  
    @tdas @yhuai Based on the fix https://github.com/apache/spark/commit/fbef90f47db7c0a81ec29db27e83d0daf56673bd , updated the scala docs, test cases and PR description.
    
    Please let me know if anything is not appropriate. Thanks again!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] [SQL] Fix Partition Discovery In...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/12828#issuecomment-216428558
  
    @tdas Thank you very much! Will do it soon. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14993] Fix Partition Discovery Inconsis...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/12828#issuecomment-216102318
  
    cc @yhuai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org