You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by yhuai <gi...@git.apache.org> on 2015/11/12 06:24:51 UTC

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

GitHub user yhuai opened a pull request:

    https://github.com/apache/spark/pull/9651

    [SPARK-11678] [SQL] Partition discovery should stop at the root path of the table.

    https://issues.apache.org/jira/browse/SPARK-11678
    
    The change of this PR is to pass root paths of table to the partition discovery logic. So, the process of partition discovery stops at those root paths instead of going all the way to the root path of the file system.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yhuai/spark SPARK-11678

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9651.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9651
    
----
commit e406792a2a5b69c40864ff16dbdbb534a25e5c2f
Author: Yin Huai <yh...@databricks.com>
Date:   2015-11-12T05:22:56Z

    Partition discovery stops at the root path of the table.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156312901
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156332449
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156351030
  
    Hopefully PR #9677 fixes the flaky MLlib tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156303057
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9651#discussion_r44622658
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala ---
    @@ -76,7 +76,37 @@ class ParquetPartitionDiscoverySuite extends QueryTest with ParquetTest with Sha
           "hdfs://host:9000/path/a=10/b=20",
           "hdfs://host:9000/path/_temporary/path")
     
    -    parsePartitions(paths.map(new Path(_)), defaultPartitionName, true)
    +    parsePartitions(
    +      paths.map(new Path(_)),
    +      defaultPartitionName,
    +      true,
    +      Set(new Path("hdfs://host:9000/path/")))
    +
    +    // Valid
    +    paths = Seq(
    +      "hdfs://host:9000/path/something=true/table/",
    +      "hdfs://host:9000/path/something=true/table/_temporary",
    +      "hdfs://host:9000/path/something=true/table/a=10/b=20",
    +      "hdfs://host:9000/path/something=true/table/_temporary/path")
    +
    +    parsePartitions(
    +      paths.map(new Path(_)),
    +      defaultPartitionName,
    +      true,
    +      Set(new Path("hdfs://host:9000/path/something=true/table")))
    +
    +    // Valid
    +    paths = Seq(
    +      "hdfs://host:9000/path/table=true/",
    +      "hdfs://host:9000/path/table=true/_temporary",
    +      "hdfs://host:9000/path/table=true/a=10/b=20",
    +      "hdfs://host:9000/path/table=true/_temporary/path")
    --- End diff --
    
    If we the table's root is `hdfs://host:9000/path/table=true`, we actually treat table as a partition column without this change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156286371
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156290411
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156332601
  
    **[Test build #45838 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45838/consoleFull)** for PR 9651 at commit [`240bcf3`](https://github.com/apache/spark/commit/240bcf3c226d0e71601d854aaf34b136caf7f547).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9651#discussion_r44752326
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
    @@ -519,13 +526,33 @@ abstract class HadoopFsRelation private[sql](maybePartitionSpec: Option[Partitio
       }
     
       /**
    -   * Base paths of this relation.  For partitioned relations, it should be either root directories
    +   * Paths of this relation.  For partitioned relations, it should be root directories
        * of all partition directories.
        *
        * @since 1.4.0
        */
       def paths: Array[String]
     
    +  /**
    +   * Contains a set of paths that are considered as the base dirs of the input datasets.
    +   * The partitioning discovery logic will make sure it will stop when it reaches any
    +   * base path. By default, the paths of the dataset provided by users will be base paths.
    +   * For example, if a user uses `sqlContext.read.parquet("/path/something=true/")`, the base path
    +   * will be `/path/something=true/`, and the returned DataFrame will not contain a column of
    +   * `something`. If users want to override the basePath. They can set `basePath` in the options
    +   * to pass the new base path to the data source.
    +   * For the above example, if the user-provided base path is `/path/`, the returned
    +   * DataFrame will have the column of `something`.
    +   */
    +  private def basePaths: Set[Path] = {
    +    val userDefinedBasePath = parameters.get("basePath").map(basePath => Set(new Path(basePath)))
    +    userDefinedBasePath.getOrElse {
    +      // If the user does not provide basePath, we will just use paths.
    +      val pathSet = paths.toSet
    +      pathSet.map(p => new Path(p))
    +    }
    --- End diff --
    
    Ah, actually the `basePaths` in `parsePartitions` is completely something else with the same name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156346396
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45838/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156283864
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156388123
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156346131
  
    **[Test build #45838 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45838/consoleFull)** for PR 9651 at commit [`240bcf3`](https://github.com/apache/spark/commit/240bcf3c226d0e71601d854aaf34b136caf7f547).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156312868
  
    **[Test build #45809 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45809/consoleFull)** for PR 9651 at commit [`d784a52`](https://github.com/apache/spark/commit/d784a52ee80a9f90c3d6a3698c5b97945ca14253).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9651#discussion_r44752292
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala ---
    @@ -152,32 +157,59 @@ private[sql] object PartitioningUtils {
       private[sql] def parsePartition(
           path: Path,
           defaultPartitionName: String,
    -      typeInference: Boolean): (Option[PartitionValues], Option[Path]) = {
    +      typeInference: Boolean,
    +      basePaths: Set[Path]): (Option[PartitionValues], Option[Path]) = {
         val columns = ArrayBuffer.empty[(String, Literal)]
         // Old Hadoop versions don't have `Path.isRoot`
         var finished = path.getParent == null
    -    var chopped = path
    -    var basePath = path
    +    // currentPath is the current path that we will use to parse partition column value.
    +    var currentPath = path
    +    // childPath will be the child of currentPath in the loop below.
    +    var childPath = path
    --- End diff --
    
    Oh right.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156035099
  
    Seems that this PR breaks another existing feature, namely explicitly specifying a subset of partitions. E.g.:
    
    ```scala
    sqlContext.read.parquet("base/p1=a/p2=1", "base/p1=a/p2=2")
    sqlContext.read.parquet("base/year=201?/month=10")
    ```
    
    In the second case, the the glob pattern is expanded into multiple input paths. And these paths are considered to be root paths, which prevents partition discovery.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156004499
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156322107
  
    **[Test build #45828 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45828/consoleFull)** for PR 9651 at commit [`d784a52`](https://github.com/apache/spark/commit/d784a52ee80a9f90c3d6a3698c5b97945ca14253).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9651#discussion_r44622623
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala ---
    @@ -76,7 +76,37 @@ class ParquetPartitionDiscoverySuite extends QueryTest with ParquetTest with Sha
           "hdfs://host:9000/path/a=10/b=20",
           "hdfs://host:9000/path/_temporary/path")
     
    -    parsePartitions(paths.map(new Path(_)), defaultPartitionName, true)
    +    parsePartitions(
    +      paths.map(new Path(_)),
    +      defaultPartitionName,
    +      true,
    +      Set(new Path("hdfs://host:9000/path/")))
    +
    +    // Valid
    +    paths = Seq(
    +      "hdfs://host:9000/path/something=true/table/",
    +      "hdfs://host:9000/path/something=true/table/_temporary",
    +      "hdfs://host:9000/path/something=true/table/a=10/b=20",
    +      "hdfs://host:9000/path/something=true/table/_temporary/path")
    +
    +    parsePartitions(
    +      paths.map(new Path(_)),
    +      defaultPartitionName,
    +      true,
    +      Set(new Path("hdfs://host:9000/path/something=true/table")))
    --- End diff --
    
    This will fail without the change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156321029
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156316649
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9651#discussion_r44745604
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala ---
    @@ -144,7 +149,7 @@ private[sql] object PartitioningUtils {
        *       Literal.create("hello", StringType),
        *       Literal.create(3.14, FloatType)))
        * }}}
    -   * and the base path:
    +   * and the path when we stop the discovery is:
        * {{{
        *   /path/to/partition
    --- End diff --
    
    Should we add the `hdfs://<host>:<port>` part? I think `basePath` is required to be a canonical/qualified HDFS path, would be better to document this explicitly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156305194
  
    **[Test build #45809 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45809/consoleFull)** for PR 9651 at commit [`d784a52`](https://github.com/apache/spark/commit/d784a52ee80a9f90c3d6a3698c5b97945ca14253).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156303505
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156013486
  
    **[Test build #45715 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45715/consoleFull)** for PR 9651 at commit [`28a1227`](https://github.com/apache/spark/commit/28a122752cbcc8ea2abff314441522b12f96a6f9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156010397
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156285651
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156290940
  
    **[Test build #45802 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45802/consoleFull)** for PR 9651 at commit [`d784a52`](https://github.com/apache/spark/commit/d784a52ee80a9f90c3d6a3698c5b97945ca14253).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156013199
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156290397
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9651#discussion_r44747423
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
    @@ -519,13 +526,33 @@ abstract class HadoopFsRelation private[sql](maybePartitionSpec: Option[Partitio
       }
     
       /**
    -   * Base paths of this relation.  For partitioned relations, it should be either root directories
    +   * Paths of this relation.  For partitioned relations, it should be root directories
        * of all partition directories.
        *
        * @since 1.4.0
        */
       def paths: Array[String]
     
    +  /**
    +   * Contains a set of paths that are considered as the base dirs of the input datasets.
    +   * The partitioning discovery logic will make sure it will stop when it reaches any
    +   * base path. By default, the paths of the dataset provided by users will be base paths.
    +   * For example, if a user uses `sqlContext.read.parquet("/path/something=true/")`, the base path
    +   * will be `/path/something=true/`, and the returned DataFrame will not contain a column of
    +   * `something`. If users want to override the basePath. They can set `basePath` in the options
    +   * to pass the new base path to the data source.
    +   * For the above example, if the user-provided base path is `/path/`, the returned
    +   * DataFrame will have the column of `something`.
    +   */
    +  private def basePaths: Set[Path] = {
    +    val userDefinedBasePath = parameters.get("basePath").map(basePath => Set(new Path(basePath)))
    +    userDefinedBasePath.getOrElse {
    +      // If the user does not provide basePath, we will just use paths.
    +      val pathSet = paths.toSet
    +      pathSet.map(p => new Path(p))
    +    }
    --- End diff --
    
    Considering `parsePartitions` asserts that `basePaths.distinct.size == 1`, users should either provide a `basePath` or passing only a single input path, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156303515
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156029793
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156029795
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45715/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156346392
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9651#discussion_r44625371
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala ---
    @@ -166,11 +170,37 @@ private[sql] object PartitioningUtils {
             return (None, None)
           }
     
    +      // Let's say chopped is a path of /table/a=1/, chopped.getName will give us a=1.
    +      // Once we get the string, we try to parse it and find the partition column and value.
           val maybeColumn = parsePartitionColumn(chopped.getName, defaultPartitionName, typeInference)
    -      maybeColumn.foreach(columns += _)
    +
    +      // Now, basePath will be /table/a=1/
           basePath = chopped
    +      // chopped will be /table/
           chopped = chopped.getParent
    -      finished = (maybeColumn.isEmpty && !columns.isEmpty) || chopped.getParent == null
    +
    +      // Now, we determine if we should continue.
    +      // When we hit any of the following three cases, we will not continue:
    +      //  - In this iteration, we could not parse the value of partition column and value,
    +      //    i.e. maybeColumn is None, and columns is not empty. At here we check if columns is
    +      //    empty to handle cases like /table/a=1/_temporary/something (we need to find a=1 in
    +      //    this case).
    +      //  - After we get the new chopped, this new chopped represent the path of "/table", i.e.
    +      //    chopped.getParent == null.
    +      //  - The chopped we used to parse partition column and value (right now, it is basePath),
    +      //    is already the root path of a table. For the example of /table/a=1/, /table/ is the
    +      //    root path.
    +      finished =
    +        (maybeColumn.isEmpty && !columns.isEmpty) ||
    +          chopped.getParent == null ||
    +          rootPaths.contains(basePath)
    +
    +      if (maybeColumn.isDefined && !rootPaths.contains(basePath)) {
    --- End diff --
    
    As we will stop when `rootPaths.contains(basePath) == true` and in this case `columns` will not be modified and we don't care the content of `maybeColumn`, maybe we can skip `parsePartitionColumn` too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156329263
  
    **[Test build #45828 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45828/consoleFull)** for PR 9651 at commit [`d784a52`](https://github.com/apache/spark/commit/d784a52ee80a9f90c3d6a3698c5b97945ca14253).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156302949
  
    **[Test build #45802 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45802/consoleFull)** for PR 9651 at commit [`d784a52`](https://github.com/apache/spark/commit/d784a52ee80a9f90c3d6a3698c5b97945ca14253).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156350536
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156350581
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156283815
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156288352
  
    need to work on the doc and comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156005172
  
    **[Test build #45707 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45707/consoleFull)** for PR 9651 at commit [`e406792`](https://github.com/apache/spark/commit/e406792a2a5b69c40864ff16dbdbb534a25e5c2f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156317077
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156318343
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156321004
  
    retest this please...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156318344
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45823/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9651#discussion_r44625510
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala ---
    @@ -76,7 +76,37 @@ class ParquetPartitionDiscoverySuite extends QueryTest with ParquetTest with Sha
           "hdfs://host:9000/path/a=10/b=20",
           "hdfs://host:9000/path/_temporary/path")
     
    -    parsePartitions(paths.map(new Path(_)), defaultPartitionName, true)
    +    parsePartitions(
    +      paths.map(new Path(_)),
    +      defaultPartitionName,
    +      true,
    +      Set(new Path("hdfs://host:9000/path/")))
    +
    +    // Valid
    +    paths = Seq(
    +      "hdfs://host:9000/path/something=true/table/",
    +      "hdfs://host:9000/path/something=true/table/_temporary",
    +      "hdfs://host:9000/path/something=true/table/a=10/b=20",
    +      "hdfs://host:9000/path/something=true/table/_temporary/path")
    +
    +    parsePartitions(
    +      paths.map(new Path(_)),
    +      defaultPartitionName,
    +      true,
    +      Set(new Path("hdfs://host:9000/path/something=true/table")))
    +
    +    // Valid
    +    paths = Seq(
    +      "hdfs://host:9000/path/table=true/",
    +      "hdfs://host:9000/path/table=true/_temporary",
    +      "hdfs://host:9000/path/table=true/a=10/b=20",
    +      "hdfs://host:9000/path/table=true/_temporary/path")
    --- End diff --
    
    Yeah, I don't realize that it is possible to have such base path.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9651#discussion_r44751563
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala ---
    @@ -144,7 +149,7 @@ private[sql] object PartitioningUtils {
        *       Literal.create("hello", StringType),
        *       Literal.create(3.14, FloatType)))
        * }}}
    -   * and the base path:
    +   * and the path when we stop the discovery is:
        * {{{
        *   /path/to/partition
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156329298
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156029744
  
    **[Test build #45715 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45715/consoleFull)** for PR 9651 at commit [`28a1227`](https://github.com/apache/spark/commit/28a122752cbcc8ea2abff314441522b12f96a6f9).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156285630
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156388126
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45840/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156317093
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9651#discussion_r44630535
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala ---
    @@ -294,7 +294,7 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex
             // If the "part = 1" filter gets pushed down, this query will throw an exception since
             // "part" is not a valid column in the actual Parquet file
             checkAnswer(
    -          sqlContext.read.parquet(path).filter("part = 1"),
    +          sqlContext.read.parquet(dir.getCanonicalPath).filter("part = 1"),
    --- End diff --
    
    Why do we need `getCanonicalPath` here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156353605
  
    **[Test build #45840 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45840/consoleFull)** for PR 9651 at commit [`240bcf3`](https://github.com/apache/spark/commit/240bcf3c226d0e71601d854aaf34b136caf7f547).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156003791
  
    @liancheng @viirya can you guys review it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156287761
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45793/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156332440
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156394772
  
    Thanks, merged to master and branch-1.6.
    
    cc @marmbrus 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156312902
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45809/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9651#discussion_r44751717
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala ---
    @@ -152,32 +157,59 @@ private[sql] object PartitioningUtils {
       private[sql] def parsePartition(
           path: Path,
           defaultPartitionName: String,
    -      typeInference: Boolean): (Option[PartitionValues], Option[Path]) = {
    +      typeInference: Boolean,
    +      basePaths: Set[Path]): (Option[PartitionValues], Option[Path]) = {
         val columns = ArrayBuffer.empty[(String, Literal)]
         // Old Hadoop versions don't have `Path.isRoot`
         var finished = path.getParent == null
    -    var chopped = path
    -    var basePath = path
    +    // currentPath is the current path that we will use to parse partition column value.
    +    var currentPath = path
    +    // childPath will be the child of currentPath in the loop below.
    +    var childPath = path
    --- End diff --
    
    Seems we will not need childPath anymore.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156010399
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45707/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156004451
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156387936
  
    **[Test build #45840 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45840/consoleFull)** for PR 9651 at commit [`240bcf3`](https://github.com/apache/spark/commit/240bcf3c226d0e71601d854aaf34b136caf7f547).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156326244
  
    The previous `LDASuite` test failures were because of flaky tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156329299
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45828/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156349621
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156321042
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/9651


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156287760
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156286373
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45792/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156302997
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45802/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9651#discussion_r44671945
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala ---
    @@ -294,7 +294,7 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex
             // If the "part = 1" filter gets pushed down, this query will throw an exception since
             // "part" is not a valid column in the actual Parquet file
             checkAnswer(
    -          sqlContext.read.parquet(path).filter("part = 1"),
    +          sqlContext.read.parquet(dir.getCanonicalPath).filter("part = 1"),
    --- End diff --
    
    path is a partition dir and if we load that single dir, I am not sure we should attach part as a column to your table.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156013181
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9651#discussion_r44629735
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala ---
    @@ -166,11 +170,37 @@ private[sql] object PartitioningUtils {
             return (None, None)
           }
     
    +      // Let's say chopped is a path of /table/a=1/, chopped.getName will give us a=1.
    +      // Once we get the string, we try to parse it and find the partition column and value.
           val maybeColumn = parsePartitionColumn(chopped.getName, defaultPartitionName, typeInference)
    -      maybeColumn.foreach(columns += _)
    +
    +      // Now, basePath will be /table/a=1/
           basePath = chopped
    +      // chopped will be /table/
           chopped = chopped.getParent
    -      finished = (maybeColumn.isEmpty && !columns.isEmpty) || chopped.getParent == null
    +
    +      // Now, we determine if we should continue.
    +      // When we hit any of the following three cases, we will not continue:
    +      //  - In this iteration, we could not parse the value of partition column and value,
    +      //    i.e. maybeColumn is None, and columns is not empty. At here we check if columns is
    +      //    empty to handle cases like /table/a=1/_temporary/something (we need to find a=1 in
    +      //    this case).
    +      //  - After we get the new chopped, this new chopped represent the path of "/table", i.e.
    +      //    chopped.getParent == null.
    +      //  - The chopped we used to parse partition column and value (right now, it is basePath),
    +      //    is already the root path of a table. For the example of /table/a=1/, /table/ is the
    +      //    root path.
    +      finished =
    +        (maybeColumn.isEmpty && !columns.isEmpty) ||
    +          chopped.getParent == null ||
    +          rootPaths.contains(basePath)
    +
    +      if (maybeColumn.isDefined && !rootPaths.contains(basePath)) {
    --- End diff --
    
    We can do it in this way:
    
    ```scala
    if (rootPaths.contains(basePath)) {
      finished = true
    } else {
      val maybeColumn = parsePartitionColumn(chopped.getName, defaultPartitionName, typeInference)
      maybeColumn.foreach(columns += _)
      basePath = chopped
      chopped = chopped.getParent
      finished = (maybeColumn.isEmpty && columns.nonEmpty) || chopped.getParent == null
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9651#discussion_r44746035
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala ---
    @@ -152,32 +157,59 @@ private[sql] object PartitioningUtils {
       private[sql] def parsePartition(
           path: Path,
           defaultPartitionName: String,
    -      typeInference: Boolean): (Option[PartitionValues], Option[Path]) = {
    +      typeInference: Boolean,
    +      basePaths: Set[Path]): (Option[PartitionValues], Option[Path]) = {
         val columns = ArrayBuffer.empty[(String, Literal)]
         // Old Hadoop versions don't have `Path.isRoot`
         var finished = path.getParent == null
    -    var chopped = path
    -    var basePath = path
    +    // currentPath is the current path that we will use to parse partition column value.
    +    var currentPath = path
    +    // childPath will be the child of currentPath in the loop below.
    +    var childPath = path
    --- End diff --
    
    Maybe it's better to initialize `childPath` as `null` since it should point to a subdirectory of `currentPath`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11678] [SQL] Partition discovery should...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9651#issuecomment-156010374
  
    **[Test build #45707 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45707/consoleFull)** for PR 9651 at commit [`e406792`](https://github.com/apache/spark/commit/e406792a2a5b69c40864ff16dbdbb534a25e5c2f).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org