You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2015/11/02 12:09:23 UTC

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/9408

    [SPARK-11453][SQL] append data to partitioned table will messes up the result

    The reason is that: 
    
    1. For partitioned hive table, we will move the partitioned columns after data columns. (e.g. `<a: Int, b: Int>` partition by `a` will become `<b: Int, a: Int>`)
    2. When append data to table, we use position to figure out how to match input columns to table's columns.
    
    So when we append data to partitioned table, we will match wrong columns between input and table. A solution is reordering the input columns before match by position.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark append

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9408.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9408
    
----
commit b1512b0bcb80d5621f43954989403a85fdab0960
Author: Wenchen Fan <we...@databricks.com>
Date:   2015-11-02T10:53:30Z

    fix bug of appending data to partitioned table

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-153930370
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-153932901
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45082/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-153709196
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-154827497
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-152990097
  
    cc @yhuai


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-153931159
  
    **[Test build #45084 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45084/consoleFull)** for PR 9408 at commit [`e682c86`](https://github.com/apache/spark/commit/e682c86244c8b6e7d4035b0cc8e340e303319cf5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9408#discussion_r43853474
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -169,11 +169,21 @@ final class DataFrameWriter private[sql](df: DataFrame) {
       private def insertInto(tableIdent: TableIdentifier): Unit = {
         val partitions = partitioningColumns.map(_.map(col => col -> (None: Option[String])).toMap)
         val overwrite = mode == SaveMode.Overwrite
    +
    +    // A partitioned relation schema's can be different from the input logicalPlan, since
    +    // partition columns are all moved after data column. We Project to adjust the ordering.
    +    // TODO: this belongs in the analyzer.
    +    val input = partitioningColumns.map { parCols =>
    +      val projectList = df.logicalPlan.output.filterNot(c => parCols.contains(c.name)) ++
    --- End diff --
    
    Do we need to consider case sensitivity for partition column names here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-153013113
  
    **[Test build #44804 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44804/consoleFull)** for PR 9408 at commit [`b1512b0`](https://github.com/apache/spark/commit/b1512b0bcb80d5621f43954989403a85fdab0960).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:\n  * `case class Corr(`\n  * `case class Corr(left: Expression, right: Expression)`\n  * `case class RepartitionByExpression(`\n  * `        logInfo(s\"Hive class not found $e\")`\n  * `        logDebug(\"Hive class not found\", e)`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-154837760
  
    **[Test build #2013 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2013/consoleFull)** for PR 9408 at commit [`186d281`](https://github.com/apache/spark/commit/186d2812ec8b931b479a84dfa065149f6d905ecb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9408#discussion_r43853585
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -169,11 +169,21 @@ final class DataFrameWriter private[sql](df: DataFrame) {
       private def insertInto(tableIdent: TableIdentifier): Unit = {
         val partitions = partitioningColumns.map(_.map(col => col -> (None: Option[String])).toMap)
         val overwrite = mode == SaveMode.Overwrite
    +
    +    // A partitioned relation schema's can be different from the input logicalPlan, since
    +    // partition columns are all moved after data column. We Project to adjust the ordering.
    +    // TODO: this belongs in the analyzer.
    +    val input = partitioningColumns.map { parCols =>
    +      val projectList = df.logicalPlan.output.filterNot(c => parCols.contains(c.name)) ++
    +        parCols.map(UnresolvedAttribute(_))
    +      Project(projectList, df.logicalPlan)
    --- End diff --
    
    I found this block is a little bit hard to understand. The following version may be questionably more readable:
    
    ```scala
          val (inputDataCols, inputPartCols) = df.logicalPlan.output.partition {
            c => partCols.contains(c.name)
          }
          val projectList = inputDataCols ++ inputPartCols.map(c => UnresolvedAttribute(c.name))
          Project(projectList, df.logicalPlan)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-154870121
  
    **[Test build #2015 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2015/consoleFull)** for PR 9408 at commit [`186d281`](https://github.com/apache/spark/commit/186d2812ec8b931b479a84dfa065149f6d905ecb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-154858795
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9408#discussion_r44092650
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -167,17 +167,38 @@ final class DataFrameWriter private[sql](df: DataFrame) {
       }
     
       private def insertInto(tableIdent: TableIdentifier): Unit = {
    -    val partitions = partitioningColumns.map(_.map(col => col -> (None: Option[String])).toMap)
    +    val partitions = normalizedParCols.map(_.map(col => col -> (None: Option[String])).toMap)
         val overwrite = mode == SaveMode.Overwrite
    +
    +    // A partitioned relation's schema can be different from the input logicalPlan, since
    +    // partition columns are all moved after data columns. We Project to adjust the ordering.
    +    // TODO: this belongs to the analyzer.
    +    val input = normalizedParCols.map { parCols =>
    +      val (inputPartCols, inputDataCols) = df.logicalPlan.output.partition { attr =>
    +        parCols.contains(attr.name)
    +      }
    +      Project(inputDataCols ++ inputPartCols, df.logicalPlan)
    --- End diff --
    
    hmmm,  I think `normalizedParCols` already did this check?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-153951355
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45084/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-154839612
  
    **[Test build #2013 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2013/consoleFull)** for PR 9408 at commit [`186d281`](https://github.com/apache/spark/commit/186d2812ec8b931b479a84dfa065149f6d905ecb).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-153930956
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-153753781
  
    **[Test build #45015 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45015/consoleFull)** for PR 9408 at commit [`06b96ed`](https://github.com/apache/spark/commit/06b96ed704dfaeb795575312e1a092f0a8aea6d5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9408#discussion_r43939051
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -167,17 +167,39 @@ final class DataFrameWriter private[sql](df: DataFrame) {
       }
     
       private def insertInto(tableIdent: TableIdentifier): Unit = {
    -    val partitions = partitioningColumns.map(_.map(col => col -> (None: Option[String])).toMap)
    +    val partitions = normalizedParCols.map(_.map(col => col -> (None: Option[String])).toMap)
         val overwrite = mode == SaveMode.Overwrite
    +
    +    // A partitioned relation's schema can be different from the input logicalPlan, since
    +    // partition columns are all moved after data columns. We Project to adjust the ordering.
    +    // TODO: this belongs to the analyzer.
    +    val input = normalizedParCols.map { parCols =>
    +      val (inputPartCols, inputDataCols) = df.logicalPlan.output.partition { attr =>
    +        parCols.contains(attr.name)
    +      }
    +      val projectList = inputDataCols ++ inputPartCols.map(c => UnresolvedAttribute(c.name))
    +      Project(projectList, df.logicalPlan)
    +    }.getOrElse(df.logicalPlan)
    +
         df.sqlContext.executePlan(
           InsertIntoTable(
             UnresolvedRelation(tableIdent),
             partitions.getOrElse(Map.empty[String, Option[String]]),
    -        df.logicalPlan,
    +        input,
             overwrite,
             ifNotExists = false)).toRdd
       }
     
    +  private def normalizedParCols: Option[Seq[String]] = partitioningColumns.map { parCols =>
    +    parCols.map { col =>
    +      df.logicalPlan.output
    +        .map(_.name)
    +        .find(df.queryExecution.analyzer.resolver(_, col))
    +        .getOrElse(throw new AnalysisException(
    +          s"Partition column $col not found in schema ${df.logicalPlan.schema}"))
    --- End diff --
    
    We only need to print column names, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-153711906
  
    **[Test build #45015 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45015/consoleFull)** for PR 9408 at commit [`06b96ed`](https://github.com/apache/spark/commit/06b96ed704dfaeb795575312e1a092f0a8aea6d5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-153013228
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-153930936
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-152990881
  
    **[Test build #44804 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44804/consoleFull)** for PR 9408 at commit [`b1512b0`](https://github.com/apache/spark/commit/b1512b0bcb80d5621f43954989403a85fdab0960).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/9408


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-153222624
  
    cc @liancheng 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9408#discussion_r43939622
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -167,17 +167,39 @@ final class DataFrameWriter private[sql](df: DataFrame) {
       }
     
       private def insertInto(tableIdent: TableIdentifier): Unit = {
    -    val partitions = partitioningColumns.map(_.map(col => col -> (None: Option[String])).toMap)
    +    val partitions = normalizedParCols.map(_.map(col => col -> (None: Option[String])).toMap)
         val overwrite = mode == SaveMode.Overwrite
    +
    +    // A partitioned relation's schema can be different from the input logicalPlan, since
    +    // partition columns are all moved after data columns. We Project to adjust the ordering.
    +    // TODO: this belongs to the analyzer.
    +    val input = normalizedParCols.map { parCols =>
    +      val (inputPartCols, inputDataCols) = df.logicalPlan.output.partition { attr =>
    +        parCols.contains(attr.name)
    +      }
    +      val projectList = inputDataCols ++ inputPartCols.map(c => UnresolvedAttribute(c.name))
    --- End diff --
    
    Do we need to make them unresolved?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-154858788
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-153754013
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-152990301
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-154859299
  
    **[Test build #45305 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45305/consoleFull)** for PR 9408 at commit [`4a61037`](https://github.com/apache/spark/commit/4a610371b0a7c2a22f67a97534c91b0a422f14b0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9408#discussion_r44084011
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -167,17 +167,38 @@ final class DataFrameWriter private[sql](df: DataFrame) {
       }
     
       private def insertInto(tableIdent: TableIdentifier): Unit = {
    -    val partitions = partitioningColumns.map(_.map(col => col -> (None: Option[String])).toMap)
    +    val partitions = normalizedParCols.map(_.map(col => col -> (None: Option[String])).toMap)
         val overwrite = mode == SaveMode.Overwrite
    +
    +    // A partitioned relation's schema can be different from the input logicalPlan, since
    +    // partition columns are all moved after data columns. We Project to adjust the ordering.
    +    // TODO: this belongs to the analyzer.
    +    val input = normalizedParCols.map { parCols =>
    +      val (inputPartCols, inputDataCols) = df.logicalPlan.output.partition { attr =>
    +        parCols.contains(attr.name)
    +      }
    +      Project(inputDataCols ++ inputPartCols, df.logicalPlan)
    --- End diff --
    
    actually, can we have a check to make sure that partition columns do appear at the end of the column list?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-153013229
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44804/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-153635160
  
    Overall looks good, left some minor comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-154857396
  
    **[Test build #2015 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2015/consoleFull)** for PR 9408 at commit [`186d281`](https://github.com/apache/spark/commit/186d2812ec8b931b479a84dfa065149f6d905ecb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-153930348
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-153951352
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-153932900
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-154929534
  
    LGTM. Merging to master and branch 1.6.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-153951034
  
    **[Test build #45084 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45084/consoleFull)** for PR 9408 at commit [`e682c86`](https://github.com/apache/spark/commit/e682c86244c8b6e7d4035b0cc8e340e303319cf5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-153709175
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-153754016
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45015/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-154875860
  
    **[Test build #45305 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45305/consoleFull)** for PR 9408 at commit [`4a61037`](https://github.com/apache/spark/commit/4a610371b0a7c2a22f67a97534c91b0a422f14b0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9408#discussion_r43853577
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -169,11 +169,21 @@ final class DataFrameWriter private[sql](df: DataFrame) {
       private def insertInto(tableIdent: TableIdentifier): Unit = {
         val partitions = partitioningColumns.map(_.map(col => col -> (None: Option[String])).toMap)
         val overwrite = mode == SaveMode.Overwrite
    +
    +    // A partitioned relation schema's can be different from the input logicalPlan, since
    +    // partition columns are all moved after data column. We Project to adjust the ordering.
    +    // TODO: this belongs in the analyzer.
    --- End diff --
    
    Nit: some typos
    
    - "... relation schema's ..." => "... relation's schema ..."
    - "... after data column." => "... after data columns"
    - "... belongs in ..." => "... belongs to ..."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-154875910
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9408#issuecomment-152990332
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org