You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2015/11/02 12:09:23 UTC
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
GitHub user cloud-fan opened a pull request:
https://github.com/apache/spark/pull/9408
[SPARK-11453][SQL] append data to partitioned table will messes up the result
The reason is that:
1. For partitioned hive table, we will move the partitioned columns after data columns. (e.g. `<a: Int, b: Int>` partition by `a` will become `<b: Int, a: Int>`)
2. When append data to table, we use position to figure out how to match input columns to table's columns.
So when we append data to partitioned table, we will match wrong columns between input and table. A solution is reordering the input columns before match by position.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cloud-fan/spark append
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/9408.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #9408
----
commit b1512b0bcb80d5621f43954989403a85fdab0960
Author: Wenchen Fan <we...@databricks.com>
Date: 2015-11-02T10:53:30Z
fix bug of appending data to partitioned table
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-153930370
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-153932901
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45082/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-153709196
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-154827497
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-152990097
cc @yhuai
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-153931159
**[Test build #45084 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45084/consoleFull)** for PR 9408 at commit [`e682c86`](https://github.com/apache/spark/commit/e682c86244c8b6e7d4035b0cc8e340e303319cf5).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/9408#discussion_r43853474
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -169,11 +169,21 @@ final class DataFrameWriter private[sql](df: DataFrame) {
private def insertInto(tableIdent: TableIdentifier): Unit = {
val partitions = partitioningColumns.map(_.map(col => col -> (None: Option[String])).toMap)
val overwrite = mode == SaveMode.Overwrite
+
+ // A partitioned relation schema's can be different from the input logicalPlan, since
+ // partition columns are all moved after data column. We Project to adjust the ordering.
+ // TODO: this belongs in the analyzer.
+ val input = partitioningColumns.map { parCols =>
+ val projectList = df.logicalPlan.output.filterNot(c => parCols.contains(c.name)) ++
--- End diff --
Do we need to consider case sensitivity for partition column names here?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-153013113
**[Test build #44804 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44804/consoleFull)** for PR 9408 at commit [`b1512b0`](https://github.com/apache/spark/commit/b1512b0bcb80d5621f43954989403a85fdab0960).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:\n * `case class Corr(`\n * `case class Corr(left: Expression, right: Expression)`\n * `case class RepartitionByExpression(`\n * ` logInfo(s\"Hive class not found $e\")`\n * ` logDebug(\"Hive class not found\", e)`\n
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-154837760
**[Test build #2013 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2013/consoleFull)** for PR 9408 at commit [`186d281`](https://github.com/apache/spark/commit/186d2812ec8b931b479a84dfa065149f6d905ecb).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/9408#discussion_r43853585
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -169,11 +169,21 @@ final class DataFrameWriter private[sql](df: DataFrame) {
private def insertInto(tableIdent: TableIdentifier): Unit = {
val partitions = partitioningColumns.map(_.map(col => col -> (None: Option[String])).toMap)
val overwrite = mode == SaveMode.Overwrite
+
+ // A partitioned relation schema's can be different from the input logicalPlan, since
+ // partition columns are all moved after data column. We Project to adjust the ordering.
+ // TODO: this belongs in the analyzer.
+ val input = partitioningColumns.map { parCols =>
+ val projectList = df.logicalPlan.output.filterNot(c => parCols.contains(c.name)) ++
+ parCols.map(UnresolvedAttribute(_))
+ Project(projectList, df.logicalPlan)
--- End diff --
I found this block is a little bit hard to understand. The following version may be questionably more readable:
```scala
val (inputDataCols, inputPartCols) = df.logicalPlan.output.partition {
c => partCols.contains(c.name)
}
val projectList = inputDataCols ++ inputPartCols.map(c => UnresolvedAttribute(c.name))
Project(projectList, df.logicalPlan)
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-154870121
**[Test build #2015 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2015/consoleFull)** for PR 9408 at commit [`186d281`](https://github.com/apache/spark/commit/186d2812ec8b931b479a84dfa065149f6d905ecb).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-154858795
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/9408#discussion_r44092650
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -167,17 +167,38 @@ final class DataFrameWriter private[sql](df: DataFrame) {
}
private def insertInto(tableIdent: TableIdentifier): Unit = {
- val partitions = partitioningColumns.map(_.map(col => col -> (None: Option[String])).toMap)
+ val partitions = normalizedParCols.map(_.map(col => col -> (None: Option[String])).toMap)
val overwrite = mode == SaveMode.Overwrite
+
+ // A partitioned relation's schema can be different from the input logicalPlan, since
+ // partition columns are all moved after data columns. We Project to adjust the ordering.
+ // TODO: this belongs to the analyzer.
+ val input = normalizedParCols.map { parCols =>
+ val (inputPartCols, inputDataCols) = df.logicalPlan.output.partition { attr =>
+ parCols.contains(attr.name)
+ }
+ Project(inputDataCols ++ inputPartCols, df.logicalPlan)
--- End diff --
hmmm, I think `normalizedParCols` already did this check?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-153951355
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45084/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-154839612
**[Test build #2013 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2013/consoleFull)** for PR 9408 at commit [`186d281`](https://github.com/apache/spark/commit/186d2812ec8b931b479a84dfa065149f6d905ecb).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-153930956
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-153753781
**[Test build #45015 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45015/consoleFull)** for PR 9408 at commit [`06b96ed`](https://github.com/apache/spark/commit/06b96ed704dfaeb795575312e1a092f0a8aea6d5).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/9408#discussion_r43939051
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -167,17 +167,39 @@ final class DataFrameWriter private[sql](df: DataFrame) {
}
private def insertInto(tableIdent: TableIdentifier): Unit = {
- val partitions = partitioningColumns.map(_.map(col => col -> (None: Option[String])).toMap)
+ val partitions = normalizedParCols.map(_.map(col => col -> (None: Option[String])).toMap)
val overwrite = mode == SaveMode.Overwrite
+
+ // A partitioned relation's schema can be different from the input logicalPlan, since
+ // partition columns are all moved after data columns. We Project to adjust the ordering.
+ // TODO: this belongs to the analyzer.
+ val input = normalizedParCols.map { parCols =>
+ val (inputPartCols, inputDataCols) = df.logicalPlan.output.partition { attr =>
+ parCols.contains(attr.name)
+ }
+ val projectList = inputDataCols ++ inputPartCols.map(c => UnresolvedAttribute(c.name))
+ Project(projectList, df.logicalPlan)
+ }.getOrElse(df.logicalPlan)
+
df.sqlContext.executePlan(
InsertIntoTable(
UnresolvedRelation(tableIdent),
partitions.getOrElse(Map.empty[String, Option[String]]),
- df.logicalPlan,
+ input,
overwrite,
ifNotExists = false)).toRdd
}
+ private def normalizedParCols: Option[Seq[String]] = partitioningColumns.map { parCols =>
+ parCols.map { col =>
+ df.logicalPlan.output
+ .map(_.name)
+ .find(df.queryExecution.analyzer.resolver(_, col))
+ .getOrElse(throw new AnalysisException(
+ s"Partition column $col not found in schema ${df.logicalPlan.schema}"))
--- End diff --
We only need to print column names, right?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-153711906
**[Test build #45015 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45015/consoleFull)** for PR 9408 at commit [`06b96ed`](https://github.com/apache/spark/commit/06b96ed704dfaeb795575312e1a092f0a8aea6d5).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-153013228
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-153930936
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-152990881
**[Test build #44804 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44804/consoleFull)** for PR 9408 at commit [`b1512b0`](https://github.com/apache/spark/commit/b1512b0bcb80d5621f43954989403a85fdab0960).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/9408
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-153222624
cc @liancheng
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/9408#discussion_r43939622
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -167,17 +167,39 @@ final class DataFrameWriter private[sql](df: DataFrame) {
}
private def insertInto(tableIdent: TableIdentifier): Unit = {
- val partitions = partitioningColumns.map(_.map(col => col -> (None: Option[String])).toMap)
+ val partitions = normalizedParCols.map(_.map(col => col -> (None: Option[String])).toMap)
val overwrite = mode == SaveMode.Overwrite
+
+ // A partitioned relation's schema can be different from the input logicalPlan, since
+ // partition columns are all moved after data columns. We Project to adjust the ordering.
+ // TODO: this belongs to the analyzer.
+ val input = normalizedParCols.map { parCols =>
+ val (inputPartCols, inputDataCols) = df.logicalPlan.output.partition { attr =>
+ parCols.contains(attr.name)
+ }
+ val projectList = inputDataCols ++ inputPartCols.map(c => UnresolvedAttribute(c.name))
--- End diff --
Do we need to make them unresolved?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-154858788
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-153754013
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-152990301
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-154859299
**[Test build #45305 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45305/consoleFull)** for PR 9408 at commit [`4a61037`](https://github.com/apache/spark/commit/4a610371b0a7c2a22f67a97534c91b0a422f14b0).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/9408#discussion_r44084011
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -167,17 +167,38 @@ final class DataFrameWriter private[sql](df: DataFrame) {
}
private def insertInto(tableIdent: TableIdentifier): Unit = {
- val partitions = partitioningColumns.map(_.map(col => col -> (None: Option[String])).toMap)
+ val partitions = normalizedParCols.map(_.map(col => col -> (None: Option[String])).toMap)
val overwrite = mode == SaveMode.Overwrite
+
+ // A partitioned relation's schema can be different from the input logicalPlan, since
+ // partition columns are all moved after data columns. We Project to adjust the ordering.
+ // TODO: this belongs to the analyzer.
+ val input = normalizedParCols.map { parCols =>
+ val (inputPartCols, inputDataCols) = df.logicalPlan.output.partition { attr =>
+ parCols.contains(attr.name)
+ }
+ Project(inputDataCols ++ inputPartCols, df.logicalPlan)
--- End diff --
actually, can we have a check to make sure that partition columns do appear at the end of the column list?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-153013229
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44804/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-153635160
Overall looks good, left some minor comments.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-154857396
**[Test build #2015 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2015/consoleFull)** for PR 9408 at commit [`186d281`](https://github.com/apache/spark/commit/186d2812ec8b931b479a84dfa065149f6d905ecb).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-153930348
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-153951352
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-153932900
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-154929534
LGTM. Merging to master and branch 1.6.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-153951034
**[Test build #45084 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45084/consoleFull)** for PR 9408 at commit [`e682c86`](https://github.com/apache/spark/commit/e682c86244c8b6e7d4035b0cc8e340e303319cf5).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-153709175
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-153754016
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45015/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-154875860
**[Test build #45305 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45305/consoleFull)** for PR 9408 at commit [`4a61037`](https://github.com/apache/spark/commit/4a610371b0a7c2a22f67a97534c91b0a422f14b0).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/9408#discussion_r43853577
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -169,11 +169,21 @@ final class DataFrameWriter private[sql](df: DataFrame) {
private def insertInto(tableIdent: TableIdentifier): Unit = {
val partitions = partitioningColumns.map(_.map(col => col -> (None: Option[String])).toMap)
val overwrite = mode == SaveMode.Overwrite
+
+ // A partitioned relation schema's can be different from the input logicalPlan, since
+ // partition columns are all moved after data column. We Project to adjust the ordering.
+ // TODO: this belongs in the analyzer.
--- End diff --
Nit: some typos
- "... relation schema's ..." => "... relation's schema ..."
- "... after data column." => "... after data columns"
- "... belongs in ..." => "... belongs to ..."
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-154875910
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11453][SQL] append data to partitioned ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9408#issuecomment-152990332
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org