You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gengliangwang <gi...@git.apache.org> on 2018/09/03 07:18:25 UTC
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
GitHub user gengliangwang opened a pull request:
https://github.com/apache/spark/pull/22320
[SPARK-25313][SQL]Fix regression in FileFormatWriter output names
## What changes were proposed in this pull request?
Let's see the follow example:
```
val location = "/tmp/t"
val df = spark.range(10).toDF("id")
df.write.format("parquet").saveAsTable("tbl")
spark.sql("CREATE VIEW view1 AS SELECT id FROM tbl")
spark.sql(s"CREATE TABLE tbl2(ID long) USING parquet location $location")
spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT ID FROM view1")
println(spark.read.parquet(location).schema)
spark.table("tbl2").show()
```
The output column name in schema will be `id` instead of `ID`, thus the last query shows nothing from `tbl2`.
By enabling the debug message we can see that the output naming is changed from `ID` to `id`, and then the `outputColumns` in `InsertIntoHadoopFsRelationCommand` is changed in `RemoveRedundantAliases`.
![wechatimg5](https://user-images.githubusercontent.com/1097932/44947871-6299f200-ae46-11e8-9c96-d45fe368206c.jpeg)
![wechatimg4](https://user-images.githubusercontent.com/1097932/44947866-56ae3000-ae46-11e8-8923-8b3bbe060075.jpeg)
**To guarantee correctness**, we should change the output columns from `Seq[Attribute]` to `Seq[String]` to avoid its names being replaced by optimizer.
I will fix project elimination related rules in https://github.com/apache/spark/pull/22311 after this one.
## How was this patch tested?
Unit test.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gengliangwang/spark fixOutputSchema
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22320.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22320
----
commit bbd572c1fe542c6b2fd642212f927ba384c882e4
Author: Gengliang Wang <ge...@...>
Date: 2018-08-31T16:07:00Z
Fix regression in FileFormatWriter output schema
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2872/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on the issue:
https://github.com/apache/spark/pull/22320
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the issue:
https://github.com/apache/spark/pull/22320
(This is a test comment to test a GitHub Integration; please ignore)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22320
**[Test build #95609 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95609/testReport)** for PR 22320 at commit [`bbd572c`](https://github.com/apache/spark/commit/bbd572c1fe542c6b2fd642212f927ba384c882e4).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r215106921
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala ---
@@ -56,7 +56,7 @@ case class InsertIntoHadoopFsRelationCommand(
mode: SaveMode,
catalogTable: Option[CatalogTable],
fileIndex: Option[FileIndex],
- outputColumns: Seq[Attribute])
+ outputColumnNames: Seq[String])
extends DataWritingCommand {
import org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils.escapePathName
--- End diff --
Line 66: `query.schema` should be `DataWritingCommand.logicalPlanSchemaWithNames(query, outputColumnNames)`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22320
**[Test build #95711 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95711/testReport)** for PR 22320 at commit [`4590c98`](https://github.com/apache/spark/commit/4590c9837026e820d7d91300a7ab3f87a668755c).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22320
**[Test build #95610 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95610/testReport)** for PR 22320 at commit [`bbd572c`](https://github.com/apache/spark/commit/bbd572c1fe542c6b2fd642212f927ba384c882e4).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2791/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r214751930
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala ---
@@ -754,6 +754,47 @@ class HiveDDLSuite
}
}
+ test("Insert overwrite Hive table should output correct schema") {
+ withTable("tbl", "tbl2") {
+ withView("view1") {
+ spark.sql("CREATE TABLE tbl(id long)")
+ spark.sql("INSERT OVERWRITE TABLE tbl SELECT 4")
+ spark.sql("CREATE VIEW view1 AS SELECT id FROM tbl")
+ spark.sql("CREATE TABLE tbl2(ID long)")
+ spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT ID FROM view1")
+ checkAnswer(spark.table("tbl2"), Seq(Row(4)))
+ }
+ }
+ }
+
+ test("Insert into Hive directory should output correct schema") {
+ withTable("tbl") {
+ withView("view1") {
+ withTempPath { path =>
+ spark.sql("CREATE TABLE tbl(id long)")
+ spark.sql("INSERT OVERWRITE TABLE tbl SELECT 4")
--- End diff --
`s/SELECT/VALUES` as it could be a bit more Spark-idiomatic?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22320
**[Test build #95620 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95620/testReport)** for PR 22320 at commit [`16bb457`](https://github.com/apache/spark/commit/16bb457828ff0284456ef4ef36a23384b4a74b6e).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r214722030
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala ---
@@ -805,6 +805,81 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext with Be
}
}
+ test("Insert overwrite table command should output correct schema: basic") {
+ withTable("tbl", "tbl2") {
+ withView("view1") {
+ val df = spark.range(10).toDF("id")
+ df.write.format("parquet").saveAsTable("tbl")
+ spark.sql("CREATE VIEW view1 AS SELECT id FROM tbl")
+ spark.sql("CREATE TABLE tbl2(ID long) USING parquet")
+ spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT ID FROM view1")
+ val identifier = TableIdentifier("tbl2", Some("default"))
+ val location = spark.sessionState.catalog.getTableMetadata(identifier).location.toString
+ val expectedSchema = StructType(Seq(StructField("ID", LongType, true)))
+ assert(spark.read.parquet(location).schema == expectedSchema)
+ checkAnswer(spark.table("tbl2"), df)
+ }
+ }
+ }
+
+ test("Insert overwrite table command should output correct schema: complex") {
+ withTable("tbl", "tbl2") {
+ withView("view1") {
+ val df = spark.range(10).map(x => (x, x.toInt, x.toInt)).toDF("col1", "col2", "col3")
+ df.write.format("parquet").saveAsTable("tbl")
+ spark.sql("CREATE VIEW view1 AS SELECT * FROM tbl")
+ spark.sql("CREATE TABLE tbl2(COL1 long, COL2 int, COL3 int) USING parquet PARTITIONED " +
+ "BY (COL2) CLUSTERED BY (COL3) INTO 3 BUCKETS")
+ spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT COL1, COL2, COL3 " +
+ "FROM view1 CLUSTER BY COL3")
--- End diff --
is it legal to put `CLUSTER BY` in the INSERT statement?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r215479502
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala ---
@@ -82,7 +83,7 @@ case class CreateHiveTableAsSelectCommand(
query,
overwrite = true,
ifPartitionNotExists = false,
- outputColumns = outputColumns).run(sparkSession, child)
+ outputColumnNames = outputColumnNames).run(sparkSession, child)
--- End diff --
I feel it's better to specify parameters by name if the previous parameter is already specified by name, e.g. `ifPartitionNotExists = false`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r215247634
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala ---
@@ -754,6 +754,54 @@ class HiveDDLSuite
}
}
+ test("Insert overwrite Hive table should output correct schema") {
+ withSQLConf(CONVERT_METASTORE_PARQUET.key -> "false") {
+ withTable("tbl", "tbl2") {
+ withView("view1") {
+ spark.sql("CREATE TABLE tbl(id long)")
+ spark.sql("INSERT OVERWRITE TABLE tbl VALUES 4")
--- End diff --
We can, but it's important to keep the code style consistent with the existing code in the same file. In this test suite, seems SQL statements are prefered.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r215248202
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala ---
@@ -82,7 +83,7 @@ case class CreateHiveTableAsSelectCommand(
query,
overwrite = true,
ifPartitionNotExists = false,
- outputColumns = outputColumns).run(sparkSession, child)
+ outputColumnNames = outputColumnNames).run(sparkSession, child)
--- End diff --
what's the duplication?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2863/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r215128076
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala ---
@@ -56,7 +56,7 @@ case class InsertIntoHadoopFsRelationCommand(
mode: SaveMode,
catalogTable: Option[CatalogTable],
fileIndex: Option[FileIndex],
- outputColumns: Seq[Attribute])
+ outputColumnNames: Seq[String])
extends DataWritingCommand {
import org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils.escapePathName
--- End diff --
Oh, then we can use this method instead.
```
def checkColumnNameDuplication(
columnNames: Seq[String], colType: String, caseSensitiveAnalysis: Boolean): Unit
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2781/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95702/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95609/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r214655750
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala ---
@@ -38,6 +38,20 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT
*/
def outputSet: AttributeSet = AttributeSet(output)
+ /**
+ * Returns output attributes with provided names.
+ * The length of provided names should be the same of the length of [[output]].
+ */
+ def outputWithNames(names: Seq[String]): Seq[Attribute] = {
+ // Save the output attributes to a variable to avoid duplicated function calls.
+ val outputAttributes = output
+ assert(outputAttributes.length == names.length,
+ "The length of provided names doesn't match the length of output attributes.")
+ outputAttributes.zipWithIndex.map { case (element, index) =>
+ element.withName(names(index))
+ }
+ }
+
--- End diff --
@maropu Thanks! I have create object `DataWritingCommand` for this.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r214645372
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ---
@@ -495,7 +496,9 @@ case class DataSource(
s"Unable to resolve $name given [${data.output.map(_.name).mkString(", ")}]")
}
}
- val resolved = cmd.copy(partitionColumns = resolvedPartCols, outputColumns = outputColumns)
+ val resolved = cmd.copy(
+ partitionColumns = resolvedPartCols,
+ outputColumnNames = outputColumns.map(_.name))
--- End diff --
why can't we use `outputColumnNames` directly here?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22320
**[Test build #95657 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95657/testReport)** for PR 22320 at commit [`538fea9`](https://github.com/apache/spark/commit/538fea99ed2158316d89f64ce397c4791fbed1f3).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22320
**[Test build #95702 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95702/testReport)** for PR 22320 at commit [`4590c98`](https://github.com/apache/spark/commit/4590c9837026e820d7d91300a7ab3f87a668755c).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22320
**[Test build #95633 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95633/testReport)** for PR 22320 at commit [`98bf027`](https://github.com/apache/spark/commit/98bf027df9c4467adc2673097c6762a3ec5210ce).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95649/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r214778690
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala ---
@@ -805,6 +805,80 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext with Be
}
}
+ test("Insert overwrite table command should output correct schema: basic") {
+ withTable("tbl", "tbl2") {
+ withView("view1") {
+ val df = spark.range(10).toDF("id")
+ df.write.format("parquet").saveAsTable("tbl")
+ spark.sql("CREATE VIEW view1 AS SELECT id FROM tbl")
+ spark.sql("CREATE TABLE tbl2(ID long) USING parquet")
+ spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT ID FROM view1")
+ val identifier = TableIdentifier("tbl2", Some("default"))
+ val location = spark.sessionState.catalog.getTableMetadata(identifier).location.toString
+ val expectedSchema = StructType(Seq(StructField("ID", LongType, true)))
+ assert(spark.read.parquet(location).schema == expectedSchema)
+ checkAnswer(spark.table("tbl2"), df)
+ }
+ }
+ }
+
+ test("Insert overwrite table command should output correct schema: complex") {
+ withTable("tbl", "tbl2") {
+ withView("view1") {
+ val df = spark.range(10).map(x => (x, x.toInt, x.toInt)).toDF("col1", "col2", "col3")
+ df.write.format("parquet").saveAsTable("tbl")
+ spark.sql("CREATE VIEW view1 AS SELECT * FROM tbl")
+ spark.sql("CREATE TABLE tbl2(COL1 long, COL2 int, COL3 int) USING parquet PARTITIONED " +
+ "BY (COL2) CLUSTERED BY (COL3) INTO 3 BUCKETS")
+ spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT COL1, COL2, COL3 FROM view1")
+ val identifier = TableIdentifier("tbl2", Some("default"))
+ val location = spark.sessionState.catalog.getTableMetadata(identifier).location.toString
+ val expectedSchema = StructType(Seq(
+ StructField("COL1", LongType, true),
--- End diff --
Keep it should be OK.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r214735437
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala ---
@@ -754,6 +754,47 @@ class HiveDDLSuite
}
}
+ test("Insert overwrite Hive table should output correct schema") {
+ withTable("tbl", "tbl2") {
+ withView("view1") {
+ spark.sql("CREATE TABLE tbl(id long)")
--- End diff --
I am not familiar with Hive. But as I look at the debug message of this logical plan, the top level is `InsertIntoHiveTable `default`.`tbl2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, true, false, [ID]`. It should not be related to this configuration, right?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95610/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2797/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22320
**[Test build #95702 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95702/testReport)** for PR 22320 at commit [`4590c98`](https://github.com/apache/spark/commit/4590c9837026e820d7d91300a7ab3f87a668755c).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r214828496
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala ---
@@ -754,6 +754,47 @@ class HiveDDLSuite
}
}
+ test("Insert overwrite Hive table should output correct schema") {
+ withTable("tbl", "tbl2") {
+ withView("view1") {
+ spark.sql("CREATE TABLE tbl(id long)")
+ spark.sql("INSERT OVERWRITE TABLE tbl SELECT 4")
+ spark.sql("CREATE VIEW view1 AS SELECT id FROM tbl")
+ spark.sql("CREATE TABLE tbl2(ID long)")
+ spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT ID FROM view1")
+ checkAnswer(spark.table("tbl2"), Seq(Row(4)))
--- End diff --
Good point. I found that `CreateHiveTableAsSelectCommand` output wrong schema after adding a new test case.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r214694881
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala ---
@@ -69,7 +69,7 @@ case class InsertIntoHiveTable(
query: LogicalPlan,
overwrite: Boolean,
ifPartitionNotExists: Boolean,
- outputColumns: Seq[Attribute]) extends SaveAsHiveFile {
+ outputColumnNames: Seq[String]) extends SaveAsHiveFile {
--- End diff --
No problem 👍
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2815/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r214630233
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala ---
@@ -38,6 +38,20 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT
*/
def outputSet: AttributeSet = AttributeSet(output)
+ /**
+ * Returns output attributes with provided names.
+ * The length of provided names should be the same of the length of [[output]].
+ */
+ def outputWithNames(names: Seq[String]): Seq[Attribute] = {
+ // Save the output attributes to a variable to avoid duplicated function calls.
+ val outputAttributes = output
+ assert(outputAttributes.length == names.length,
+ "The length of provided names doesn't match the length of output attributes.")
+ outputAttributes.zipWithIndex.map { case (element, index) =>
+ element.withName(names(index))
+ }
+ }
+
--- End diff --
If #22311 merged, we don't need this function anymore? If so, IMHO it'd be better to fix this issue in the `FileFormatWriter` side as a workaround?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22320
**[Test build #95610 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95610/testReport)** for PR 22320 at commit [`bbd572c`](https://github.com/apache/spark/commit/bbd572c1fe542c6b2fd642212f927ba384c882e4).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22320
**[Test build #95657 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95657/testReport)** for PR 22320 at commit [`538fea9`](https://github.com/apache/spark/commit/538fea99ed2158316d89f64ce397c4791fbed1f3).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22320
**[Test build #95627 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95627/testReport)** for PR 22320 at commit [`3c282ef`](https://github.com/apache/spark/commit/3c282ef85acf80b1fb2507d75c1a2ad585efe115).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95663/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r214653309
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala ---
@@ -38,6 +38,20 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT
*/
def outputSet: AttributeSet = AttributeSet(output)
+ /**
+ * Returns output attributes with provided names.
+ * The length of provided names should be the same of the length of [[output]].
+ */
+ def outputWithNames(names: Seq[String]): Seq[Attribute] = {
+ // Save the output attributes to a variable to avoid duplicated function calls.
+ val outputAttributes = output
+ assert(outputAttributes.length == names.length,
+ "The length of provided names doesn't match the length of output attributes.")
+ outputAttributes.zipWithIndex.map { case (element, index) =>
+ element.withName(names(index))
+ }
+ }
+
--- End diff --
I was thinking...
```
object FileFormatWriter {
...
// workaround: a helper function...
def outputWithNames(outputAttributes: Seq[Attribute], names: Seq[String]): Seq[Attribute] = {
assert(outputAttributes.length == names.length,
"The length of provided names doesn't match the length of output attributes.")
outputAttributes.zipWithIndex.map { case (element, index) =>
element.withName(names(index))
}
}
```
Then, in each callsite, just say `FileFormatWriter. outputWithNames(logicalPlan.output, names)`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r214609005
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ---
@@ -460,9 +460,9 @@ case class DataSource(
* @param mode The save mode for this writing.
* @param data The input query plan that produces the data to be written. Note that this plan
* is analyzed and optimized.
- * @param outputColumns The original output columns of the input query plan. The optimizer may not
- * preserve the output column's names' case, so we need this parameter
- * instead of `data.output`.
+ * @param outputColumnNames The original output column names of the input query plan. The
+ * optimizer may not preserve the output column's names' case, so we need
+ * this parameter instead of `data.output`.
--- End diff --
nit:
```
* @param outputColumnNames The original output column names of the input query plan. The
* optimizer may not preserve the output column's names' case, so we need
* this parameter instead of `data.output`.
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22320
**[Test build #95627 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95627/testReport)** for PR 22320 at commit [`3c282ef`](https://github.com/apache/spark/commit/3c282ef85acf80b1fb2507d75c1a2ad585efe115).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r214646343
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala ---
@@ -38,6 +38,20 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT
*/
def outputSet: AttributeSet = AttributeSet(output)
+ /**
+ * Returns output attributes with provided names.
+ * The length of provided names should be the same of the length of [[output]].
+ */
+ def outputWithNames(names: Seq[String]): Seq[Attribute] = {
+ // Save the output attributes to a variable to avoid duplicated function calls.
+ val outputAttributes = output
+ assert(outputAttributes.length == names.length,
+ "The length of provided names doesn't match the length of output attributes.")
+ outputAttributes.zipWithIndex.map { case (element, index) =>
+ element.withName(names(index))
+ }
+ }
+
--- End diff --
It seems overkill to add a function here. But in `FileFormatWriter` we can't not access `LogicalPlan` to get the attributes.
Another way is to put this method in a Util.
Do you have a good suggestion?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22320
**[Test build #95663 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95663/testReport)** for PR 22320 at commit [`3ca072d`](https://github.com/apache/spark/commit/3ca072d18474d1536c3ac729fe1e0b79cd855cca).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r214761843
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala ---
@@ -69,7 +69,7 @@ case class InsertIntoHiveTable(
query: LogicalPlan,
overwrite: Boolean,
ifPartitionNotExists: Boolean,
- outputColumns: Seq[Attribute]) extends SaveAsHiveFile {
+ outputColumnNames: Seq[String]) extends SaveAsHiveFile {
--- End diff --
thanks!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22320
thanks, merging to master!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95619/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r214655488
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala ---
@@ -38,6 +38,20 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT
*/
def outputSet: AttributeSet = AttributeSet(output)
+ /**
+ * Returns output attributes with provided names.
+ * The length of provided names should be the same of the length of [[output]].
+ */
+ def outputWithNames(names: Seq[String]): Seq[Attribute] = {
+ // Save the output attributes to a variable to avoid duplicated function calls.
+ val outputAttributes = output
+ assert(outputAttributes.length == names.length,
+ "The length of provided names doesn't match the length of output attributes.")
+ outputAttributes.zipWithIndex.map { case (element, index) =>
+ element.withName(names(index))
--- End diff --
`outputAttributes.zip(names).map { case (attr, outputName) => attr.withName(outputName) }`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r214751169
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala ---
@@ -805,6 +805,80 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext with Be
}
}
+ test("Insert overwrite table command should output correct schema: basic") {
+ withTable("tbl", "tbl2") {
+ withView("view1") {
+ val df = spark.range(10).toDF("id")
+ df.write.format("parquet").saveAsTable("tbl")
+ spark.sql("CREATE VIEW view1 AS SELECT id FROM tbl")
+ spark.sql("CREATE TABLE tbl2(ID long) USING parquet")
+ spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT ID FROM view1")
+ val identifier = TableIdentifier("tbl2", Some("default"))
+ val location = spark.sessionState.catalog.getTableMetadata(identifier).location.toString
+ val expectedSchema = StructType(Seq(StructField("ID", LongType, true)))
+ assert(spark.read.parquet(location).schema == expectedSchema)
+ checkAnswer(spark.table("tbl2"), df)
+ }
+ }
+ }
+
+ test("Insert overwrite table command should output correct schema: complex") {
+ withTable("tbl", "tbl2") {
+ withView("view1") {
+ val df = spark.range(10).map(x => (x, x.toInt, x.toInt)).toDF("col1", "col2", "col3")
+ df.write.format("parquet").saveAsTable("tbl")
+ spark.sql("CREATE VIEW view1 AS SELECT * FROM tbl")
+ spark.sql("CREATE TABLE tbl2(COL1 long, COL2 int, COL3 int) USING parquet PARTITIONED " +
+ "BY (COL2) CLUSTERED BY (COL3) INTO 3 BUCKETS")
+ spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT COL1, COL2, COL3 FROM view1")
+ val identifier = TableIdentifier("tbl2", Some("default"))
--- End diff --
Same as above.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r214644907
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala ---
@@ -38,6 +38,20 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT
*/
def outputSet: AttributeSet = AttributeSet(output)
+ /**
+ * Returns output attributes with provided names.
+ * The length of provided names should be the same of the length of [[output]].
+ */
+ def outputWithNames(names: Seq[String]): Seq[Attribute] = {
+ // Save the output attributes to a variable to avoid duplicated function calls.
+ val outputAttributes = output
+ assert(outputAttributes.length == names.length,
+ "The length of provided names doesn't match the length of output attributes.")
+ outputAttributes.zipWithIndex.map { case (element, index) =>
+ element.withName(names(index))
+ }
+ }
+
--- End diff --
or make it a util function
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r215213849
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala ---
@@ -805,6 +805,80 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext with Be
}
}
+ test("Insert overwrite table command should output correct schema: basic") {
+ withTable("tbl", "tbl2") {
+ withView("view1") {
+ val df = spark.range(10).toDF("id")
--- End diff --
"case sensitive"? How is so since Spark SQL is case-insensitive by default?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r214644583
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala ---
@@ -38,6 +38,20 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT
*/
def outputSet: AttributeSet = AttributeSet(output)
+ /**
+ * Returns output attributes with provided names.
+ * The length of provided names should be the same of the length of [[output]].
+ */
+ def outputWithNames(names: Seq[String]): Seq[Attribute] = {
+ // Save the output attributes to a variable to avoid duplicated function calls.
+ val outputAttributes = output
+ assert(outputAttributes.length == names.length,
+ "The length of provided names doesn't match the length of output attributes.")
+ outputAttributes.zipWithIndex.map { case (element, index) =>
+ element.withName(names(index))
+ }
+ }
+
--- End diff --
+1
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r214828936
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala ---
@@ -63,13 +63,14 @@ case class CreateHiveTableAsSelectCommand(
query,
overwrite = false,
ifPartitionNotExists = false,
- outputColumns = outputColumns).run(sparkSession, child)
+ outputColumnNames = outputColumnNames).run(sparkSession, child)
} else {
// TODO ideally, we should get the output data ready first and then
// add the relation into catalog, just in case of failure occurs while data
// processing.
assert(tableDesc.schema.isEmpty)
- catalog.createTable(tableDesc.copy(schema = query.schema), ignoreIfExists = false)
+ val schema = DataWritingCommand.logicalPlanSchemaWithNames(query, outputColumnNames)
+ catalog.createTable(tableDesc.copy(schema = schema), ignoreIfExists = false)
--- End diff --
The schema naming need to be consistent with `outputColumnNames` here.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r215214259
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala ---
@@ -82,7 +83,7 @@ case class CreateHiveTableAsSelectCommand(
query,
overwrite = true,
ifPartitionNotExists = false,
- outputColumns = outputColumns).run(sparkSession, child)
+ outputColumnNames = outputColumnNames).run(sparkSession, child)
--- End diff --
Why is this duplication needed here?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2853/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95633/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95657/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r214671722
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala ---
@@ -69,7 +69,7 @@ case class InsertIntoHiveTable(
query: LogicalPlan,
overwrite: Boolean,
ifPartitionNotExists: Boolean,
- outputColumns: Seq[Attribute]) extends SaveAsHiveFile {
+ outputColumnNames: Seq[String]) extends SaveAsHiveFile {
--- End diff --
For better test coverage, can you add tests for hive tables?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r214751748
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala ---
@@ -63,7 +63,7 @@ case class CreateHiveTableAsSelectCommand(
query,
overwrite = false,
ifPartitionNotExists = false,
- outputColumns = outputColumns).run(sparkSession, child)
+ outputColumnNames = outputColumnNames).run(sparkSession, child)
--- End diff --
Can you remove one `outputColumnNames`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r214751023
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala ---
@@ -805,6 +805,80 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext with Be
}
}
+ test("Insert overwrite table command should output correct schema: basic") {
+ withTable("tbl", "tbl2") {
+ withView("view1") {
+ val df = spark.range(10).toDF("id")
+ df.write.format("parquet").saveAsTable("tbl")
+ spark.sql("CREATE VIEW view1 AS SELECT id FROM tbl")
+ spark.sql("CREATE TABLE tbl2(ID long) USING parquet")
+ spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT ID FROM view1")
+ val identifier = TableIdentifier("tbl2", Some("default"))
--- End diff --
`default` is the default database name, isn't it? I'd remove it from the test or use `spark.catalog.currentDatabase`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22320
**[Test build #95633 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95633/testReport)** for PR 22320 at commit [`98bf027`](https://github.com/apache/spark/commit/98bf027df9c4467adc2673097c6762a3ec5210ce).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95692/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95627/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r215246692
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala ---
@@ -805,6 +805,80 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext with Be
}
}
+ test("Insert overwrite table command should output correct schema: basic") {
+ withTable("tbl", "tbl2") {
+ withView("view1") {
+ val df = spark.range(10).toDF("id")
--- End diff --
I think @gengliangwang meant case preserving, which is the behavior we are testing against.
`spark.range(10).toDF("id")` is same as `spark.range(10)`, it's just clearer to people who don't know `spark.range` outputs a single column named "id".
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r214658233
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala ---
@@ -53,3 +57,21 @@ trait DataWritingCommand extends Command {
def run(sparkSession: SparkSession, child: SparkPlan): Seq[Row]
}
+
+object DataWritingCommand {
+ /**
+ * Returns output attributes with provided names.
+ * The length of provided names should be the same of the length of [[LogicalPlan.output]].
+ */
+ def logicalPlanOutputWithNames(
+ query: LogicalPlan,
+ names: Seq[String]): Seq[Attribute] = {
+ // Save the output attributes to a variable to avoid duplicated function calls.
+ val outputAttributes = query.output
--- End diff --
`query: LogicalPlan` -> `outputAttributes: Seq[Attribute]` in the function argument, then drop the line above?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2822/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r214722461
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala ---
@@ -754,6 +754,47 @@ class HiveDDLSuite
}
}
+ test("Insert overwrite Hive table should output correct schema") {
+ withTable("tbl", "tbl2") {
+ withView("view1") {
+ spark.sql("CREATE TABLE tbl(id long)")
--- End diff --
please run this test within `withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET -> false)`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22320
**[Test build #95663 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95663/testReport)** for PR 22320 at commit [`3ca072d`](https://github.com/apache/spark/commit/3ca072d18474d1536c3ac729fe1e0b79cd855cca).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/22320
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22320
**[Test build #95619 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95619/testReport)** for PR 22320 at commit [`5bce8a0`](https://github.com/apache/spark/commit/5bce8a0f325eed4c37687dab98b707c46ee4f50e).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r214786494
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala ---
@@ -754,6 +754,47 @@ class HiveDDLSuite
}
}
+ test("Insert overwrite Hive table should output correct schema") {
+ withTable("tbl", "tbl2") {
+ withView("view1") {
+ spark.sql("CREATE TABLE tbl(id long)")
+ spark.sql("INSERT OVERWRITE TABLE tbl SELECT 4")
+ spark.sql("CREATE VIEW view1 AS SELECT id FROM tbl")
+ spark.sql("CREATE TABLE tbl2(ID long)")
+ spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT ID FROM view1")
+ checkAnswer(spark.table("tbl2"), Seq(Row(4)))
--- End diff --
Add schema assert please. We can read data since [SPARK-25132](https://issues.apache.org/jira/browse/SPARK-25132).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r215215098
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala ---
@@ -754,6 +754,54 @@ class HiveDDLSuite
}
}
+ test("Insert overwrite Hive table should output correct schema") {
+ withSQLConf(CONVERT_METASTORE_PARQUET.key -> "false") {
+ withTable("tbl", "tbl2") {
+ withView("view1") {
+ spark.sql("CREATE TABLE tbl(id long)")
+ spark.sql("INSERT OVERWRITE TABLE tbl VALUES 4")
--- End diff --
I might be missing something, but why does this test use SQL statements not DataFrameWriter API, e.g. `Seq(4).toDF("id").write.mode(SaveMode.Overwrite).saveAsTable("tbl")`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r214778523
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala ---
@@ -805,6 +805,80 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext with Be
}
}
+ test("Insert overwrite table command should output correct schema: basic") {
+ withTable("tbl", "tbl2") {
+ withView("view1") {
+ val df = spark.range(10).toDF("id")
--- End diff --
This is trivial...As the column name `id` is case sensitive and used below, I would like to show it explicitly.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2784/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22320
**[Test build #95649 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95649/testReport)** for PR 22320 at commit [`538fea9`](https://github.com/apache/spark/commit/538fea99ed2158316d89f64ce397c4791fbed1f3).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95620/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on the issue:
https://github.com/apache/spark/pull/22320
@gengliangwang We need backport this pr to branch-2.3.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22320
**[Test build #95606 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95606/testReport)** for PR 22320 at commit [`bbd572c`](https://github.com/apache/spark/commit/bbd572c1fe542c6b2fd642212f927ba384c882e4).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r214751219
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala ---
@@ -805,6 +805,80 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext with Be
}
}
+ test("Insert overwrite table command should output correct schema: basic") {
+ withTable("tbl", "tbl2") {
+ withView("view1") {
+ val df = spark.range(10).toDF("id")
+ df.write.format("parquet").saveAsTable("tbl")
+ spark.sql("CREATE VIEW view1 AS SELECT id FROM tbl")
+ spark.sql("CREATE TABLE tbl2(ID long) USING parquet")
+ spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT ID FROM view1")
+ val identifier = TableIdentifier("tbl2", Some("default"))
+ val location = spark.sessionState.catalog.getTableMetadata(identifier).location.toString
+ val expectedSchema = StructType(Seq(StructField("ID", LongType, true)))
+ assert(spark.read.parquet(location).schema == expectedSchema)
+ checkAnswer(spark.table("tbl2"), df)
+ }
+ }
+ }
+
+ test("Insert overwrite table command should output correct schema: complex") {
+ withTable("tbl", "tbl2") {
+ withView("view1") {
+ val df = spark.range(10).map(x => (x, x.toInt, x.toInt)).toDF("col1", "col2", "col3")
+ df.write.format("parquet").saveAsTable("tbl")
+ spark.sql("CREATE VIEW view1 AS SELECT * FROM tbl")
+ spark.sql("CREATE TABLE tbl2(COL1 long, COL2 int, COL3 int) USING parquet PARTITIONED " +
+ "BY (COL2) CLUSTERED BY (COL3) INTO 3 BUCKETS")
+ spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT COL1, COL2, COL3 FROM view1")
+ val identifier = TableIdentifier("tbl2", Some("default"))
+ val location = spark.sessionState.catalog.getTableMetadata(identifier).location.toString
+ val expectedSchema = StructType(Seq(
+ StructField("COL1", LongType, true),
--- End diff --
`nullable` is `true` by default.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22320
**[Test build #95620 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95620/testReport)** for PR 22320 at commit [`16bb457`](https://github.com/apache/spark/commit/16bb457828ff0284456ef4ef36a23384b4a74b6e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r214751309
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala ---
@@ -805,6 +805,80 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext with Be
}
}
+ test("Insert overwrite table command should output correct schema: basic") {
+ withTable("tbl", "tbl2") {
+ withView("view1") {
+ val df = spark.range(10).toDF("id")
+ df.write.format("parquet").saveAsTable("tbl")
+ spark.sql("CREATE VIEW view1 AS SELECT id FROM tbl")
+ spark.sql("CREATE TABLE tbl2(ID long) USING parquet")
+ spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT ID FROM view1")
+ val identifier = TableIdentifier("tbl2", Some("default"))
+ val location = spark.sessionState.catalog.getTableMetadata(identifier).location.toString
+ val expectedSchema = StructType(Seq(StructField("ID", LongType, true)))
+ assert(spark.read.parquet(location).schema == expectedSchema)
+ checkAnswer(spark.table("tbl2"), df)
+ }
+ }
+ }
+
+ test("Insert overwrite table command should output correct schema: complex") {
+ withTable("tbl", "tbl2") {
+ withView("view1") {
+ val df = spark.range(10).map(x => (x, x.toInt, x.toInt)).toDF("col1", "col2", "col3")
+ df.write.format("parquet").saveAsTable("tbl")
+ spark.sql("CREATE VIEW view1 AS SELECT * FROM tbl")
+ spark.sql("CREATE TABLE tbl2(COL1 long, COL2 int, COL3 int) USING parquet PARTITIONED " +
+ "BY (COL2) CLUSTERED BY (COL3) INTO 3 BUCKETS")
+ spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT COL1, COL2, COL3 FROM view1")
+ val identifier = TableIdentifier("tbl2", Some("default"))
+ val location = spark.sessionState.catalog.getTableMetadata(identifier).location.toString
+ val expectedSchema = StructType(Seq(
+ StructField("COL1", LongType, true),
+ StructField("COL3", IntegerType, true),
--- End diff --
You could use a little magic here: `$"COL1".int`
```
scala> $"COL1".int
res1: org.apache.spark.sql.types.StructField = StructField(COL1,IntegerType,true)
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95711/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22320
**[Test build #95711 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95711/testReport)** for PR 22320 at commit [`4590c98`](https://github.com/apache/spark/commit/4590c9837026e820d7d91300a7ab3f87a668755c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22320
LGTM except some minor comments
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22320
**[Test build #95606 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95606/testReport)** for PR 22320 at commit [`bbd572c`](https://github.com/apache/spark/commit/bbd572c1fe542c6b2fd642212f927ba384c882e4).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r214671466
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---
@@ -2853,6 +2854,81 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
}
}
+ test("Insert overwrite table command should output correct schema: basic") {
+ withTable("tbl", "tbl2") {
+ withView("view1") {
+ val df = spark.range(10).toDF("id")
+ df.write.format("parquet").saveAsTable("tbl")
+ spark.sql("CREATE VIEW view1 AS SELECT id FROM tbl")
+ spark.sql("CREATE TABLE tbl2(ID long) USING parquet")
+ spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT ID FROM view1")
+ val identifier = TableIdentifier("tbl2", Some("default"))
+ val location = spark.sessionState.catalog.getTableMetadata(identifier).location.toString
+ val expectedSchema = StructType(Seq(StructField("ID", LongType, true)))
+ assert(spark.read.parquet(location).schema == expectedSchema)
+ checkAnswer(spark.table("tbl2"), df)
+ }
+ }
+ }
+
+ test("Insert overwrite table command should output correct schema: complex") {
+ withTable("tbl", "tbl2") {
+ withView("view1") {
+ val df = spark.range(10).map(x => (x, x.toInt, x.toInt)).toDF("col1", "col2", "col3")
+ df.write.format("parquet").saveAsTable("tbl")
+ spark.sql("CREATE VIEW view1 AS SELECT * FROM tbl")
+ spark.sql("CREATE TABLE tbl2(COL1 long, COL2 int, COL3 int) USING parquet PARTITIONED " +
+ "BY (COL2) CLUSTERED BY (COL3) INTO 3 BUCKETS")
+ spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT COL1, COL2, COL3 " +
+ "FROM view1 CLUSTER BY COL3")
+ val identifier = TableIdentifier("tbl2", Some("default"))
+ val location = spark.sessionState.catalog.getTableMetadata(identifier).location.toString
+ val expectedSchema = StructType(Seq(
+ StructField("COL1", LongType, true),
+ StructField("COL3", IntegerType, true),
+ StructField("COL2", IntegerType, true)))
+ assert(spark.read.parquet(location).schema == expectedSchema)
+ checkAnswer(spark.table("tbl2"), df)
+ }
+ }
+ }
+
+ test("Create table as select command should output correct schema: basic") {
+ withTable("tbl", "tbl2") {
+ withView("view1") {
+ val df = spark.range(10).toDF("id")
+ df.write.format("parquet").saveAsTable("tbl")
+ spark.sql("CREATE VIEW view1 AS SELECT id FROM tbl")
+ spark.sql("CREATE TABLE tbl2 USING parquet AS SELECT ID FROM view1")
+ val identifier = TableIdentifier("tbl2", Some("default"))
+ val location = spark.sessionState.catalog.getTableMetadata(identifier).location.toString
+ val expectedSchema = StructType(Seq(StructField("ID", LongType, true)))
+ assert(spark.read.parquet(location).schema == expectedSchema)
+ checkAnswer(spark.table("tbl2"), df)
+ }
+ }
+ }
+
+ test("Create table as select command should output correct schema: complex") {
+ withTable("tbl", "tbl2") {
+ withView("view1") {
+ val df = spark.range(10).map(x => (x, x.toInt, x.toInt)).toDF("col1", "col2", "col3")
+ df.write.format("parquet").saveAsTable("tbl")
+ spark.sql("CREATE VIEW view1 AS SELECT * FROM tbl")
+ spark.sql("CREATE TABLE tbl2 USING parquet PARTITIONED BY (COL2) " +
+ "CLUSTERED BY (COL3) INTO 3 BUCKETS AS SELECT COL1, COL2, COL3 FROM view1")
+ val identifier = TableIdentifier("tbl2", Some("default"))
+ val location = spark.sessionState.catalog.getTableMetadata(identifier).location.toString
+ val expectedSchema = StructType(Seq(
+ StructField("COL1", LongType, true),
+ StructField("COL3", IntegerType, true),
+ StructField("COL2", IntegerType, true)))
+ assert(spark.read.parquet(location).schema == expectedSchema)
+ checkAnswer(spark.table("tbl2"), df)
+ }
+ }
+ }
+
--- End diff --
better to move these tests into `DataFrameReaderWriterSuite`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2800/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r214750815
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala ---
@@ -805,6 +805,80 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext with Be
}
}
+ test("Insert overwrite table command should output correct schema: basic") {
+ withTable("tbl", "tbl2") {
+ withView("view1") {
+ val df = spark.range(10).toDF("id")
--- End diff --
Why is `toDF("id")` required? Why not `spark.range(10)` alone?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r214721624
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---
@@ -24,6 +24,7 @@ import java.util.concurrent.atomic.AtomicBoolean
import org.apache.spark.{AccumulatorSuite, SparkException}
import org.apache.spark.scheduler.{SparkListener, SparkListenerJobStart}
+import org.apache.spark.sql.catalyst.TableIdentifier
--- End diff --
unnecessary change
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2830/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r215376132
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala ---
@@ -82,7 +83,7 @@ case class CreateHiveTableAsSelectCommand(
query,
overwrite = true,
ifPartitionNotExists = false,
- outputColumns = outputColumns).run(sparkSession, child)
+ outputColumnNames = outputColumnNames).run(sparkSession, child)
--- End diff --
`outputColumnNames` themselves. Specyfing `outputColumnNames` as the name of the property to set using `outputColumnNames` does nothing but introduces a duplication. If you removed one `outputColumnNames` the comprehension should not be lowered whatsoever, shouldn't it?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on the issue:
https://github.com/apache/spark/pull/22320
retest this please.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22320
**[Test build #95692 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95692/testReport)** for PR 22320 at commit [`4590c98`](https://github.com/apache/spark/commit/4590c9837026e820d7d91300a7ab3f87a668755c).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22320
**[Test build #95692 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95692/testReport)** for PR 22320 at commit [`4590c98`](https://github.com/apache/spark/commit/4590c9837026e820d7d91300a7ab3f87a668755c).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22320
**[Test build #95609 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95609/testReport)** for PR 22320 at commit [`bbd572c`](https://github.com/apache/spark/commit/bbd572c1fe542c6b2fd642212f927ba384c882e4).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/22320
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95606/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22320
**[Test build #95649 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95649/testReport)** for PR 22320 at commit [`538fea9`](https://github.com/apache/spark/commit/538fea99ed2158316d89f64ce397c4791fbed1f3).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22320
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2792/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22320
**[Test build #95619 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95619/testReport)** for PR 22320 at commit [`5bce8a0`](https://github.com/apache/spark/commit/5bce8a0f325eed4c37687dab98b707c46ee4f50e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on the issue:
https://github.com/apache/spark/pull/22320
@wangyum @cloud-fan @maropu
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22320
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r214697039
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala ---
@@ -53,3 +57,21 @@ trait DataWritingCommand extends Command {
def run(sparkSession: SparkSession, child: SparkPlan): Seq[Row]
}
+
+object DataWritingCommand {
+ /**
+ * Returns output attributes with provided names.
+ * The length of provided names should be the same of the length of [[LogicalPlan.output]].
+ */
+ def logicalPlanOutputWithNames(
+ query: LogicalPlan,
+ names: Seq[String]): Seq[Attribute] = {
+ // Save the output attributes to a variable to avoid duplicated function calls.
+ val outputAttributes = query.output
--- End diff --
I think both are OK. The current way makes it easier to call this Util function, while the ways you suggests makes the argument carrying minimal information.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org