You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2016/05/05 15:37:21 UTC
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
GitHub user cloud-fan opened a pull request:
https://github.com/apache/spark/pull/12935
[SPARK-15160][SQL] support data source table in InMemoryCatalog
## What changes were proposed in this pull request?
This PR adds a new rule to convert `SimpleCatalogRelation` to data source table if its table property contains data source information.
## How was this patch tested?
new test in SQLQuerySuite
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cloud-fan/spark ds-table
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/12935.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #12935
----
commit c22bb5a158717756d8ec63f4b175f65dd2c005b2
Author: Wenchen Fan <we...@databricks.com>
Date: 2016-05-05T15:34:18Z
support data source table in InMemoryCatalog
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12935#discussion_r62264492
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala ---
@@ -77,6 +78,22 @@ private[sql] object DataSourceAnalysis extends Rule[LogicalPlan] {
}
}
+
+/**
+ * Replaces [[SimpleCatalogRelation]] with data source table if its table property contains data
+ * source information.
+ */
+private[sql] class FindDataSourceTable(sparkSession: SparkSession) extends Rule[LogicalPlan] {
+ override def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case i @ logical.InsertIntoTable(s: SimpleCatalogRelation, _, _, _, _) =>
--- End diff --
hm actually I can't find an instance of `InsertIntoTable` where we would ever pass in a `SimpleCatalogRelation`. What command triggers this code path?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:
https://github.com/apache/spark/pull/12935#issuecomment-217777835
I'll update it after https://github.com/apache/spark/pull/12949
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12935#issuecomment-217214701
**[Test build #57895 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57895/consoleFull)** for PR 12935 at commit [`53e7d27`](https://github.com/apache/spark/commit/53e7d27351d38b22ac7a0428f6972ac7ad638998).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/12935#discussion_r62799542
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---
@@ -2480,4 +2480,20 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
Row(null) :: Nil
)
}
+
+ test("data source table created in InMemoryCatalog should be able to read/write") {
+ withTable("tbl") {
+ sql("CREATE TABLE tbl(i INT, j STRING) USING parquet")
+ checkAnswer(sql("SELECT * FROM tbl"), Nil)
+
+ Seq(1 -> "a", 2 -> "b").toDF("i", "j").write.mode("overwrite").insertInto("tbl")
--- End diff --
https://github.com/apache/spark/pull/13013 has updated the doc for insertInto and saveAsTable. The main difference between them is that `insertInto` use position-based column resolution (like SQL's insert into statement), while `saveAsTable` uses column-name-based column resolution (like applying a project on top of the data).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12935#issuecomment-217190030
**[Test build #57895 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57895/consoleFull)** for PR 12935 at commit [`53e7d27`](https://github.com/apache/spark/commit/53e7d27351d38b22ac7a0428f6972ac7ad638998).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12935#issuecomment-217214962
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/12935#discussion_r62910910
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---
@@ -2480,4 +2480,20 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
Row(null) :: Nil
)
}
+
+ test("data source table created in InMemoryCatalog should be able to read/write") {
+ withTable("tbl") {
+ sql("CREATE TABLE tbl(i INT, j STRING) USING parquet")
+ checkAnswer(sql("SELECT * FROM tbl"), Nil)
+
+ Seq(1 -> "a", 2 -> "b").toDF("i", "j").write.mode("overwrite").insertInto("tbl")
+ checkAnswer(sql("SELECT * FROM tbl"), Row(1, "a") :: Row(2, "b") :: Nil)
--- End diff --
Let's avoid of using `select *` in our test.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12935#issuecomment-217214964
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57895/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/12935#issuecomment-218659875
LGTM pending jenkins
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12935#issuecomment-218391423
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12935#issuecomment-218655534
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12935#issuecomment-218669521
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58443/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/12935#issuecomment-217309280
Looks good. The only thing is that the write path seems to be already handled even before your patch. Is that true? I tried removing the `InsertIntoTable` check in `FindDataSourceTable` and things still work. I don't think that code path is used anywhere.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12935#issuecomment-218645401
**[Test build #58434 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58434/consoleFull)** for PR 12935 at commit [`1018537`](https://github.com/apache/spark/commit/1018537aae135d3983577becba2f330cf34d42b4).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12935#issuecomment-218375374
**[Test build #58338 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58338/consoleFull)** for PR 12935 at commit [`98cea9a`](https://github.com/apache/spark/commit/98cea9a1eab09d55f2f1bb8488bcbd0ec11dea03).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12935#issuecomment-218391424
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58338/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/12935#discussion_r62802229
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -492,33 +493,28 @@ private[sql] object DDLUtils {
table.properties.contains("spark.sql.sources.schema.numPartCols")
}
- def getSchemaFromTableProperties(metadata: CatalogTable): Option[StructType] = {
--- End diff --
Looks good.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12935#issuecomment-218645973
**[Test build #58436 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58436/consoleFull)** for PR 12935 at commit [`ef6399a`](https://github.com/apache/spark/commit/ef6399a14ddca5e02a7b5292960e45a5cc75d6a2).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12935#issuecomment-218391185
**[Test build #58338 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58338/consoleFull)** for PR 12935 at commit [`98cea9a`](https://github.com/apache/spark/commit/98cea9a1eab09d55f2f1bb8488bcbd0ec11dea03).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12935#discussion_r62252487
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala ---
@@ -193,6 +193,10 @@ case class CreateDataSourceTableAsSelectCommand(
sessionState.catalog.lookupRelation(tableIdent)) match {
case l @ LogicalRelation(_: InsertableRelation | _: HadoopFsRelation, _, _) =>
existingSchema = Some(l.schema)
+ case s: SimpleCatalogRelation
+ if s.metadata.properties.contains("spark.sql.sources.provider") =>
--- End diff --
might be good to add a method `CatalogTable.isDatasourceTable` to avoid duplicating this config
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/12935#discussion_r62276570
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala ---
@@ -114,6 +110,7 @@ private[sql] class SessionState(sparkSession: SparkSession) {
new Analyzer(catalog, conf) {
override val extendedResolutionRules =
PreInsertCastAndRename ::
+ new FindDataSourceTable(sparkSession) ::
--- End diff --
no we don't. `HiveSessionCatalog` can handle data source table already.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12935#issuecomment-218654845
**[Test build #58434 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58434/consoleFull)** for PR 12935 at commit [`1018537`](https://github.com/apache/spark/commit/1018537aae135d3983577becba2f330cf34d42b4).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12935#issuecomment-218654924
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12935#issuecomment-218659457
**[Test build #58443 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58443/consoleFull)** for PR 12935 at commit [`ef6399a`](https://github.com/apache/spark/commit/ef6399a14ddca5e02a7b5292960e45a5cc75d6a2).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:
https://github.com/apache/spark/pull/12935#issuecomment-217318371
Oh, It was used when we call `df.write.insertInto`, but then I found there is a bug and it doesn't work under hive either. So I change to use `df.write.saveAsTable` and that code path is not used now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/12935#issuecomment-218565649
https://github.com/apache/spark/pull/12935/files#r62910625 is my comment. Other parts look good.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12935#issuecomment-218654926
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58434/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/12935#issuecomment-218659370
test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12935#issuecomment-218669520
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/12935
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/12935#discussion_r62798256
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---
@@ -2480,4 +2480,20 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
Row(null) :: Nil
)
}
+
+ test("data source table created in InMemoryCatalog should be able to read/write") {
+ withTable("tbl") {
+ sql("CREATE TABLE tbl(i INT, j STRING) USING parquet")
+ checkAnswer(sql("SELECT * FROM tbl"), Nil)
+
+ Seq(1 -> "a", 2 -> "b").toDF("i", "j").write.mode("overwrite").insertInto("tbl")
--- End diff --
cc @yhuai , do you know what's the difference between `insertInto` and `saveAsTable`?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12935#issuecomment-218669394
**[Test build #58443 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58443/consoleFull)** for PR 12935 at commit [`ef6399a`](https://github.com/apache/spark/commit/ef6399a14ddca5e02a7b5292960e45a5cc75d6a2).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12935#discussion_r62251940
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala ---
@@ -77,6 +78,22 @@ private[sql] object DataSourceAnalysis extends Rule[LogicalPlan] {
}
}
+
+/**
+ * Replaces [[SimpleCatalogRelation]] with data source table if its table property contains data
+ * source information.
+ */
+private[sql] class FindDataSourceTable(sparkSession: SparkSession) extends Rule[LogicalPlan] {
+ override def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case i @ logical.InsertIntoTable(s: SimpleCatalogRelation, _, _, _, _) =>
--- End diff --
don't we need to check whether this is a datasource table here?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/12935#discussion_r62798178
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -492,33 +493,28 @@ private[sql] object DDLUtils {
table.properties.contains("spark.sql.sources.schema.numPartCols")
}
- def getSchemaFromTableProperties(metadata: CatalogTable): Option[StructType] = {
--- End diff --
did some simplification for this util methods, cc @liancheng to take a look.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12935#discussion_r62254434
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala ---
@@ -114,6 +110,7 @@ private[sql] class SessionState(sparkSession: SparkSession) {
new Analyzer(catalog, conf) {
override val extendedResolutionRules =
PreInsertCastAndRename ::
+ new FindDataSourceTable(sparkSession) ::
--- End diff --
don't we also need to add this to `HiveSessionState`?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/12935#discussion_r62910625
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -71,66 +71,7 @@ private[hive] class HiveMetastoreCatalog(sparkSession: SparkSession) extends Log
override def load(in: QualifiedTableName): LogicalPlan = {
logDebug(s"Creating new cached data source for $in")
val table = client.getTable(in.database, in.name)
-
- def schemaStringFromParts: Option[String] = {
- table.properties.get("spark.sql.sources.schema.numParts").map { numParts =>
- val parts = (0 until numParts.toInt).map { index =>
- val part = table.properties.get(s"spark.sql.sources.schema.part.$index").orNull
- if (part == null) {
- throw new AnalysisException(
- "Could not read schema from the metastore because it is corrupted " +
- s"(missing part $index of the schema, $numParts parts are expected).")
- }
-
- part
- }
- // Stick all parts back to a single schema string.
- parts.mkString
- }
- }
-
- def getColumnNames(colType: String): Seq[String] = {
- table.properties.get(s"spark.sql.sources.schema.num${colType.capitalize}Cols").map {
- numCols => (0 until numCols.toInt).map { index =>
- table.properties.getOrElse(s"spark.sql.sources.schema.${colType}Col.$index",
- throw new AnalysisException(
- s"Could not read $colType columns from the metastore because it is corrupted " +
- s"(missing part $index of it, $numCols parts are expected)."))
- }
- }.getOrElse(Nil)
- }
-
- // Originally, we used spark.sql.sources.schema to store the schema of a data source table.
- // After SPARK-6024, we removed this flag.
- // Although we are not using spark.sql.sources.schema any more, we need to still support.
- val schemaString =
- table.properties.get("spark.sql.sources.schema").orElse(schemaStringFromParts)
-
- val userSpecifiedSchema =
- schemaString.map(s => DataType.fromJson(s).asInstanceOf[StructType])
-
- // We only need names at here since userSpecifiedSchema we loaded from the metastore
- // contains partition columns. We can always get datatypes of partitioning columns
- // from userSpecifiedSchema.
- val partitionColumns = getColumnNames("part")
-
- val bucketSpec = table.properties.get("spark.sql.sources.schema.numBuckets").map { n =>
- BucketSpec(n.toInt, getColumnNames("bucket"), getColumnNames("sort"))
- }
-
- val options = table.storage.serdeProperties
- val dataSource =
- DataSource(
- sparkSession,
- userSpecifiedSchema = userSpecifiedSchema,
- partitionColumns = partitionColumns,
- bucketSpec = bucketSpec,
- className = table.properties("spark.sql.sources.provider"),
- options = options)
-
- LogicalRelation(
- dataSource.resolveRelation(),
- metastoreTableIdentifier = Some(TableIdentifier(in.name, Some(in.database))))
--- End diff --
I am very inclined to revert the changes in this file. This part of the code is on the critical path of resolving a table and the code we add is for a new code path and a command. It will be great if we can isolate these two parts for now. (we will remove HiveMetastoreCatalog later. We can do the cleanup this part. We can leave a comment at here)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12935#issuecomment-218655535
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58436/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12935#issuecomment-218655464
**[Test build #58436 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58436/consoleFull)** for PR 12935 at commit [`ef6399a`](https://github.com/apache/spark/commit/ef6399a14ddca5e02a7b5292960e45a5cc75d6a2).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/12935#issuecomment-218675018
Thanks. Merging to master and branch 2.0.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org