You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2016/05/05 15:37:21 UTC

[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/12935

    [SPARK-15160][SQL] support data source table in InMemoryCatalog

    ## What changes were proposed in this pull request?
    
    This PR adds a new rule to convert `SimpleCatalogRelation` to data source table if its table property contains data source information.
    
    
    ## How was this patch tested?
    
    new test in SQLQuerySuite


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark ds-table

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12935.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12935
    
----
commit c22bb5a158717756d8ec63f4b175f65dd2c005b2
Author: Wenchen Fan <we...@databricks.com>
Date:   2016-05-05T15:34:18Z

    support data source table in InMemoryCatalog

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12935#discussion_r62264492
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala ---
    @@ -77,6 +78,22 @@ private[sql] object DataSourceAnalysis extends Rule[LogicalPlan] {
       }
     }
     
    +
    +/**
    + * Replaces [[SimpleCatalogRelation]] with data source table if its table property contains data
    + * source information.
    + */
    +private[sql] class FindDataSourceTable(sparkSession: SparkSession) extends Rule[LogicalPlan] {
    +  override def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case i @ logical.InsertIntoTable(s: SimpleCatalogRelation, _, _, _, _) =>
    --- End diff --
    
    hm actually I can't find an instance of `InsertIntoTable` where we would ever pass in a `SimpleCatalogRelation`. What command triggers this code path?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/12935#issuecomment-217777835
  
    I'll update it after https://github.com/apache/spark/pull/12949


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12935#issuecomment-217214701
  
    **[Test build #57895 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57895/consoleFull)** for PR 12935 at commit [`53e7d27`](https://github.com/apache/spark/commit/53e7d27351d38b22ac7a0428f6972ac7ad638998).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12935#discussion_r62799542
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---
    @@ -2480,4 +2480,20 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
           Row(null) :: Nil
         )
       }
    +
    +  test("data source table created in InMemoryCatalog should be able to read/write") {
    +    withTable("tbl") {
    +      sql("CREATE TABLE tbl(i INT, j STRING) USING parquet")
    +      checkAnswer(sql("SELECT * FROM tbl"), Nil)
    +
    +      Seq(1 -> "a", 2 -> "b").toDF("i", "j").write.mode("overwrite").insertInto("tbl")
    --- End diff --
    
    https://github.com/apache/spark/pull/13013 has updated the doc for insertInto and saveAsTable. The main difference between them is that `insertInto` use position-based column resolution (like SQL's insert into statement), while `saveAsTable` uses column-name-based column resolution (like applying a project on top of the data).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12935#issuecomment-217190030
  
    **[Test build #57895 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57895/consoleFull)** for PR 12935 at commit [`53e7d27`](https://github.com/apache/spark/commit/53e7d27351d38b22ac7a0428f6972ac7ad638998).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12935#issuecomment-217214962
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12935#discussion_r62910910
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---
    @@ -2480,4 +2480,20 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
           Row(null) :: Nil
         )
       }
    +
    +  test("data source table created in InMemoryCatalog should be able to read/write") {
    +    withTable("tbl") {
    +      sql("CREATE TABLE tbl(i INT, j STRING) USING parquet")
    +      checkAnswer(sql("SELECT * FROM tbl"), Nil)
    +
    +      Seq(1 -> "a", 2 -> "b").toDF("i", "j").write.mode("overwrite").insertInto("tbl")
    +      checkAnswer(sql("SELECT * FROM tbl"), Row(1, "a") :: Row(2, "b") :: Nil)
    --- End diff --
    
    Let's avoid of using `select *` in our test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12935#issuecomment-217214964
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57895/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/12935#issuecomment-218659875
  
    LGTM pending jenkins


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12935#issuecomment-218391423
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12935#issuecomment-218655534
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12935#issuecomment-218669521
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58443/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/12935#issuecomment-217309280
  
    Looks good. The only thing is that the write path seems to be already handled even before your patch. Is that true? I tried removing the `InsertIntoTable` check in `FindDataSourceTable` and things still work. I don't think that code path is used anywhere.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12935#issuecomment-218645401
  
    **[Test build #58434 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58434/consoleFull)** for PR 12935 at commit [`1018537`](https://github.com/apache/spark/commit/1018537aae135d3983577becba2f330cf34d42b4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12935#issuecomment-218375374
  
    **[Test build #58338 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58338/consoleFull)** for PR 12935 at commit [`98cea9a`](https://github.com/apache/spark/commit/98cea9a1eab09d55f2f1bb8488bcbd0ec11dea03).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12935#issuecomment-218391424
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58338/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12935#discussion_r62802229
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -492,33 +493,28 @@ private[sql] object DDLUtils {
           table.properties.contains("spark.sql.sources.schema.numPartCols")
       }
     
    -  def getSchemaFromTableProperties(metadata: CatalogTable): Option[StructType] = {
    --- End diff --
    
    Looks good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12935#issuecomment-218645973
  
    **[Test build #58436 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58436/consoleFull)** for PR 12935 at commit [`ef6399a`](https://github.com/apache/spark/commit/ef6399a14ddca5e02a7b5292960e45a5cc75d6a2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12935#issuecomment-218391185
  
    **[Test build #58338 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58338/consoleFull)** for PR 12935 at commit [`98cea9a`](https://github.com/apache/spark/commit/98cea9a1eab09d55f2f1bb8488bcbd0ec11dea03).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12935#discussion_r62252487
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala ---
    @@ -193,6 +193,10 @@ case class CreateDataSourceTableAsSelectCommand(
                 sessionState.catalog.lookupRelation(tableIdent)) match {
                 case l @ LogicalRelation(_: InsertableRelation | _: HadoopFsRelation, _, _) =>
                   existingSchema = Some(l.schema)
    +            case s: SimpleCatalogRelation
    +                if s.metadata.properties.contains("spark.sql.sources.provider") =>
    --- End diff --
    
    might be good to add a method `CatalogTable.isDatasourceTable` to avoid duplicating this config


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12935#discussion_r62276570
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala ---
    @@ -114,6 +110,7 @@ private[sql] class SessionState(sparkSession: SparkSession) {
         new Analyzer(catalog, conf) {
           override val extendedResolutionRules =
             PreInsertCastAndRename ::
    +        new FindDataSourceTable(sparkSession) ::
    --- End diff --
    
    no we don't. `HiveSessionCatalog` can handle data source table already.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12935#issuecomment-218654845
  
    **[Test build #58434 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58434/consoleFull)** for PR 12935 at commit [`1018537`](https://github.com/apache/spark/commit/1018537aae135d3983577becba2f330cf34d42b4).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12935#issuecomment-218654924
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12935#issuecomment-218659457
  
    **[Test build #58443 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58443/consoleFull)** for PR 12935 at commit [`ef6399a`](https://github.com/apache/spark/commit/ef6399a14ddca5e02a7b5292960e45a5cc75d6a2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/12935#issuecomment-217318371
  
    Oh, It was used when we call `df.write.insertInto`, but then I found there is a bug and it doesn't work under hive either. So I change to use `df.write.saveAsTable` and that code path is not used now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/12935#issuecomment-218565649
  
    https://github.com/apache/spark/pull/12935/files#r62910625  is my comment. Other parts look good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12935#issuecomment-218654926
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58434/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/12935#issuecomment-218659370
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12935#issuecomment-218669520
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/12935


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12935#discussion_r62798256
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---
    @@ -2480,4 +2480,20 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
           Row(null) :: Nil
         )
       }
    +
    +  test("data source table created in InMemoryCatalog should be able to read/write") {
    +    withTable("tbl") {
    +      sql("CREATE TABLE tbl(i INT, j STRING) USING parquet")
    +      checkAnswer(sql("SELECT * FROM tbl"), Nil)
    +
    +      Seq(1 -> "a", 2 -> "b").toDF("i", "j").write.mode("overwrite").insertInto("tbl")
    --- End diff --
    
    cc @yhuai , do you know what's the difference between `insertInto` and `saveAsTable`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12935#issuecomment-218669394
  
    **[Test build #58443 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58443/consoleFull)** for PR 12935 at commit [`ef6399a`](https://github.com/apache/spark/commit/ef6399a14ddca5e02a7b5292960e45a5cc75d6a2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12935#discussion_r62251940
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala ---
    @@ -77,6 +78,22 @@ private[sql] object DataSourceAnalysis extends Rule[LogicalPlan] {
       }
     }
     
    +
    +/**
    + * Replaces [[SimpleCatalogRelation]] with data source table if its table property contains data
    + * source information.
    + */
    +private[sql] class FindDataSourceTable(sparkSession: SparkSession) extends Rule[LogicalPlan] {
    +  override def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case i @ logical.InsertIntoTable(s: SimpleCatalogRelation, _, _, _, _) =>
    --- End diff --
    
    don't we need to check whether this is a datasource table here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12935#discussion_r62798178
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -492,33 +493,28 @@ private[sql] object DDLUtils {
           table.properties.contains("spark.sql.sources.schema.numPartCols")
       }
     
    -  def getSchemaFromTableProperties(metadata: CatalogTable): Option[StructType] = {
    --- End diff --
    
    did some simplification for this util methods, cc @liancheng to take a look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12935#discussion_r62254434
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala ---
    @@ -114,6 +110,7 @@ private[sql] class SessionState(sparkSession: SparkSession) {
         new Analyzer(catalog, conf) {
           override val extendedResolutionRules =
             PreInsertCastAndRename ::
    +        new FindDataSourceTable(sparkSession) ::
    --- End diff --
    
    don't we also need to add this to `HiveSessionState`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12935#discussion_r62910625
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
    @@ -71,66 +71,7 @@ private[hive] class HiveMetastoreCatalog(sparkSession: SparkSession) extends Log
           override def load(in: QualifiedTableName): LogicalPlan = {
             logDebug(s"Creating new cached data source for $in")
             val table = client.getTable(in.database, in.name)
    -
    -        def schemaStringFromParts: Option[String] = {
    -          table.properties.get("spark.sql.sources.schema.numParts").map { numParts =>
    -            val parts = (0 until numParts.toInt).map { index =>
    -              val part = table.properties.get(s"spark.sql.sources.schema.part.$index").orNull
    -              if (part == null) {
    -                throw new AnalysisException(
    -                  "Could not read schema from the metastore because it is corrupted " +
    -                    s"(missing part $index of the schema, $numParts parts are expected).")
    -              }
    -
    -              part
    -            }
    -            // Stick all parts back to a single schema string.
    -            parts.mkString
    -          }
    -        }
    -
    -        def getColumnNames(colType: String): Seq[String] = {
    -          table.properties.get(s"spark.sql.sources.schema.num${colType.capitalize}Cols").map {
    -            numCols => (0 until numCols.toInt).map { index =>
    -              table.properties.getOrElse(s"spark.sql.sources.schema.${colType}Col.$index",
    -                throw new AnalysisException(
    -                  s"Could not read $colType columns from the metastore because it is corrupted " +
    -                    s"(missing part $index of it, $numCols parts are expected)."))
    -            }
    -          }.getOrElse(Nil)
    -        }
    -
    -        // Originally, we used spark.sql.sources.schema to store the schema of a data source table.
    -        // After SPARK-6024, we removed this flag.
    -        // Although we are not using spark.sql.sources.schema any more, we need to still support.
    -        val schemaString =
    -          table.properties.get("spark.sql.sources.schema").orElse(schemaStringFromParts)
    -
    -        val userSpecifiedSchema =
    -          schemaString.map(s => DataType.fromJson(s).asInstanceOf[StructType])
    -
    -        // We only need names at here since userSpecifiedSchema we loaded from the metastore
    -        // contains partition columns. We can always get datatypes of partitioning columns
    -        // from userSpecifiedSchema.
    -        val partitionColumns = getColumnNames("part")
    -
    -        val bucketSpec = table.properties.get("spark.sql.sources.schema.numBuckets").map { n =>
    -          BucketSpec(n.toInt, getColumnNames("bucket"), getColumnNames("sort"))
    -        }
    -
    -        val options = table.storage.serdeProperties
    -        val dataSource =
    -          DataSource(
    -            sparkSession,
    -            userSpecifiedSchema = userSpecifiedSchema,
    -            partitionColumns = partitionColumns,
    -            bucketSpec = bucketSpec,
    -            className = table.properties("spark.sql.sources.provider"),
    -            options = options)
    -
    -        LogicalRelation(
    -          dataSource.resolveRelation(),
    -          metastoreTableIdentifier = Some(TableIdentifier(in.name, Some(in.database))))
    --- End diff --
    
    I am very inclined to revert the changes in this file. This part of the code is on the critical path of resolving a table and the code we add is for a new code path and a command. It will be great if we can isolate these two parts for now. (we will remove HiveMetastoreCatalog later. We can do the cleanup this part. We can leave a comment at here)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12935#issuecomment-218655535
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58436/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12935#issuecomment-218655464
  
    **[Test build #58436 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58436/consoleFull)** for PR 12935 at commit [`ef6399a`](https://github.com/apache/spark/commit/ef6399a14ddca5e02a7b5292960e45a5cc75d6a2).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/12935#issuecomment-218675018
  
    Thanks. Merging to master and branch 2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org