You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by wuxianxingkong <gi...@git.apache.org> on 2016/07/14 02:24:23 UTC

[GitHub] spark pull request #14191: [SPARK-16217][SQL] Support SELECT INTO statement

GitHub user wuxianxingkong opened a pull request:

    https://github.com/apache/spark/pull/14191

    [SPARK-16217][SQL] Support SELECT INTO statement

    ## What changes were proposed in this pull request?
    
    This PR implements the *SELECT INTO* statement.
    
    The *SELECT INTO* statement selects data from one table and inserts it into a new table as follows.
    
    	SELECT column_name(s)
    	INTO newtable
    	FROM table1;
    
    This statement is commonly used in SQL but not currently supported in SparkSQL.
    We investigated the Catalyst and found that this statement can be implemented by improving the grammar and reusing the logical plan of *CTAS*.
    
    The related JIRA is https://issues.apache.org/jira/browse/SPARK-16217
    
    
    ## How was this patch tested?
    
    SQLQuerySuite.
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wuxianxingkong/spark select_into

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14191.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14191
    
----
commit 605634deb779a0cf0eaece8420692d9bf44dab64
Author: cuiguangfan <73...@qq.com>
Date:   2016-07-12T13:16:43Z

    SELECT INTO Implements

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14191: [SPARK-16217][SQL] Support SELECT INTO statement

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14191#discussion_r70748124
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ---
    @@ -1755,4 +1755,97 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
           }
         }
       }
    +
    +  test("select into(check relation)") {
    +    val originalConf = sessionState.conf.convertCTAS
    +
    +    setConf(SQLConf.CONVERT_CTAS, true)
    --- End diff --
    
    ```
    withSQLConf(SQLConf. CONVERT_CTAS.key -> "true") {
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14191: [SPARK-16217][SQL] Support SELECT INTO statement

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14191#discussion_r70801792
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
    @@ -1376,4 +1376,62 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder {
           reader, writer,
           schemaLess)
       }
    +
    +  /**
    +   * Reuse CTAS, convert select into to CTAS,
    +   * returning [[CreateHiveTableAsSelectLogicalPlan]].
    +   * The SELECT INTO statement selects data from one table
    +   * and inserts it into a new table.It is commonly used to
    +   * create a backup copy for table or selected records.
    +   *
    +   * Expected format:
    +   * {{{
    +   *   SELECT column_name(s)
    +   *   INTO new_table
    +   *   FROM old_table
    +   *   ...
    +   * }}}
    +   */
    +  override protected def withSelectInto(
    --- End diff --
    
    The code below is duplicates. Why are we not using the existing CTAS code path? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14191: [SPARK-16217][SQL] Support SELECT INTO statement

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14191
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14191: [SPARK-16217][SQL] Support SELECT INTO statement

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14191
  
    @wuxianxingkong Are you still working on this? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14191: [SPARK-16217][SQL] Support SELECT INTO statement

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14191
  
    We are closing it due to inactivity. please do reopen if you want to push it forward. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14191: [SPARK-16217][SQL] Support SELECT INTO statement

Posted by wuxianxingkong <gi...@git.apache.org>.

Github user wuxianxingkong commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14191#discussion_r71067768
  
    --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
    @@ -338,7 +338,7 @@ querySpecification
            (RECORDREADER recordReader=STRING)?
            fromClause?
            (WHERE where=booleanExpression)?)
    -    | ((kind=SELECT setQuantifier? namedExpressionSeq fromClause?
    +    | ((kind=SELECT setQuantifier? namedExpressionSeq (intoClause? fromClause)?
    --- End diff --
    
    At first, I modify grammar:
    ![wrong select into](https://raw.githubusercontent.com/wuxianxingkong/storage/master/original_select_into.png)
    But it will affect multiInsertQueryBody rule, i.e.:
    ```sql
    FROM OLD_TABLE
    INSERT INTO T1
    SELECT C1
    INSERT INTO T2
    SELECT C2
    ```
    The Syntax tree before adding intoClause is:
    ![right tree structure](https://raw.githubusercontent.com/wuxianxingkong/storage/master/original_muliinsert_tree.png)
    After adding intoClause ,the tree will be:
    ![wrong tree structure](https://raw.githubusercontent.com/wuxianxingkong/storage/master/wrong_multiinserttree.png) This is because INSERT is a nonreserved keyword and matching strategy of antlr.
    One of the ways I can think of is to change grammar like this:
    ![one way](https://raw.githubusercontent.com/wuxianxingkong/storage/master/solve_way.png)
    This can solve the problem because antlr parser chooses the alternative specified first.
    By the way, the grammar now can support "SELECT 1 INTO newtable" now.
    But this will cause confusion about querySpecification rule because of the duplication. Is there any way to solve this problem?Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14191: [SPARK-16217][SQL] Support SELECT INTO statement

Posted by wuxianxingkong <gi...@git.apache.org>.

Github user wuxianxingkong commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14191#discussion_r70935031
  
    --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
    @@ -338,7 +338,7 @@ querySpecification
            (RECORDREADER recordReader=STRING)?
            fromClause?
            (WHERE where=booleanExpression)?)
    -    | ((kind=SELECT setQuantifier? namedExpressionSeq fromClause?
    +    | ((kind=SELECT setQuantifier? namedExpressionSeq (intoClause? fromClause)?
    --- End diff --
    
    ```sql
    SELECT 1 
    INTO newtable
    ```
    This won't work because we need oldtable info to create newtable. So the sql should be
    ```sql
    SELECT 1
    INTO newtable 
    FROM oldtable
    ```
    The result from my test is: a new table called newtable was created, one column called 1 has the length of oldtable.rows.length and all elements are 1.
    Did you mean there is no _FROM_?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14191: [SPARK-16217][SQL] Support SELECT INTO statement

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14191#discussion_r70748266
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ---
    @@ -1755,4 +1755,97 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
           }
         }
       }
    +
    +  test("select into(check relation)") {
    +    val originalConf = sessionState.conf.convertCTAS
    +
    +    setConf(SQLConf.CONVERT_CTAS, true)
    +
    +    val defaultDataSource = sessionState.conf.defaultDataSourceName
    +    try {
    +      sql("DROP TABLE IF EXISTS si1")
    +      sql("SELECT key, value INTO si1 FROM src ORDER BY key, value")
    +      val message = intercept[AnalysisException] {
    +        sql("SELECT key, value INTO si1 FROM src ORDER BY key, value")
    +      }.getMessage
    +      assert(message.contains("already exists"))
    +      checkRelation("si1", true, defaultDataSource)
    +      sql("DROP TABLE si1")
    +
    +      // Specifying database name for query can be converted to data source write path
    +      // is not allowed right now.
    +      sql("SELECT key, value INTO default.si1 FROM src ORDER BY key, value")
    +      checkRelation("si1", true, defaultDataSource)
    +      sql("DROP TABLE si1")
    +
    +    } finally {
    +      setConf(SQLConf.CONVERT_CTAS, originalConf)
    +      sql("DROP TABLE IF EXISTS si1")
    +    }
    +  }
    +
    +  test("select into(check answer)") {
    +    sql("DROP TABLE IF EXISTS si1")
    +    sql("DROP TABLE IF EXISTS si2")
    +    sql("DROP TABLE IF EXISTS si3")
    --- End diff --
    
    ```
    withTable("si1", "si2", "si3") {
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14191: [SPARK-16217][SQL] Support SELECT INTO statement

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14191#discussion_r70747940
  
    --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
    @@ -338,7 +338,7 @@ querySpecification
            (RECORDREADER recordReader=STRING)?
            fromClause?
            (WHERE where=booleanExpression)?)
    -    | ((kind=SELECT setQuantifier? namedExpressionSeq fromClause?
    +    | ((kind=SELECT setQuantifier? namedExpressionSeq (intoClause? fromClause)?
    --- End diff --
    
    Hi, @wuxianxingkong .
    Currently, the following seems to be not considered yet. Could you modify the syntax to support this too?
    ```
    SELECT 1
    INTO newtable
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14191: [SPARK-16217][SQL] Support SELECT INTO statement

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14191#discussion_r121592866
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ---
    @@ -395,6 +414,15 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging {
       }
     
       /**
    +   * Change to Hive CTAS statement.
    +   */
    +  protected def withSelectInto(
    +      ctx: IntoClauseContext,
    +      query: LogicalPlan): LogicalPlan = withOrigin(ctx) {
    +    throw new ParseException("Script Select Into is not supported", ctx)
    --- End diff --
    
    Why throwing an `ParseException `?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14191: [SPARK-16217][SQL] Support SELECT INTO statement

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14191#discussion_r70754573
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ---
    @@ -159,7 +159,9 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging {
           // Add organization statements.
           optionalMap(ctx.queryOrganization)(withQueryResultClauses).
           // Add insert.
    -      optionalMap(ctx.insertInto())(withInsertInto)
    --- End diff --
    
    We also need to check what this does with multi-insert syntax, i.e.:
    ```sql
    FROM tbl_a
    INSERT INTO tbl_b
    SELECT *
    INSERT INTO tbl_c
    SELECT *
    INTO tbl_c
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14191: [SPARK-16217][SQL] Support SELECT INTO statement

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14191#discussion_r70748362
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ---
    @@ -1755,4 +1755,97 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
           }
         }
       }
    +
    +  test("select into(check relation)") {
    +    val originalConf = sessionState.conf.convertCTAS
    +
    +    setConf(SQLConf.CONVERT_CTAS, true)
    +
    +    val defaultDataSource = sessionState.conf.defaultDataSourceName
    +    try {
    +      sql("DROP TABLE IF EXISTS si1")
    +      sql("SELECT key, value INTO si1 FROM src ORDER BY key, value")
    +      val message = intercept[AnalysisException] {
    +        sql("SELECT key, value INTO si1 FROM src ORDER BY key, value")
    +      }.getMessage
    +      assert(message.contains("already exists"))
    +      checkRelation("si1", true, defaultDataSource)
    +      sql("DROP TABLE si1")
    +
    +      // Specifying database name for query can be converted to data source write path
    +      // is not allowed right now.
    +      sql("SELECT key, value INTO default.si1 FROM src ORDER BY key, value")
    +      checkRelation("si1", true, defaultDataSource)
    +      sql("DROP TABLE si1")
    +
    +    } finally {
    +      setConf(SQLConf.CONVERT_CTAS, originalConf)
    +      sql("DROP TABLE IF EXISTS si1")
    +    }
    +  }
    +
    +  test("select into(check answer)") {
    +    sql("DROP TABLE IF EXISTS si1")
    +    sql("DROP TABLE IF EXISTS si2")
    +    sql("DROP TABLE IF EXISTS si3")
    +
    +    sql("SELECT key, value INTO si1 FROM src")
    +    checkAnswer(
    +      sql("SELECT key, value FROM si1 ORDER BY key"),
    +      sql("SELECT key, value FROM src ORDER BY key").collect().toSeq)
    +
    +    sql("SELECT key k, value INTO si2 FROM src ORDER BY k,value").collect()
    +    checkAnswer(
    +      sql("SELECT k, value FROM si2 ORDER BY k, value"),
    +      sql("SELECT key, value FROM src ORDER BY key, value").collect().toSeq)
    +
    +    sql("SELECT 1 AS key,value INTO si3 FROM src LIMIT 1").collect()
    +    intercept[AnalysisException] {
    +      sql("SELECT key, value INTO si3 FROM src ORDER BY key, value").collect()
    +    }
    --- End diff --
    
    Checking the real error message is better.
    ```
    val m = intercept[AnalysisException] {
         sql("SELECT key, value INTO si3 FROM src ORDER BY key, value").collect()
    }.getMessage
    assert(m.contains("your exception message"))
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14191: [SPARK-16217][SQL] Support SELECT INTO statement

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14191#discussion_r70754255
  
    --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
    @@ -347,6 +347,10 @@ querySpecification
            windows?)
         ;
     
    +intoClause
    +    : INTO tableIdentifier
    --- End diff --
    
    It is easier to just put this in the `querySpecification` rule. Make sure you given the tableIdentifier a proper name


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14191: [SPARK-16217][SQL] Support SELECT INTO statement

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14191#discussion_r70935956
  
    --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
    @@ -338,7 +338,7 @@ querySpecification
            (RECORDREADER recordReader=STRING)?
            fromClause?
            (WHERE where=booleanExpression)?)
    -    | ((kind=SELECT setQuantifier? namedExpressionSeq fromClause?
    +    | ((kind=SELECT setQuantifier? namedExpressionSeq (intoClause? fromClause)?
    --- End diff --
    
    In the Spark Shell, please run the followings.
    ```
    sql("select 1")
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14191: [SPARK-16217][SQL] Support SELECT INTO statement

Posted by wuxianxingkong <gi...@git.apache.org>.

Github user wuxianxingkong commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14191#discussion_r71077546
  
    --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
    @@ -347,6 +347,10 @@ querySpecification
            windows?)
         ;
     
    +intoClause
    +    : INTO tableIdentifier
    --- End diff --
    
    For example, the json data registed as table tbl_b is:
    ![json data](https://raw.githubusercontent.com/wuxianxingkong/storage/master/use_data.png)
    @hvanhovell 
    The Logical Plan of sql "SELECT a INTO tbl_a FROM tbl_b" is:
    ![json plan](https://raw.githubusercontent.com/wuxianxingkong/storage/master/print_json_plan.png)
    The results match the expectations


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14191: [SPARK-16217][SQL] Support SELECT INTO statement

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/14191


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14191: [SPARK-16217][SQL] Support SELECT INTO statement

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14191
  
    Hi, @wuxianxingkong .
    Although I'm just a contributor like you, I left a few comments for you because I like your PR.
    I hope your PR will be merged soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14191: [SPARK-16217][SQL] Support SELECT INTO statement

Posted by wuxianxingkong <gi...@git.apache.org>.

Github user wuxianxingkong commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14191#discussion_r71067943
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
    @@ -1376,4 +1376,62 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder {
           reader, writer,
           schemaLess)
       }
    +
    +  /**
    +   * Reuse CTAS, convert select into to CTAS,
    +   * returning [[CreateHiveTableAsSelectLogicalPlan]].
    +   * The SELECT INTO statement selects data from one table
    +   * and inserts it into a new table.It is commonly used to
    +   * create a backup copy for table or selected records.
    +   *
    +   * Expected format:
    +   * {{{
    +   *   SELECT column_name(s)
    +   *   INTO new_table
    +   *   FROM old_table
    +   *   ...
    +   * }}}
    +   */
    +  override protected def withSelectInto(
    --- End diff --
    
    Reusing CTAS code path means we need to convert _IntoClauseContext_ to _CreateTableContext_ (or construct a new _CreateTableContext_),it might be difficult to archive. Maybe there is another way?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14191: [SPARK-16217][SQL] Support SELECT INTO statement

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14191#discussion_r70754137
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ---
    @@ -159,7 +159,9 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging {
           // Add organization statements.
           optionalMap(ctx.queryOrganization)(withQueryResultClauses).
           // Add insert.
    -      optionalMap(ctx.insertInto())(withInsertInto)
    --- End diff --
    
    This allows for the following syntax:
    ```sql
    INSERT INTO tbl_a
    SELECT *
    INTO tbl_a
    FROM tbl_b
    ```
    Make sure that we cannot have both.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14191: [SPARK-16217][SQL] Support SELECT INTO statement

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14191#discussion_r70748215
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ---
    @@ -1755,4 +1755,97 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
           }
         }
       }
    +
    +  test("select into(check relation)") {
    +    val originalConf = sessionState.conf.convertCTAS
    +
    +    setConf(SQLConf.CONVERT_CTAS, true)
    +
    +    val defaultDataSource = sessionState.conf.defaultDataSourceName
    +    try {
    +      sql("DROP TABLE IF EXISTS si1")
    --- End diff --
    
    Please consider the following convention.
    ```
    withTable("si1") {
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14191: [SPARK-16217][SQL] Support SELECT INTO statement

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14191#discussion_r70754843
  
    --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
    @@ -347,6 +347,10 @@ querySpecification
            windows?)
         ;
     
    +intoClause
    +    : INTO tableIdentifier
    --- End diff --
    
    Could you also check what kind of a plan the following query produces:
    ```SQL
    SELECT a INTO tbl_a FROM tbl_b
    ```
    We might run into a weird syntax error here. If we do then we need to move the `INTO` keyword from the `nonReserved` rule to the `identifier` rule.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org