You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gatorsmile <gi...@git.apache.org> on 2016/07/10 07:42:27 UTC

[GitHub] spark pull request #14123: [SPARK-16471] [SQL] Remove Hive-specific CreateHi...

GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/14123

    [SPARK-16471] [SQL] Remove Hive-specific CreateHiveTableAsSelectLogicalPlan [WIP]

    #### What changes were proposed in this pull request?
    `CreateHiveTableAsSelectLogicalPlan` is a Hive-specific logical node. This is not a good design. We need to consolidate it into the corresponding `CreateTableUsingAsSelect`.
    
    The first step is to make more general the signature of `CreateTableUsingAsSelect` by using `CatalogTable` as the input of Table metadata. The logical node name will be renamed to `CreateTableAsSelect`. The new interface will be like
    ```Scala
    case class CreateTableAsSelect(
        tableDesc: CatalogTable,
        provider: String,
        mode: SaveMode,
        child: LogicalPlan) extends logical.UnaryNode 
    ```
    The second step is to convert `CreateHiveTableAsSelectLogicalPlan` into `CreateTableAsSelect `.  
    
    This PR is based on the compare of the two interfaces. The details are described below.
    
    Currently, the SQL interface is the only only entrance to `CreateHiveTableAsSelectLogicalPlan`. Below describes the correspondence between the SQL interface and `CreateHiveTableAsSelectLogicalPlan `
    ```Scala
    case class CreateHiveTableAsSelectLogicalPlan(
        tableDesc: CatalogTable,
        child: LogicalPlan,
        allowExisting: Boolean)
        extends UnaryNode with Command 
    ```
    ```Scala
    SQL:
    
    When conf.convertCTAS == false || either [ROW FORMAT row_format] or [STORED AS file_format] is specified
    
      CREATE [EXTERNAL] [TEMPORARY] TABLE [IF NOT EXISTS] [db_name.]table_name
      [(col1[:] data_type [COMMENT col_comment], ...)]
      [COMMENT table_comment]
      [PARTITIONED BY (col2[:] data_type [COMMENT col_comment], ...)]
      [ROW FORMAT row_format]
      [STORED AS file_format]
      [LOCATION path]
      [TBLPROPERTIES (property_name=property_value, ...)]
      [AS select_statement];
      
      -->
      
      [TEMPORARY] is not allowed.
    
      allowExisting: Boolean = [IF NOT EXISTS]
      child: LogicalPlan = select_statement
      tableDesc: CatalogTable = CatalogTable(
        identifier = [db_name.]table_name,
        tableType = [EXTERNAL],
        storage = [ROW FORMAT row_format +
                  [STORED AS file_format] +
                  [LOCATION path],
        schema = Seq.empty,
        partitionColumnNames = Seq.empty,
        properties = [TBLPROPERTIES (property_name=property_value, ...)],
        comment = [COMMENT table_comment])
    ```
    
    `CreateTableUsingAsSelect` has three entrances. Below is the the correspondence:
    ```Scala
    case class CreateTableUsingAsSelect(
        tableIdent: TableIdentifier,
        provider: String,
        partitionColumns: Array[String],
        bucketSpec: Option[BucketSpec],
        mode: SaveMode,
        options: Map[String, String],
        child: LogicalPlan) extends logical.UnaryNode 
    ```
    ```Scala
    SQL Interface I:
    
    When conf.convertCTAS == true && [ROW FORMAT row_format] and [STORED AS file_format] are not specified
    
      CREATE [EXTERNAL] [TEMPORARY] TABLE [IF NOT EXISTS] [db_name.]table_name
      [(col1[:] data_type [COMMENT col_comment], ...)]
      [COMMENT table_comment]
      [PARTITIONED BY (col2[:] data_type [COMMENT col_comment], ...)]
      [ROW FORMAT row_format]
      [STORED AS file_format]
      [LOCATION path]
      [TBLPROPERTIES (property_name=property_value, ...)]
      [AS select_statement];
      
      --> 
      
      tableIdent: TableIdentifier = [db_name.]table_name,
      provider: String = conf.defaultDataSourceName,
      partitionColumns: Array[String] = Seq.empty,
      bucketSpec: Option[BucketSpec] = None,
      mode: SaveMode = [IF NOT EXISTS],
      options: Map[String, String] = [LOCATION path],
      child: LogicalPlan = [AS select_statement]
    ```
    ```Scala
    SQL Interface II:
    
      CREATE [EXTERNAL] [TEMPORARY] TABLE [IF NOT EXISTS] [db_name.]table_name
      [(col1[:] data_type [COMMENT col_comment], ...)]
      USING qualifiedName
      [OPTIONS tablePropertyList)]
      [PARTITIONED BY (col2[:] data_type [COMMENT col_comment], ...)]
      [CLUSTERED BY (col3, ...) (SORTED BY orderedIdentifierList)? INTO INTEGER_VALUE BUCKETS]
      [AS select_statement];
    
      -->
    
      [EXTERNAL] is not allowed.
      [TEMPORARY] is not allowed.
    
      tableIdent: TableIdentifier = [db_name.]table_name,
      provider: String = USING qualifiedName,
      partitionColumns: Array[String] = [PARTITIONED BY (col2[:] data_type [COMMENT col_comment], ...)],
      bucketSpec: Option[BucketSpec] = [CLUSTERED BY (col3, ...) (SORTED BY orderedIdentifierList)? INTO INTEGER_VALUE BUCKETS],
      mode: SaveMode = [IF NOT EXISTS],
      options: Map[String, String] = [OPTIONS tablePropertyList)],
      child: LogicalPlan = [AS select_statement]
    ```
    ```Scala
    DataFrameWriter Interface:
    
      tableIdent: TableIdentifier = tableIdent (from saveAsTable API),
      provider: String = source (from format API),
      partitionColumns: Array[String] = partitioningColumns (from partitionBy API),
      bucketSpec: Option[BucketSpec] = getBucketSpec function (from bucketBy API and sortBy API),
      mode: SaveMode = mode (from mode API),
      options: Map[String, String] = extraOptions (from option and options API),
      child: LogicalPlan = df.logicalPlan (from DataFrameWriter)
    ```
    
    #### How was this patch tested?
    The existing test cases cover the code refactoring

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark removeHiveCTASLogicalNode

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14123.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14123
    
----
commit 55b1a8c4f44611b2f7372acef5b79dad5833d105
Author: gatorsmile <ga...@gmail.com>
Date:   2016-07-09T07:32:57Z

    fix.

commit 568f13352a30412c74b267bad1b339c17653f02c
Author: gatorsmile <ga...@gmail.com>
Date:   2016-07-10T07:09:30Z

    fix1

commit 5bef1e8353d41c7fa333bba78502934211692a15
Author: gatorsmile <ga...@gmail.com>
Date:   2016-07-10T07:20:48Z

    revert

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14123: [SPARK-16471] [SQL] Remove Hive-specific CreateHiveTable...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14123
  
    **[Test build #62068 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62068/consoleFull)** for PR 14123 at commit [`082040f`](https://github.com/apache/spark/commit/082040f64130795593d551647f4d451a0b6a9a7e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14123: [SPARK-16471] [SQL] Remove Hive-specific CreateHiveTable...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14123
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14123: [SPARK-16471] [SQL] Remove Hive-specific CreateHiveTable...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/14123
  
    should we wait for https://github.com/apache/spark/pull/14071?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14123: [SPARK-16471] [SQL] Remove Hive-specific CreateHiveTable...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14123
  
    This is part of https://github.com/apache/spark/pull/14482. Close it now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14123: [SPARK-16471] [SQL] Remove Hive-specific CreateHi...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile closed the pull request at:

    https://github.com/apache/spark/pull/14123


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14123: [SPARK-16471] [SQL] Remove Hive-specific CreateHiveTable...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14123
  
    @cloud-fan Yeah! Will be in [WIP] until https://github.com/apache/spark/pull/14071 is merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14123: [SPARK-16471] [SQL] Remove Hive-specific CreateHiveTable...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14123
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62047/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14123: [SPARK-16471] [SQL] Remove Hive-specific CreateHiveTable...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14123
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62068/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14123: [SPARK-16471] [SQL] Remove Hive-specific CreateHiveTable...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14123
  
    **[Test build #62047 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62047/consoleFull)** for PR 14123 at commit [`5bef1e8`](https://github.com/apache/spark/commit/5bef1e8353d41c7fa333bba78502934211692a15).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14123: [SPARK-16471] [SQL] Remove Hive-specific CreateHiveTable...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14123
  
    **[Test build #62068 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62068/consoleFull)** for PR 14123 at commit [`082040f`](https://github.com/apache/spark/commit/082040f64130795593d551647f4d451a0b6a9a7e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14123: [SPARK-16471] [SQL] Remove Hive-specific CreateHiveTable...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14123
  
    **[Test build #62047 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62047/consoleFull)** for PR 14123 at commit [`5bef1e8`](https://github.com/apache/spark/commit/5bef1e8353d41c7fa333bba78502934211692a15).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14123: [SPARK-16471] [SQL] Remove Hive-specific CreateHiveTable...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14123
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org