You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by manishgupta88 <gi...@git.apache.org> on 2017/12/15 09:43:32 UTC

[GitHub] carbondata pull request #1665: [CARBONDATA-1884] Add CTAS support to carbond...

GitHub user manishgupta88 opened a pull request:

    https://github.com/apache/carbondata/pull/1665

    [CARBONDATA-1884] Add CTAS support to carbondata

    Implemented CTAS feature in carbondata. This will hep to create a carbon table from other parquet/orc tables.
    
     - [ ] Any interfaces changed?
     New DDL has been introduced.
    Syntax:
    **CREATE TABLE [IF NOT EXISTS] [db_name.]table_name stored by 'carbondata'  AS select_statement**
     - [ ] Any backward compatibility impacted?
     No
     - [ ] Document update required?
    Yes
     - [ ] Testing done
     Yes added 12 functional test cases covering various scenarios for CTAS test  
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. 
    NA
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/manishgupta88/carbondata CTAS_implementation

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1665.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1665
    
----
commit 1dc3df29d92d2cddcd73f6ea0c26140bd4a70ad6
Author: manishgupta88 <to...@gmail.com>
Date:   2017-12-13T16:10:19Z

    Added code for implementing CTAS

----


---

[GitHub] carbondata pull request #1665: [CARBONDATA-1884] Add CTAS support to carbond...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1665#discussion_r157356007
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonCreateTableAsSelectCommand.scala ---
    @@ -0,0 +1,86 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.command.table
    +
    +import scala.util.control.NonFatal
    +
    +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession}
    +import org.apache.spark.sql.catalyst.TableIdentifier
    +import org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException
    +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
    +import org.apache.spark.sql.execution.command.MetadataCommand
    +import org.apache.spark.sql.execution.command.management.CarbonInsertIntoCommand
    +
    +import org.apache.carbondata.common.logging.LogServiceFactory
    +import org.apache.carbondata.core.metadata.schema.table.TableInfo
    +
    +/**
    + * Create table and insert the query result into it.
    + *
    + * @param query the query whose result will be insert into the new relation
    + * @param tableInfo the Table Describe, which may contains serde, storage handler etc.
    + * @param ifNotExistsSet allow continue working if it's already exists, otherwise
    + *                      raise exception
    + * @param tableLocation store location where the table need to be created
    + */
    +case class CarbonCreateTableAsSelectCommand(query: LogicalPlan,
    +    tableInfo: TableInfo,
    +    ifNotExistsSet: Boolean = false,
    +    tableLocation: Option[String] = None) extends MetadataCommand {
    --- End diff --
    
    I think it should be AtomicRunnableCommand, since it do both data and metadata modification


---

[GitHub] carbondata issue #1665: [CARBONDATA-1884] Add CTAS support to carbondata

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/1665
  
    LGTM, thanks for working on this


---

[GitHub] carbondata pull request #1665: [CARBONDATA-1884] Add CTAS support to carbond...

Posted by watermen <gi...@git.apache.org>.
Github user watermen commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1665#discussion_r157408233
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonCreateTableAsSelectCommand.scala ---
    @@ -0,0 +1,86 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.command.table
    +
    +import scala.util.control.NonFatal
    +
    +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession}
    +import org.apache.spark.sql.catalyst.TableIdentifier
    +import org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException
    +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
    +import org.apache.spark.sql.execution.command.MetadataCommand
    +import org.apache.spark.sql.execution.command.management.CarbonInsertIntoCommand
    +
    +import org.apache.carbondata.common.logging.LogServiceFactory
    +import org.apache.carbondata.core.metadata.schema.table.TableInfo
    +
    +/**
    + * Create table and insert the query result into it.
    + *
    + * @param query the query whose result will be insert into the new relation
    + * @param tableInfo the Table Describe, which may contains serde, storage handler etc.
    + * @param ifNotExistsSet allow continue working if it's already exists, otherwise
    + *                      raise exception
    + * @param tableLocation store location where the table need to be created
    + */
    +case class CarbonCreateTableAsSelectCommand(query: LogicalPlan,
    +    tableInfo: TableInfo,
    +    ifNotExistsSet: Boolean = false,
    +    tableLocation: Option[String] = None) extends MetadataCommand {
    +
    +  override def processMetadata(sparkSession: SparkSession): Seq[Row] = {
    +    val LOGGER = LogServiceFactory.getLogService(this.getClass.getCanonicalName)
    +    val tableName = tableInfo.getFactTable.getTableName
    +    var databaseOpt: Option[String] = None
    +    if (tableInfo.getDatabaseName != null) {
    +      databaseOpt = Some(tableInfo.getDatabaseName)
    +    }
    +    val dbName = CarbonEnv.getDatabaseName(databaseOpt)(sparkSession)
    +    LOGGER.audit(s"Request received for CTAS for $dbName.$tableName")
    +    lazy val carbonDataSourceHadoopRelation = {
    --- End diff --
    
    Why you use lazy here?


---

[GitHub] carbondata pull request #1665: [CARBONDATA-1884] Add CTAS support to carbond...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1665#discussion_r157356131
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonMetaStore.scala ---
    @@ -144,6 +145,15 @@ trait CarbonMetaStore {
       def getThriftTableInfo(tablePath: CarbonTablePath)(sparkSession: SparkSession): TableInfo
     
       def getTableFromMetadataCache(database: String, tableName: String): Option[CarbonTable]
    +
    +  def getCarbonDataSourceHadoopRelation(sparkSession: SparkSession,
    +      tableIdentifier: TableIdentifier): CarbonDatasourceHadoopRelation
    +
    +  def getSchemaFromUnresolvedRelation(sparkSession: SparkSession,
    --- End diff --
    
    add comment for interface function


---

[GitHub] carbondata issue #1665: [CARBONDATA-1884] Add CTAS support to carbondata

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1665
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2098/



---

[GitHub] carbondata issue #1665: [CARBONDATA-1884] Add CTAS support to carbondata

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1665
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2402/



---

[GitHub] carbondata issue #1665: [CARBONDATA-1884] Add CTAS support to carbondata

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1665
  
    Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/873/



---

[GitHub] carbondata pull request #1665: [CARBONDATA-1884] Add CTAS support to carbond...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1665#discussion_r157356103
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala ---
    @@ -528,4 +529,46 @@ class CarbonFileMetastore extends CarbonMetaStore {
         val tableMetadataFile = tablePath.getSchemaFilePath
         CarbonUtil.readSchemaFile(tableMetadataFile)
       }
    +
    +  override def getCarbonDataSourceHadoopRelation(sparkSession: SparkSession,
    --- End diff --
    
    please provide comment for this function


---

[GitHub] carbondata issue #1665: [CARBONDATA-1884] Add CTAS support to carbondata

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1665
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2115/



---

[GitHub] carbondata issue #1665: [CARBONDATA-1884] Add CTAS support to carbondata

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1665
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2321/



---

[GitHub] carbondata issue #1665: [CARBONDATA-1884] Add CTAS support to carbondata

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1665
  
    Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/891/



---

[GitHub] carbondata issue #1665: [CARBONDATA-1884] Add CTAS support to carbondata

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1665
  
    Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/890/



---

[GitHub] carbondata issue #1665: [CARBONDATA-1884] Add CTAS support to carbondata

Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on the issue:

    https://github.com/apache/carbondata/pull/1665
  
    retest this please


---

[GitHub] carbondata pull request #1665: [CARBONDATA-1884] Add CTAS support to carbond...

Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1665#discussion_r157405351
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonMetaStore.scala ---
    @@ -144,6 +145,15 @@ trait CarbonMetaStore {
       def getThriftTableInfo(tablePath: CarbonTablePath)(sparkSession: SparkSession): TableInfo
     
       def getTableFromMetadataCache(database: String, tableName: String): Option[CarbonTable]
    +
    +  def getCarbonDataSourceHadoopRelation(sparkSession: SparkSession,
    +      tableIdentifier: TableIdentifier): CarbonDatasourceHadoopRelation
    +
    +  def getSchemaFromUnresolvedRelation(sparkSession: SparkSession,
    --- End diff --
    
    This method is called from the parser and this logic for retrieving schema from unresolved relation, according to me should not return in parser. Please let me know if you have a different perspective


---

[GitHub] carbondata issue #1665: [CARBONDATA-1884] Add CTAS support to carbondata

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1665
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2116/



---

[GitHub] carbondata pull request #1665: [CARBONDATA-1884] Add CTAS support to carbond...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1665#discussion_r157356062
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonCreateTableAsSelectCommand.scala ---
    @@ -0,0 +1,86 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.command.table
    +
    +import scala.util.control.NonFatal
    +
    +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession}
    +import org.apache.spark.sql.catalyst.TableIdentifier
    +import org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException
    +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
    +import org.apache.spark.sql.execution.command.MetadataCommand
    +import org.apache.spark.sql.execution.command.management.CarbonInsertIntoCommand
    +
    +import org.apache.carbondata.common.logging.LogServiceFactory
    +import org.apache.carbondata.core.metadata.schema.table.TableInfo
    +
    +/**
    + * Create table and insert the query result into it.
    + *
    + * @param query the query whose result will be insert into the new relation
    + * @param tableInfo the Table Describe, which may contains serde, storage handler etc.
    + * @param ifNotExistsSet allow continue working if it's already exists, otherwise
    + *                      raise exception
    + * @param tableLocation store location where the table need to be created
    + */
    +case class CarbonCreateTableAsSelectCommand(query: LogicalPlan,
    +    tableInfo: TableInfo,
    +    ifNotExistsSet: Boolean = false,
    +    tableLocation: Option[String] = None) extends MetadataCommand {
    +
    +  override def processMetadata(sparkSession: SparkSession): Seq[Row] = {
    +    val LOGGER = LogServiceFactory.getLogService(this.getClass.getCanonicalName)
    +    val tableName = tableInfo.getFactTable.getTableName
    +    var databaseOpt: Option[String] = None
    +    if (tableInfo.getDatabaseName != null) {
    +      databaseOpt = Some(tableInfo.getDatabaseName)
    +    }
    +    val dbName = CarbonEnv.getDatabaseName(databaseOpt)(sparkSession)
    +    LOGGER.audit(s"Request received for CTAS for $dbName.$tableName")
    +    lazy val carbonDataSourceHadoopRelation = {
    --- End diff --
    
    This can be moved down to line 61


---

[GitHub] carbondata pull request #1665: [CARBONDATA-1884] Add CTAS support to carbond...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1665#discussion_r157356143
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/parser/CarbonSparkSqlParser.scala ---
    @@ -210,13 +211,40 @@ class CarbonHelperSqlAstBuilder(conf: SQLConf, parser: CarbonSpark2SqlParser)
           }
         }
     
    -    val fields = parser.getFields(cols ++ partitionByStructFields)
    +    var fields = parser.getFields(cols ++ partitionByStructFields)
         val options = new CarbonOption(properties)
         // validate tblProperties
         val bucketFields = parser.getBucketFields(tableProperties, fields, options)
     
         validateStreamingProperty(options)
     
    +    // validate for create table as select
    --- End diff --
    
    move this logic to a validateSelectQueryForCTAS function


---

[GitHub] carbondata issue #1665: [CARBONDATA-1884] Add CTAS support to carbondata

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1665
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2018/



---

[GitHub] carbondata issue #1665: [CARBONDATA-1884] Add CTAS support to carbondata

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1665
  
    Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/867/



---

[GitHub] carbondata pull request #1665: [CARBONDATA-1884] Add CTAS support to carbond...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1665#discussion_r157356189
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonMetaStore.scala ---
    @@ -144,6 +145,15 @@ trait CarbonMetaStore {
       def getThriftTableInfo(tablePath: CarbonTablePath)(sparkSession: SparkSession): TableInfo
     
       def getTableFromMetadataCache(database: String, tableName: String): Option[CarbonTable]
    +
    +  def getCarbonDataSourceHadoopRelation(sparkSession: SparkSession,
    +      tableIdentifier: TableIdentifier): CarbonDatasourceHadoopRelation
    +
    +  def getSchemaFromUnresolvedRelation(sparkSession: SparkSession,
    --- End diff --
    
    This function is simple logic only, why not do this logic directly in caller?


---

[GitHub] carbondata pull request #1665: [CARBONDATA-1884] Add CTAS support to carbond...

Posted by watermen <gi...@git.apache.org>.
Github user watermen commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1665#discussion_r157409856
  
    --- Diff: integration/spark2/src/main/spark2.1/CarbonSessionState.scala ---
    @@ -259,25 +259,26 @@ object CarbonOptimizerUtil {
       }
     }
     
    -class CarbonSqlAstBuilder(conf: SQLConf, parser: CarbonSpark2SqlParser) extends
    -  SparkSqlAstBuilder(conf) {
    +class CarbonSqlAstBuilder(conf: SQLConf, parser: CarbonSpark2SqlParser, sparkSession: SparkSession)
    +  extends SparkSqlAstBuilder(conf) {
     
    -  val helper = new CarbonHelperSqlAstBuilder(conf, parser)
    +  val helper = new CarbonHelperSqlAstBuilder(conf, parser, sparkSession)
     
       override def visitCreateTable(ctx: CreateTableContext): LogicalPlan = {
         val fileStorage = helper.getFileStorage(ctx.createFileFormat)
     
         if (fileStorage.equalsIgnoreCase("'carbondata'") ||
             fileStorage.equalsIgnoreCase("'org.apache.carbondata.format'")) {
           helper.createCarbonTable(ctx.createTableHeader,
    --- End diff --
    
    move `ctx.createTableHeader` to next line


---

[GitHub] carbondata pull request #1665: [CARBONDATA-1884] Add CTAS support to carbond...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1665#discussion_r157356121
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonMetaStore.scala ---
    @@ -144,6 +145,15 @@ trait CarbonMetaStore {
       def getThriftTableInfo(tablePath: CarbonTablePath)(sparkSession: SparkSession): TableInfo
     
       def getTableFromMetadataCache(database: String, tableName: String): Option[CarbonTable]
    +
    +  def getCarbonDataSourceHadoopRelation(sparkSession: SparkSession,
    --- End diff --
    
    move parameter to next line


---

[GitHub] carbondata pull request #1665: [CARBONDATA-1884] Add CTAS support to carbond...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1665#discussion_r157355991
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonCreateTableAsSelectCommand.scala ---
    @@ -0,0 +1,86 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.command.table
    +
    +import scala.util.control.NonFatal
    +
    +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession}
    +import org.apache.spark.sql.catalyst.TableIdentifier
    +import org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException
    +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
    +import org.apache.spark.sql.execution.command.MetadataCommand
    +import org.apache.spark.sql.execution.command.management.CarbonInsertIntoCommand
    +
    +import org.apache.carbondata.common.logging.LogServiceFactory
    +import org.apache.carbondata.core.metadata.schema.table.TableInfo
    +
    +/**
    + * Create table and insert the query result into it.
    + *
    + * @param query the query whose result will be insert into the new relation
    + * @param tableInfo the Table Describe, which may contains serde, storage handler etc.
    + * @param ifNotExistsSet allow continue working if it's already exists, otherwise
    + *                      raise exception
    + * @param tableLocation store location where the table need to be created
    + */
    +case class CarbonCreateTableAsSelectCommand(query: LogicalPlan,
    --- End diff --
    
    move query as the last parameter of this command


---

[GitHub] carbondata pull request #1665: [CARBONDATA-1884] Add CTAS support to carbond...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1665#discussion_r157473214
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/parser/CarbonSparkSqlParser.scala ---
    @@ -210,13 +208,40 @@ class CarbonHelperSqlAstBuilder(conf: SQLConf, parser: CarbonSpark2SqlParser)
           }
         }
     
    -    val fields = parser.getFields(cols ++ partitionByStructFields)
    +    var fields = parser.getFields(cols ++ partitionByStructFields)
         val options = new CarbonOption(properties)
         // validate tblProperties
         val bucketFields = parser.getBucketFields(tableProperties, fields, options)
     
         validateStreamingProperty(options)
     
    +    // validate for create table as select
    +    val selectQuery = Option(query).map(plan)
    +    selectQuery match {
    +      case Some(q) =>
    +        // create table as select does not allow creation of partitioned table
    +        if (partitionFields.nonEmpty) {
    +          val errorMessage = "A Create Table As Select (CTAS) statement is not allowed to " +
    +                             "create a partitioned table using Carbondata file formats."
    +          operationNotAllowed(errorMessage, partitionColumns)
    +        }
    +        // create table as select does not allow to explicitly specify schema
    +        if (fields.nonEmpty) {
    +          operationNotAllowed(
    +            "Schema may not be specified in a Create Table As Select (CTAS) statement", columns)
    +        }
    +        // create table as select does not allow specifying any table properties
    +        if (tableProperties.nonEmpty) {
    --- End diff --
    
    I think this can be allowed, it is also in Hive syntax


---

[GitHub] carbondata issue #1665: [CARBONDATA-1884] Add CTAS support to carbondata

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1665
  
    Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/775/



---

[GitHub] carbondata issue #1665: [CARBONDATA-1884] Add CTAS support to carbondata

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1665
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2401/



---

[GitHub] carbondata pull request #1665: [CARBONDATA-1884] Add CTAS support to carbond...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1665#discussion_r157356072
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonCreateTableAsSelectCommand.scala ---
    @@ -0,0 +1,86 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.command.table
    +
    +import scala.util.control.NonFatal
    +
    +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession}
    +import org.apache.spark.sql.catalyst.TableIdentifier
    +import org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException
    +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
    +import org.apache.spark.sql.execution.command.MetadataCommand
    +import org.apache.spark.sql.execution.command.management.CarbonInsertIntoCommand
    +
    +import org.apache.carbondata.common.logging.LogServiceFactory
    +import org.apache.carbondata.core.metadata.schema.table.TableInfo
    +
    +/**
    + * Create table and insert the query result into it.
    + *
    + * @param query the query whose result will be insert into the new relation
    + * @param tableInfo the Table Describe, which may contains serde, storage handler etc.
    + * @param ifNotExistsSet allow continue working if it's already exists, otherwise
    + *                      raise exception
    + * @param tableLocation store location where the table need to be created
    + */
    +case class CarbonCreateTableAsSelectCommand(query: LogicalPlan,
    +    tableInfo: TableInfo,
    +    ifNotExistsSet: Boolean = false,
    +    tableLocation: Option[String] = None) extends MetadataCommand {
    +
    +  override def processMetadata(sparkSession: SparkSession): Seq[Row] = {
    +    val LOGGER = LogServiceFactory.getLogService(this.getClass.getCanonicalName)
    +    val tableName = tableInfo.getFactTable.getTableName
    +    var databaseOpt: Option[String] = None
    +    if (tableInfo.getDatabaseName != null) {
    +      databaseOpt = Some(tableInfo.getDatabaseName)
    +    }
    +    val dbName = CarbonEnv.getDatabaseName(databaseOpt)(sparkSession)
    +    LOGGER.audit(s"Request received for CTAS for $dbName.$tableName")
    +    lazy val carbonDataSourceHadoopRelation = {
    +      // execute command to create carbon table
    +      CarbonCreateTableCommand(tableInfo, ifNotExistsSet, tableLocation).run(sparkSession)
    +      CarbonEnv.getInstance(sparkSession).carbonMetastore
    +        .getCarbonDataSourceHadoopRelation(sparkSession, TableIdentifier(tableName, Option(dbName)))
    +    }
    +    // check if table already exists
    +    if (sparkSession.sessionState.catalog.listTables(dbName)
    +      .exists(_.table.equalsIgnoreCase(tableName))) {
    +      if (!ifNotExistsSet) {
    +        LOGGER.audit(
    +          s"Table creation with Database name [$dbName] and Table name [$tableName] failed. " +
    +          s"Table [$tableName] already exists under database [$dbName]")
    +        throw new TableAlreadyExistsException(dbName, tableName)
    +      }
    +    } else {
    +      try {
    +        // execute command to load data into carbon table
    +        CarbonInsertIntoCommand(carbonDataSourceHadoopRelation,
    --- End diff --
    
    move parameter to next line


---

[GitHub] carbondata pull request #1665: [CARBONDATA-1884] Add CTAS support to carbond...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/1665


---

[GitHub] carbondata issue #1665: [CARBONDATA-1884] Add CTAS support to carbondata

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1665
  
    Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/780/



---

[GitHub] carbondata issue #1665: [CARBONDATA-1884] Add CTAS support to carbondata

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1665
  
    Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/785/



---

[GitHub] carbondata pull request #1665: [CARBONDATA-1884] Add CTAS support to carbond...

Posted by watermen <gi...@git.apache.org>.
Github user watermen commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1665#discussion_r157410277
  
    --- Diff: integration/spark2/src/main/spark2.2/CarbonSessionState.scala ---
    @@ -280,25 +280,26 @@ class CarbonOptimizer(
       }
     }
     
    -class CarbonSqlAstBuilder(conf: SQLConf, parser: CarbonSpark2SqlParser) extends
    -  SparkSqlAstBuilder(conf) {
    +class CarbonSqlAstBuilder(conf: SQLConf, parser: CarbonSpark2SqlParser, sparkSession: SparkSession)
    +  extends SparkSqlAstBuilder(conf) {
     
    -  val helper = new CarbonHelperSqlAstBuilder(conf, parser)
    +  val helper = new CarbonHelperSqlAstBuilder(conf, parser, sparkSession)
     
       override def visitCreateHiveTable(ctx: CreateHiveTableContext): LogicalPlan = {
         val fileStorage = helper.getFileStorage(ctx.createFileFormat)
     
         if (fileStorage.equalsIgnoreCase("'carbondata'") ||
             fileStorage.equalsIgnoreCase("'org.apache.carbondata.format'")) {
           helper.createCarbonTable(ctx.createTableHeader,
    -          ctx.skewSpec,
    --- End diff --
    
    @jackylk We should use the right format at the first time, I mean use 2 blanks instead of 4 blanks. We should avoid unrelated modify, so we can keep each MR with less diff. Better to review and better to rebase.


---

[GitHub] carbondata issue #1665: [CARBONDATA-1884] Add CTAS support to carbondata

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1665
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2006/



---

[GitHub] carbondata issue #1665: [CARBONDATA-1884] Add CTAS support to carbondata

Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on the issue:

    https://github.com/apache/carbondata/pull/1665
  
    retest this please


---

[GitHub] carbondata pull request #1665: [CARBONDATA-1884] Add CTAS support to carbond...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1665#discussion_r157356091
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala ---
    @@ -528,4 +529,46 @@ class CarbonFileMetastore extends CarbonMetaStore {
         val tableMetadataFile = tablePath.getSchemaFilePath
         CarbonUtil.readSchemaFile(tableMetadataFile)
       }
    +
    +  override def getCarbonDataSourceHadoopRelation(sparkSession: SparkSession,
    --- End diff --
    
    move parameter to next line


---

[GitHub] carbondata pull request #1665: [CARBONDATA-1884] Add CTAS support to carbond...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1665#discussion_r157356181
  
    --- Diff: integration/spark2/src/main/spark2.1/CarbonSessionState.scala ---
    @@ -259,25 +259,26 @@ object CarbonOptimizerUtil {
       }
     }
     
    -class CarbonSqlAstBuilder(conf: SQLConf, parser: CarbonSpark2SqlParser) extends
    -  SparkSqlAstBuilder(conf) {
    +class CarbonSqlAstBuilder(conf: SQLConf, parser: CarbonSpark2SqlParser, sparkSession: SparkSession)
    +  extends SparkSqlAstBuilder(conf) {
     
    -  val helper = new CarbonHelperSqlAstBuilder(conf, parser)
    +  val helper = new CarbonHelperSqlAstBuilder(conf, parser, sparkSession)
     
       override def visitCreateTable(ctx: CreateTableContext): LogicalPlan = {
         val fileStorage = helper.getFileStorage(ctx.createFileFormat)
     
         if (fileStorage.equalsIgnoreCase("'carbondata'") ||
             fileStorage.equalsIgnoreCase("'org.apache.carbondata.format'")) {
           helper.createCarbonTable(ctx.createTableHeader,
    --- End diff --
    
    Sine you modify this function, to make it more readable, please add parameter name also, like
    ```
       foo(
          paramA = a,
          paramB = b
          ...)
    ```


---

[GitHub] carbondata pull request #1665: [CARBONDATA-1884] Add CTAS support to carbond...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1665#discussion_r157356168
  
    --- Diff: integration/spark2/src/main/spark2.2/CarbonSessionState.scala ---
    @@ -280,25 +280,26 @@ class CarbonOptimizer(
       }
     }
     
    -class CarbonSqlAstBuilder(conf: SQLConf, parser: CarbonSpark2SqlParser) extends
    -  SparkSqlAstBuilder(conf) {
    +class CarbonSqlAstBuilder(conf: SQLConf, parser: CarbonSpark2SqlParser, sparkSession: SparkSession)
    +  extends SparkSqlAstBuilder(conf) {
     
    -  val helper = new CarbonHelperSqlAstBuilder(conf, parser)
    +  val helper = new CarbonHelperSqlAstBuilder(conf, parser, sparkSession)
     
       override def visitCreateHiveTable(ctx: CreateHiveTableContext): LogicalPlan = {
         val fileStorage = helper.getFileStorage(ctx.createFileFormat)
     
         if (fileStorage.equalsIgnoreCase("'carbondata'") ||
             fileStorage.equalsIgnoreCase("'org.apache.carbondata.format'")) {
           helper.createCarbonTable(ctx.createTableHeader,
    -          ctx.skewSpec,
    -          ctx.bucketSpec,
    -          ctx.partitionColumns,
    -          ctx.columns,
    -          ctx.tablePropertyList,
    -          ctx.locationSpec(),
    -          Option(ctx.STRING()).map(string),
    -          ctx.AS)
    +        ctx.skewSpec,
    --- End diff --
    
    Sine you modify this function, to make it more readable, please add parameter name also, like
    ```
       foo(
          paramA = a,
          paramB = b
       )


---

[GitHub] carbondata issue #1665: [CARBONDATA-1884] Add CTAS support to carbondata

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1665
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2093/



---

[GitHub] carbondata issue #1665: [CARBONDATA-1884] Add CTAS support to carbondata

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1665
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2391/



---