You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by ravipesala <gi...@git.apache.org> on 2017/07/20 10:32:13 UTC

[GitHub] carbondata pull request #1189: [WIP] Insert overwrite support and force clea...

GitHub user ravipesala opened a pull request:

    https://github.com/apache/carbondata/pull/1189

    [WIP] Insert overwrite support and force clean up files and clean up in progress files support added

    The following features are added in this PR.
    1. Added support for `LOAD OVERWRITE` and `INSERT OVERWRITE` in carbon load. So after user issues overwrite command all old data will be overwritten with new data.
     Example :
     ```
    LOAD DATA INPATH '" data.csv' overwrite INTO table carbontable
    ```
    ```
    insert overwrite table carbontable select * from othertable
    ```
    When overwrite is in progress no other load will be allowed . And if any other load is already in progress also will be overwritten
    
    2. Added support for force clean table to remove the table with force from disk. It is useful in case of inconsistency with hive metastore. This support is only internal purpose and not exposed to user, so it is supported through scala API not through SQL.
    
    3.  Cleanup the inprogress files while driver is initializing. In case of driver is down while any load is in progress then it must be cleaned while coming up of driver. This is only controlled through parameter `spark.carbon.table.loader.driver` , so it must be set true in driver properties to cleanup the inprogress files. 


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ravipesala/incubator-carbondata insert-overwrite

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1189.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1189
    
----
commit 1eca780ee69b07cdf2a86df1759dfaa7d0f96fd8
Author: Ravindra Pesala <ra...@gmail.com>
Date:   2017-07-20T09:27:21Z

    Insert overwrite support and force clean up files and clean up in progress files support added

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1189: [CARBONDATA-1322] Insert overwrite support and force...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1189
  
    Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/558/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1189: [CARBONDATA-1322] Insert overwrite support an...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1189#discussion_r128481387
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
    @@ -1264,6 +1264,14 @@
     
       public static final String ENABLE_HIVE_SCHEMA_META_STORE_DEFAULT = "false";
     
    +  /**
    +   * There is more often that in production uses different drivers for load and queries. So in case
    +   * of load driver user should set this property to enable loader specific clean up.
    +   */
    +  public static final String TABLE_LOADER_DRIVER = "spark.carbon.table.loader.driver";
    --- End diff --
    
    I think this property not just for loading, any transactional operation should use this driver. So can you rename it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1189: [CARBONDATA-1322] Insert overwrite support an...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1189#discussion_r128495694
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
    @@ -1264,6 +1264,14 @@
     
       public static final String ENABLE_HIVE_SCHEMA_META_STORE_DEFAULT = "false";
     
    +  /**
    +   * There is more often that in production uses different drivers for load and queries. So in case
    +   * of load driver user should set this property to enable loader specific clean up.
    +   */
    +  public static final String TABLE_LOADER_DRIVER = "spark.carbon.table.loader.driver";
    --- End diff --
    
    ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1189: [CARBONDATA-1322] Insert overwrite support an...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1189#discussion_r128487079
  
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala ---
    @@ -617,4 +621,75 @@ object CommonUtil {
         AttributeReference("partition", StringType, nullable = false,
           new MetadataBuilder().putString("comment", "partitions info").build())()
       )
    +
    +  def cleanInProgressSegments(storePath: String, sparkContext: SparkContext): Unit = {
    --- End diff --
    
    please add some description 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1189: [CARBONDATA-1322] Insert overwrite support an...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1189#discussion_r128487481
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala ---
    @@ -485,8 +487,8 @@ case class LoadTable(
         }
     
         val dbName = databaseNameOp.getOrElse(sparkSession.catalog.currentDatabase)
    -    if (isOverwriteExist) {
    -      sys.error(s"Overwrite is not supported for carbon table with $dbName.$tableName")
    +    if (isOverwriteTable) {
    +      LOGGER.info(s"Overwrite of carbon table with $dbName.$tableName is in progress")
    --- End diff --
    
    ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1189: [CARBONDATA-1322] Insert overwrite support and force...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1189
  
    Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/561/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1189: [CARBONDATA-1322] Insert overwrite support and force...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1189
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3151/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1189: [CARBONDATA-1322] Insert overwrite support and force...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1189
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3153/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1189: [CARBONDATA-1322] Insert overwrite support an...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1189#discussion_r128496772
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/util/CleanFiles.scala ---
    @@ -29,12 +29,12 @@ import org.apache.carbondata.api.CarbonStore
     object CleanFiles {
     
       def cleanFiles(spark: SparkSession, dbName: String, tableName: String,
    -      storePath: String): Unit = {
    +      storePath: String, forceTableClean: Boolean): Unit = {
    --- End diff --
    
    add default value to `forceTableClean` and add comment for this function


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1189: [CARBONDATA-1322] Insert overwrite support and force...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1189
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3154/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1189: [CARBONDATA-1322] Insert overwrite support and force...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1189
  
    Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/560/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1189: [CARBONDATA-1322] Insert overwrite support an...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1189#discussion_r128490892
  
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala ---
    @@ -617,4 +621,75 @@ object CommonUtil {
         AttributeReference("partition", StringType, nullable = false,
           new MetadataBuilder().putString("comment", "partitions info").build())()
       )
    +
    +  def cleanInProgressSegments(storePath: String, sparkContext: SparkContext): Unit = {
    --- End diff --
    
    ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1189: [CARBONDATA-1322] Insert overwrite support an...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1189#discussion_r128486494
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala ---
    @@ -485,8 +487,8 @@ case class LoadTable(
         }
     
         val dbName = databaseNameOp.getOrElse(sparkSession.catalog.currentDatabase)
    -    if (isOverwriteExist) {
    -      sys.error(s"Overwrite is not supported for carbon table with $dbName.$tableName")
    +    if (isOverwriteTable) {
    +      LOGGER.info(s"Overwrite of carbon table with $dbName.$tableName is in progress")
    --- End diff --
    
    should first check whether there is overwrite on going, then do this log


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1189: [CARBONDATA-1322] Insert overwrite support an...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1189#discussion_r128487500
  
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala ---
    @@ -617,4 +621,75 @@ object CommonUtil {
         AttributeReference("partition", StringType, nullable = false,
           new MetadataBuilder().putString("comment", "partitions info").build())()
       )
    +
    +  def cleanInProgressSegments(storePath: String, sparkContext: SparkContext): Unit = {
    --- End diff --
    
    ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1189: [CARBONDATA-1322] Insert overwrite support an...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/1189


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---