You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by ravipesala <gi...@git.apache.org> on 2018/01/03 04:20:06 UTC

[GitHub] carbondata pull request #1752: [CARBONDATA-1972] Fix compaction after update...

GitHub user ravipesala opened a pull request:

    https://github.com/apache/carbondata/pull/1752

    [CARBONDATA-1972] Fix compaction after update of partition table.

    When updation happens on whole data then all old segments needs to be marked as delete. But it is not happening in case of partition table. This PR fixes it.
    
    Be sure to do all of the following checklist to help us incorporate 
    your contribution quickly and easily:
    
     - [X] Any interfaces changed?
      NO
     - [X] Any backward compatibility impacted?
       NO
     - [X] Document update required?
       NO
     - [X] Testing done
           Tests added
           
     - [X] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. 
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ravipesala/incubator-carbondata compaction-fail-after-update-partition-table

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1752.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1752
    
----
commit aed803fd202f6dfb9598cddf61aa3836f9171213
Author: ravipesala <ra...@...>
Date:   2018-01-03T03:46:36Z

    Fix compaction after update of partition table.

----


---

[GitHub] carbondata pull request #1752: [CARBONDATA-1972][PARTITION] Fix compaction a...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1752#discussion_r159657874
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/mutation/DeleteExecution.scala ---
    @@ -58,14 +58,15 @@ object DeleteExecution {
           dataRdd: RDD[Row],
           timestamp: String,
           isUpdateOperation: Boolean,
    -      executorErrors: ExecutionErrors): Boolean = {
    +      executorErrors: ExecutionErrors): (Boolean, Seq[String]) = {
    --- End diff --
    
    ok updated


---

[GitHub] carbondata pull request #1752: [CARBONDATA-1972] Fix compaction after update...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1752#discussion_r159367321
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java ---
    @@ -98,12 +96,25 @@ public CarbonOutputCommitter(Path outputPath, TaskAttemptContext context) throws
           new CarbonIndexFileMergeWriter().mergeCarbonIndexFilesOfSegment(segmentPath);
           String updateTime =
               context.getConfiguration().get(CarbonTableOutputFormat.UPADTE_TIMESTAMP, null);
    +      String segmentsToBeDeleted =
    +          context.getConfiguration().get(CarbonTableOutputFormat.SEGMENTS_TO_BE_DELETED, null);
           if (updateTime != null) {
             Set<String> segmentSet = new HashSet<>(
                 new SegmentStatusManager(carbonTable.getAbsoluteTableIdentifier())
                     .getValidAndInvalidSegments().getValidSegments());
    -        CarbonUpdateUtil.updateTableMetadataStatus(segmentSet, carbonTable, updateTime, true,
    -            new ArrayList<String>());
    +        List<String> segmentDeleteList;
    +        if (segmentsToBeDeleted != null)
    +        {
    --- End diff --
    
    move to previous line


---

[GitHub] carbondata issue #1752: [CARBONDATA-1972][PARTITION] Fix compaction after up...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1752
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2693/



---

[GitHub] carbondata issue #1752: [CARBONDATA-1972][PARTITION] Fix compaction after up...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1752
  
    retest this please


---

[GitHub] carbondata pull request #1752: [CARBONDATA-1972][PARTITION] Fix compaction a...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1752#discussion_r159594673
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForDeleteCommand.scala ---
    @@ -77,7 +77,7 @@ private[sql] case class CarbonProjectForDeleteCommand(
             dataRdd,
             timestamp,
             isUpdateOperation = false,
    -        executorErrors)) {
    +        executorErrors)._1) {
    --- End diff --
    
    It seems always true


---

[GitHub] carbondata issue #1752: [CARBONDATA-1972] Fix compaction after update of par...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1752
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2657/



---

[GitHub] carbondata issue #1752: [CARBONDATA-1972][PARTITION] Fix compaction after up...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1752
  
    Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1301/



---

[GitHub] carbondata issue #1752: [CARBONDATA-1972][PARTITION] Fix compaction after up...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1752
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2727/



---

[GitHub] carbondata pull request #1752: [CARBONDATA-1972][PARTITION] Fix compaction a...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1752#discussion_r159658076
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/datasources/CarbonFileFormat.scala ---
    @@ -107,11 +107,19 @@ with Serializable {
         CarbonTableOutputFormat.setOverwrite(conf, options("overwrite").toBoolean)
         // Set the update timestamp if user sets in case of update query. It needs to be updated
         // in load status update time
    -    val updateTimeStamp = options.getOrElse("updatetimestamp", null)
    -    if (updateTimeStamp != null) {
    -      conf.set(CarbonTableOutputFormat.UPADTE_TIMESTAMP, updateTimeStamp)
    -      model.setFactTimeStamp(updateTimeStamp.toLong)
    +    val updateTimeStamp = options.get("updatetimestamp")
    +    if (updateTimeStamp.isDefined) {
    +      conf.set(CarbonTableOutputFormat.UPADTE_TIMESTAMP, updateTimeStamp.get)
    +      model.setFactTimeStamp(updateTimeStamp.get.toLong)
         }
    +    // In case of update query there is chance to remove the older segments, so here we can set
    +    // the to be deleted segments to mark as delete while updating tablestatus
    +    val segemntsTobeDeleted = options.get("segmentsToBeDeleted")
    +    if (segemntsTobeDeleted.isDefined) {
    +      conf.set(CarbonTableOutputFormat.SEGMENTS_TO_BE_DELETED, segemntsTobeDeleted.get)
    +    }
    +
    +
    --- End diff --
    
    ok


---

[GitHub] carbondata issue #1752: [CARBONDATA-1972] Fix compaction after update of par...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1752
  
    Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1272/



---

[GitHub] carbondata pull request #1752: [CARBONDATA-1972][PARTITION] Fix compaction a...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1752#discussion_r159657961
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForDeleteCommand.scala ---
    @@ -77,7 +77,7 @@ private[sql] case class CarbonProjectForDeleteCommand(
             dataRdd,
             timestamp,
             isUpdateOperation = false,
    -        executorErrors)) {
    +        executorErrors)._1) {
    --- End diff --
    
    removed returning status


---

[GitHub] carbondata pull request #1752: [CARBONDATA-1972][PARTITION] Fix compaction a...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1752#discussion_r159475886
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java ---
    @@ -98,12 +96,25 @@ public CarbonOutputCommitter(Path outputPath, TaskAttemptContext context) throws
           new CarbonIndexFileMergeWriter().mergeCarbonIndexFilesOfSegment(segmentPath);
           String updateTime =
               context.getConfiguration().get(CarbonTableOutputFormat.UPADTE_TIMESTAMP, null);
    +      String segmentsToBeDeleted =
    +          context.getConfiguration().get(CarbonTableOutputFormat.SEGMENTS_TO_BE_DELETED, null);
           if (updateTime != null) {
             Set<String> segmentSet = new HashSet<>(
                 new SegmentStatusManager(carbonTable.getAbsoluteTableIdentifier())
                     .getValidAndInvalidSegments().getValidSegments());
    -        CarbonUpdateUtil.updateTableMetadataStatus(segmentSet, carbonTable, updateTime, true,
    -            new ArrayList<String>());
    +        List<String> segmentDeleteList;
    +        if (segmentsToBeDeleted != null)
    +        {
    --- End diff --
    
    ok


---

[GitHub] carbondata issue #1752: [CARBONDATA-1972][PARTITION] Fix compaction after up...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1752
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1337/



---

[GitHub] carbondata pull request #1752: [CARBONDATA-1972][PARTITION] Fix compaction a...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1752#discussion_r159479290
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java ---
    @@ -98,12 +96,25 @@ public CarbonOutputCommitter(Path outputPath, TaskAttemptContext context) throws
           new CarbonIndexFileMergeWriter().mergeCarbonIndexFilesOfSegment(segmentPath);
           String updateTime =
               context.getConfiguration().get(CarbonTableOutputFormat.UPADTE_TIMESTAMP, null);
    +      String segmentsToBeDeleted =
    +          context.getConfiguration().get(CarbonTableOutputFormat.SEGMENTS_TO_BE_DELETED, null);
           if (updateTime != null) {
             Set<String> segmentSet = new HashSet<>(
                 new SegmentStatusManager(carbonTable.getAbsoluteTableIdentifier())
                     .getValidAndInvalidSegments().getValidSegments());
    -        CarbonUpdateUtil.updateTableMetadataStatus(segmentSet, carbonTable, updateTime, true,
    -            new ArrayList<String>());
    +        List<String> segmentDeleteList;
    --- End diff --
    
    Ok


---

[GitHub] carbondata issue #1752: [CARBONDATA-1972][PARTITION] Fix compaction after up...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1752
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2747/



---

[GitHub] carbondata issue #1752: [CARBONDATA-1972][PARTITION] Fix compaction after up...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1752
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2574/



---

[GitHub] carbondata pull request #1752: [CARBONDATA-1972][PARTITION] Fix compaction a...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/1752


---

[GitHub] carbondata issue #1752: [CARBONDATA-1972][PARTITION] Fix compaction after up...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/1752
  
    LGTM


---

[GitHub] carbondata issue #1752: [CARBONDATA-1972][PARTITION] Fix compaction after up...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1752
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2525/



---

[GitHub] carbondata issue #1752: [CARBONDATA-1972][PARTITION] Fix compaction after up...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1752
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1321/



---

[GitHub] carbondata pull request #1752: [CARBONDATA-1972] Fix compaction after update...

Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1752#discussion_r159376105
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java ---
    @@ -98,12 +96,25 @@ public CarbonOutputCommitter(Path outputPath, TaskAttemptContext context) throws
           new CarbonIndexFileMergeWriter().mergeCarbonIndexFilesOfSegment(segmentPath);
           String updateTime =
               context.getConfiguration().get(CarbonTableOutputFormat.UPADTE_TIMESTAMP, null);
    +      String segmentsToBeDeleted =
    +          context.getConfiguration().get(CarbonTableOutputFormat.SEGMENTS_TO_BE_DELETED, null);
           if (updateTime != null) {
             Set<String> segmentSet = new HashSet<>(
                 new SegmentStatusManager(carbonTable.getAbsoluteTableIdentifier())
                     .getValidAndInvalidSegments().getValidSegments());
    -        CarbonUpdateUtil.updateTableMetadataStatus(segmentSet, carbonTable, updateTime, true,
    -            new ArrayList<String>());
    +        List<String> segmentDeleteList;
    --- End diff --
    
    Let the default value of `SEGMENTS_TO_BE_DELETED` be `""` in line 100 will reduce the following code (line 105~111) to 1 line: Only line 108 is enough.


---

[GitHub] carbondata pull request #1752: [CARBONDATA-1972][PARTITION] Fix compaction a...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1752#discussion_r159594614
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/mutation/DeleteExecution.scala ---
    @@ -58,14 +58,15 @@ object DeleteExecution {
           dataRdd: RDD[Row],
           timestamp: String,
           isUpdateOperation: Boolean,
    -      executorErrors: ExecutionErrors): Boolean = {
    +      executorErrors: ExecutionErrors): (Boolean, Seq[String]) = {
    --- End diff --
    
    It seems the first return value is always true
    And it is hard to understand this function, can you add comment for it


---

[GitHub] carbondata issue #1752: [CARBONDATA-1972] Fix compaction after update of par...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1752
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2497/



---

[GitHub] carbondata issue #1752: [CARBONDATA-1972][PARTITION] Fix compaction after up...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1752
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2559/



---

[GitHub] carbondata pull request #1752: [CARBONDATA-1972][PARTITION] Fix compaction a...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1752#discussion_r159594689
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/datasources/CarbonFileFormat.scala ---
    @@ -107,11 +107,19 @@ with Serializable {
         CarbonTableOutputFormat.setOverwrite(conf, options("overwrite").toBoolean)
         // Set the update timestamp if user sets in case of update query. It needs to be updated
         // in load status update time
    -    val updateTimeStamp = options.getOrElse("updatetimestamp", null)
    -    if (updateTimeStamp != null) {
    -      conf.set(CarbonTableOutputFormat.UPADTE_TIMESTAMP, updateTimeStamp)
    -      model.setFactTimeStamp(updateTimeStamp.toLong)
    +    val updateTimeStamp = options.get("updatetimestamp")
    +    if (updateTimeStamp.isDefined) {
    +      conf.set(CarbonTableOutputFormat.UPADTE_TIMESTAMP, updateTimeStamp.get)
    +      model.setFactTimeStamp(updateTimeStamp.get.toLong)
         }
    +    // In case of update query there is chance to remove the older segments, so here we can set
    +    // the to be deleted segments to mark as delete while updating tablestatus
    +    val segemntsTobeDeleted = options.get("segmentsToBeDeleted")
    +    if (segemntsTobeDeleted.isDefined) {
    +      conf.set(CarbonTableOutputFormat.SEGMENTS_TO_BE_DELETED, segemntsTobeDeleted.get)
    +    }
    +
    +
    --- End diff --
    
    remove empty line


---