You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by GitBox <gi...@apache.org> on 2020/10/15 13:24:33 UTC

[GitHub] [carbondata] shenjiayu17 opened a new pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

shenjiayu17 opened a new pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986


    ### Why is this PR needed?
    The horizontal compaction flow will be too slow when updating with lots of segments(or lots of blocks), so we try to analyze and optimize it for time-consuming problem.
    
    ### What changes were proposed in this PR?
   In performDeleteDeltaCompaction, optimize the method getSegListIUDCompactionQualified. Combine two traversals of segments which have same process, and move listFiles process outside the traversal of blocks.
       
    ### Does this PR introduce any user interface change?
    - No
   
    ### Is any new testcase added?
    - No
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509202313



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##########
@@ -415,44 +415,38 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files of the block of specified segment.
+   * Actually, delete delta file name is generated from each SegmentUpdateDetails.
+   *
+   * @param seg the segment which is to find block and its delete delta files
+   * @param blockName the specified block of the segment
+   * @return delete delta file list of the block
    */
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final String blockName) {
+  public List<String> getDeleteDeltaFilesList(final Segment seg, final String blockName) {
+
+    List<String> deleteDeltaFileList = new ArrayList<>();
     String segmentPath = CarbonTablePath.getSegmentPath(
-        identifier.getTablePath(), segmentId.getSegmentNo());
-    CarbonFile segDir =
-        FileFactory.getCarbonFile(segmentPath);
+        identifier.getTablePath(), seg.getSegmentNo());
+
     for (SegmentUpdateDetails block : updateDetails) {
       if ((block.getBlockName().equalsIgnoreCase(blockName)) &&
-          (block.getSegmentName().equalsIgnoreCase(segmentId.getSegmentNo()))
-          && !CarbonUpdateUtil.isBlockInvalid((block.getSegmentStatus()))) {
+          (block.getSegmentName().equalsIgnoreCase(seg.getSegmentNo())) &&
+          !CarbonUpdateUtil.isBlockInvalid(block.getSegmentStatus())) {
         final long deltaStartTimestamp =
             getStartTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
         final long deltaEndTimeStamp =
             getEndTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
-
-        return segDir.listFiles(new CarbonFileFilter() {
-
-          @Override
-          public boolean accept(CarbonFile pathName) {
-            String fileName = pathName.getName();
-            if (pathName.getSize() > 0
-                && fileName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) {
-              String blkName = fileName.substring(0, fileName.lastIndexOf("-"));
-              long timestamp =
-                  Long.parseLong(CarbonTablePath.DataFileUtil.getTimeStampFromFileName(fileName));
-              return blockName.equals(blkName) && timestamp <= deltaEndTimeStamp
-                  && timestamp >= deltaStartTimestamp;
-            }
-            return false;
-          }
-        });
+        Set<String> deleteDeltaFiles = new HashSet<>();
+        deleteDeltaFiles.add(segmentPath + CarbonCommonConstants.FILE_SEPARATOR +
+            blockName + CarbonCommonConstants.HYPHEN + deltaStartTimestamp +
+            CarbonCommonConstants.DELETE_DELTA_FILE_EXT);
+        deleteDeltaFiles.add(segmentPath + CarbonCommonConstants.FILE_SEPARATOR +
+            blockName + CarbonCommonConstants.HYPHEN + deltaEndTimeStamp +
+            CarbonCommonConstants.DELETE_DELTA_FILE_EXT);

Review comment:
       why are you adding two times here with start and end timestamp? This is wrong, please check




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r507834166



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##########
@@ -415,44 +415,66 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files mapped to each block of the specified segment.
+   * First list all deletedelta files in the segment dir, then loop the files and find
+   * a map of blocks and .deletedelta files related to each block.
+   *
+   * @param seg the segment which is to find blocks
+   * @return a map of block and its file list
    */
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final String blockName) {
-    String segmentPath = CarbonTablePath.getSegmentPath(
-        identifier.getTablePath(), segmentId.getSegmentNo());
-    CarbonFile segDir =
-        FileFactory.getCarbonFile(segmentPath);
+  public Map<String, List<CarbonFile>> getDeleteDeltaFilesList(final Segment seg) {
+
+    Map<String, long[]> blockDeltaStartAndEndTimestampMap = new HashMap<>();

Review comment:
       @shenjiayu17 why exactly do we need to change here? Introduce map and all, still we are doing the list files which is costly operation. I have some points, please check
   
   1. Here no need to create these map, again listing files to fill map.
   we are already getting the blockname and every blovck will have one corresponding deletedelta file only right. So always the delta files per block block will be one. 
   2. The updatedetails contains the blockname, actual blockname, timestamps and delete delta timestamps so you can check if the timestamp not empty, you can yourself form the delta file name based on these info and return from this method.
   
   With the above approach, you will avoid list files operation, filtering operation based on timestamp and creating all these maps. So you can avoid all these changes.
   
   Always we keep the horizontal compaction threshold as 1, and we dont change and dont recommend users to change to get the better performance.

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##########
@@ -173,6 +176,9 @@ object HorizontalCompaction {
 
     val db = carbonTable.getDatabaseName
     val table = carbonTable.getTableName
+
+    LOG.info(s"Horizontal Delete Compaction operation is getting valid segments for [$db.$table].")

Review comment:
       same as above

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##########
@@ -125,6 +125,9 @@ object HorizontalCompaction {
       segLists: util.List[Segment]): Unit = {
     val db = carbonTable.getDatabaseName
     val table = carbonTable.getTableName
+
+    LOG.info(s"Horizontal Update Compaction operation is getting valid segments for [$db.$table].")

Review comment:
       i think this log does not give any useful info here, if you put some log after line 133 and print `validSegList ` it looks little useful




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-717024597


   We tested with 10 segments total 10G and update more than 20 times to see the cost, update row count of each time is 176973.
   The result and comparison before and after improving is shown below,  we can see that time-consuming of UPDATE reduces by about 30% after improving horizontal compaction.
   
   |   | Time Before Improving | Time after Improving | 
   | ------- | ------------- | ------------- | 
   | Average time <br> from 2nd to 21st UPDATE | **99.19** | **67.55** |
   | UPDATE  |     |     |
   | 1st     | 38.397  |  71.91   |
   | 2nd     | 74.918  | 67.386   |
   | 3rd     | 43.851  | 46.171   |
   | 4th     | 52.133  | 82.974   |
   | 5th     | 86.021  | 44.499   |
   | 6th     | 62.992  | 39.973   |
   | 7th     | 99.077  | 69.292   |
   | 8th     | 77.815  | 60.416   |
   | 9th     | 74.949  | 50.187   |
   | 10th     | 85.874  | 50.973   |
   | 11th     | 103.902  | 58.005   |
   | 12th     | 77.534  | 86.087   |
   | 13th     | 79.154  | 76.283   |
   | 14th     | 116.117  | 72.029   |
   | 15th     | 112.846  | 74.182   |
   | 16th     | 124.707  | 69.282   |
   | 17th     | 124.677  | 67.546   |
   | 18th     | 147.15   | 66.652   |
   | 19th     | 135.127  | 109.385   |
   | 20th     | 133.94   | 80.849   |
   | 21st     | 171.112  | 78.92  |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r507380031



##########
File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##########
@@ -1246,8 +1209,22 @@ public static boolean isHorizontalCompactionEnabled() {
     // set the update status.
     segmentUpdateStatusManager.setUpdateStatusDetails(segmentUpdateDetails);
 
-    CarbonFile[] deleteDeltaFiles =
-        segmentUpdateStatusManager.getDeleteDeltaFilesList(new Segment(seg), blockName);
+    // only when SegmentUpdateDetails contain the specified block
+    // will the method getDeleteDeltaFilesList be executed
+    List<String> blockNameList = segmentUpdateStatusManager.getBlockNameFromSegment(seg);
+    Map<String, List<CarbonFile>> blockAndDeleteDeltaFilesMap = new HashMap<>();
+    CarbonFile[] deleteDeltaFiles = null;
+    if (blockNameList.contains(blockName)) {
+      blockAndDeleteDeltaFilesMap =
+          segmentUpdateStatusManager.getDeleteDeltaFilesList(new Segment(seg));
+    }
+    if (blockAndDeleteDeltaFilesMap.containsKey(blockName)) {
+      List<CarbonFile> deleteDeltaFileList = blockAndDeleteDeltaFilesMap.get(blockName);
+      deleteDeltaFiles = deleteDeltaFileList.toArray(new CarbonFile[deleteDeltaFileList.size()]);
+    }
+
+    // CarbonFile[] deleteDeltaFiles =

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r508943092



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##########
@@ -415,44 +415,66 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files mapped to each block of the specified segment.
+   * First list all deletedelta files in the segment dir, then loop the files and find
+   * a map of blocks and .deletedelta files related to each block.
+   *
+   * @param seg the segment which is to find blocks
+   * @return a map of block and its file list
    */
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final String blockName) {
-    String segmentPath = CarbonTablePath.getSegmentPath(
-        identifier.getTablePath(), segmentId.getSegmentNo());
-    CarbonFile segDir =
-        FileFactory.getCarbonFile(segmentPath);
+  public Map<String, List<CarbonFile>> getDeleteDeltaFilesList(final Segment seg) {
+
+    Map<String, long[]> blockDeltaStartAndEndTimestampMap = new HashMap<>();

Review comment:
       I modified this method as above approach. 
   The delete delta file name is generated from updatedetails, without listing files




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509979972



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##########
@@ -130,6 +130,7 @@ object HorizontalCompaction {
       absTableIdentifier,
       segmentUpdateStatusManager,
       compactionTypeIUD)
+    LOG.debug(s"The segment list for Horizontal Update Compaction is ${ validSegList }")

Review comment:
       remove brackets, which are unnecessary




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-717880461


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4716/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r510001821



##########
File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##########
@@ -1138,73 +1126,36 @@ private static Boolean checkUpdateDeltaFilesInSeg(Segment seg,
   }
 
   /**
-   * Check is the segment passed qualifies for IUD delete delta compaction or not i.e.
-   * if the number of delete delta files present in the segment is more than
-   * numberDeltaFilesThreshold.
+   * Check whether the segment passed qualifies for IUD delete delta compaction or not,
+   * i.e., if the number of delete delta files present in the segment is more than
+   * numberDeltaFilesThreshold, this segment will be selected.
    *
-   * @param seg
-   * @param segmentUpdateStatusManager
-   * @param numberDeltaFilesThreshold
-   * @return
+   * @param seg segment to be qualified
+   * @param segmentUpdateStatusManager segments & blocks details management
+   * @param numberDeltaFilesThreshold threshold of delete delta files
+   * @return block list of the segment
    */
-  private static boolean checkDeleteDeltaFilesInSeg(Segment seg,
+  private static List<String> checkDeleteDeltaFilesInSeg(Segment seg,
       SegmentUpdateStatusManager segmentUpdateStatusManager, int numberDeltaFilesThreshold) {
 
+    List<String> blockLists = new ArrayList<>();
     Set<String> uniqueBlocks = new HashSet<String>();
     List<String> blockNameList =
         segmentUpdateStatusManager.getBlockNameFromSegment(seg.getSegmentNo());
-
-    for (final String blockName : blockNameList) {
-
-      CarbonFile[] deleteDeltaFiles =
+    for (String blockName : blockNameList) {
+      List<String> deleteDeltaFiles =
           segmentUpdateStatusManager.getDeleteDeltaFilesList(seg, blockName);
-      if (null != deleteDeltaFiles) {
-        // The Delete Delta files may have Spill over blocks. Will consider multiple spill over
-        // blocks as one. Currently DeleteDeltaFiles array contains Delete Delta Block name which
-        // lies within Delete Delta Start TimeStamp and End TimeStamp. In order to eliminate
-        // Spill Over Blocks will choose files with unique taskID.
-        for (CarbonFile blocks : deleteDeltaFiles) {
-          // Get Task ID and the Timestamp from the Block name for e.g.
-          // part-0-3-1481084721319.carbondata => "3-1481084721319"
-          String task = CarbonTablePath.DataFileUtil.getTaskNo(blocks.getName());
-          String timestamp =
-              CarbonTablePath.DataFileUtil.getTimeStampFromDeleteDeltaFile(blocks.getName());
-          String taskAndTimeStamp = task + "-" + timestamp;
+      if (null != deleteDeltaFiles && deleteDeltaFiles.size() > numberDeltaFilesThreshold) {
+        for (String file : deleteDeltaFiles) {

Review comment:
       ```suggestion
           for (String deleteDeltaFile: deleteDeltaFiles) {
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509980034



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##########
@@ -177,6 +178,7 @@ object HorizontalCompaction {
       absTableIdentifier,
       segmentUpdateStatusManager,
       compactionTypeIUD)
+    LOG.debug(s"The segment list for Horizontal Update Compaction is ${ deletedBlocksList }")

Review comment:
       remove brackets, which are unnecessary




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-709984578


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4484/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509221753



##########
File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##########
@@ -1039,22 +1039,24 @@ private static boolean isSegmentValid(LoadMetadataDetails seg) {
     if (CompactionType.IUD_DELETE_DELTA == compactionTypeIUD) {
       int numberDeleteDeltaFilesThreshold =
           CarbonProperties.getInstance().getNoDeleteDeltaFilesThresholdForIUDCompaction();
-      List<Segment> deleteSegments = new ArrayList<>();
-      for (Segment seg : segments) {
-        if (checkDeleteDeltaFilesInSeg(seg, segmentUpdateStatusManager,
-            numberDeleteDeltaFilesThreshold)) {
-          deleteSegments.add(seg);
+
+      // firstly find the valid segments which are updated from SegmentUpdateDetails,
+      // in order to reduce the segment list for behind traversal
+      List<String> segmentsPresentInSegmentUpdateDetails = new ArrayList<>();

Review comment:
       you dont need to delete the old code, its already going inside logic if the `updateDetails `are present, So you can just put the `getDeleteDeltaFilesInSeg` method inside `checkDeleteDeltaFilesInSeg` and no need to completely modify `getSegListIUDCompactionQualified`, it will be more clean.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-713264301


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4555/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509943740



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##########
@@ -415,44 +415,38 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files of the block of specified segment.
+   * Actually, delete delta file name is generated from each SegmentUpdateDetails.
+   *
+   * @param seg the segment which is to find block and its delete delta files
+   * @param blockName the specified block of the segment
+   * @return delete delta file list of the block
    */
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final String blockName) {
+  public List<String> getDeleteDeltaFilesList(final Segment seg, final String blockName) {
+
+    List<String> deleteDeltaFileList = new ArrayList<>();
     String segmentPath = CarbonTablePath.getSegmentPath(
-        identifier.getTablePath(), segmentId.getSegmentNo());
-    CarbonFile segDir =
-        FileFactory.getCarbonFile(segmentPath);
+        identifier.getTablePath(), seg.getSegmentNo());
+
     for (SegmentUpdateDetails block : updateDetails) {
       if ((block.getBlockName().equalsIgnoreCase(blockName)) &&
-          (block.getSegmentName().equalsIgnoreCase(segmentId.getSegmentNo()))
-          && !CarbonUpdateUtil.isBlockInvalid((block.getSegmentStatus()))) {
+          (block.getSegmentName().equalsIgnoreCase(seg.getSegmentNo())) &&
+          !CarbonUpdateUtil.isBlockInvalid(block.getSegmentStatus())) {
         final long deltaStartTimestamp =
             getStartTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
         final long deltaEndTimeStamp =
             getEndTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
-
-        return segDir.listFiles(new CarbonFileFilter() {
-
-          @Override
-          public boolean accept(CarbonFile pathName) {
-            String fileName = pathName.getName();
-            if (pathName.getSize() > 0
-                && fileName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) {
-              String blkName = fileName.substring(0, fileName.lastIndexOf("-"));
-              long timestamp =
-                  Long.parseLong(CarbonTablePath.DataFileUtil.getTimeStampFromFileName(fileName));
-              return blockName.equals(blkName) && timestamp <= deltaEndTimeStamp
-                  && timestamp >= deltaStartTimestamp;
-            }
-            return false;
-          }
-        });
+        Set<String> deleteDeltaFiles = new HashSet<>();
+        deleteDeltaFiles.add(segmentPath + CarbonCommonConstants.FILE_SEPARATOR +
+            blockName + CarbonCommonConstants.HYPHEN + deltaStartTimestamp +
+            CarbonCommonConstants.DELETE_DELTA_FILE_EXT);
+        deleteDeltaFiles.add(segmentPath + CarbonCommonConstants.FILE_SEPARATOR +
+            blockName + CarbonCommonConstants.HYPHEN + deltaEndTimeStamp +
+            CarbonCommonConstants.DELETE_DELTA_FILE_EXT);

Review comment:
       I modified this method by calling `getDeltaFileStamps ` as the comment below




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-712853350


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2785/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] marchpure commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
marchpure commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r505670979



##########
File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##########
@@ -1210,6 +1198,39 @@ private static boolean checkDeleteDeltaFilesInSeg(Segment seg,
     return blockLists;
   }
 
+  private static List<String> checkAndGetDeleteDeltaFilesInSeg(Segment seg,
+      SegmentUpdateStatusManager segmentUpdateStatusManager, int numberDeltaFilesThreshold) {
+
+    List<String> blockLists = new ArrayList<>();
+
+    Map<String, List<CarbonFile>> blockAndDeleteDeltaFilesMap =
+      segmentUpdateStatusManager.getDeleteDeltaFilesForSegment(seg);
+
+    List<String> blockNameList =
+      segmentUpdateStatusManager.getBlockNameFromSegment(seg.getSegmentNo());
+
+    Set<String> uniqueBlocks = new HashSet<String>();
+    for (final String blockName : blockNameList) {
+
+      List<CarbonFile> deleteDeltaFiles = blockAndDeleteDeltaFilesMap.get(blockName);
+
+      if (null != deleteDeltaFiles) {
+        for (CarbonFile blocks : deleteDeltaFiles) {

Review comment:
       if (delteDeltaFiles.size < threshold) continue

##########
File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##########
@@ -1039,22 +1039,10 @@ private static boolean isSegmentValid(LoadMetadataDetails seg) {
     if (CompactionType.IUD_DELETE_DELTA == compactionTypeIUD) {
       int numberDeleteDeltaFilesThreshold =
           CarbonProperties.getInstance().getNoDeleteDeltaFilesThresholdForIUDCompaction();
-      List<Segment> deleteSegments = new ArrayList<>();
       for (Segment seg : segments) {
-        if (checkDeleteDeltaFilesInSeg(seg, segmentUpdateStatusManager,

Review comment:
       remove checkDeleteDeltaFilesInSeg function

##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##########
@@ -455,6 +455,51 @@ public boolean accept(CarbonFile pathName) {
     return null;
   }
 
+  public Map<String, List<CarbonFile>> getDeleteDeltaFilesForSegment(final Segment seg) {

Review comment:
       remove getDeleteDeltaFilesList function

##########
File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##########
@@ -1210,6 +1198,39 @@ private static boolean checkDeleteDeltaFilesInSeg(Segment seg,
     return blockLists;
   }
 
+  private static List<String> checkAndGetDeleteDeltaFilesInSeg(Segment seg,
+      SegmentUpdateStatusManager segmentUpdateStatusManager, int numberDeltaFilesThreshold) {
+
+    List<String> blockLists = new ArrayList<>();
+
+    Map<String, List<CarbonFile>> blockAndDeleteDeltaFilesMap =
+      segmentUpdateStatusManager.getDeleteDeltaFilesForSegment(seg);
+
+    List<String> blockNameList =
+      segmentUpdateStatusManager.getBlockNameFromSegment(seg.getSegmentNo());
+
+    Set<String> uniqueBlocks = new HashSet<String>();
+    for (final String blockName : blockNameList) {
+
+      List<CarbonFile> deleteDeltaFiles = blockAndDeleteDeltaFilesMap.get(blockName);
+
+      if (null != deleteDeltaFiles) {
+        for (CarbonFile blocks : deleteDeltaFiles) {

Review comment:
       blocks -> block

##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##########
@@ -455,6 +455,51 @@ public boolean accept(CarbonFile pathName) {
     return null;
   }
 
+  public Map<String, List<CarbonFile>> getDeleteDeltaFilesForSegment(final Segment seg) {
+    String segmentPath = CarbonTablePath.getSegmentPath(
+      identifier.getTablePath(), seg.getSegmentNo());
+    CarbonFile segDir = FileFactory.getCarbonFile(segmentPath);
+    CarbonFile[] allDeleteDeltaFilesOfSegment = segDir.listFiles(new CarbonFileFilter() {
+      @Override
+      public boolean accept(CarbonFile pathName) {
+        String fileName = pathName.getName();
+        return (pathName.getSize() > 0) &&

Review comment:
       getSize() will trigger one S3 IO.
   remove getsSize()

##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##########
@@ -455,6 +455,51 @@ public boolean accept(CarbonFile pathName) {
     return null;
   }
 
+  public Map<String, List<CarbonFile>> getDeleteDeltaFilesForSegment(final Segment seg) {
+    String segmentPath = CarbonTablePath.getSegmentPath(
+      identifier.getTablePath(), seg.getSegmentNo());

Review comment:
       if SegmentUpdateDetails  donot contains seg, we shall return empty result directly.
   which can save a lot of IO overhead

##########
File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##########
@@ -1210,6 +1198,39 @@ private static boolean checkDeleteDeltaFilesInSeg(Segment seg,
     return blockLists;
   }
 
+  private static List<String> checkAndGetDeleteDeltaFilesInSeg(Segment seg,
+      SegmentUpdateStatusManager segmentUpdateStatusManager, int numberDeltaFilesThreshold) {
+
+    List<String> blockLists = new ArrayList<>();
+
+    Map<String, List<CarbonFile>> blockAndDeleteDeltaFilesMap =
+      segmentUpdateStatusManager.getDeleteDeltaFilesForSegment(seg);
+
+    List<String> blockNameList =
+      segmentUpdateStatusManager.getBlockNameFromSegment(seg.getSegmentNo());
+
+    Set<String> uniqueBlocks = new HashSet<String>();
+    for (final String blockName : blockNameList) {
+
+      List<CarbonFile> deleteDeltaFiles = blockAndDeleteDeltaFilesMap.get(blockName);
+
+      if (null != deleteDeltaFiles) {
+        for (CarbonFile blocks : deleteDeltaFiles) {
+          String task = CarbonTablePath.DataFileUtil.getTaskNo(blocks.getName());
+          String timestamp =
+            CarbonTablePath.DataFileUtil.getTimeStampFromDeleteDeltaFile(blocks.getName());
+          String taskAndTimeStamp = task + "-" + timestamp;
+          uniqueBlocks.add(taskAndTimeStamp);

Review comment:
       if (uniqueBlocks.size() > numberDeltaFilesThreshold) {
      blockLists.add(seg.getSegmentNo() + "/" + blockName);
      break;
   }




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-714490521


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2873/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-709785844


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4476/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-715196563


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4660/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] akashrn5 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
akashrn5 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-717054658


   > We tested with 10 segments total 10G and update more than 20 times to see the cost, update row count of each time is 176973.
   > The result and comparison before and after improving is shown below, we can see that time-consuming of UPDATE reduces by about 30% after improving horizontal compaction.
   > 
   > Time Before Improving	Time after Improving
   > Average time
   > from 2nd to 21st UPDATE	**99.19**	**67.55**
   > UPDATE		
   > 1st	38.397	71.91
   > 2nd	74.918	67.386
   > 3rd	43.851	46.171
   > 4th	52.133	82.974
   > 5th	86.021	44.499
   > 6th	62.992	39.973
   > 7th	99.077	69.292
   > 8th	77.815	60.416
   > 9th	74.949	50.187
   > 10th	85.874	50.973
   > 11th	103.902	58.005
   > 12th	77.534	86.087
   > 13th	79.154	76.283
   > 14th	116.117	72.029
   > 15th	112.846	74.182
   > 16th	124.707	69.282
   > 17th	124.677	67.546
   > 18th	147.15	66.652
   > 19th	135.127	109.385
   > 20th	133.94	80.849
   > 21st	171.112	78.92
   
   Cool


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-711484706


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4502/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r506203367



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##########
@@ -455,6 +455,51 @@ public boolean accept(CarbonFile pathName) {
     return null;
   }
 
+  public Map<String, List<CarbonFile>> getDeleteDeltaFilesForSegment(final Segment seg) {
+    String segmentPath = CarbonTablePath.getSegmentPath(
+      identifier.getTablePath(), seg.getSegmentNo());

Review comment:
       Done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-709712170


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2721/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r508933445



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##########
@@ -125,6 +125,9 @@ object HorizontalCompaction {
       segLists: util.List[Segment]): Unit = {
     val db = carbonTable.getDatabaseName
     val table = carbonTable.getTableName
+
+    LOG.info(s"Horizontal Update Compaction operation is getting valid segments for [$db.$table].")

Review comment:
       This log is modified, prints the validSegList and time taken to get it




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r510001358



##########
File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##########
@@ -1138,73 +1126,36 @@ private static Boolean checkUpdateDeltaFilesInSeg(Segment seg,
   }
 
   /**
-   * Check is the segment passed qualifies for IUD delete delta compaction or not i.e.
-   * if the number of delete delta files present in the segment is more than
-   * numberDeltaFilesThreshold.
+   * Check whether the segment passed qualifies for IUD delete delta compaction or not,
+   * i.e., if the number of delete delta files present in the segment is more than
+   * numberDeltaFilesThreshold, this segment will be selected.
    *
-   * @param seg
-   * @param segmentUpdateStatusManager
-   * @param numberDeltaFilesThreshold
-   * @return
+   * @param seg segment to be qualified
+   * @param segmentUpdateStatusManager segments & blocks details management
+   * @param numberDeltaFilesThreshold threshold of delete delta files
+   * @return block list of the segment
    */
-  private static boolean checkDeleteDeltaFilesInSeg(Segment seg,
+  private static List<String> checkDeleteDeltaFilesInSeg(Segment seg,
       SegmentUpdateStatusManager segmentUpdateStatusManager, int numberDeltaFilesThreshold) {
 
+    List<String> blockLists = new ArrayList<>();
     Set<String> uniqueBlocks = new HashSet<String>();
     List<String> blockNameList =
         segmentUpdateStatusManager.getBlockNameFromSegment(seg.getSegmentNo());
-
-    for (final String blockName : blockNameList) {
-
-      CarbonFile[] deleteDeltaFiles =
+    for (String blockName : blockNameList) {
+      List<String> deleteDeltaFiles =
           segmentUpdateStatusManager.getDeleteDeltaFilesList(seg, blockName);
-      if (null != deleteDeltaFiles) {
-        // The Delete Delta files may have Spill over blocks. Will consider multiple spill over

Review comment:
       please do not delete the existing comments in code, add it back




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r506020858



##########
File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##########
@@ -1210,6 +1198,39 @@ private static boolean checkDeleteDeltaFilesInSeg(Segment seg,
     return blockLists;
   }
 
+  private static List<String> checkAndGetDeleteDeltaFilesInSeg(Segment seg,
+      SegmentUpdateStatusManager segmentUpdateStatusManager, int numberDeltaFilesThreshold) {
+
+    List<String> blockLists = new ArrayList<>();
+
+    Map<String, List<CarbonFile>> blockAndDeleteDeltaFilesMap =
+      segmentUpdateStatusManager.getDeleteDeltaFilesForSegment(seg);
+
+    List<String> blockNameList =
+      segmentUpdateStatusManager.getBlockNameFromSegment(seg.getSegmentNo());
+
+    Set<String> uniqueBlocks = new HashSet<String>();
+    for (final String blockName : blockNameList) {
+
+      List<CarbonFile> deleteDeltaFiles = blockAndDeleteDeltaFilesMap.get(blockName);
+
+      if (null != deleteDeltaFiles) {
+        for (CarbonFile blocks : deleteDeltaFiles) {

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Zhangshunyu commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
Zhangshunyu commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-717027548


   LGTM


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-717880662


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2959/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-711817892


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2753/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-714352138


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2864/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r508933755



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##########
@@ -173,6 +176,9 @@ object HorizontalCompaction {
 
     val db = carbonTable.getDatabaseName
     val table = carbonTable.getTableName
+
+    LOG.info(s"Horizontal Delete Compaction operation is getting valid segments for [$db.$table].")

Review comment:
       I have modified the log, which only prints the validSegList and is in LOG.debug




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] asfgit closed pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509948458



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##########
@@ -415,44 +415,39 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files of the block of specified segment.
+   * Actually, delete delta file name is generated from each SegmentUpdateDetails.
+   *
+   * @param seg the segment which is to find block and its delete delta files
+   * @param blockName the specified block of the segment
+   * @return delete delta file list of the block
    */
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final String blockName) {
+  public List<String> getDeleteDeltaFilesList(final Segment seg, final String blockName) {
+

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-714213749






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r506020781



##########
File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##########
@@ -1210,6 +1198,39 @@ private static boolean checkDeleteDeltaFilesInSeg(Segment seg,
     return blockLists;
   }
 
+  private static List<String> checkAndGetDeleteDeltaFilesInSeg(Segment seg,
+      SegmentUpdateStatusManager segmentUpdateStatusManager, int numberDeltaFilesThreshold) {
+
+    List<String> blockLists = new ArrayList<>();
+
+    Map<String, List<CarbonFile>> blockAndDeleteDeltaFilesMap =
+      segmentUpdateStatusManager.getDeleteDeltaFilesForSegment(seg);
+
+    List<String> blockNameList =
+      segmentUpdateStatusManager.getBlockNameFromSegment(seg.getSegmentNo());
+
+    Set<String> uniqueBlocks = new HashSet<String>();
+    for (final String blockName : blockNameList) {
+
+      List<CarbonFile> deleteDeltaFiles = blockAndDeleteDeltaFilesMap.get(blockName);
+
+      if (null != deleteDeltaFiles) {
+        for (CarbonFile blocks : deleteDeltaFiles) {

Review comment:
       Done. 
   Added judgement: 
   if (deleteDeltaFiles.size() <= numberDeltaFilesThreshold) continue




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Zhangshunyu commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
Zhangshunyu commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-709698583


   Have you ever tested this optimization? Could you pls give a comparison result for this change?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] akashrn5 edited a comment on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
akashrn5 edited a comment on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-713522116


   > Have you ever tested this optimization? Could you pls give a comparison result for this change?
   
   as @Zhangshunyu said, its better if you can get the reading and update in PR, it would be great. Thanks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r507380824



##########
File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##########
@@ -1246,8 +1209,22 @@ public static boolean isHorizontalCompactionEnabled() {
     // set the update status.
     segmentUpdateStatusManager.setUpdateStatusDetails(segmentUpdateDetails);
 
-    CarbonFile[] deleteDeltaFiles =
-        segmentUpdateStatusManager.getDeleteDeltaFilesList(new Segment(seg), blockName);
+    // only when SegmentUpdateDetails contain the specified block
+    // will the method getDeleteDeltaFilesList be executed
+    List<String> blockNameList = segmentUpdateStatusManager.getBlockNameFromSegment(seg);
+    Map<String, List<CarbonFile>> blockAndDeleteDeltaFilesMap = new HashMap<>();
+    CarbonFile[] deleteDeltaFiles = null;
+    if (blockNameList.contains(blockName)) {
+      blockAndDeleteDeltaFilesMap =
+          segmentUpdateStatusManager.getDeleteDeltaFilesList(new Segment(seg));
+    }
+    if (blockAndDeleteDeltaFilesMap.containsKey(blockName)) {

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r510063450



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##########
@@ -415,44 +415,39 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files of the block of specified segment.
+   * Actually, delete delta file name is generated from each SegmentUpdateDetails.
+   *
+   * @param seg the segment which is to find block and its delete delta files
+   * @param blockName the specified block of the segment
+   * @return delete delta file list of the block
    */
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final String blockName) {
+  public List<String> getDeleteDeltaFilesList(final Segment seg, final String blockName) {
+
+    List<String> deleteDeltaFileList = new ArrayList<>();
     String segmentPath = CarbonTablePath.getSegmentPath(
-        identifier.getTablePath(), segmentId.getSegmentNo());
-    CarbonFile segDir =
-        FileFactory.getCarbonFile(segmentPath);
+        identifier.getTablePath(), seg.getSegmentNo());
+
     for (SegmentUpdateDetails block : updateDetails) {
       if ((block.getBlockName().equalsIgnoreCase(blockName)) &&
-          (block.getSegmentName().equalsIgnoreCase(segmentId.getSegmentNo()))
-          && !CarbonUpdateUtil.isBlockInvalid((block.getSegmentStatus()))) {
-        final long deltaStartTimestamp =
-            getStartTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
-        final long deltaEndTimeStamp =
-            getEndTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
-
-        return segDir.listFiles(new CarbonFileFilter() {
-
-          @Override
-          public boolean accept(CarbonFile pathName) {
-            String fileName = pathName.getName();
-            if (pathName.getSize() > 0
-                && fileName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) {
-              String blkName = fileName.substring(0, fileName.lastIndexOf("-"));
-              long timestamp =
-                  Long.parseLong(CarbonTablePath.DataFileUtil.getTimeStampFromFileName(fileName));
-              return blockName.equals(blkName) && timestamp <= deltaEndTimeStamp
-                  && timestamp >= deltaStartTimestamp;
-            }
-            return false;
-          }
-        });
+          (block.getSegmentName().equalsIgnoreCase(seg.getSegmentNo())) &&
+          !CarbonUpdateUtil.isBlockInvalid(block.getSegmentStatus())) {
+        Set<String> deltaFileTimestamps = block.getDeltaFileStamps();
+        String deleteDeltaFilePrefix = segmentPath + CarbonCommonConstants.FILE_SEPARATOR +
+            blockName + CarbonCommonConstants.HYPHEN;
+        if (deltaFileTimestamps != null && deltaFileTimestamps.size() > 0) {
+          deltaFileTimestamps.forEach(timestamp -> deleteDeltaFileList.add(
+              deleteDeltaFilePrefix + timestamp + CarbonCommonConstants.DELETE_DELTA_FILE_EXT));
+        } else {
+          final long deltaEndTimeStamp =

Review comment:
       ok, I understood.  Added a comment here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] akashrn5 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
akashrn5 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-713522116


   > Have you ever tested this optimization? Could you pls give a comparison result for this change?
   as @Zhangshunyu said, its better if you can get the reading and update in PR, it would be great. Thanks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] marchpure commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
marchpure commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-709459684


   checkstyle failes. you can have a checkstyple test in your local env by run:
   mvn clean install -DskipTests
   metions: don't use mvn clean package -DskipTests


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-709457343


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2712/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-711479262


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2747/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r506202461



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##########
@@ -455,6 +455,51 @@ public boolean accept(CarbonFile pathName) {
     return null;
   }
 
+  public Map<String, List<CarbonFile>> getDeleteDeltaFilesForSegment(final Segment seg) {

Review comment:
       Done.
   Keep function name getDeleteDeltaFilesList and change the return type to Map<String, List<CarbonFile>>




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-709323928


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-713265280


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2803/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509190649



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##########
@@ -125,12 +125,18 @@ object HorizontalCompaction {
       segLists: util.List[Segment]): Unit = {
     val db = carbonTable.getDatabaseName
     val table = carbonTable.getTableName
+    val startTime = System.nanoTime()
+
     // get the valid segments qualified for update compaction.
     val validSegList = CarbonDataMergerUtil.getSegListIUDCompactionQualified(segLists,
       absTableIdentifier,
       segmentUpdateStatusManager,
       compactionTypeIUD)
 
+    val endTime = System.nanoTime()
+    LOG.info(s"time taken to get segment list for Horizontal Update Compaction is" +

Review comment:
       I dont think time in this log will help us, just segment list can help in someway, not time




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r507380663



##########
File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##########
@@ -1246,8 +1209,22 @@ public static boolean isHorizontalCompactionEnabled() {
     // set the update status.
     segmentUpdateStatusManager.setUpdateStatusDetails(segmentUpdateDetails);
 
-    CarbonFile[] deleteDeltaFiles =
-        segmentUpdateStatusManager.getDeleteDeltaFilesList(new Segment(seg), blockName);
+    // only when SegmentUpdateDetails contain the specified block
+    // will the method getDeleteDeltaFilesList be executed
+    List<String> blockNameList = segmentUpdateStatusManager.getBlockNameFromSegment(seg);
+    Map<String, List<CarbonFile>> blockAndDeleteDeltaFilesMap = new HashMap<>();
+    CarbonFile[] deleteDeltaFiles = null;
+    if (blockNameList.contains(blockName)) {

Review comment:
       Done. Combined the two judgement




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-712840146


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4539/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-709454528


   Add to whitelist


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r506020565



##########
File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##########
@@ -1210,6 +1198,39 @@ private static boolean checkDeleteDeltaFilesInSeg(Segment seg,
     return blockLists;
   }
 
+  private static List<String> checkAndGetDeleteDeltaFilesInSeg(Segment seg,
+      SegmentUpdateStatusManager segmentUpdateStatusManager, int numberDeltaFilesThreshold) {
+
+    List<String> blockLists = new ArrayList<>();
+
+    Map<String, List<CarbonFile>> blockAndDeleteDeltaFilesMap =
+      segmentUpdateStatusManager.getDeleteDeltaFilesForSegment(seg);
+
+    List<String> blockNameList =
+      segmentUpdateStatusManager.getBlockNameFromSegment(seg.getSegmentNo());
+
+    Set<String> uniqueBlocks = new HashSet<String>();
+    for (final String blockName : blockNameList) {
+
+      List<CarbonFile> deleteDeltaFiles = blockAndDeleteDeltaFilesMap.get(blockName);
+
+      if (null != deleteDeltaFiles) {
+        for (CarbonFile blocks : deleteDeltaFiles) {
+          String task = CarbonTablePath.DataFileUtil.getTaskNo(blocks.getName());
+          String timestamp =
+            CarbonTablePath.DataFileUtil.getTimeStampFromDeleteDeltaFile(blocks.getName());
+          String taskAndTimeStamp = task + "-" + timestamp;
+          uniqueBlocks.add(taskAndTimeStamp);

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509190746



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##########
@@ -173,11 +179,17 @@ object HorizontalCompaction {
 
     val db = carbonTable.getDatabaseName
     val table = carbonTable.getTableName
+    val startTime = System.nanoTime()
+
     val deletedBlocksList = CarbonDataMergerUtil.getSegListIUDCompactionQualified(segLists,
       absTableIdentifier,
       segmentUpdateStatusManager,
       compactionTypeIUD)
 
+    val endTime = System.nanoTime()
+    LOG.info(s"time taken to get deleted block list for Horizontal Delete Compaction is" +

Review comment:
       same as above




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-714364595


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4619/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] akashrn5 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
akashrn5 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-715336355






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r510677675



##########
File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##########
@@ -1138,73 +1126,36 @@ private static Boolean checkUpdateDeltaFilesInSeg(Segment seg,
   }
 
   /**
-   * Check is the segment passed qualifies for IUD delete delta compaction or not i.e.
-   * if the number of delete delta files present in the segment is more than
-   * numberDeltaFilesThreshold.
+   * Check whether the segment passed qualifies for IUD delete delta compaction or not,
+   * i.e., if the number of delete delta files present in the segment is more than
+   * numberDeltaFilesThreshold, this segment will be selected.
    *
-   * @param seg
-   * @param segmentUpdateStatusManager
-   * @param numberDeltaFilesThreshold
-   * @return
+   * @param seg segment to be qualified
+   * @param segmentUpdateStatusManager segments & blocks details management
+   * @param numberDeltaFilesThreshold threshold of delete delta files
+   * @return block list of the segment
    */
-  private static boolean checkDeleteDeltaFilesInSeg(Segment seg,
+  private static List<String> checkDeleteDeltaFilesInSeg(Segment seg,
       SegmentUpdateStatusManager segmentUpdateStatusManager, int numberDeltaFilesThreshold) {
 
+    List<String> blockLists = new ArrayList<>();
     Set<String> uniqueBlocks = new HashSet<String>();
     List<String> blockNameList =
         segmentUpdateStatusManager.getBlockNameFromSegment(seg.getSegmentNo());
-
-    for (final String blockName : blockNameList) {
-
-      CarbonFile[] deleteDeltaFiles =
+    for (String blockName : blockNameList) {
+      List<String> deleteDeltaFiles =
           segmentUpdateStatusManager.getDeleteDeltaFilesList(seg, blockName);
-      if (null != deleteDeltaFiles) {
-        // The Delete Delta files may have Spill over blocks. Will consider multiple spill over
-        // blocks as one. Currently DeleteDeltaFiles array contains Delete Delta Block name which
-        // lies within Delete Delta Start TimeStamp and End TimeStamp. In order to eliminate
-        // Spill Over Blocks will choose files with unique taskID.
-        for (CarbonFile blocks : deleteDeltaFiles) {
-          // Get Task ID and the Timestamp from the Block name for e.g.
-          // part-0-3-1481084721319.carbondata => "3-1481084721319"
-          String task = CarbonTablePath.DataFileUtil.getTaskNo(blocks.getName());
-          String timestamp =
-              CarbonTablePath.DataFileUtil.getTimeStampFromDeleteDeltaFile(blocks.getName());
-          String taskAndTimeStamp = task + "-" + timestamp;
+      if (null != deleteDeltaFiles && deleteDeltaFiles.size() > numberDeltaFilesThreshold) {
+        for (String file : deleteDeltaFiles) {

Review comment:
       formatted and modified the variable name




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r507380251



##########
File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##########
@@ -1246,8 +1209,22 @@ public static boolean isHorizontalCompactionEnabled() {
     // set the update status.
     segmentUpdateStatusManager.setUpdateStatusDetails(segmentUpdateDetails);
 
-    CarbonFile[] deleteDeltaFiles =
-        segmentUpdateStatusManager.getDeleteDeltaFilesList(new Segment(seg), blockName);
+    // only when SegmentUpdateDetails contain the specified block
+    // will the method getDeleteDeltaFilesList be executed
+    List<String> blockNameList = segmentUpdateStatusManager.getBlockNameFromSegment(seg);
+    Map<String, List<CarbonFile>> blockAndDeleteDeltaFilesMap = new HashMap<>();
+    CarbonFile[] deleteDeltaFiles = null;

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509947651



##########
File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##########
@@ -1039,22 +1039,24 @@ private static boolean isSegmentValid(LoadMetadataDetails seg) {
     if (CompactionType.IUD_DELETE_DELTA == compactionTypeIUD) {
       int numberDeleteDeltaFilesThreshold =
           CarbonProperties.getInstance().getNoDeleteDeltaFilesThresholdForIUDCompaction();
-      List<Segment> deleteSegments = new ArrayList<>();
-      for (Segment seg : segments) {
-        if (checkDeleteDeltaFilesInSeg(seg, segmentUpdateStatusManager,
-            numberDeleteDeltaFilesThreshold)) {
-          deleteSegments.add(seg);
+
+      // firstly find the valid segments which are updated from SegmentUpdateDetails,
+      // in order to reduce the segment list for behind traversal
+      List<String> segmentsPresentInSegmentUpdateDetails = new ArrayList<>();

Review comment:
       combined the `getDeleteDeltaFilesInSeg`and `checkDeleteDeltaFilesInSeg` , and kept the method name `checkDeleteDeltaFilesInSeg` and let it return `List<String>`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509920810



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##########
@@ -415,44 +415,39 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files of the block of specified segment.
+   * Actually, delete delta file name is generated from each SegmentUpdateDetails.
+   *
+   * @param seg the segment which is to find block and its delete delta files
+   * @param blockName the specified block of the segment
+   * @return delete delta file list of the block
    */
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final String blockName) {
+  public List<String> getDeleteDeltaFilesList(final Segment seg, final String blockName) {
+
+    List<String> deleteDeltaFileList = new ArrayList<>();
     String segmentPath = CarbonTablePath.getSegmentPath(
-        identifier.getTablePath(), segmentId.getSegmentNo());
-    CarbonFile segDir =
-        FileFactory.getCarbonFile(segmentPath);
+        identifier.getTablePath(), seg.getSegmentNo());
+
     for (SegmentUpdateDetails block : updateDetails) {
       if ((block.getBlockName().equalsIgnoreCase(blockName)) &&
-          (block.getSegmentName().equalsIgnoreCase(segmentId.getSegmentNo()))
-          && !CarbonUpdateUtil.isBlockInvalid((block.getSegmentStatus()))) {
-        final long deltaStartTimestamp =
-            getStartTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
-        final long deltaEndTimeStamp =
-            getEndTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
-
-        return segDir.listFiles(new CarbonFileFilter() {
-
-          @Override
-          public boolean accept(CarbonFile pathName) {
-            String fileName = pathName.getName();
-            if (pathName.getSize() > 0
-                && fileName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) {
-              String blkName = fileName.substring(0, fileName.lastIndexOf("-"));
-              long timestamp =
-                  Long.parseLong(CarbonTablePath.DataFileUtil.getTimeStampFromFileName(fileName));
-              return blockName.equals(blkName) && timestamp <= deltaEndTimeStamp
-                  && timestamp >= deltaStartTimestamp;
-            }
-            return false;
-          }
-        });
+          (block.getSegmentName().equalsIgnoreCase(seg.getSegmentNo())) &&
+          !CarbonUpdateUtil.isBlockInvalid(block.getSegmentStatus())) {
+        Set<String> deltaFileTimestamps = block.getDeltaFileStamps();
+        String deleteDeltaFilePrefix = segmentPath + CarbonCommonConstants.FILE_SEPARATOR +
+            blockName + CarbonCommonConstants.HYPHEN;
+        if (deltaFileTimestamps != null && deltaFileTimestamps.size() > 0) {
+          deltaFileTimestamps.forEach(timestamp -> deleteDeltaFileList.add(
+              deleteDeltaFilePrefix + timestamp + CarbonCommonConstants.DELETE_DELTA_FILE_EXT));
+        } else {
+          final long deltaEndTimeStamp =

Review comment:
       add a comment here saying, when the deltaFileTimestamps  is null, then there is only one delta file and update details will have same start and end timestamps




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r513248146



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##########
@@ -415,44 +415,41 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files of the block of specified segment.
+   * Actually, delete delta file name is generated from each SegmentUpdateDetails.
+   *
+   * @param seg the segment which is to find block and its delete delta files
+   * @param blockName the specified block of the segment
+   * @return delete delta file list of the block
    */
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final String blockName) {
+  public List<String> getDeleteDeltaFilesList(final Segment seg, final String blockName) {
+    List<String> deleteDeltaFileList = new ArrayList<>();
     String segmentPath = CarbonTablePath.getSegmentPath(
-        identifier.getTablePath(), segmentId.getSegmentNo());
-    CarbonFile segDir =
-        FileFactory.getCarbonFile(segmentPath);
+        identifier.getTablePath(), seg.getSegmentNo());
+
     for (SegmentUpdateDetails block : updateDetails) {
       if ((block.getBlockName().equalsIgnoreCase(blockName)) &&
-          (block.getSegmentName().equalsIgnoreCase(segmentId.getSegmentNo()))
-          && !CarbonUpdateUtil.isBlockInvalid((block.getSegmentStatus()))) {
-        final long deltaStartTimestamp =
-            getStartTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
-        final long deltaEndTimeStamp =
-            getEndTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
-
-        return segDir.listFiles(new CarbonFileFilter() {
-
-          @Override
-          public boolean accept(CarbonFile pathName) {
-            String fileName = pathName.getName();
-            if (pathName.getSize() > 0
-                && fileName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) {
-              String blkName = fileName.substring(0, fileName.lastIndexOf("-"));
-              long timestamp =
-                  Long.parseLong(CarbonTablePath.DataFileUtil.getTimeStampFromFileName(fileName));
-              return blockName.equals(blkName) && timestamp <= deltaEndTimeStamp
-                  && timestamp >= deltaStartTimestamp;
-            }
-            return false;
-          }
-        });
+          (block.getSegmentName().equalsIgnoreCase(seg.getSegmentNo())) &&
+          !CarbonUpdateUtil.isBlockInvalid(block.getSegmentStatus())) {
+        Set<String> deltaFileTimestamps = block.getDeltaFileStamps();
+        String deleteDeltaFilePrefix = segmentPath + CarbonCommonConstants.FILE_SEPARATOR +
+            blockName + CarbonCommonConstants.HYPHEN;
+        if (deltaFileTimestamps != null && deltaFileTimestamps.size() > 0) {

Review comment:
       Added this test case "test IUD Horizontal Compaction after updating without compaction"
   in HorizontalCompactionTestCase




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-709459321


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4466/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r508933445



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##########
@@ -125,6 +125,9 @@ object HorizontalCompaction {
       segLists: util.List[Segment]): Unit = {
     val db = carbonTable.getDatabaseName
     val table = carbonTable.getTableName
+
+    LOG.info(s"Horizontal Update Compaction operation is getting valid segments for [$db.$table].")

Review comment:
       I have modified the log, which only prints the validSegList and is in LOG.debug




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r506020950



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##########
@@ -455,6 +455,51 @@ public boolean accept(CarbonFile pathName) {
     return null;
   }
 
+  public Map<String, List<CarbonFile>> getDeleteDeltaFilesForSegment(final Segment seg) {
+    String segmentPath = CarbonTablePath.getSegmentPath(
+      identifier.getTablePath(), seg.getSegmentNo());
+    CarbonFile segDir = FileFactory.getCarbonFile(segmentPath);
+    CarbonFile[] allDeleteDeltaFilesOfSegment = segDir.listFiles(new CarbonFileFilter() {
+      @Override
+      public boolean accept(CarbonFile pathName) {
+        String fileName = pathName.getName();
+        return (pathName.getSize() > 0) &&

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r510002204



##########
File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##########
@@ -1138,73 +1126,36 @@ private static Boolean checkUpdateDeltaFilesInSeg(Segment seg,
   }
 
   /**
-   * Check is the segment passed qualifies for IUD delete delta compaction or not i.e.
-   * if the number of delete delta files present in the segment is more than
-   * numberDeltaFilesThreshold.
+   * Check whether the segment passed qualifies for IUD delete delta compaction or not,
+   * i.e., if the number of delete delta files present in the segment is more than
+   * numberDeltaFilesThreshold, this segment will be selected.
    *
-   * @param seg
-   * @param segmentUpdateStatusManager
-   * @param numberDeltaFilesThreshold
-   * @return
+   * @param seg segment to be qualified
+   * @param segmentUpdateStatusManager segments & blocks details management
+   * @param numberDeltaFilesThreshold threshold of delete delta files
+   * @return block list of the segment
    */
-  private static boolean checkDeleteDeltaFilesInSeg(Segment seg,
+  private static List<String> checkDeleteDeltaFilesInSeg(Segment seg,
       SegmentUpdateStatusManager segmentUpdateStatusManager, int numberDeltaFilesThreshold) {
 
+    List<String> blockLists = new ArrayList<>();
     Set<String> uniqueBlocks = new HashSet<String>();
     List<String> blockNameList =
         segmentUpdateStatusManager.getBlockNameFromSegment(seg.getSegmentNo());
-
-    for (final String blockName : blockNameList) {
-
-      CarbonFile[] deleteDeltaFiles =
+    for (String blockName : blockNameList) {
+      List<String> deleteDeltaFiles =
           segmentUpdateStatusManager.getDeleteDeltaFilesList(seg, blockName);
-      if (null != deleteDeltaFiles) {
-        // The Delete Delta files may have Spill over blocks. Will consider multiple spill over
-        // blocks as one. Currently DeleteDeltaFiles array contains Delete Delta Block name which
-        // lies within Delete Delta Start TimeStamp and End TimeStamp. In order to eliminate
-        // Spill Over Blocks will choose files with unique taskID.
-        for (CarbonFile blocks : deleteDeltaFiles) {
-          // Get Task ID and the Timestamp from the Block name for e.g.
-          // part-0-3-1481084721319.carbondata => "3-1481084721319"
-          String task = CarbonTablePath.DataFileUtil.getTaskNo(blocks.getName());
-          String timestamp =
-              CarbonTablePath.DataFileUtil.getTimeStampFromDeleteDeltaFile(blocks.getName());
-          String taskAndTimeStamp = task + "-" + timestamp;
+      if (null != deleteDeltaFiles && deleteDeltaFiles.size() > numberDeltaFilesThreshold) {
+        for (String file : deleteDeltaFiles) {

Review comment:
       please do the proper formatting




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] marchpure commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
marchpure commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r506261693



##########
File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##########
@@ -1246,8 +1209,22 @@ public static boolean isHorizontalCompactionEnabled() {
     // set the update status.
     segmentUpdateStatusManager.setUpdateStatusDetails(segmentUpdateDetails);
 
-    CarbonFile[] deleteDeltaFiles =
-        segmentUpdateStatusManager.getDeleteDeltaFilesList(new Segment(seg), blockName);
+    // only when SegmentUpdateDetails contain the specified block
+    // will the method getDeleteDeltaFilesList be executed
+    List<String> blockNameList = segmentUpdateStatusManager.getBlockNameFromSegment(seg);
+    Map<String, List<CarbonFile>> blockAndDeleteDeltaFilesMap = new HashMap<>();
+    CarbonFile[] deleteDeltaFiles = null;
+    if (blockNameList.contains(blockName)) {
+      blockAndDeleteDeltaFilesMap =
+          segmentUpdateStatusManager.getDeleteDeltaFilesList(new Segment(seg));
+    }
+    if (blockAndDeleteDeltaFilesMap.containsKey(blockName)) {
+      List<CarbonFile> deleteDeltaFileList = blockAndDeleteDeltaFilesMap.get(blockName);
+      deleteDeltaFiles = deleteDeltaFileList.toArray(new CarbonFile[deleteDeltaFileList.size()]);
+    }
+
+    // CarbonFile[] deleteDeltaFiles =

Review comment:
       delete these 2 lines

##########
File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##########
@@ -1246,8 +1209,22 @@ public static boolean isHorizontalCompactionEnabled() {
     // set the update status.
     segmentUpdateStatusManager.setUpdateStatusDetails(segmentUpdateDetails);
 
-    CarbonFile[] deleteDeltaFiles =
-        segmentUpdateStatusManager.getDeleteDeltaFilesList(new Segment(seg), blockName);
+    // only when SegmentUpdateDetails contain the specified block
+    // will the method getDeleteDeltaFilesList be executed
+    List<String> blockNameList = segmentUpdateStatusManager.getBlockNameFromSegment(seg);
+    Map<String, List<CarbonFile>> blockAndDeleteDeltaFilesMap = new HashMap<>();
+    CarbonFile[] deleteDeltaFiles = null;

Review comment:
       = new CarbonFile[0]

##########
File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##########
@@ -1246,8 +1209,22 @@ public static boolean isHorizontalCompactionEnabled() {
     // set the update status.
     segmentUpdateStatusManager.setUpdateStatusDetails(segmentUpdateDetails);
 
-    CarbonFile[] deleteDeltaFiles =
-        segmentUpdateStatusManager.getDeleteDeltaFilesList(new Segment(seg), blockName);
+    // only when SegmentUpdateDetails contain the specified block
+    // will the method getDeleteDeltaFilesList be executed
+    List<String> blockNameList = segmentUpdateStatusManager.getBlockNameFromSegment(seg);
+    Map<String, List<CarbonFile>> blockAndDeleteDeltaFilesMap = new HashMap<>();
+    CarbonFile[] deleteDeltaFiles = null;
+    if (blockNameList.contains(blockName)) {

Review comment:
       if (blockNameList.contains(blockName)) { 
     blockAndDeleteDeltaFilesMap =
     if (blockAndDeleteDeltaFilesMap.containsKey(blockName)) {
     }
   }

##########
File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##########
@@ -1246,8 +1209,22 @@ public static boolean isHorizontalCompactionEnabled() {
     // set the update status.
     segmentUpdateStatusManager.setUpdateStatusDetails(segmentUpdateDetails);
 
-    CarbonFile[] deleteDeltaFiles =
-        segmentUpdateStatusManager.getDeleteDeltaFilesList(new Segment(seg), blockName);
+    // only when SegmentUpdateDetails contain the specified block
+    // will the method getDeleteDeltaFilesList be executed
+    List<String> blockNameList = segmentUpdateStatusManager.getBlockNameFromSegment(seg);
+    Map<String, List<CarbonFile>> blockAndDeleteDeltaFilesMap = new HashMap<>();
+    CarbonFile[] deleteDeltaFiles = null;
+    if (blockNameList.contains(blockName)) {
+      blockAndDeleteDeltaFilesMap =
+          segmentUpdateStatusManager.getDeleteDeltaFilesList(new Segment(seg));
+    }
+    if (blockAndDeleteDeltaFilesMap.containsKey(blockName)) {

Review comment:
       no need to covert to array




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r508933755



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##########
@@ -173,6 +176,9 @@ object HorizontalCompaction {
 
     val db = carbonTable.getDatabaseName
     val table = carbonTable.getTableName
+
+    LOG.info(s"Horizontal Delete Compaction operation is getting valid segments for [$db.$table].")

Review comment:
       This log is modified, prints the deletedBlocksList and time taken to get it




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r510018289



##########
File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##########
@@ -1246,20 +1197,16 @@ public static boolean isHorizontalCompactionEnabled() {
     // set the update status.
     segmentUpdateStatusManager.setUpdateStatusDetails(segmentUpdateDetails);
 
-    CarbonFile[] deleteDeltaFiles =
+    List<String> deleteFilePathList =
         segmentUpdateStatusManager.getDeleteDeltaFilesList(new Segment(seg), blockName);
 
     String destFileName =
         blockName + "-" + timestamp.toString() + CarbonCommonConstants.DELETE_DELTA_FILE_EXT;
-    List<String> deleteFilePathList = new ArrayList<>();
-    if (null != deleteDeltaFiles && deleteDeltaFiles.length > 0 && null != deleteDeltaFiles[0]
-        .getParentFile()) {
-      String fullBlockFilePath = deleteDeltaFiles[0].getParentFile().getCanonicalPath()
-          + CarbonCommonConstants.FILE_SEPARATOR + destFileName;
-
-      for (CarbonFile cFile : deleteDeltaFiles) {
-        deleteFilePathList.add(cFile.getCanonicalPath());
-      }
+    if (deleteFilePathList.size() > 0) {
+      String deleteDeltaFilePath = deleteFilePathList.get(0);
+      String fullBlockFilePath =
+          deleteDeltaFilePath.substring(0, deleteDeltaFilePath.lastIndexOf("/")) +

Review comment:
       use constant for "/"




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-711484392


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2748/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r510065323



##########
File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##########
@@ -1246,20 +1197,16 @@ public static boolean isHorizontalCompactionEnabled() {
     // set the update status.
     segmentUpdateStatusManager.setUpdateStatusDetails(segmentUpdateDetails);
 
-    CarbonFile[] deleteDeltaFiles =
+    List<String> deleteFilePathList =
         segmentUpdateStatusManager.getDeleteDeltaFilesList(new Segment(seg), blockName);
 
     String destFileName =
         blockName + "-" + timestamp.toString() + CarbonCommonConstants.DELETE_DELTA_FILE_EXT;
-    List<String> deleteFilePathList = new ArrayList<>();
-    if (null != deleteDeltaFiles && deleteDeltaFiles.length > 0 && null != deleteDeltaFiles[0]
-        .getParentFile()) {
-      String fullBlockFilePath = deleteDeltaFiles[0].getParentFile().getCanonicalPath()
-          + CarbonCommonConstants.FILE_SEPARATOR + destFileName;
-
-      for (CarbonFile cFile : deleteDeltaFiles) {
-        deleteFilePathList.add(cFile.getCanonicalPath());
-      }
+    if (deleteFilePathList.size() > 0) {
+      String deleteDeltaFilePath = deleteFilePathList.get(0);
+      String fullBlockFilePath =
+          deleteDeltaFilePath.substring(0, deleteDeltaFilePath.lastIndexOf("/")) +

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-709711202


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4475/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-710848203


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2741/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r510064286



##########
File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##########
@@ -1138,73 +1126,36 @@ private static Boolean checkUpdateDeltaFilesInSeg(Segment seg,
   }
 
   /**
-   * Check is the segment passed qualifies for IUD delete delta compaction or not i.e.
-   * if the number of delete delta files present in the segment is more than
-   * numberDeltaFilesThreshold.
+   * Check whether the segment passed qualifies for IUD delete delta compaction or not,
+   * i.e., if the number of delete delta files present in the segment is more than
+   * numberDeltaFilesThreshold, this segment will be selected.
    *
-   * @param seg
-   * @param segmentUpdateStatusManager
-   * @param numberDeltaFilesThreshold
-   * @return
+   * @param seg segment to be qualified
+   * @param segmentUpdateStatusManager segments & blocks details management
+   * @param numberDeltaFilesThreshold threshold of delete delta files
+   * @return block list of the segment
    */
-  private static boolean checkDeleteDeltaFilesInSeg(Segment seg,
+  private static List<String> checkDeleteDeltaFilesInSeg(Segment seg,
       SegmentUpdateStatusManager segmentUpdateStatusManager, int numberDeltaFilesThreshold) {
 
+    List<String> blockLists = new ArrayList<>();
     Set<String> uniqueBlocks = new HashSet<String>();
     List<String> blockNameList =
         segmentUpdateStatusManager.getBlockNameFromSegment(seg.getSegmentNo());
-
-    for (final String blockName : blockNameList) {
-
-      CarbonFile[] deleteDeltaFiles =
+    for (String blockName : blockNameList) {
+      List<String> deleteDeltaFiles =
           segmentUpdateStatusManager.getDeleteDeltaFilesList(seg, blockName);
-      if (null != deleteDeltaFiles) {
-        // The Delete Delta files may have Spill over blocks. Will consider multiple spill over

Review comment:
       Recovered the existing comments. I will pay attention to it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509947651



##########
File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##########
@@ -1039,22 +1039,24 @@ private static boolean isSegmentValid(LoadMetadataDetails seg) {
     if (CompactionType.IUD_DELETE_DELTA == compactionTypeIUD) {
       int numberDeleteDeltaFilesThreshold =
           CarbonProperties.getInstance().getNoDeleteDeltaFilesThresholdForIUDCompaction();
-      List<Segment> deleteSegments = new ArrayList<>();
-      for (Segment seg : segments) {
-        if (checkDeleteDeltaFilesInSeg(seg, segmentUpdateStatusManager,
-            numberDeleteDeltaFilesThreshold)) {
-          deleteSegments.add(seg);
+
+      // firstly find the valid segments which are updated from SegmentUpdateDetails,
+      // in order to reduce the segment list for behind traversal
+      List<String> segmentsPresentInSegmentUpdateDetails = new ArrayList<>();

Review comment:
       combined the `getDeleteDeltaFilesInSeg`and `checkDeleteDeltaFilesInSeg` , and kept the method name `checkDeleteDeltaFilesInSeg` and let it return List<String>




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509920920



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##########
@@ -415,44 +415,39 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files of the block of specified segment.
+   * Actually, delete delta file name is generated from each SegmentUpdateDetails.
+   *
+   * @param seg the segment which is to find block and its delete delta files
+   * @param blockName the specified block of the segment
+   * @return delete delta file list of the block
    */
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final String blockName) {
+  public List<String> getDeleteDeltaFilesList(final Segment seg, final String blockName) {
+

Review comment:
       remove empty lines




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509204221



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##########
@@ -415,44 +415,38 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files of the block of specified segment.
+   * Actually, delete delta file name is generated from each SegmentUpdateDetails.
+   *
+   * @param seg the segment which is to find block and its delete delta files
+   * @param blockName the specified block of the segment
+   * @return delete delta file list of the block
    */
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final String blockName) {
+  public List<String> getDeleteDeltaFilesList(final Segment seg, final String blockName) {
+
+    List<String> deleteDeltaFileList = new ArrayList<>();
     String segmentPath = CarbonTablePath.getSegmentPath(
-        identifier.getTablePath(), segmentId.getSegmentNo());
-    CarbonFile segDir =
-        FileFactory.getCarbonFile(segmentPath);
+        identifier.getTablePath(), seg.getSegmentNo());
+
     for (SegmentUpdateDetails block : updateDetails) {
       if ((block.getBlockName().equalsIgnoreCase(blockName)) &&
-          (block.getSegmentName().equalsIgnoreCase(segmentId.getSegmentNo()))
-          && !CarbonUpdateUtil.isBlockInvalid((block.getSegmentStatus()))) {
+          (block.getSegmentName().equalsIgnoreCase(seg.getSegmentNo())) &&
+          !CarbonUpdateUtil.isBlockInvalid(block.getSegmentStatus())) {
         final long deltaStartTimestamp =
             getStartTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
         final long deltaEndTimeStamp =
             getEndTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
-
-        return segDir.listFiles(new CarbonFileFilter() {
-
-          @Override
-          public boolean accept(CarbonFile pathName) {
-            String fileName = pathName.getName();
-            if (pathName.getSize() > 0
-                && fileName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) {
-              String blkName = fileName.substring(0, fileName.lastIndexOf("-"));
-              long timestamp =
-                  Long.parseLong(CarbonTablePath.DataFileUtil.getTimeStampFromFileName(fileName));
-              return blockName.equals(blkName) && timestamp <= deltaEndTimeStamp
-                  && timestamp >= deltaStartTimestamp;
-            }
-            return false;
-          }
-        });
+        Set<String> deleteDeltaFiles = new HashSet<>();

Review comment:
       instead of this you can follow like below
   1. from `SegmentUpdateDetails`, call `getDeltaFileStamps` method and check if not null and size > 0, if not null, directly take the list of delta timestamps and prepare to final list with the blockname, as we already know
   2. If the `getDeltaFileStamps` gives null, then there is only one valid delta timestamp, in this case, `SegmentUpdateDetails` will have same delta start and end timestamp, so you can take one and form a delta file name and return only this, as only one file is there. No need to create list just to add one, so u can create list only in 1st case.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-709707004


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4474/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509956278



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##########
@@ -415,44 +415,39 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files of the block of specified segment.
+   * Actually, delete delta file name is generated from each SegmentUpdateDetails.
+   *
+   * @param seg the segment which is to find block and its delete delta files
+   * @param blockName the specified block of the segment
+   * @return delete delta file list of the block
    */
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final String blockName) {
+  public List<String> getDeleteDeltaFilesList(final Segment seg, final String blockName) {
+
+    List<String> deleteDeltaFileList = new ArrayList<>();
     String segmentPath = CarbonTablePath.getSegmentPath(
-        identifier.getTablePath(), segmentId.getSegmentNo());
-    CarbonFile segDir =
-        FileFactory.getCarbonFile(segmentPath);
+        identifier.getTablePath(), seg.getSegmentNo());
+
     for (SegmentUpdateDetails block : updateDetails) {
       if ((block.getBlockName().equalsIgnoreCase(blockName)) &&
-          (block.getSegmentName().equalsIgnoreCase(segmentId.getSegmentNo()))
-          && !CarbonUpdateUtil.isBlockInvalid((block.getSegmentStatus()))) {
-        final long deltaStartTimestamp =
-            getStartTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
-        final long deltaEndTimeStamp =
-            getEndTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
-
-        return segDir.listFiles(new CarbonFileFilter() {
-
-          @Override
-          public boolean accept(CarbonFile pathName) {
-            String fileName = pathName.getName();
-            if (pathName.getSize() > 0
-                && fileName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) {
-              String blkName = fileName.substring(0, fileName.lastIndexOf("-"));
-              long timestamp =
-                  Long.parseLong(CarbonTablePath.DataFileUtil.getTimeStampFromFileName(fileName));
-              return blockName.equals(blkName) && timestamp <= deltaEndTimeStamp
-                  && timestamp >= deltaStartTimestamp;
-            }
-            return false;
-          }
-        });
+          (block.getSegmentName().equalsIgnoreCase(seg.getSegmentNo())) &&
+          !CarbonUpdateUtil.isBlockInvalid(block.getSegmentStatus())) {
+        Set<String> deltaFileTimestamps = block.getDeltaFileStamps();
+        String deleteDeltaFilePrefix = segmentPath + CarbonCommonConstants.FILE_SEPARATOR +
+            blockName + CarbonCommonConstants.HYPHEN;
+        if (deltaFileTimestamps != null && deltaFileTimestamps.size() > 0) {
+          deltaFileTimestamps.forEach(timestamp -> deleteDeltaFileList.add(
+              deleteDeltaFilePrefix + timestamp + CarbonCommonConstants.DELETE_DELTA_FILE_EXT));
+        } else {
+          final long deltaEndTimeStamp =

Review comment:
       The part in else{} is when deltaFileTimestamps is null. 
   The updateDetails has same start and end timestamp, here I use endTimestamp to form the delta file




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-709976082


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2730/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-714520638


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4629/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r510065227



##########
File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##########
@@ -1138,73 +1126,36 @@ private static Boolean checkUpdateDeltaFilesInSeg(Segment seg,
   }
 
   /**
-   * Check is the segment passed qualifies for IUD delete delta compaction or not i.e.
-   * if the number of delete delta files present in the segment is more than
-   * numberDeltaFilesThreshold.
+   * Check whether the segment passed qualifies for IUD delete delta compaction or not,
+   * i.e., if the number of delete delta files present in the segment is more than
+   * numberDeltaFilesThreshold, this segment will be selected.
    *
-   * @param seg
-   * @param segmentUpdateStatusManager
-   * @param numberDeltaFilesThreshold
-   * @return
+   * @param seg segment to be qualified
+   * @param segmentUpdateStatusManager segments & blocks details management
+   * @param numberDeltaFilesThreshold threshold of delete delta files
+   * @return block list of the segment
    */
-  private static boolean checkDeleteDeltaFilesInSeg(Segment seg,
+  private static List<String> checkDeleteDeltaFilesInSeg(Segment seg,
       SegmentUpdateStatusManager segmentUpdateStatusManager, int numberDeltaFilesThreshold) {
 
+    List<String> blockLists = new ArrayList<>();
     Set<String> uniqueBlocks = new HashSet<String>();
     List<String> blockNameList =
         segmentUpdateStatusManager.getBlockNameFromSegment(seg.getSegmentNo());
-
-    for (final String blockName : blockNameList) {
-
-      CarbonFile[] deleteDeltaFiles =
+    for (String blockName : blockNameList) {
+      List<String> deleteDeltaFiles =
           segmentUpdateStatusManager.getDeleteDeltaFilesList(seg, blockName);
-      if (null != deleteDeltaFiles) {
-        // The Delete Delta files may have Spill over blocks. Will consider multiple spill over
-        // blocks as one. Currently DeleteDeltaFiles array contains Delete Delta Block name which
-        // lies within Delete Delta Start TimeStamp and End TimeStamp. In order to eliminate
-        // Spill Over Blocks will choose files with unique taskID.
-        for (CarbonFile blocks : deleteDeltaFiles) {
-          // Get Task ID and the Timestamp from the Block name for e.g.
-          // part-0-3-1481084721319.carbondata => "3-1481084721319"
-          String task = CarbonTablePath.DataFileUtil.getTaskNo(blocks.getName());
-          String timestamp =
-              CarbonTablePath.DataFileUtil.getTimeStampFromDeleteDeltaFile(blocks.getName());
-          String taskAndTimeStamp = task + "-" + timestamp;
+      if (null != deleteDeltaFiles && deleteDeltaFiles.size() > numberDeltaFilesThreshold) {
+        for (String file : deleteDeltaFiles) {

Review comment:
       It was formatted




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] QiangCai commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
QiangCai commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r513147277



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##########
@@ -415,44 +415,41 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files of the block of specified segment.
+   * Actually, delete delta file name is generated from each SegmentUpdateDetails.
+   *
+   * @param seg the segment which is to find block and its delete delta files
+   * @param blockName the specified block of the segment
+   * @return delete delta file list of the block
    */
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final String blockName) {
+  public List<String> getDeleteDeltaFilesList(final Segment seg, final String blockName) {
+    List<String> deleteDeltaFileList = new ArrayList<>();
     String segmentPath = CarbonTablePath.getSegmentPath(
-        identifier.getTablePath(), segmentId.getSegmentNo());
-    CarbonFile segDir =
-        FileFactory.getCarbonFile(segmentPath);
+        identifier.getTablePath(), seg.getSegmentNo());
+
     for (SegmentUpdateDetails block : updateDetails) {
       if ((block.getBlockName().equalsIgnoreCase(blockName)) &&
-          (block.getSegmentName().equalsIgnoreCase(segmentId.getSegmentNo()))
-          && !CarbonUpdateUtil.isBlockInvalid((block.getSegmentStatus()))) {
-        final long deltaStartTimestamp =
-            getStartTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
-        final long deltaEndTimeStamp =
-            getEndTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
-
-        return segDir.listFiles(new CarbonFileFilter() {
-
-          @Override
-          public boolean accept(CarbonFile pathName) {
-            String fileName = pathName.getName();
-            if (pathName.getSize() > 0
-                && fileName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) {
-              String blkName = fileName.substring(0, fileName.lastIndexOf("-"));
-              long timestamp =
-                  Long.parseLong(CarbonTablePath.DataFileUtil.getTimeStampFromFileName(fileName));
-              return blockName.equals(blkName) && timestamp <= deltaEndTimeStamp
-                  && timestamp >= deltaStartTimestamp;
-            }
-            return false;
-          }
-        });
+          (block.getSegmentName().equalsIgnoreCase(seg.getSegmentNo())) &&
+          !CarbonUpdateUtil.isBlockInvalid(block.getSegmentStatus())) {
+        Set<String> deltaFileTimestamps = block.getDeltaFileStamps();
+        String deleteDeltaFilePrefix = segmentPath + CarbonCommonConstants.FILE_SEPARATOR +
+            blockName + CarbonCommonConstants.HYPHEN;
+        if (deltaFileTimestamps != null && deltaFileTimestamps.size() > 0) {

Review comment:
       please add test case:
   1.  set CarbonCommonConstants.CARBON_HORIZONTAL_COMPACTION_ENABLE to false
   2. update different row in same file three times (it should generate three delete delta files for same blockname) 
   3. set CarbonCommonConstants.CARBON_HORIZONTAL_COMPACTION_ENABLE to true
   4. update different row in same file again
   5. query table to check result

##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##########
@@ -415,44 +415,41 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files of the block of specified segment.
+   * Actually, delete delta file name is generated from each SegmentUpdateDetails.
+   *
+   * @param seg the segment which is to find block and its delete delta files
+   * @param blockName the specified block of the segment
+   * @return delete delta file list of the block
    */
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final String blockName) {
+  public List<String> getDeleteDeltaFilesList(final Segment seg, final String blockName) {
+    List<String> deleteDeltaFileList = new ArrayList<>();
     String segmentPath = CarbonTablePath.getSegmentPath(
-        identifier.getTablePath(), segmentId.getSegmentNo());
-    CarbonFile segDir =
-        FileFactory.getCarbonFile(segmentPath);
+        identifier.getTablePath(), seg.getSegmentNo());
+
     for (SegmentUpdateDetails block : updateDetails) {
       if ((block.getBlockName().equalsIgnoreCase(blockName)) &&
-          (block.getSegmentName().equalsIgnoreCase(segmentId.getSegmentNo()))
-          && !CarbonUpdateUtil.isBlockInvalid((block.getSegmentStatus()))) {
-        final long deltaStartTimestamp =
-            getStartTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
-        final long deltaEndTimeStamp =
-            getEndTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
-
-        return segDir.listFiles(new CarbonFileFilter() {
-
-          @Override
-          public boolean accept(CarbonFile pathName) {
-            String fileName = pathName.getName();
-            if (pathName.getSize() > 0
-                && fileName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) {
-              String blkName = fileName.substring(0, fileName.lastIndexOf("-"));
-              long timestamp =
-                  Long.parseLong(CarbonTablePath.DataFileUtil.getTimeStampFromFileName(fileName));
-              return blockName.equals(blkName) && timestamp <= deltaEndTimeStamp
-                  && timestamp >= deltaStartTimestamp;
-            }
-            return false;
-          }
-        });
+          (block.getSegmentName().equalsIgnoreCase(seg.getSegmentNo())) &&
+          !CarbonUpdateUtil.isBlockInvalid(block.getSegmentStatus())) {
+        Set<String> deltaFileTimestamps = block.getDeltaFileStamps();
+        String deleteDeltaFilePrefix = segmentPath + CarbonCommonConstants.FILE_SEPARATOR +
+            blockName + CarbonCommonConstants.HYPHEN;
+        if (deltaFileTimestamps != null && deltaFileTimestamps.size() > 0) {
+          deltaFileTimestamps.forEach(timestamp -> deleteDeltaFileList.add(
+              deleteDeltaFilePrefix + timestamp + CarbonCommonConstants.DELETE_DELTA_FILE_EXT));

Review comment:
       better to reuse CarbonUpdateUtil.getDeleteDeltaFilePath
   

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##########
@@ -177,6 +178,7 @@ object HorizontalCompaction {
       absTableIdentifier,
       segmentUpdateStatusManager,
       compactionTypeIUD)
+    LOG.debug(s"The segment list for Horizontal Update Compaction is $deletedBlocksList")

Review comment:
       same above

##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##########
@@ -415,44 +415,41 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files of the block of specified segment.
+   * Actually, delete delta file name is generated from each SegmentUpdateDetails.
+   *
+   * @param seg the segment which is to find block and its delete delta files
+   * @param blockName the specified block of the segment
+   * @return delete delta file list of the block
    */
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final String blockName) {
+  public List<String> getDeleteDeltaFilesList(final Segment seg, final String blockName) {

Review comment:
       change seg to segment

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##########
@@ -130,6 +130,7 @@ object HorizontalCompaction {
       absTableIdentifier,
       segmentUpdateStatusManager,
       compactionTypeIUD)
+    LOG.debug(s"The segment list for Horizontal Update Compaction is $validSegList")

Review comment:
       avoid converting the list to string
   if (LOG.isDebugEnabled) {
     ...
   }

##########
File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##########
@@ -1138,73 +1126,43 @@ private static Boolean checkUpdateDeltaFilesInSeg(Segment seg,
   }
 
   /**
-   * Check is the segment passed qualifies for IUD delete delta compaction or not i.e.
-   * if the number of delete delta files present in the segment is more than
-   * numberDeltaFilesThreshold.
+   * Check whether the segment passed qualifies for IUD delete delta compaction or not,
+   * i.e., if the number of delete delta files present in the segment is more than
+   * numberDeltaFilesThreshold, this segment will be selected.
    *
-   * @param seg
-   * @param segmentUpdateStatusManager
-   * @param numberDeltaFilesThreshold
-   * @return
+   * @param seg segment to be qualified
+   * @param segmentUpdateStatusManager segments & blocks details management
+   * @param numberDeltaFilesThreshold threshold of delete delta files
+   * @return block list of the segment
    */
-  private static boolean checkDeleteDeltaFilesInSeg(Segment seg,
+  private static List<String> checkDeleteDeltaFilesInSeg(Segment seg,
       SegmentUpdateStatusManager segmentUpdateStatusManager, int numberDeltaFilesThreshold) {
 
+    List<String> blockLists = new ArrayList<>();
     Set<String> uniqueBlocks = new HashSet<String>();
     List<String> blockNameList =
         segmentUpdateStatusManager.getBlockNameFromSegment(seg.getSegmentNo());
-
-    for (final String blockName : blockNameList) {
-
-      CarbonFile[] deleteDeltaFiles =
+    for (String blockName : blockNameList) {
+      List<String> deleteDeltaFiles =
           segmentUpdateStatusManager.getDeleteDeltaFilesList(seg, blockName);
-      if (null != deleteDeltaFiles) {
+      if (null != deleteDeltaFiles && deleteDeltaFiles.size() > numberDeltaFilesThreshold) {

Review comment:
       add test case for case when CarbonCommonConstants.DELETE_DELTAFILE_COUNT_THRESHOLD_IUD_COMPACTION  >= 3




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ydvpankaj99 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
ydvpankaj99 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-713228255


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r506020781



##########
File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##########
@@ -1210,6 +1198,39 @@ private static boolean checkDeleteDeltaFilesInSeg(Segment seg,
     return blockLists;
   }
 
+  private static List<String> checkAndGetDeleteDeltaFilesInSeg(Segment seg,
+      SegmentUpdateStatusManager segmentUpdateStatusManager, int numberDeltaFilesThreshold) {
+
+    List<String> blockLists = new ArrayList<>();
+
+    Map<String, List<CarbonFile>> blockAndDeleteDeltaFilesMap =
+      segmentUpdateStatusManager.getDeleteDeltaFilesForSegment(seg);
+
+    List<String> blockNameList =
+      segmentUpdateStatusManager.getBlockNameFromSegment(seg.getSegmentNo());
+
+    Set<String> uniqueBlocks = new HashSet<String>();
+    for (final String blockName : blockNameList) {
+
+      List<CarbonFile> deleteDeltaFiles = blockAndDeleteDeltaFilesMap.get(blockName);
+
+      if (null != deleteDeltaFiles) {
+        for (CarbonFile blocks : deleteDeltaFiles) {

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-709805701


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2722/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r513249713



##########
File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##########
@@ -1138,73 +1126,43 @@ private static Boolean checkUpdateDeltaFilesInSeg(Segment seg,
   }
 
   /**
-   * Check is the segment passed qualifies for IUD delete delta compaction or not i.e.
-   * if the number of delete delta files present in the segment is more than
-   * numberDeltaFilesThreshold.
+   * Check whether the segment passed qualifies for IUD delete delta compaction or not,
+   * i.e., if the number of delete delta files present in the segment is more than
+   * numberDeltaFilesThreshold, this segment will be selected.
    *
-   * @param seg
-   * @param segmentUpdateStatusManager
-   * @param numberDeltaFilesThreshold
-   * @return
+   * @param seg segment to be qualified
+   * @param segmentUpdateStatusManager segments & blocks details management
+   * @param numberDeltaFilesThreshold threshold of delete delta files
+   * @return block list of the segment
    */
-  private static boolean checkDeleteDeltaFilesInSeg(Segment seg,
+  private static List<String> checkDeleteDeltaFilesInSeg(Segment seg,
       SegmentUpdateStatusManager segmentUpdateStatusManager, int numberDeltaFilesThreshold) {
 
+    List<String> blockLists = new ArrayList<>();
     Set<String> uniqueBlocks = new HashSet<String>();
     List<String> blockNameList =
         segmentUpdateStatusManager.getBlockNameFromSegment(seg.getSegmentNo());
-
-    for (final String blockName : blockNameList) {
-
-      CarbonFile[] deleteDeltaFiles =
+    for (String blockName : blockNameList) {
+      List<String> deleteDeltaFiles =
           segmentUpdateStatusManager.getDeleteDeltaFilesList(seg, blockName);
-      if (null != deleteDeltaFiles) {
+      if (null != deleteDeltaFiles && deleteDeltaFiles.size() > numberDeltaFilesThreshold) {

Review comment:
       Added test case "test IUD Horizontal Compaction when CarbonCommonConstants.DELETE_DELTAFILE_COUNT_THRESHOLD_IUD_COMPACTION >= 3"
   in HorizontalCompactionTestCase




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509979593



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##########
@@ -415,44 +415,39 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files of the block of specified segment.
+   * Actually, delete delta file name is generated from each SegmentUpdateDetails.
+   *
+   * @param seg the segment which is to find block and its delete delta files
+   * @param blockName the specified block of the segment
+   * @return delete delta file list of the block
    */
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final String blockName) {
+  public List<String> getDeleteDeltaFilesList(final Segment seg, final String blockName) {
+
+    List<String> deleteDeltaFileList = new ArrayList<>();
     String segmentPath = CarbonTablePath.getSegmentPath(
-        identifier.getTablePath(), segmentId.getSegmentNo());
-    CarbonFile segDir =
-        FileFactory.getCarbonFile(segmentPath);
+        identifier.getTablePath(), seg.getSegmentNo());
+
     for (SegmentUpdateDetails block : updateDetails) {
       if ((block.getBlockName().equalsIgnoreCase(blockName)) &&
-          (block.getSegmentName().equalsIgnoreCase(segmentId.getSegmentNo()))
-          && !CarbonUpdateUtil.isBlockInvalid((block.getSegmentStatus()))) {
-        final long deltaStartTimestamp =
-            getStartTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
-        final long deltaEndTimeStamp =
-            getEndTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
-
-        return segDir.listFiles(new CarbonFileFilter() {
-
-          @Override
-          public boolean accept(CarbonFile pathName) {
-            String fileName = pathName.getName();
-            if (pathName.getSize() > 0
-                && fileName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) {
-              String blkName = fileName.substring(0, fileName.lastIndexOf("-"));
-              long timestamp =
-                  Long.parseLong(CarbonTablePath.DataFileUtil.getTimeStampFromFileName(fileName));
-              return blockName.equals(blkName) && timestamp <= deltaEndTimeStamp
-                  && timestamp >= deltaStartTimestamp;
-            }
-            return false;
-          }
-        });
+          (block.getSegmentName().equalsIgnoreCase(seg.getSegmentNo())) &&
+          !CarbonUpdateUtil.isBlockInvalid(block.getSegmentStatus())) {
+        Set<String> deltaFileTimestamps = block.getDeltaFileStamps();
+        String deleteDeltaFilePrefix = segmentPath + CarbonCommonConstants.FILE_SEPARATOR +
+            blockName + CarbonCommonConstants.HYPHEN;
+        if (deltaFileTimestamps != null && deltaFileTimestamps.size() > 0) {
+          deltaFileTimestamps.forEach(timestamp -> deleteDeltaFileList.add(
+              deleteDeltaFilePrefix + timestamp + CarbonCommonConstants.DELETE_DELTA_FILE_EXT));
+        } else {
+          final long deltaEndTimeStamp =

Review comment:
       yes, i mean to say, add a comment in code so that whoever works on it next, it will be easy to understand from comments. Always add comments wherever you feel important




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] marchpure commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
marchpure commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-709674242


   add some logs and comments.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-711838443


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4507/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509945109



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##########
@@ -415,44 +415,38 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files of the block of specified segment.
+   * Actually, delete delta file name is generated from each SegmentUpdateDetails.
+   *
+   * @param seg the segment which is to find block and its delete delta files
+   * @param blockName the specified block of the segment
+   * @return delete delta file list of the block
    */
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final String blockName) {
+  public List<String> getDeleteDeltaFilesList(final Segment seg, final String blockName) {
+
+    List<String> deleteDeltaFileList = new ArrayList<>();
     String segmentPath = CarbonTablePath.getSegmentPath(
-        identifier.getTablePath(), segmentId.getSegmentNo());
-    CarbonFile segDir =
-        FileFactory.getCarbonFile(segmentPath);
+        identifier.getTablePath(), seg.getSegmentNo());
+
     for (SegmentUpdateDetails block : updateDetails) {
       if ((block.getBlockName().equalsIgnoreCase(blockName)) &&
-          (block.getSegmentName().equalsIgnoreCase(segmentId.getSegmentNo()))
-          && !CarbonUpdateUtil.isBlockInvalid((block.getSegmentStatus()))) {
+          (block.getSegmentName().equalsIgnoreCase(seg.getSegmentNo())) &&
+          !CarbonUpdateUtil.isBlockInvalid(block.getSegmentStatus())) {
         final long deltaStartTimestamp =
             getStartTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
         final long deltaEndTimeStamp =
             getEndTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
-
-        return segDir.listFiles(new CarbonFileFilter() {
-
-          @Override
-          public boolean accept(CarbonFile pathName) {
-            String fileName = pathName.getName();
-            if (pathName.getSize() > 0
-                && fileName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) {
-              String blkName = fileName.substring(0, fileName.lastIndexOf("-"));
-              long timestamp =
-                  Long.parseLong(CarbonTablePath.DataFileUtil.getTimeStampFromFileName(fileName));
-              return blockName.equals(blkName) && timestamp <= deltaEndTimeStamp
-                  && timestamp >= deltaStartTimestamp;
-            }
-            return false;
-          }
-        });
+        Set<String> deleteDeltaFiles = new HashSet<>();

Review comment:
       modified code like this approach




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509941524



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##########
@@ -125,12 +125,18 @@ object HorizontalCompaction {
       segLists: util.List[Segment]): Unit = {
     val db = carbonTable.getDatabaseName
     val table = carbonTable.getTableName
+    val startTime = System.nanoTime()
+
     // get the valid segments qualified for update compaction.
     val validSegList = CarbonDataMergerUtil.getSegListIUDCompactionQualified(segLists,
       absTableIdentifier,
       segmentUpdateStatusManager,
       compactionTypeIUD)
 
+    val endTime = System.nanoTime()
+    LOG.info(s"time taken to get segment list for Horizontal Update Compaction is" +

Review comment:
       I have modified the log, which only prints the validSegList and is in LOG.debug

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##########
@@ -173,11 +179,17 @@ object HorizontalCompaction {
 
     val db = carbonTable.getDatabaseName
     val table = carbonTable.getTableName
+    val startTime = System.nanoTime()
+
     val deletedBlocksList = CarbonDataMergerUtil.getSegListIUDCompactionQualified(segLists,
       absTableIdentifier,
       segmentUpdateStatusManager,
       compactionTypeIUD)
 
+    val endTime = System.nanoTime()
+    LOG.info(s"time taken to get deleted block list for Horizontal Delete Compaction is" +

Review comment:
       I have modified the log, which only prints the validSegList and is in LOG.debug




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] QiangCai commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
QiangCai commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-718729381


   LGTM, greate job


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-709707683


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2720/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r513249194



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##########
@@ -130,6 +130,7 @@ object HorizontalCompaction {
       absTableIdentifier,
       segmentUpdateStatusManager,
       compactionTypeIUD)
+    LOG.debug(s"The segment list for Horizontal Update Compaction is $validSegList")

Review comment:
       Done

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##########
@@ -177,6 +178,7 @@ object HorizontalCompaction {
       absTableIdentifier,
       segmentUpdateStatusManager,
       compactionTypeIUD)
+    LOG.debug(s"The segment list for Horizontal Update Compaction is $deletedBlocksList")

Review comment:
       Done

##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##########
@@ -415,44 +415,41 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files of the block of specified segment.
+   * Actually, delete delta file name is generated from each SegmentUpdateDetails.
+   *
+   * @param seg the segment which is to find block and its delete delta files
+   * @param blockName the specified block of the segment
+   * @return delete delta file list of the block
    */
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final String blockName) {
+  public List<String> getDeleteDeltaFilesList(final Segment seg, final String blockName) {

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] marchpure commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
marchpure commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-710780197


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] marchpure commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
marchpure commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-711459338


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r510123816



##########
File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##########
@@ -1138,73 +1126,36 @@ private static Boolean checkUpdateDeltaFilesInSeg(Segment seg,
   }
 
   /**
-   * Check is the segment passed qualifies for IUD delete delta compaction or not i.e.
-   * if the number of delete delta files present in the segment is more than
-   * numberDeltaFilesThreshold.
+   * Check whether the segment passed qualifies for IUD delete delta compaction or not,
+   * i.e., if the number of delete delta files present in the segment is more than
+   * numberDeltaFilesThreshold, this segment will be selected.
    *
-   * @param seg
-   * @param segmentUpdateStatusManager
-   * @param numberDeltaFilesThreshold
-   * @return
+   * @param seg segment to be qualified
+   * @param segmentUpdateStatusManager segments & blocks details management
+   * @param numberDeltaFilesThreshold threshold of delete delta files
+   * @return block list of the segment
    */
-  private static boolean checkDeleteDeltaFilesInSeg(Segment seg,
+  private static List<String> checkDeleteDeltaFilesInSeg(Segment seg,
       SegmentUpdateStatusManager segmentUpdateStatusManager, int numberDeltaFilesThreshold) {
 
+    List<String> blockLists = new ArrayList<>();
     Set<String> uniqueBlocks = new HashSet<String>();
     List<String> blockNameList =
         segmentUpdateStatusManager.getBlockNameFromSegment(seg.getSegmentNo());
-
-    for (final String blockName : blockNameList) {
-
-      CarbonFile[] deleteDeltaFiles =
+    for (String blockName : blockNameList) {
+      List<String> deleteDeltaFiles =
           segmentUpdateStatusManager.getDeleteDeltaFilesList(seg, blockName);
-      if (null != deleteDeltaFiles) {
-        // The Delete Delta files may have Spill over blocks. Will consider multiple spill over
-        // blocks as one. Currently DeleteDeltaFiles array contains Delete Delta Block name which
-        // lies within Delete Delta Start TimeStamp and End TimeStamp. In order to eliminate
-        // Spill Over Blocks will choose files with unique taskID.
-        for (CarbonFile blocks : deleteDeltaFiles) {
-          // Get Task ID and the Timestamp from the Block name for e.g.
-          // part-0-3-1481084721319.carbondata => "3-1481084721319"
-          String task = CarbonTablePath.DataFileUtil.getTaskNo(blocks.getName());
-          String timestamp =
-              CarbonTablePath.DataFileUtil.getTimeStampFromDeleteDeltaFile(blocks.getName());
-          String taskAndTimeStamp = task + "-" + timestamp;
+      if (null != deleteDeltaFiles && deleteDeltaFiles.size() > numberDeltaFilesThreshold) {
+        for (String file : deleteDeltaFiles) {

Review comment:
       i gave comment to change the variable name and remove spaces after and before :, please check




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-711479603


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4501/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r506020436



##########
File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##########
@@ -1039,22 +1039,10 @@ private static boolean isSegmentValid(LoadMetadataDetails seg) {
     if (CompactionType.IUD_DELETE_DELTA == compactionTypeIUD) {
       int numberDeleteDeltaFilesThreshold =
           CarbonProperties.getInstance().getNoDeleteDeltaFilesThresholdForIUDCompaction();
-      List<Segment> deleteSegments = new ArrayList<>();
       for (Segment seg : segments) {
-        if (checkDeleteDeltaFilesInSeg(seg, segmentUpdateStatusManager,

Review comment:
       Done.
   Combined function checkDeleteDeltaFilesInSeg and function getDeleteDeltaFilesInSeg to new function checkAndGetDeleteDeltaFilesInSeg




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r510063577



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##########
@@ -130,6 +130,7 @@ object HorizontalCompaction {
       absTableIdentifier,
       segmentUpdateStatusManager,
       compactionTypeIUD)
+    LOG.debug(s"The segment list for Horizontal Update Compaction is ${ validSegList }")

Review comment:
       Done

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##########
@@ -177,6 +178,7 @@ object HorizontalCompaction {
       absTableIdentifier,
       segmentUpdateStatusManager,
       compactionTypeIUD)
+    LOG.debug(s"The segment list for Horizontal Update Compaction is ${ deletedBlocksList }")

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r506020436



##########
File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##########
@@ -1039,22 +1039,10 @@ private static boolean isSegmentValid(LoadMetadataDetails seg) {
     if (CompactionType.IUD_DELETE_DELTA == compactionTypeIUD) {
       int numberDeleteDeltaFilesThreshold =
           CarbonProperties.getInstance().getNoDeleteDeltaFilesThresholdForIUDCompaction();
-      List<Segment> deleteSegments = new ArrayList<>();
       for (Segment seg : segments) {
-        if (checkDeleteDeltaFilesInSeg(seg, segmentUpdateStatusManager,

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Zhangshunyu commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
Zhangshunyu commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-717025630


   > We tested with 10 segments total 10G and update more than 20 times to see the cost, update row count of each time is 176973.
   > The result and comparison before and after improving is shown below, we can see that time-consuming of UPDATE reduces by about 30% after improving horizontal compaction.
   > 
   > Time Before Improving	Time after Improving
   > Average time
   > from 2nd to 21st UPDATE	**99.19**	**67.55**
   > UPDATE		
   > 1st	38.397	71.91
   > 2nd	74.918	67.386
   > 3rd	43.851	46.171
   > 4th	52.133	82.974
   > 5th	86.021	44.499
   > 6th	62.992	39.973
   > 7th	99.077	69.292
   > 8th	77.815	60.416
   > 9th	74.949	50.187
   > 10th	85.874	50.973
   > 11th	103.902	58.005
   > 12th	77.534	86.087
   > 13th	79.154	76.283
   > 14th	116.117	72.029
   > 15th	112.846	74.182
   > 16th	124.707	69.282
   > 17th	124.677	67.546
   > 18th	147.15	66.652
   > 19th	135.127	109.385
   > 20th	133.94	80.849
   > 21st	171.112	78.92
   
   greate!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-715197754


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2905/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-710849261


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4495/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

Posted by GitBox <gi...@apache.org>.
shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r513248715



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##########
@@ -415,44 +415,41 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files of the block of specified segment.
+   * Actually, delete delta file name is generated from each SegmentUpdateDetails.
+   *
+   * @param seg the segment which is to find block and its delete delta files
+   * @param blockName the specified block of the segment
+   * @return delete delta file list of the block
    */
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final String blockName) {
+  public List<String> getDeleteDeltaFilesList(final Segment seg, final String blockName) {
+    List<String> deleteDeltaFileList = new ArrayList<>();
     String segmentPath = CarbonTablePath.getSegmentPath(
-        identifier.getTablePath(), segmentId.getSegmentNo());
-    CarbonFile segDir =
-        FileFactory.getCarbonFile(segmentPath);
+        identifier.getTablePath(), seg.getSegmentNo());
+
     for (SegmentUpdateDetails block : updateDetails) {
       if ((block.getBlockName().equalsIgnoreCase(blockName)) &&
-          (block.getSegmentName().equalsIgnoreCase(segmentId.getSegmentNo()))
-          && !CarbonUpdateUtil.isBlockInvalid((block.getSegmentStatus()))) {
-        final long deltaStartTimestamp =
-            getStartTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
-        final long deltaEndTimeStamp =
-            getEndTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
-
-        return segDir.listFiles(new CarbonFileFilter() {
-
-          @Override
-          public boolean accept(CarbonFile pathName) {
-            String fileName = pathName.getName();
-            if (pathName.getSize() > 0
-                && fileName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) {
-              String blkName = fileName.substring(0, fileName.lastIndexOf("-"));
-              long timestamp =
-                  Long.parseLong(CarbonTablePath.DataFileUtil.getTimeStampFromFileName(fileName));
-              return blockName.equals(blkName) && timestamp <= deltaEndTimeStamp
-                  && timestamp >= deltaStartTimestamp;
-            }
-            return false;
-          }
-        });
+          (block.getSegmentName().equalsIgnoreCase(seg.getSegmentNo())) &&
+          !CarbonUpdateUtil.isBlockInvalid(block.getSegmentStatus())) {
+        Set<String> deltaFileTimestamps = block.getDeltaFileStamps();
+        String deleteDeltaFilePrefix = segmentPath + CarbonCommonConstants.FILE_SEPARATOR +
+            blockName + CarbonCommonConstants.HYPHEN;
+        if (deltaFileTimestamps != null && deltaFileTimestamps.size() > 0) {
+          deltaFileTimestamps.forEach(timestamp -> deleteDeltaFileList.add(
+              deleteDeltaFilePrefix + timestamp + CarbonCommonConstants.DELETE_DELTA_FILE_EXT));

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org