You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@carbondata.apache.org by ja...@apache.org on 2020/07/12 17:07:44 UTC

[carbondata] branch master updated: [CARBONDATA-3894] [IUD]decrease the size of tableupdatestaus file by remove the invalid segments not exist in tablestatus

This is an automated email from the ASF dual-hosted git repository.

jackylk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
     new 063d9b2  [CARBONDATA-3894] [IUD]decrease the size of tableupdatestaus file by remove the invalid segments not exist in tablestatus
063d9b2 is described below

commit 063d9b2aff86f66f22ce75bc6905affc8a4bd8df
Author: Zhangshunyu <zh...@126.com>
AuthorDate: Thu Jul 9 11:23:39 2020 +0800

    [CARBONDATA-3894] [IUD]decrease the size of tableupdatestaus file by remove the invalid segments not exist in tablestatus
    
    Why is this PR needed?
    tableupdatestatus file always keep the segments info even the compacted segment is deleted already,this will lead to the file size increase quickly, which is bad for performance.
    After this change, the tableupdatestatus file size can descrease from ~MB to ~KB
    
    What changes were proposed in this PR?
    Remove the invalid segments
    
    Does this PR introduce any user interface change?
    No
    
    Is any new testcase added?
    No
    
    This closes #3833
---
 .../apache/carbondata/core/mutate/CarbonUpdateUtil.java  | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java b/core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java
index e915c66..77ebf3e 100644
--- a/core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java
+++ b/core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java
@@ -148,7 +148,21 @@ public class CarbonUpdateUtil {
           mergeSegmentUpdate(isCompaction, oldList, newBlockEntry);
         }
 
-        segmentUpdateStatusManager.writeLoadDetailsIntoFile(oldList, updateStatusFileIdentifier);
+        List<SegmentUpdateDetails> updateDetailsValidSeg = new ArrayList<>();
+        Set<String> loadDetailsSet = new HashSet<>();
+        for (LoadMetadataDetails details : segmentUpdateStatusManager.getLoadMetadataDetails()) {
+          loadDetailsSet.add(details.getLoadName());
+        }
+        for (SegmentUpdateDetails updateDetails : oldList) {
+          if (loadDetailsSet.contains(updateDetails.getSegmentName())) {
+            // we should only keep the update info of segments in table status, especially after
+            // compaction and clean files some compacted segments will be removed. It can keep
+            // tableupdatestatus file in small size which is good for performance.
+            updateDetailsValidSeg.add(updateDetails);
+          }
+        }
+        segmentUpdateStatusManager
+            .writeLoadDetailsIntoFile(updateDetailsValidSeg, updateStatusFileIdentifier);
         status = true;
       } else {
         LOGGER.error("Not able to acquire the segment update lock.");