You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2019/12/24 14:33:29 UTC

[GitHub] [incubator-doris] morningman opened a new pull request #2558: [Compaction] Support compact only one rowset

morningman opened a new pull request #2558: [Compaction] Support compact only one rowset
URL: https://github.com/apache/incubator-doris/pull/2558
 
 
   Support compaction operation to compact only one rowset.
   After the modification, the last rowset of the tablet will
   also be compacted.
   
   At the same time, we added a `segments_overlap_pb` field to
   the rowset meta. Used to describe whether the segment data
   in the rowset overlaps. This field is set by `rowset_writer`.
   Initially UNKNOWN for compatibility with existing data.
   
   In addition, the version hash of the rowset generated after
   compaction is directly set to the version hash of last rowset
   participating in compaction, to ensure that the tablet's
   version hash remains unchanged after compaction.
   
   ISSUE #2551

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] chaoyli commented on a change in pull request #2558: [Compaction] Support compact only one rowset

Posted by GitBox <gi...@apache.org>.
chaoyli commented on a change in pull request #2558: [Compaction] Support compact only one rowset
URL: https://github.com/apache/incubator-doris/pull/2558#discussion_r361385750
 
 

 ##########
 File path: be/src/olap/rowset/rowset_meta.h
 ##########
 @@ -309,6 +309,49 @@ class RowsetMeta {
         return _is_removed_from_rowset_meta;
     }
 
+    SegmentsOverlapPB segments_overlap() const {
+        return _rowset_meta_pb.segments_overlap_pb();
+    }
+
+    void set_segments_overlap(SegmentsOverlapPB segments_overlap) {
+        _rowset_meta_pb.set_segments_overlap_pb(segments_overlap);
+    }
+
+    // return if segments in this rowset has overlapping data.
+    // this is not same as `segments_overlap()` method.
+    // `segments_overlap()` only return the value of "segments_overlap" field in rowset meta,
+    // but "segments_overlap" may be UNKNOWN.
+    // so segments overlapping is defined as
+    // 1. if end version > start version, which means this rowset is generated by compaction process,
+    //    so the segments in it are non overlapping.
+    // 2. the segments_overlap() flag in rowset meta is set to NONOVERLAPPING, explicitly.
+    bool is_segments_overlapping() const {
+        if (num_segments() == 0) {
+            // specially for delete version
+            return false;
+        }
+        if (end_version() > start_version() || segments_overlap() == NONOVERLAPPING) {
+            return false;
+        }
+        return true;
+    }
+
+    // get the compaction score of this rowset.
+    // if segments are overlapping, the score equals to the number of segments,
+    // otherwise, score is 1.
+    uint32_t get_compaction_score() const {
+        uint32_t score = 0;
+        if (!is_segments_overlapping()) {
+            score = 1;
+        } else {
+            // if this is a delete version, num_segments() will be 0.
+            // so set at least 1 to avoid return 0.
+            score = num_segments() == 0 ? 1 : num_segments();
 
 Review comment:
   is_segments_overlapping() has judge num_segments() == 0.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman merged pull request #2558: [Compaction] Support compact only one rowset

Posted by GitBox <gi...@apache.org>.
morningman merged pull request #2558: [Compaction] Support compact only one rowset
URL: https://github.com/apache/incubator-doris/pull/2558
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] chaoyli commented on a change in pull request #2558: [Compaction] Support compact only one rowset

Posted by GitBox <gi...@apache.org>.
chaoyli commented on a change in pull request #2558: [Compaction] Support compact only one rowset
URL: https://github.com/apache/incubator-doris/pull/2558#discussion_r361355813
 
 

 ##########
 File path: be/src/olap/cumulative_compaction.cpp
 ##########
 @@ -67,57 +67,54 @@ OLAPStatus CumulativeCompaction::pick_rowsets_to_compact() {
     std::vector<RowsetSharedPtr> candidate_rowsets;
     _tablet->pick_candicate_rowsets_to_cumulative_compaction(&candidate_rowsets);
 
-    if (candidate_rowsets.size() <= 1) {
+    if (candidate_rowsets.empty()) {
         return OLAP_ERR_CUMULATIVE_NO_SUITABLE_VERSIONS;
     }
 
     std::sort(candidate_rowsets.begin(), candidate_rowsets.end(), Rowset::comparator);
     RETURN_NOT_OK(check_version_continuity(candidate_rowsets));
 
     std::vector<RowsetSharedPtr> transient_rowsets;
-    size_t num_overlapping_segments = 0;
+    size_t compaction_score = 0;
     // the last delete version we meet when traversing candidate_rowsets
     Version last_delete_version { -1, -1 };
 
-    // traverse rowsets from begin to penultimate rowset.
-    // Because VersionHash will calculated from chosen rowsets.
-    // If ultimate singleton rowset is chosen, VersionHash
-    // will be different from the value recorded in FE.
-    // So the ultimate singleton rowset is revserved.
-    for (size_t i = 0; i < candidate_rowsets.size() - 1; ++i) {
+    for (size_t i = 0; i < candidate_rowsets.size(); ++i) {
         RowsetSharedPtr rowset = candidate_rowsets[i];
         if (_tablet->version_for_delete_predicate(rowset->version())) {
             last_delete_version = rowset->version();
-            if (num_overlapping_segments >= config::min_cumulative_compaction_num_singleton_deltas) {
+            if (!transient_rowsets.empty()) {
+                // we meet a delete version, and there were other versions before.
+                // we should compact those version before handling them over to base compaction
                 _input_rowsets = transient_rowsets;
                 break;
             }
+
+            // we meet a delete version, and no othher versions before, skip it and continue
 
 Review comment:
   other

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #2558: [Compaction] Support compact only one rowset

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #2558: [Compaction] Support compact only one rowset
URL: https://github.com/apache/incubator-doris/pull/2558#discussion_r361415178
 
 

 ##########
 File path: be/src/olap/rowset/rowset_meta.h
 ##########
 @@ -309,6 +309,49 @@ class RowsetMeta {
         return _is_removed_from_rowset_meta;
     }
 
+    SegmentsOverlapPB segments_overlap() const {
+        return _rowset_meta_pb.segments_overlap_pb();
+    }
+
+    void set_segments_overlap(SegmentsOverlapPB segments_overlap) {
+        _rowset_meta_pb.set_segments_overlap_pb(segments_overlap);
+    }
+
+    // return if segments in this rowset has overlapping data.
+    // this is not same as `segments_overlap()` method.
+    // `segments_overlap()` only return the value of "segments_overlap" field in rowset meta,
+    // but "segments_overlap" may be UNKNOWN.
+    // so segments overlapping is defined as
+    // 1. if end version > start version, which means this rowset is generated by compaction process,
+    //    so the segments in it are non overlapping.
+    // 2. the segments_overlap() flag in rowset meta is set to NONOVERLAPPING, explicitly.
+    bool is_segments_overlapping() const {
+        if (num_segments() == 0) {
+            // specially for delete version
+            return false;
+        }
+        if (end_version() > start_version() || segments_overlap() == NONOVERLAPPING) {
+            return false;
+        }
+        return true;
+    }
+
+    // get the compaction score of this rowset.
+    // if segments are overlapping, the score equals to the number of segments,
+    // otherwise, score is 1.
+    uint32_t get_compaction_score() const {
+        uint32_t score = 0;
+        if (!is_segments_overlapping()) {
+            score = 1;
+        } else {
+            // if this is a delete version, num_segments() will be 0.
+            // so set at least 1 to avoid return 0.
+            score = num_segments() == 0 ? 1 : num_segments();
 
 Review comment:
   ok

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org