You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/08/26 02:54:16 UTC

[GitHub] [incubator-doris] ZhangYu0123 opened a new pull request #4454: Persistence stale rowsets meta

ZhangYu0123 opened a new pull request #4454:
URL: https://github.com/apache/incubator-doris/pull/4454


   ## Proposed changes
   
   Persistence stale rowsets meta. When be reboots, stale rowsets meta can resume and the stale version can also be readable before stale gc time.  
   
   ISSUE: #4453
   
   ## Types of changes
   
   What types of changes does your code introduce to Doris?
   _Put an `x` in the boxes that apply_
   
   - [] Bugfix (non-breaking change which fixes an issue)
   - [x] New feature (non-breaking change which adds functionality)
   - [] Breaking change (fix or feature that would cause existing functionality to not work as expected)
   - [] Documentation Update (if none of the other choices apply)
   - [] Code refactor (Modify the code structure, format the code, etc...)
   
   ## Checklist
   
   _Put an `x` in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code._
   
   - [x] I have create an issue on (Fix #4453), and have described the bug/feature there in detail
   - [x] Compiling and unit tests pass locally with my changes
   - [] I have added tests that prove my fix is effective or that my feature works
   - [] If this change need a document change, I have updated the document
   - [] Any dependent changes have been merged
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] kangkaisen commented on pull request #4454: Persistence stale rowsets meta

Posted by GitBox <gi...@apache.org>.
kangkaisen commented on pull request #4454:
URL: https://github.com/apache/incubator-doris/pull/4454#issuecomment-680713852


   > > @ZhangYu0123 Hi, What's the concrete issue that this PR want to Fix? what's the accurate definition for `stale rowsets` ?
   > 
   > #4017
   
   I see. Thanks. 
   But I think the `stale rowsets` naming isn't very appropriate. semantically, stale means the data isn't right and we shouldn't read it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] kangkaisen commented on pull request #4454: Persistence stale rowsets meta

Posted by GitBox <gi...@apache.org>.
kangkaisen commented on pull request #4454:
URL: https://github.com/apache/incubator-doris/pull/4454#issuecomment-680639493


   @ZhangYu0123 Hi, What's the concrete issue that this PR want to Fix? what's the accurate definition for `stale rowsets` ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on pull request #4454: Persistence stale rowsets meta

Posted by GitBox <gi...@apache.org>.
morningman commented on pull request #4454:
URL: https://github.com/apache/incubator-doris/pull/4454#issuecomment-683251113


   > > > @ZhangYu0123 Hi, What's the concrete issue that this PR want to Fix? what's the accurate definition for `stale rowsets` ?
   > > 
   > > 
   > > #4017
   > 
   > I see. Thanks.
   > But I think the `stale rowsets` naming isn't very appropriate. semantically, stale means the data isn't right and we shouldn't read it.
   
   Hi @kangkaisen , I think `stale` means `not fresh`, but not means `incorrect`, so it looks good to me.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman merged pull request #4454: Persistence stale rowsets meta

Posted by GitBox <gi...@apache.org>.
morningman merged pull request #4454:
URL: https://github.com/apache/incubator-doris/pull/4454


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #4454: Persistence stale rowsets meta

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #4454:
URL: https://github.com/apache/incubator-doris/pull/4454#discussion_r477260820



##########
File path: be/src/olap/tablet_meta.h
##########
@@ -171,8 +171,9 @@ class TabletMeta {
                          const std::vector<RowsetMetaSharedPtr>& to_delete);
     void revise_rs_metas(std::vector<RowsetMetaSharedPtr>&& rs_metas);
 
-
     void revise_inc_rs_metas(std::vector<RowsetMetaSharedPtr>&& rs_metas);
+    void revise_stale_rs_metas(std::vector<RowsetMetaSharedPtr>&& rs_metas);

Review comment:
       This function seems not use?

##########
File path: be/src/olap/tablet_meta.h
##########
@@ -171,8 +171,9 @@ class TabletMeta {
                          const std::vector<RowsetMetaSharedPtr>& to_delete);
     void revise_rs_metas(std::vector<RowsetMetaSharedPtr>&& rs_metas);
 
-
     void revise_inc_rs_metas(std::vector<RowsetMetaSharedPtr>&& rs_metas);
+    void revise_stale_rs_metas(std::vector<RowsetMetaSharedPtr>&& rs_metas);

Review comment:
       This function seems not use?

##########
File path: be/src/olap/tablet.cpp
##########
@@ -108,6 +109,22 @@ OLAPStatus Tablet::_init_once_action() {
         _inc_rs_version_map[version] = std::move(rowset);
     }
 
+    // init stale rowset
+    for (auto& stale_rs_meta : _tablet_meta->all_stale_rs_metas()) {
+        Version version = stale_rs_meta->version();
+        RowsetSharedPtr rowset = get_stale_rowset_by_version(version);
+        if (rowset == nullptr) {

Review comment:
       Under what circumstances will it not be found?

##########
File path: be/src/olap/version_graph.cpp
##########
@@ -43,6 +43,167 @@ void TimestampedVersionTracker::construct_versioned_tracker(const std::vector<Ro
     _construct_versioned_tracker(rs_metas);
 }
 
+void TimestampedVersionTracker::construct_versioned_tracker(
+        const std::vector<RowsetMetaSharedPtr>& rs_metas,
+        const std::vector<RowsetMetaSharedPtr>& stale_metas) {
+
+    if (rs_metas.empty()) {
+        VLOG(3) << "there is no version in the header.";
+        return;
+    }
+    _stale_version_path_map.clear();
+    _next_path_id = 1;
+    _construct_versioned_tracker(rs_metas);
+
+    // init _stale_version_path_map
+    _init_stale_version_path_map(rs_metas, stale_metas);
+}
+
+void TimestampedVersionTracker::_init_stale_version_path_map(
+        const std::vector<RowsetMetaSharedPtr>& rs_metas,
+        const std::vector<RowsetMetaSharedPtr>& stale_metas) {
+
+    if (stale_metas.empty()) {
+        return;
+    }
+
+    // sort stale meta by version diff (second version - first version)
+    std::list<RowsetMetaSharedPtr> sorted_stale_metas;
+    for (auto& rs : stale_metas) {
+        sorted_stale_metas.emplace_back(rs);
+    }
+
+    // 1. Sort the existing rowsets by version in ascending order
+    sorted_stale_metas.sort([](const RowsetMetaSharedPtr& a, const RowsetMetaSharedPtr& b) {
+        // compare by version diff between version.first and version.second
+        int64_t a_diff = a->version().second - a->version().first;
+        int64_t b_diff = b->version().second - b->version().first;
+
+        int diff = a_diff - b_diff;
+        if (diff < 0) {
+            return true;
+        }
+        else if (diff > 0) {
+            return false;
+        }
+        // when the version diff is equal, compare rowset createtime
+        return a->creation_time() < b->creation_time();
+    });
+
+    // first_version -> (second_version -> rowset_meta)
+    std::unordered_map<int64_t, std::unordered_map<int64_t, RowsetMetaSharedPtr>> stale_map;
+
+    // 2. generate stale path from stale_metas. traverse sorted_stale_metas and each time add stale_meta to stale_map.
+    // when a stale path in stale_map can replace stale_meta in sorted_stale_metas, stale_map remove rowset_metas of a stale path
+    // and add the path to _stale_version_path_map.
+    for(auto& stale_meta:sorted_stale_metas) {
+        std::vector<RowsetMetaSharedPtr> stale_path;
+        // 2.1 find a path in stale_map can replace current stale_meta version
+        bool r = _find_path_from_stale_map(stale_map, stale_meta->start_version(), stale_meta->end_version(), &stale_path);
+
+        // 2.2 add stale_meta to stale_map
+        auto start_iter = stale_map.find(stale_meta->start_version());
+        if (start_iter != stale_map.end()) {
+            start_iter->second[stale_meta->end_version()] = stale_meta;
+        } else {
+            std::unordered_map<int64_t, RowsetMetaSharedPtr> item;
+            item[stale_meta->end_version()] = stale_meta;
+            stale_map[stale_meta->start_version()] = std::move(item);
+        }
+        // 2.3 add version to version_graph
+        Version stale_meta_version = stale_meta->version();
+        add_version(stale_meta_version);
+        // 2.4 find the path
+        if (r) {

Review comment:
       I didn't get this logic...

##########
File path: be/src/olap/tablet.cpp
##########
@@ -108,6 +109,22 @@ OLAPStatus Tablet::_init_once_action() {
         _inc_rs_version_map[version] = std::move(rowset);
     }
 
+    // init stale rowset
+    for (auto& stale_rs_meta : _tablet_meta->all_stale_rs_metas()) {
+        Version version = stale_rs_meta->version();
+        RowsetSharedPtr rowset = get_stale_rowset_by_version(version);
+        if (rowset == nullptr) {

Review comment:
       Under what circumstances will it not be found?

##########
File path: be/src/olap/version_graph.cpp
##########
@@ -43,6 +43,167 @@ void TimestampedVersionTracker::construct_versioned_tracker(const std::vector<Ro
     _construct_versioned_tracker(rs_metas);
 }
 
+void TimestampedVersionTracker::construct_versioned_tracker(
+        const std::vector<RowsetMetaSharedPtr>& rs_metas,
+        const std::vector<RowsetMetaSharedPtr>& stale_metas) {
+
+    if (rs_metas.empty()) {
+        VLOG(3) << "there is no version in the header.";
+        return;
+    }
+    _stale_version_path_map.clear();
+    _next_path_id = 1;
+    _construct_versioned_tracker(rs_metas);
+
+    // init _stale_version_path_map
+    _init_stale_version_path_map(rs_metas, stale_metas);
+}
+
+void TimestampedVersionTracker::_init_stale_version_path_map(
+        const std::vector<RowsetMetaSharedPtr>& rs_metas,
+        const std::vector<RowsetMetaSharedPtr>& stale_metas) {
+
+    if (stale_metas.empty()) {
+        return;
+    }
+
+    // sort stale meta by version diff (second version - first version)
+    std::list<RowsetMetaSharedPtr> sorted_stale_metas;
+    for (auto& rs : stale_metas) {
+        sorted_stale_metas.emplace_back(rs);
+    }
+
+    // 1. Sort the existing rowsets by version in ascending order
+    sorted_stale_metas.sort([](const RowsetMetaSharedPtr& a, const RowsetMetaSharedPtr& b) {
+        // compare by version diff between version.first and version.second
+        int64_t a_diff = a->version().second - a->version().first;
+        int64_t b_diff = b->version().second - b->version().first;
+
+        int diff = a_diff - b_diff;
+        if (diff < 0) {
+            return true;
+        }
+        else if (diff > 0) {
+            return false;
+        }
+        // when the version diff is equal, compare rowset createtime
+        return a->creation_time() < b->creation_time();
+    });
+
+    // first_version -> (second_version -> rowset_meta)
+    std::unordered_map<int64_t, std::unordered_map<int64_t, RowsetMetaSharedPtr>> stale_map;
+
+    // 2. generate stale path from stale_metas. traverse sorted_stale_metas and each time add stale_meta to stale_map.
+    // when a stale path in stale_map can replace stale_meta in sorted_stale_metas, stale_map remove rowset_metas of a stale path
+    // and add the path to _stale_version_path_map.
+    for(auto& stale_meta:sorted_stale_metas) {
+        std::vector<RowsetMetaSharedPtr> stale_path;
+        // 2.1 find a path in stale_map can replace current stale_meta version
+        bool r = _find_path_from_stale_map(stale_map, stale_meta->start_version(), stale_meta->end_version(), &stale_path);
+
+        // 2.2 add stale_meta to stale_map
+        auto start_iter = stale_map.find(stale_meta->start_version());
+        if (start_iter != stale_map.end()) {
+            start_iter->second[stale_meta->end_version()] = stale_meta;
+        } else {
+            std::unordered_map<int64_t, RowsetMetaSharedPtr> item;
+            item[stale_meta->end_version()] = stale_meta;
+            stale_map[stale_meta->start_version()] = std::move(item);
+        }
+        // 2.3 add version to version_graph
+        Version stale_meta_version = stale_meta->version();
+        add_version(stale_meta_version);
+        // 2.4 find the path
+        if (r) {

Review comment:
       I didn't get this logic...




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] ZhangYu0123 commented on pull request #4454: Persistence stale rowsets meta

Posted by GitBox <gi...@apache.org>.
ZhangYu0123 commented on pull request #4454:
URL: https://github.com/apache/incubator-doris/pull/4454#issuecomment-680670974


   > @ZhangYu0123 Hi, What's the concrete issue that this PR want to Fix? what's the accurate definition for `stale rowsets` ?
   
   https://github.com/apache/incubator-doris/issues/4017


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org