You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/12/08 03:20:23 UTC

[GitHub] [doris] luozenglin opened a new pull request, #14920: [enhancement](remote) support local cache GC at the granularity of cache files

luozenglin opened a new pull request, #14920:
URL: https://github.com/apache/doris/pull/14920

   
   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem summary
   
   Describe your changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: 
       - [ ] Yes
       - [ ] No
       - [ ] I don't know
   2. Has unit tests been added:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   3. Has document been added or modified:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   4. Does it need to update dependencies:
       - [ ] Yes
       - [ ] No
   5. Are there any changes that cannot be rolled back:
       - [ ] Yes (If Yes, please explain WHY)
       - [ ] No
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #14920: [enhancement](remote) support local cache GC at the granularity of cache files

Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #14920:
URL: https://github.com/apache/doris/pull/14920#issuecomment-1342036607

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 34.9 seconds
    load time: 453 seconds
    storage size: 17123356347 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221208045718_clickbench_pr_59901.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #14920: [enhancement](remote) support local cache GC at the granularity of cache files

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14920:
URL: https://github.com/apache/doris/pull/14920#issuecomment-1341945517

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #14920: [enhancement](remote) support local cache GC at the granularity of cache files

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14920:
URL: https://github.com/apache/doris/pull/14920#issuecomment-1344003440

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] luozenglin commented on a diff in pull request #14920: [enhancement](remote) support local cache GC at the granularity of cache files

Posted by GitBox <gi...@apache.org>.
luozenglin commented on code in PR #14920:
URL: https://github.com/apache/doris/pull/14920#discussion_r1043054888


##########
be/src/io/cache/sub_file_cache.cpp:
##########
@@ -199,23 +208,29 @@ Status SubFileCache::_get_need_cache_offsets(size_t offset, size_t req_size,
 }
 
 Status SubFileCache::clean_timeout_cache() {
+    SubGcQueue gc_queue;
+    _gc_lru_queue.swap(gc_queue);
     std::vector<size_t> timeout_keys;
     {
         std::shared_lock<std::shared_mutex> rlock(_cache_map_lock);
         for (std::map<size_t, int64_t>::const_iterator iter = _last_match_times.cbegin();
              iter != _last_match_times.cend(); ++iter) {
             if (time(nullptr) - iter->second > _alive_time_sec) {
                 timeout_keys.emplace_back(iter->first);
+            } else {
+                auto [cache_file, done_file] = _cache_path(iter->first);
+                _gc_lru_queue.push({iter->first, iter->second});

Review Comment:
   _gc_lru_queue is used to gc no timeout cache when the cache file exceeds the disk limit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #14920: [enhancement](remote) support local cache GC at the granularity of cache files

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14920:
URL: https://github.com/apache/doris/pull/14920#issuecomment-1344003469

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] pengxiangyu commented on a diff in pull request #14920: [enhancement](remote) support local cache GC at the granularity of cache files

Posted by GitBox <gi...@apache.org>.
pengxiangyu commented on code in PR #14920:
URL: https://github.com/apache/doris/pull/14920#discussion_r1042980176


##########
be/src/io/cache/file_cache_manager.cpp:
##########
@@ -175,16 +193,21 @@ void FileCacheManager::gc_file_caches() {
     // policy2: GC file cache by disk size
     if (gc_conf_size > 0) {
         for (size_t i = 0; i < contexts.size(); ++i) {
-            std::list<FileCachePtr> gc_file_list;
-            contexts[i].get_gc_file_caches(gc_file_list);
-            for (auto item : gc_file_list) {
-                std::shared_lock<std::shared_mutex> rdlock(_cache_map_lock);
-                // for dummy file cache, check already used or not again
-                if (item->is_dummy_file_cache() &&
-                    _file_cache_map.find(item->cache_dir().native()) != _file_cache_map.end()) {
-                    continue;
+            auto context = contexts[i];

Review Comment:
   context = contexts[i];  is not needed, it will create a copy of contexts[i].



##########
be/src/io/cache/sub_file_cache.cpp:
##########
@@ -199,23 +208,29 @@ Status SubFileCache::_get_need_cache_offsets(size_t offset, size_t req_size,
 }
 
 Status SubFileCache::clean_timeout_cache() {
+    SubGcQueue gc_queue;
+    _gc_lru_queue.swap(gc_queue);
     std::vector<size_t> timeout_keys;
     {
         std::shared_lock<std::shared_mutex> rlock(_cache_map_lock);
         for (std::map<size_t, int64_t>::const_iterator iter = _last_match_times.cbegin();
              iter != _last_match_times.cend(); ++iter) {
             if (time(nullptr) - iter->second > _alive_time_sec) {
                 timeout_keys.emplace_back(iter->first);
+            } else {
+                auto [cache_file, done_file] = _cache_path(iter->first);
+                _gc_lru_queue.push({iter->first, iter->second});

Review Comment:
   don't delete it in _gc_lru_queue, because this file is in cache, maybe used when deleting.
   time(nullptr) - iter->second > _alive_time_sec will cover all cases,  _gc_lru_queue is not needed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #14920: [enhancement](remote) support local cache GC at the granularity of cache files

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14920:
URL: https://github.com/apache/doris/pull/14920#issuecomment-1341935973

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] pengxiangyu merged pull request #14920: [enhancement](remote) support local cache GC at the granularity of cache files

Posted by GitBox <gi...@apache.org>.
pengxiangyu merged PR #14920:
URL: https://github.com/apache/doris/pull/14920


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #14920: [enhancement](remote) support local cache GC at the granularity of cache files

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14920:
URL: https://github.com/apache/doris/pull/14920#issuecomment-1342120280

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] luozenglin commented on a diff in pull request #14920: [enhancement](remote) support local cache GC at the granularity of cache files

Posted by GitBox <gi...@apache.org>.
luozenglin commented on code in PR #14920:
URL: https://github.com/apache/doris/pull/14920#discussion_r1043052867


##########
be/src/io/cache/file_cache_manager.cpp:
##########
@@ -175,16 +193,21 @@ void FileCacheManager::gc_file_caches() {
     // policy2: GC file cache by disk size
     if (gc_conf_size > 0) {
         for (size_t i = 0; i < contexts.size(); ++i) {
-            std::list<FileCachePtr> gc_file_list;
-            contexts[i].get_gc_file_caches(gc_file_list);
-            for (auto item : gc_file_list) {
-                std::shared_lock<std::shared_mutex> rdlock(_cache_map_lock);
-                // for dummy file cache, check already used or not again
-                if (item->is_dummy_file_cache() &&
-                    _file_cache_map.find(item->cache_dir().native()) != _file_cache_map.end()) {
-                    continue;
+            auto context = contexts[i];

Review Comment:
   replaced with auto&. Thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #14920: [enhancement](remote) support local cache GC at the granularity of cache files

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14920:
URL: https://github.com/apache/doris/pull/14920#issuecomment-1342258070

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org