You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by "eldenmoon (via GitHub)" <gi...@apache.org> on 2023/01/30 12:13:39 UTC

[GitHub] [doris] eldenmoon opened a new pull request, #16263: [Improve](row-store) support row cache

eldenmoon opened a new pull request, #16263:
URL: https://github.com/apache/doris/pull/16263

   Currently only support PageCache of columns, but for row store we need a seperate cache to increase cache hit rate
   
   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem summary
   
   Describe your changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: 
       - [ ] Yes
       - [ ] No
       - [ ] I don't know
   2. Has unit tests been added:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   3. Has document been added or modified:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   4. Does it need to update dependencies:
       - [ ] Yes
       - [ ] No
   5. Are there any changes that cannot be rolled back:
       - [ ] Yes (If Yes, please explain WHY)
       - [ ] No
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #16263: [Improve](row-store) support row cache

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #16263:
URL: https://github.com/apache/doris/pull/16263#issuecomment-1418445731

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #16263: [Improve](row-store) support row cache

Posted by "hello-stephen (via GitHub)" <gi...@apache.org>.
hello-stephen commented on PR #16263:
URL: https://github.com/apache/doris/pull/16263#issuecomment-1408934046

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 34.41 seconds
    load time: 510 seconds
    storage size: 17123073238 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230130162356_clickbench_pr_87287.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #16263: [Improve](row-store) support row cache

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #16263:
URL: https://github.com/apache/doris/pull/16263#issuecomment-1414818558

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on a diff in pull request #16263: [Improve](row-store) support row cache

Posted by github-actions.
github-actions[bot] commented on code in PR #16263:
URL: https://github.com/apache/doris/pull/16263#discussion_r1090553753


##########
be/src/service/point_query_executor.h:
##########
@@ -63,10 +66,93 @@ class Reusable {
     int64_t _create_timestamp = 0;
 };
 
+// RowCache is a LRU cache for row store
+class RowCache {
+public:
+    // The cache key for row lru cache
+    struct RowCacheKey {
+        RowCacheKey(int64_t tablet_id, const Slice& key) : tablet_id(tablet_id), key(key) {}
+        int64_t tablet_id;
+        Slice key;
+
+        // Encode to a flat binary which can be used as LRUCache's key
+        std::string encode() const {
+            std::string full_key;
+            full_key.resize(sizeof(int64_t) + key.size);
+            int8store(&full_key.front(), tablet_id);
+            memcpy((&full_key.front()) + sizeof(tablet_id), key.data, key.size);
+            return full_key;
+        }
+    };
+
+    // A handle for RowCache entry. This class make it easy to handle
+    // Cache entry. Users don't need to release the obtained cache entry. This
+    // class will release the cache entry when it is destroyed.
+    class CacheHandle {
+    public:
+        CacheHandle() {}

Review Comment:
   warning: use '= default' to define a trivial default constructor [modernize-use-equals-default]
   
   ```suggestion
           CacheHandle() = default;
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] cambyzju commented on a diff in pull request #16263: [Improve](row-store) support row cache

Posted by "cambyzju (via GitHub)" <gi...@apache.org>.
cambyzju commented on code in PR #16263:
URL: https://github.com/apache/doris/pull/16263#discussion_r1091422198


##########
be/src/common/config.h:
##########
@@ -257,6 +258,8 @@ CONF_Int32(storage_page_cache_shard_size, "16");
 CONF_Int32(index_page_cache_percentage, "10");
 // whether to disable page cache feature in storage
 CONF_Bool(disable_storage_page_cache, "false");
+// whether to disable row cache feature in storage
+CONF_Bool(disable_storage_row_cache, "false");

Review Comment:
   If row cache do not used in most cases, please disable it by default.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Gabriel39 merged pull request #16263: [Improve](row-store) support row cache

Posted by "Gabriel39 (via GitHub)" <gi...@apache.org>.
Gabriel39 merged PR #16263:
URL: https://github.com/apache/doris/pull/16263


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] eldenmoon commented on a diff in pull request #16263: [Improve](row-store) support row cache

Posted by "eldenmoon (via GitHub)" <gi...@apache.org>.
eldenmoon commented on code in PR #16263:
URL: https://github.com/apache/doris/pull/16263#discussion_r1095308774


##########
be/src/olap/rowset/segment_v2/segment_writer.cpp:
##########
@@ -252,8 +253,13 @@ Status SegmentWriter::append_block(const vectorized::Block* block, size_t row_po
         if (_tablet_schema->keys_type() == UNIQUE_KEYS && _opts.enable_unique_key_merge_on_write) {
             // create primary indexes
             for (size_t pos = 0; pos < num_rows; pos++) {
-                RETURN_IF_ERROR(
-                        _primary_key_index_builder->add_item(_full_encode_keys(key_columns, pos)));
+                const std::string& key = _full_encode_keys(key_columns, pos);
+                RETURN_IF_ERROR(_primary_key_index_builder->add_item(key));
+                if (!config::disable_storage_row_cache && _tablet_schema->store_row_column() &&
+                    _opts.is_direct_write) {
+                    // invalidate cache
+                    RowCache::instance()->erase({_opts.rowset_ctx->tablet_id, key});

Review Comment:
   can not insert value because  load may be failed, and rowset is not visible



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] eldenmoon commented on a diff in pull request #16263: [Improve](row-store) support row cache

Posted by "eldenmoon (via GitHub)" <gi...@apache.org>.
eldenmoon commented on code in PR #16263:
URL: https://github.com/apache/doris/pull/16263#discussion_r1095300215


##########
be/src/common/config.h:
##########
@@ -246,6 +246,7 @@ CONF_mBool(row_nums_check, "true");
 // modify them upon necessity
 CONF_Int32(min_file_descriptor_number, "60000");
 CONF_Int64(index_stream_cache_capacity, "10737418240");
+CONF_String(row_cache_mem_limit, "20%");

Review Comment:
   high concurrent point query need memory to increase cache hit rate, I think 20% is OK for such work load



##########
be/src/vec/jsonb/serialize.cpp:
##########
@@ -319,4 +319,22 @@ void JsonbSerializeUtil::jsonb_to_block(const TupleDescriptor& desc,
     }
 }
 
+// single row
+void JsonbSerializeUtil::jsonb_to_block(const TupleDescriptor& desc, const Slice& data,

Review Comment:
   ok



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] eldenmoon commented on a diff in pull request #16263: [Improve](row-store) support row cache

Posted by "eldenmoon (via GitHub)" <gi...@apache.org>.
eldenmoon commented on code in PR #16263:
URL: https://github.com/apache/doris/pull/16263#discussion_r1091423632


##########
be/src/common/config.h:
##########
@@ -257,6 +258,8 @@ CONF_Int32(storage_page_cache_shard_size, "16");
 CONF_Int32(index_page_cache_percentage, "10");
 // whether to disable page cache feature in storage
 CONF_Bool(disable_storage_page_cache, "false");
+// whether to disable row cache feature in storage
+CONF_Bool(disable_storage_row_cache, "false");

Review Comment:
   row cache is used for row store, row store  is disabled by default, so for most cases this flag is unused



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #16263: [Improve](row-store) support row cache

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #16263:
URL: https://github.com/apache/doris/pull/16263#issuecomment-1415120926

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xiaokang commented on a diff in pull request #16263: [Improve](row-store) support row cache

Posted by "xiaokang (via GitHub)" <gi...@apache.org>.
xiaokang commented on code in PR #16263:
URL: https://github.com/apache/doris/pull/16263#discussion_r1094425555


##########
be/src/common/config.h:
##########
@@ -246,6 +246,7 @@ CONF_mBool(row_nums_check, "true");
 // modify them upon necessity
 CONF_Int32(min_file_descriptor_number, "60000");
 CONF_Int64(index_stream_cache_capacity, "10737418240");
+CONF_String(row_cache_mem_limit, "20%");

Review Comment:
   default 20% is a little large compared to page_cache



##########
be/src/vec/jsonb/serialize.cpp:
##########
@@ -319,4 +319,22 @@ void JsonbSerializeUtil::jsonb_to_block(const TupleDescriptor& desc,
     }
 }
 
+// single row
+void JsonbSerializeUtil::jsonb_to_block(const TupleDescriptor& desc, const Slice& data,

Review Comment:
   jsonb_to_block for single row can be reused by jsonb_to_block for multiple rows



##########
be/src/olap/rowset/segment_v2/segment_writer.cpp:
##########
@@ -252,8 +253,13 @@ Status SegmentWriter::append_block(const vectorized::Block* block, size_t row_po
         if (_tablet_schema->keys_type() == UNIQUE_KEYS && _opts.enable_unique_key_merge_on_write) {
             // create primary indexes
             for (size_t pos = 0; pos < num_rows; pos++) {
-                RETURN_IF_ERROR(
-                        _primary_key_index_builder->add_item(_full_encode_keys(key_columns, pos)));
+                const std::string& key = _full_encode_keys(key_columns, pos);
+                RETURN_IF_ERROR(_primary_key_index_builder->add_item(key));
+                if (!config::disable_storage_row_cache && _tablet_schema->store_row_column() &&
+                    _opts.is_direct_write) {
+                    // invalidate cache
+                    RowCache::instance()->erase({_opts.rowset_ctx->tablet_id, key});

Review Comment:
   can insert new value to cache



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org