You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/07/22 10:55:34 UTC

[GitHub] [doris] Lchangliang opened a new pull request, #11131: [Improvement] support tablet schema cache

Lchangliang opened a new pull request, #11131:
URL: https://github.com/apache/doris/pull/11131

   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem Summary:
   
   The PR is for #10136. In that, every rowset will presist a tablet schema and use in memory by rowsetMetaPB. But When tablets and rowsets in tablet too more,  memory will has a lot of pressure. So we need a way to avoid it.
   
   In this pr, it will implement a global tablet schema cache. Every rowset will hold a tablet schema sptr from cache. That mean even we has  tens of thousands of rowsets, if their schema is same, there will be one schema in memory.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (Yes/No/I Don't know)
   2. Has unit tests been added: (Yes/No/No Need)
   3. Has document been added or modified: (Yes/No/No Need)
   4. Does it need to update dependencies: (Yes/No)
   5. Are there any changes that cannot be rolled back: (Yes/No)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #11131: [Improvement] support tablet schema cache

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #11131:
URL: https://github.com/apache/doris/pull/11131#discussion_r930668475


##########
be/src/olap/rowset/rowset.cpp:
##########
@@ -32,7 +32,7 @@ Rowset::Rowset(const TabletSchema* schema, const std::string& tablet_path,
         _is_cumulative = version.first != version.second;
     }
     // build schema from RowsetMeta.tablet_schema or Tablet.tablet_schema
-    _schema = _rowset_meta->tablet_schema() != nullptr ? _rowset_meta->tablet_schema() : schema;
+    _schema = _rowset_meta->tablet_schema() ? _rowset_meta->tablet_schema().get() : schema;

Review Comment:
   should also change _schema in rowset.h to TabletSchemaSPtr. There maybe segment fault if we add clean logic in tablet schema cache.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #11131: [Improvement] support tablet schema cache

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #11131:
URL: https://github.com/apache/doris/pull/11131#discussion_r931882433


##########
be/src/olap/schema_change.cpp:
##########
@@ -1809,7 +1811,7 @@ Status SchemaChangeHandler::_do_process_alter_tablet_v2(const TAlterTabletReqV2&
     // delete handlers for new tablet
     DeleteHandler delete_handler;
     std::vector<ColumnId> return_columns;
-    auto base_tablet_schema = base_tablet->tablet_schema();
+    auto base_tablet_schema = *base_tablet->tablet_schema();

Review Comment:
   use copy from here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #11131: [Improvement] support tablet schema cache

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #11131:
URL: https://github.com/apache/doris/pull/11131#discussion_r930672800


##########
be/src/olap/tablet.cpp:
##########
@@ -116,7 +116,7 @@ Status Tablet::_init_once_action() {
     for (const auto& rs_meta : _tablet_meta->all_rs_metas()) {
         Version version = rs_meta->version();
         RowsetSharedPtr rowset;
-        res = RowsetFactory::create_rowset(&_schema, _tablet_path, rs_meta, &rowset);
+        res = RowsetFactory::create_rowset(_schema.get(), _tablet_path, rs_meta, &rowset);

Review Comment:
   Change RowsetFactory::create_rowset method, using TabletSchemaSPtr as input parameter.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #11131: [Improvement] support tablet schema cache

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #11131:
URL: https://github.com/apache/doris/pull/11131#discussion_r930672090


##########
be/src/olap/rowset/rowset_writer_context.h:
##########
@@ -55,7 +55,7 @@ struct RowsetWriterContext {
         context.tablet_schema_hash = new_tablet->schema_hash();
         context.rowset_type = new_rowset_type;
         context.tablet_path = new_tablet->tablet_path();
-        context.tablet_schema = &(new_tablet->tablet_schema());

Review Comment:
   Pls modify RowsetWriterContext and use TabletSchemaSPtr.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #11131: [Improvement] support tablet schema cache

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on code in PR #11131:
URL: https://github.com/apache/doris/pull/11131#discussion_r927723767


##########
be/src/olap/tablet_schema_cache.h:
##########
@@ -0,0 +1,62 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include <gen_cpp/olap_file.pb.h>
+
+#include <memory>
+#include <mutex>
+#include <unordered_map>
+
+#include "olap/tablet_schema.h"
+
+namespace doris {
+
+class TabletSchemaCache {
+public:
+    static void create_global_schema_cache() {
+        DCHECK(_s_instance == nullptr);
+        static TabletSchemaCache instance;
+        _s_instance = &instance;
+    }
+
+    static TabletSchemaCache* instance() { return _s_instance; }
+
+    std::shared_ptr<TabletSchema> insert(const std::string& key) {
+        std::lock_guard guard(_mtx);

Review Comment:
   Consider using `parallel-hashmap`, which may have better performance.
   
   And, it has `lazy_emplace_l`, `if_contains` to handle concurrency, which can avoid using locks manually. This looks more concise, although less well understood.
   
   Be careful not to use `try_emplace_l`, this behavior is not as expected https://github.com/greg7mdp/parallel-hashmap/issues/161



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #11131: [Improvement] support tablet schema cache

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #11131:
URL: https://github.com/apache/doris/pull/11131#discussion_r930666599


##########
be/src/olap/delta_writer.cpp:
##########
@@ -120,7 +120,8 @@ Status DeltaWriter::init() {
                                                                   _req.txn_id, _req.load_id));
     }
     // build tablet schema in request level
-    _build_current_tablet_schema(_req.index_id, _req.ptable_schema_param, _tablet->tablet_schema());
+    _build_current_tablet_schema(_req.index_id, _req.ptable_schema_param,

Review Comment:
   _build_current_tablet_schema will also copy tablet schema. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #11131: [Improvement] support tablet schema cache

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #11131:
URL: https://github.com/apache/doris/pull/11131#discussion_r930661945


##########
be/src/olap/rowset/rowset_meta.h:
##########
@@ -324,24 +337,38 @@ class RowsetMeta {
 
     int64_t newest_write_timestamp() const { return _rowset_meta_pb.newest_write_timestamp(); }
     void set_tablet_schema(const TabletSchema* tablet_schema) {

Review Comment:
   Do not pass in raw ptr here, could pass TabletSchemaSPtr if possible.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #11131: [Improvement] support tablet schema cache

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #11131:
URL: https://github.com/apache/doris/pull/11131#discussion_r930670691


##########
be/src/olap/rowset/rowset_meta.h:
##########
@@ -245,7 +251,14 @@ class RowsetMeta {
 
     void set_num_segments(int64_t num_segments) { _rowset_meta_pb.set_num_segments(num_segments); }
 
-    void to_rowset_pb(RowsetMetaPB* rs_meta_pb) const { *rs_meta_pb = _rowset_meta_pb; }
+    void to_rowset_pb(RowsetMetaPB* rs_meta_pb) const {
+        *rs_meta_pb = _rowset_meta_pb;
+        if (_schema) {
+            _schema->to_schema_pb(rs_meta_pb->mutable_tablet_schema());
+        }
+    }
+
+    // warning!don't use tablet_schema in rowset_meta_pb
     const RowsetMetaPB& get_rowset_pb() { return _rowset_meta_pb; }

Review Comment:
   change to  const RowsetMetaPB get_rowset_pb()  and create a new pb here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #11131: [Improvement] support tablet schema cache

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #11131:
URL: https://github.com/apache/doris/pull/11131#discussion_r930657541


##########
be/src/exec/olap_scanner.cpp:
##########
@@ -83,7 +83,7 @@ Status OlapScanner::prepare(
             LOG(WARNING) << ss.str();
             return Status::InternalError(ss.str());
         }
-        _tablet_schema = _tablet->tablet_schema();

Review Comment:
   better not copy tablet_schema, because TableSchema does not define copy constructor.  There is a filed  TabletColumn* _parent = nullptr; in tablet column, need change it to shared ptr?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #11131: [Improvement] support tablet schema cache

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #11131:
URL: https://github.com/apache/doris/pull/11131#discussion_r930666928


##########
be/src/exec/olap_scanner.cpp:
##########
@@ -83,7 +83,7 @@ Status OlapScanner::prepare(
             LOG(WARNING) << ss.str();
             return Status::InternalError(ss.str());
         }
-        _tablet_schema = _tablet->tablet_schema();

Review Comment:
   Maybe is better to use serizalie to pb and init_from_pb to build a new tablet schema.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei merged pull request #11131: [Improvement](light-schema-change) support tablet schema cache

Posted by GitBox <gi...@apache.org>.
yiguolei merged PR #11131:
URL: https://github.com/apache/doris/pull/11131


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #11131: [Improvement] support tablet schema cache

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #11131:
URL: https://github.com/apache/doris/pull/11131#discussion_r933031064


##########
be/src/olap/schema_change.cpp:
##########
@@ -2365,7 +2364,7 @@ Status SchemaChangeHandler::_parse_request(
     }
 
     const TabletSchema& ref_tablet_schema = *base_tablet_schema;
-    const TabletSchema& new_tablet_schema = new_tablet->tablet_schema();
+    const TabletSchema& new_tablet_schema = *new_tablet->tablet_schema();

Review Comment:
   use copy_from here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #11131: [Improvement] support tablet schema cache

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #11131:
URL: https://github.com/apache/doris/pull/11131#discussion_r933031064


##########
be/src/olap/schema_change.cpp:
##########
@@ -2365,7 +2364,7 @@ Status SchemaChangeHandler::_parse_request(
     }
 
     const TabletSchema& ref_tablet_schema = *base_tablet_schema;
-    const TabletSchema& new_tablet_schema = new_tablet->tablet_schema();
+    const TabletSchema& new_tablet_schema = *new_tablet->tablet_schema();

Review Comment:
   use copy_from here if you will modify the tablet schema. If you will not modify tablet schema, then just use share ptr here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #11131: [Improvement] support tablet schema cache

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #11131:
URL: https://github.com/apache/doris/pull/11131#discussion_r930664845


##########
be/src/olap/data_dir.cpp:
##########
@@ -474,10 +475,12 @@ Status DataDir::load() {
         }
         if (rowset_meta->rowset_state() == RowsetStatePB::COMMITTED &&
             rowset_meta->tablet_uid() == tablet->tablet_uid()) {
-            if (!rowset_meta->get_rowset_pb().has_tablet_schema()) {
-                rowset_meta->set_tablet_schema(&tablet->tablet_schema());
+            if (!rowset_meta->tablet_schema()) {

Review Comment:
   There many cases that call get_rowset_pb. For example in txn manager. If you erase schema from pb then you should add the schema to pb every time.
   I think it is better to return a new pb  and add tablet schema to the new pb during get rowset pb .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org