You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/12/12 08:53:03 UTC

[GitHub] [doris] nextdreamblue opened a new pull request, #15018: [fix](merge-on-write) calc delete bitmap need all segments which _do_flush in one memtable

nextdreamblue opened a new pull request, #15018:
URL: https://github.com/apache/doris/pull/15018

   when some case(need modify be.conf), a memtable may flush many segments and then calc delete bitmap with new data. but now, it just only load one segment with max sgement id and this bug will not cala delte bitmap with all data of all segment of one memtable, and will get many rows with same key from merge-on-write table.
   
   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem summary
   
   Describe your changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: 
       - [ ] Yes
       - [x] No
       - [ ] I don't know
   2. Has unit tests been added:
       - [ ] Yes
       - [ ] No
       - [x] No Need
   3. Has document been added or modified:
       - [ ] Yes
       - [ ] No
       - [x] No Need
   4. Does it need to update dependencies:
       - [ ] Yes
       - [x] No
   5. Are there any changes that cannot be rolled back:
       - [ ] Yes (If Yes, please explain WHY)
       - [x] No
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] nextdreamblue commented on a diff in pull request #15018: [fix](merge-on-write) calc delete bitmap need all segments which _do_flush in one memtable

Posted by GitBox <gi...@apache.org>.
nextdreamblue commented on code in PR #15018:
URL: https://github.com/apache/doris/pull/15018#discussion_r1047185051


##########
be/src/olap/memtable.h:
##########
@@ -224,6 +224,8 @@ class MemTable {
     DeleteBitmapPtr _delete_bitmap;
     RowsetIdUnorderedSet _rowset_ids;
     int64_t _cur_max_version;
+    // seg ids in one _do_flush for calc delete bitmap
+    std::vector<int64_t> _seg_ids;

Review Comment:
   it is better, i will fix code by this way



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #15018: [fix](merge-on-write) calc delete bitmap need all segments which _do_flush in one memtable

Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #15018:
URL: https://github.com/apache/doris/pull/15018#issuecomment-1346253830

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 36.1 seconds
    load time: 481 seconds
    storage size: 17123356926 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221212104202_clickbench_pr_62080.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] liaoxin01 commented on a diff in pull request #15018: [fix](merge-on-write) calc delete bitmap need all segments which _do_flush in one memtable

Posted by GitBox <gi...@apache.org>.
liaoxin01 commented on code in PR #15018:
URL: https://github.com/apache/doris/pull/15018#discussion_r1045617277


##########
be/src/olap/rowset/rowset_writer.h:
##########
@@ -62,10 +62,12 @@ class RowsetWriter {
     virtual Status flush_columns() { return Status::Error<ErrorCode::NOT_IMPLEMENTED_ERROR>(); }
     virtual Status final_flush() { return Status::Error<ErrorCode::NOT_IMPLEMENTED_ERROR>(); }
 
-    virtual Status flush_single_memtable(MemTable* memtable, int64_t* flush_size) {
+    virtual Status flush_single_memtable(MemTable* memtable, int64_t* flush_size,
+                                         std::vector<int64_t>* seg_ids_for_delete_bitmap) {

Review Comment:
   seg_ids_for_delete_bitmap -> seg_ids may be more generic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15018: [fix](merge-on-write) calc delete bitmap need all segments which _do_flush in one memtable

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15018:
URL: https://github.com/apache/doris/pull/15018#issuecomment-1352735985

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] zhannngchen commented on a diff in pull request #15018: [fix](merge-on-write) calc delete bitmap need all segments which _do_flush in one memtable

Posted by GitBox <gi...@apache.org>.
zhannngchen commented on code in PR #15018:
URL: https://github.com/apache/doris/pull/15018#discussion_r1047075452


##########
be/src/olap/memtable.h:
##########
@@ -224,6 +224,8 @@ class MemTable {
     DeleteBitmapPtr _delete_bitmap;
     RowsetIdUnorderedSet _rowset_ids;
     int64_t _cur_max_version;
+    // seg ids in one _do_flush for calc delete bitmap
+    std::vector<int64_t> _seg_ids;

Review Comment:
   We don't need to record a vector, for MoW table, memtable is flushed one by one (`should_serial = true` when `create_flush_token`), and `segment_id` is monotonically increasing, so we just need to record the segment id before flushing, such as `num_segs_before_flush`, and we just need to load segments between `num_seg_before_flush` and `num_segments`
   BTW. in this way, we don't need to change the interface definition of method `add_block`, which looks a little wired in current solution.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] nextdreamblue commented on a diff in pull request #15018: [fix](merge-on-write) calc delete bitmap need all segments which _do_flush in one memtable

Posted by GitBox <gi...@apache.org>.
nextdreamblue commented on code in PR #15018:
URL: https://github.com/apache/doris/pull/15018#discussion_r1049284511


##########
be/src/olap/memtable.h:
##########
@@ -224,6 +224,8 @@ class MemTable {
     DeleteBitmapPtr _delete_bitmap;
     RowsetIdUnorderedSet _rowset_ids;
     int64_t _cur_max_version;
+    // seg ids in one _do_flush for calc delete bitmap
+    std::vector<int64_t> _seg_ids;

Review Comment:
   找到解决办法了,新的segment是依据writer里的_num_segment来单调递增的,加了一个获取这个值的办法,flush前后就是那些需要load的segment



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] zhannngchen commented on a diff in pull request #15018: [fix](merge-on-write) calc delete bitmap need all segments which _do_flush in one memtable

Posted by GitBox <gi...@apache.org>.
zhannngchen commented on code in PR #15018:
URL: https://github.com/apache/doris/pull/15018#discussion_r1049209184


##########
be/src/olap/memtable.h:
##########
@@ -224,6 +224,8 @@ class MemTable {
     DeleteBitmapPtr _delete_bitmap;
     RowsetIdUnorderedSet _rowset_ids;
     int64_t _cur_max_version;
+    // seg ids in one _do_flush for calc delete bitmap
+    std::vector<int64_t> _seg_ids;

Review Comment:
   @nextdreamblue segment compaction 完成之前,_num_segment 这个变量是单调递增的,问题主要出在rowset_meta 里记录的 segment 数量这里。由于更新 delete bitmap时用的是由 build_tmp()构建的临时的 rowset,这时候不可能做完 segment compaction,所以可以加一个特殊的处理来在 build_tmp()时获取准确的 num_segments
   ```
   void BetaRowsetWriter::_build_rowset_meta(std::shared_ptr<RowsetMeta> rowset_meta, bool temporary) {
       int64_t num_seg = _is_segcompacted() && !temporary ? _num_segcompacted : _num_segment;
   ```
   @freemandealer pls also help to review this change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] zhannngchen commented on a diff in pull request #15018: [fix](merge-on-write) calc delete bitmap need all segments which _do_flush in one memtable

Posted by GitBox <gi...@apache.org>.
zhannngchen commented on code in PR #15018:
URL: https://github.com/apache/doris/pull/15018#discussion_r1047075452


##########
be/src/olap/memtable.h:
##########
@@ -224,6 +224,8 @@ class MemTable {
     DeleteBitmapPtr _delete_bitmap;
     RowsetIdUnorderedSet _rowset_ids;
     int64_t _cur_max_version;
+    // seg ids in one _do_flush for calc delete bitmap
+    std::vector<int64_t> _seg_ids;

Review Comment:
   We don't need to record a vector, for MoW table, memtable is flushed one by one (`should_serial = true` when `create_flush_token`), and `segment_id` is monotonically increasing, so we just need to record the segment id before flushing.
   BTW. in this way, we don't need to change the interface definition of method `add_block`, which looks a little wired in current solution.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] nextdreamblue commented on a diff in pull request #15018: [fix](merge-on-write) calc delete bitmap need all segments which _do_flush in one memtable

Posted by GitBox <gi...@apache.org>.
nextdreamblue commented on code in PR #15018:
URL: https://github.com/apache/doris/pull/15018#discussion_r1049187325


##########
be/src/olap/memtable.h:
##########
@@ -224,6 +224,8 @@ class MemTable {
     DeleteBitmapPtr _delete_bitmap;
     RowsetIdUnorderedSet _rowset_ids;
     int64_t _cur_max_version;
+    // seg ids in one _do_flush for calc delete bitmap
+    std::vector<int64_t> _seg_ids;

Review Comment:
   @zhannngchen 
   貌似通过segment num来判断不大行,如果打开segment compaction,segment个数其实是一直上下浮动的。
   flush_single_memtable(block)这个地方一开始就会判断是否可以进行segment compaction,如果可以就会触发一组segment compaction,会导致num_segments()不准,比如可能10个合并成1个,num_segements()返回1,继续新插入的segement的id是继续递增的,11/12/13...,必须使用这个真实的序号去获取segment的文件去load_segment才能读到正确的data文件,否则可能会读到已经被合并的不存在的文件,或者重复读取老segment导致仍有遗漏。
   所以这个地方必须精确拿到每次执行时新增的,未被compaction的segement的真实id才行



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] nextdreamblue commented on a diff in pull request #15018: [fix](merge-on-write) calc delete bitmap need all segments which _do_flush in one memtable

Posted by GitBox <gi...@apache.org>.
nextdreamblue commented on code in PR #15018:
URL: https://github.com/apache/doris/pull/15018#discussion_r1049187325


##########
be/src/olap/memtable.h:
##########
@@ -224,6 +224,8 @@ class MemTable {
     DeleteBitmapPtr _delete_bitmap;
     RowsetIdUnorderedSet _rowset_ids;
     int64_t _cur_max_version;
+    // seg ids in one _do_flush for calc delete bitmap
+    std::vector<int64_t> _seg_ids;

Review Comment:
   @zhannngchen 
   貌似通过segment num来判断不大行,如果打开segment compaction,segment个数其实是一直上下浮动的。
   flush_single_memtable(block)这个地方一开始就会判断是否可以进行segment compaction,如果可以就会触发一组segment compaction,会导致num_segments()不准,比如可能10个合并成1个,num_segements()返回1,继续新插入的segement的id是继续递增的,11/12/13...,必须使用这个真实的序号去获取segment的文件去load_segment才能读到正确的data文件,否则可能会读到已经被合并的不存在的文件,或者读到已有的老segment。
   所以这个地方必须精确拿到每次执行时新增的,未被compaction的segement的真实id才行



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15018: [fix](merge-on-write) calc delete bitmap need all segments which _do_flush in one memtable

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15018:
URL: https://github.com/apache/doris/pull/15018#issuecomment-1346127100

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15018: [fix](merge-on-write) calc delete bitmap need all segments which _do_flush in one memtable

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15018:
URL: https://github.com/apache/doris/pull/15018#issuecomment-1346285047

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15018: [fix](merge-on-write) calc delete bitmap need all segments which _do_flush in one memtable

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15018:
URL: https://github.com/apache/doris/pull/15018#issuecomment-1352710896

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei merged pull request #15018: [fix](merge-on-write) calc delete bitmap need all segments which _do_flush in one memtable

Posted by GitBox <gi...@apache.org>.
yiguolei merged PR #15018:
URL: https://github.com/apache/doris/pull/15018


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15018: [fix](merge-on-write) calc delete bitmap need all segments which _do_flush in one memtable

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15018:
URL: https://github.com/apache/doris/pull/15018#issuecomment-1352715684

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15018: [fix](merge-on-write) calc delete bitmap need all segments which _do_flush in one memtable

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15018:
URL: https://github.com/apache/doris/pull/15018#issuecomment-1346114651

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] nextdreamblue commented on a diff in pull request #15018: [fix](merge-on-write) calc delete bitmap need all segments which _do_flush in one memtable

Posted by GitBox <gi...@apache.org>.
nextdreamblue commented on code in PR #15018:
URL: https://github.com/apache/doris/pull/15018#discussion_r1045667912


##########
be/src/olap/rowset/rowset_writer.h:
##########
@@ -62,10 +62,12 @@ class RowsetWriter {
     virtual Status flush_columns() { return Status::Error<ErrorCode::NOT_IMPLEMENTED_ERROR>(); }
     virtual Status final_flush() { return Status::Error<ErrorCode::NOT_IMPLEMENTED_ERROR>(); }
 
-    virtual Status flush_single_memtable(MemTable* memtable, int64_t* flush_size) {
+    virtual Status flush_single_memtable(MemTable* memtable, int64_t* flush_size,
+                                         std::vector<int64_t>* seg_ids_for_delete_bitmap) {

Review Comment:
   ok
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org