You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/07/01 10:00:51 UTC

[GitHub] [doris] compiletheworld opened a new pull request, #10548: [WIP](unique-key-merge-on-write) Add delete bitmap for DSIP-018

compiletheworld opened a new pull request, #10548:
URL: https://github.com/apache/doris/pull/10548

   # Proposed changes
   
   Add delete bitmap for 
   [DSIP-018: Support Merge-On-Write implementation for UNIQUE KEY data model](https://cwiki.apache.org/confluence/display/DORIS/DSIP-018%3A+Support+Merge-On-Write+implementation+for+UNIQUE+KEY+data+model)
   
   ## Problem Summary:
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (No)
   2. Has unit tests been added: (Yes)
   3. Has the document been added or modified: (No Need)
   4. Does it need to update dependencies: (No)
   5. Are there any changes that cannot be rolled back: (No)
   
   ## Further comments
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] compiletheworld commented on a diff in pull request #10548: [WIP](unique-key-merge-on-write) Add delete bitmap for DSIP-018

Posted by GitBox <gi...@apache.org>.
compiletheworld commented on code in PR #10548:
URL: https://github.com/apache/doris/pull/10548#discussion_r913537806


##########
be/src/olap/tablet_meta.h:
##########
@@ -214,9 +217,120 @@ class TabletMeta {
     RowsetTypePB _preferred_rowset_type = BETA_ROWSET;
     std::string _remote_storage_name;
     StorageMediumPB _storage_medium = StorageMediumPB::HDD;
+    std::unique_ptr<DeleteBitmap> _delete_bitmap;
     std::shared_mutex _meta_lock;
 };
 
+/**
+ * Wraps multiple bitmaps for recording rows (row id) that are deleted or
+ * overwritten.
+ *
+ * RowsetId and SegmentId are for locating segment, Version here is a single
+ * uint32_t means that at which "version" of the load causes the delete or
+ * overwrite.
+ *
+ * The start and end version of a load is the same, it's ok and straightforward
+ * to use a single uint32_t.
+ *
+ * e.g.
+ * There is a key "key1" in rowset id 1, version [1,1], segment id 1, row id 1.
+ * A new load also contains "key1", the rowset id 2, version [2,2], segment id 1
+ * the delete bitmap will be `{1,1,2} -> 1`, which means the "row id 1" in
+ * "rowset id 1, segment id 1" is deleted/overitten by some loads at "version 2"
+ */
+class DeleteBitmap {
+    public:

Review Comment:
   Done.



##########
gensrc/proto/olap_file.proto:
##########
@@ -283,3 +284,10 @@ message OLAPDataHeaderMessage {
 message OLAPRawDeltaHeaderMessage {
     required int32 schema_hash = 2;
 }
+
+message DeleteBitmapPB {
+    repeated string rowset_ids = 1;
+    repeated uint32 segment_ids = 2;
+    repeated int64 versions = 3;
+    repeated bytes segment_delete_bitmaps = 4;

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] spaces-X commented on a diff in pull request #10548: [WIP](unique-key-merge-on-write) Add delete bitmap for DSIP-018

Posted by GitBox <gi...@apache.org>.
spaces-X commented on code in PR #10548:
URL: https://github.com/apache/doris/pull/10548#discussion_r912741730


##########
be/src/olap/tablet_meta.cpp:
##########
@@ -710,4 +742,106 @@ bool operator!=(const TabletMeta& a, const TabletMeta& b) {
     return !(a == b);
 }
 
+DeleteBitmap::DeleteBitmap() {
+}
+
+DeleteBitmap::DeleteBitmap(const DeleteBitmap& o) {
+    delete_bitmap = o.delete_bitmap; // just copy data
+}
+
+DeleteBitmap& DeleteBitmap::operator=(const DeleteBitmap& o) {
+    delete_bitmap = o.delete_bitmap; // just copy data
+    return *this;
+}
+
+DeleteBitmap::DeleteBitmap(DeleteBitmap&& o) {
+    delete_bitmap = std::move(o.delete_bitmap);
+}
+
+DeleteBitmap& DeleteBitmap::operator=(DeleteBitmap&& o) {
+    delete_bitmap = std::move(o.delete_bitmap);
+    return *this;
+}
+
+DeleteBitmap DeleteBitmap::snapshot() const {
+    std::shared_lock l(lock);
+    return DeleteBitmap(*this);
+}
+
+void DeleteBitmap::add(const BitmapKey& bitmap, uint32_t row_id) {
+    std::lock_guard l(lock);

Review Comment:
   It looks like the granularity of the lock is tablet level.
   
   I wonder will there be a performance penalty for this part of serial update when loading concurrently that need to update multiple versions of bitmaps?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morningman commented on pull request #10548: [WIP](unique-key-merge-on-write) Add delete bitmap for DSIP-018

Posted by GitBox <gi...@apache.org>.
morningman commented on PR #10548:
URL: https://github.com/apache/doris/pull/10548#issuecomment-1177924418

   hi @compiletheworld , please fix the BE code format


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] kpfly commented on a diff in pull request #10548: [WIP](unique-key-merge-on-write) Add delete bitmap for DSIP-018

Posted by GitBox <gi...@apache.org>.
kpfly commented on code in PR #10548:
URL: https://github.com/apache/doris/pull/10548#discussion_r911822999


##########
gensrc/proto/olap_file.proto:
##########
@@ -283,3 +284,10 @@ message OLAPDataHeaderMessage {
 message OLAPRawDeltaHeaderMessage {
     required int32 schema_hash = 2;
 }
+
+message DeleteBitmapPB {
+    repeated string rowset_ids = 1;
+    repeated uint32 segment_ids = 2;
+    repeated int64 versions = 3;
+    repeated bytes segment_delete_bitmaps = 4;

Review Comment:
   add some comments please.



##########
be/src/olap/tablet_meta.h:
##########
@@ -214,9 +217,120 @@ class TabletMeta {
     RowsetTypePB _preferred_rowset_type = BETA_ROWSET;
     std::string _remote_storage_name;
     StorageMediumPB _storage_medium = StorageMediumPB::HDD;
+    std::unique_ptr<DeleteBitmap> _delete_bitmap;
     std::shared_mutex _meta_lock;
 };
 
+/**
+ * Wraps multiple bitmaps for recording rows (row id) that are deleted or
+ * overwritten.
+ *
+ * RowsetId and SegmentId are for locating segment, Version here is a single
+ * uint32_t means that at which "version" of the load causes the delete or
+ * overwrite.
+ *
+ * The start and end version of a load is the same, it's ok and straightforward
+ * to use a single uint32_t.
+ *
+ * e.g.
+ * There is a key "key1" in rowset id 1, version [1,1], segment id 1, row id 1.
+ * A new load also contains "key1", the rowset id 2, version [2,2], segment id 1
+ * the delete bitmap will be `{1,1,2} -> 1`, which means the "row id 1" in
+ * "rowset id 1, segment id 1" is deleted/overitten by some loads at "version 2"
+ */
+class DeleteBitmap {
+    public:

Review Comment:
   indentation 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] spaces-X commented on a diff in pull request #10548: [WIP](unique-key-merge-on-write) Add delete bitmap for DSIP-018

Posted by GitBox <gi...@apache.org>.
spaces-X commented on code in PR #10548:
URL: https://github.com/apache/doris/pull/10548#discussion_r912741730


##########
be/src/olap/tablet_meta.cpp:
##########
@@ -710,4 +742,106 @@ bool operator!=(const TabletMeta& a, const TabletMeta& b) {
     return !(a == b);
 }
 
+DeleteBitmap::DeleteBitmap() {
+}
+
+DeleteBitmap::DeleteBitmap(const DeleteBitmap& o) {
+    delete_bitmap = o.delete_bitmap; // just copy data
+}
+
+DeleteBitmap& DeleteBitmap::operator=(const DeleteBitmap& o) {
+    delete_bitmap = o.delete_bitmap; // just copy data
+    return *this;
+}
+
+DeleteBitmap::DeleteBitmap(DeleteBitmap&& o) {
+    delete_bitmap = std::move(o.delete_bitmap);
+}
+
+DeleteBitmap& DeleteBitmap::operator=(DeleteBitmap&& o) {
+    delete_bitmap = std::move(o.delete_bitmap);
+    return *this;
+}
+
+DeleteBitmap DeleteBitmap::snapshot() const {
+    std::shared_lock l(lock);
+    return DeleteBitmap(*this);
+}
+
+void DeleteBitmap::add(const BitmapKey& bitmap, uint32_t row_id) {
+    std::lock_guard l(lock);

Review Comment:
   It looks like the granularity of the lock is tablet level.
   
   I wonder will there be a performance penalty for this part of serial update when loading concurrently that need to update multiple versions of delete-bitmaps?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] zhannngchen commented on a diff in pull request #10548: [WIP](unique-key-merge-on-write) Add delete bitmap for DSIP-018

Posted by GitBox <gi...@apache.org>.
zhannngchen commented on code in PR #10548:
URL: https://github.com/apache/doris/pull/10548#discussion_r916508357


##########
be/src/olap/tablet_meta.h:
##########
@@ -214,9 +217,120 @@ class TabletMeta {
     RowsetTypePB _preferred_rowset_type = BETA_ROWSET;
     std::string _remote_storage_name;
     StorageMediumPB _storage_medium = StorageMediumPB::HDD;
+    std::unique_ptr<DeleteBitmap> _delete_bitmap;
     std::shared_mutex _meta_lock;
 };
 
+/**
+ * Wraps multiple bitmaps for recording rows (row id) that are deleted or
+ * overwritten.
+ *
+ * RowsetId and SegmentId are for locating segment, Version here is a single
+ * uint32_t means that at which "version" of the load causes the delete or
+ * overwrite.
+ *
+ * The start and end version of a load is the same, it's ok and straightforward
+ * to use a single uint32_t.
+ *
+ * e.g.
+ * There is a key "key1" in rowset id 1, version [1,1], segment id 1, row id 1.
+ * A new load also contains "key1", the rowset id 2, version [2,2], segment id 1
+ * the delete bitmap will be `{1,1,2} -> 1`, which means the "row id 1" in
+ * "rowset id 1, segment id 1" is deleted/overitten by some loads at "version 2"
+ */
+class DeleteBitmap {
+public:
+    mutable std::shared_mutex lock;
+    using SegmentId = uint32_t;
+    using Version = uint32_t;
+    using BitmapKey = std::tuple<RowsetId, SegmentId, Version>;
+    std::map<BitmapKey, roaring::Roaring> delete_bitmap; // Ordered map
+
+    DeleteBitmap();
+
+    /**
+     * Copy c-tor for making delete bitmap snapshot on read path
+     */
+    DeleteBitmap(const DeleteBitmap& r);
+    DeleteBitmap& operator=(const DeleteBitmap& r);
+    /**
+     * Move c-tor for making delete bitmap snapshot on read path
+     */
+    DeleteBitmap(DeleteBitmap&& r);
+    DeleteBitmap& operator=(DeleteBitmap&& r);
+
+    /**
+     * Makes a snapshot of delete bimap, read lock will be acquired in this
+     * process
+     */
+    DeleteBitmap snapshot() const;
+
+    /**
+     * Marks the specific row deleted
+     */
+    void add(const BitmapKey& bitmap, uint32_t row_id);

Review Comment:
   We should change all the BitmapKey parameter from `bitmap` to `bmk`, the name is confusing, since it's not bitmap, it's just a key.



##########
be/src/olap/tablet_meta.cpp:
##########
@@ -710,4 +742,106 @@ bool operator!=(const TabletMeta& a, const TabletMeta& b) {
     return !(a == b);
 }
 
+DeleteBitmap::DeleteBitmap() {
+}
+
+DeleteBitmap::DeleteBitmap(const DeleteBitmap& o) {
+    delete_bitmap = o.delete_bitmap; // just copy data
+}
+
+DeleteBitmap& DeleteBitmap::operator=(const DeleteBitmap& o) {
+    delete_bitmap = o.delete_bitmap; // just copy data
+    return *this;
+}
+
+DeleteBitmap::DeleteBitmap(DeleteBitmap&& o) {
+    delete_bitmap = std::move(o.delete_bitmap);
+}
+
+DeleteBitmap& DeleteBitmap::operator=(DeleteBitmap&& o) {
+    delete_bitmap = std::move(o.delete_bitmap);
+    return *this;
+}
+
+DeleteBitmap DeleteBitmap::snapshot() const {
+    std::shared_lock l(lock);
+    return DeleteBitmap(*this);
+}
+
+void DeleteBitmap::add(const BitmapKey& bitmap, uint32_t row_id) {
+    std::lock_guard l(lock);

Review Comment:
   > It looks like the granularity of the lock is tablet level.
   > 
   > I wonder will there be a performance penalty for this part of serial update when loading concurrently that need to update multiple versions of delete-bitmaps?
   
   We can't update the delete bitmap concurrently, at least in current design.
   A load will first lookup a row key before it update the delete map, but before that, it MUST see all previous versions data. If we can't guarantee that, data inconsistency will happen.
   
   e.g. 
   - current rowset layout : [0-5][6-6][7-7]
   - inflight load job: load1, load2, load3
   - load1 committed first, load2 in second, load3 committed in third. We CAN NOT update delete bitmap at this time, because load2 may overwrite some data in load1, but it can't see load1
   - load2 published first, with version [9-9](which is determined at commit stage), load1 published in second, with version [8-8], load 3 published in third, with version[10-10]. The publish sequence is not acceptable, since load2 didn't update the delete bitmap on rowset[8-8] correctly.
   - So we have to guarantee that load1 published first, the load2, then load3



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #10548: [WIP](unique-key-merge-on-write) Add delete bitmap for DSIP-018

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #10548:
URL: https://github.com/apache/doris/pull/10548#issuecomment-1181475605

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #10548: [WIP](unique-key-merge-on-write) Add delete bitmap for DSIP-018

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #10548:
URL: https://github.com/apache/doris/pull/10548#issuecomment-1177924816

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #10548: [WIP](unique-key-merge-on-write) Add delete bitmap for DSIP-018

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #10548:
URL: https://github.com/apache/doris/pull/10548#issuecomment-1174786495

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] compiletheworld commented on a diff in pull request #10548: [WIP](unique-key-merge-on-write) Add delete bitmap for DSIP-018

Posted by GitBox <gi...@apache.org>.
compiletheworld commented on code in PR #10548:
URL: https://github.com/apache/doris/pull/10548#discussion_r913537582


##########
be/src/olap/tablet_meta.cpp:
##########
@@ -710,4 +742,106 @@ bool operator!=(const TabletMeta& a, const TabletMeta& b) {
     return !(a == b);
 }
 
+DeleteBitmap::DeleteBitmap() {
+}
+
+DeleteBitmap::DeleteBitmap(const DeleteBitmap& o) {
+    delete_bitmap = o.delete_bitmap; // just copy data
+}
+
+DeleteBitmap& DeleteBitmap::operator=(const DeleteBitmap& o) {
+    delete_bitmap = o.delete_bitmap; // just copy data
+    return *this;
+}
+
+DeleteBitmap::DeleteBitmap(DeleteBitmap&& o) {
+    delete_bitmap = std::move(o.delete_bitmap);
+}
+
+DeleteBitmap& DeleteBitmap::operator=(DeleteBitmap&& o) {
+    delete_bitmap = std::move(o.delete_bitmap);
+    return *this;
+}
+
+DeleteBitmap DeleteBitmap::snapshot() const {
+    std::shared_lock l(lock);
+    return DeleteBitmap(*this);
+}
+
+void DeleteBitmap::add(const BitmapKey& bitmap, uint32_t row_id) {
+    std::lock_guard l(lock);

Review Comment:
   Your concern makes sense, however, 
   1. This `add` operation is in-memory and fairly fast enough, and contention on this lock is not significant because there won't be thousands of concurrent loads on a single tablet.
   2. The read lock is acquired when the tablet meta is being saved (called by `TabletManager::do_tablet_meta_checkpoint`), serialized bitmap will go to RocksDB, more precisely -- page cache of WAL, it's also kind of memory operation.
   
   It is OK that we don't optimize the lock granularity for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morningman merged pull request #10548: [WIP](unique-key-merge-on-write) Add delete bitmap for DSIP-018

Posted by GitBox <gi...@apache.org>.
morningman merged PR #10548:
URL: https://github.com/apache/doris/pull/10548


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] zhannngchen commented on pull request #10548: [WIP](unique-key-merge-on-write) Add delete bitmap for DSIP-018

Posted by GitBox <gi...@apache.org>.
zhannngchen commented on PR #10548:
URL: https://github.com/apache/doris/pull/10548#issuecomment-1178629366

   @compiletheworld  the formatter failed, pls fix it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org