You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by zhangao <ga...@qq.com.INVALID> on 2022/01/14 03:30:03 UTC

回复: [VOTE] PIP-129: Introduce intermediate state for ledger deletion

+1 (non-binding)




------------------&nbsp;原始邮件&nbsp;------------------
发件人:                                                                                                                        "dev"                                                                                    <mattisonchao@gmail.com&gt;;
发送时间:&nbsp;2022年1月14日(星期五) 中午11:23
收件人:&nbsp;"dev"<dev@pulsar.apache.org&gt;;

主题:&nbsp;Re: [VOTE] PIP-129: Introduce intermediate state for ledger deletion



+1 (non-binding)

Best,
Mattison

On Fri, 14 Jan 2022 at 11:19, Hang Chen <chenhang@apache.org&gt; wrote:

&gt; +1 (binding)
&gt;
&gt; Best,
&gt; Hang
&gt;
&gt; Zhanpeng Wu <wuzhanpeng.will@gmail.com&gt; 于2022年1月14日周五 10:37写道:
&gt; &gt;
&gt; &gt; This is the voting thread for PIP-129. It will stay open for at least 48
&gt; &gt; hours.&nbsp; Pasted below for quoting convenience.
&gt; &gt;
&gt; &gt; ----
&gt; &gt;
&gt; &gt; https://github.com/apache/pulsar/issues/13526
&gt; &gt;
&gt; &gt; ----
&gt; &gt;
&gt; &gt; ## Motivation
&gt; &gt;
&gt; &gt; Under the current ledger-trimming design in
&gt; &gt;
&gt; `org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl#internalTrimLedgers`,
&gt; &gt; we need to collect those ledgers that need to be deleted first, and then
&gt; &gt; perform the asynchronous deletion of the ledger concurrently, but we do
&gt; not
&gt; &gt; continue to pay attention to whether the deletion operation is completed.
&gt; &gt; If the meta-information update has been successfully completed but an
&gt; error
&gt; &gt; occurs during the asynchronous deletion, the ledger may not be deleted,
&gt; but
&gt; &gt; at the logical level we think that the deletion has been completed, which
&gt; &gt; will make this part of the data remain in the storage layer forever (such
&gt; &gt; as bk). As the usage time of the cluster becomes longer, the residual
&gt; data
&gt; &gt; that cannot be deleted will gradually increase.
&gt; &gt;
&gt; &gt; In order to achieve this goal, we can separate the logic of
&gt; &gt; meta-information update and ledger deletion. In the trimming process, we
&gt; &gt; can first mark which ledgers are deletable, and update the results to the
&gt; &gt; metadatastore. We can perform the deletion of marked ledgers
&gt; asynchronously
&gt; &gt; in the callback of updating the meta information, so that the original
&gt; &gt; logic can be retained seamlessly. Therefore, when we are rolling upgrade
&gt; or
&gt; &gt; rollback, the only difference is whether the deleted ledger is marked for
&gt; &gt; deletion.
&gt; &gt;
&gt; &gt; To be more specific:
&gt; &gt; 1. for upgrade, only the marker information of ledger has been added, and
&gt; &gt; the logical sequence of deletion has not changed.
&gt; &gt; 2. for rollback, some ledgers that have been marked for deletion may not
&gt; be
&gt; &gt; deleted due to the restart of the broker. This behavior is consistent
&gt; with
&gt; &gt; the original version.
&gt; &gt;
&gt; &gt; In addition, if the ledger that has been marked is not deleted
&gt; &gt; successfully, the marker will not be removed. So for this part of
&gt; ledgers,
&gt; &gt; every time trimming is triggered, it will be deleted again, which is
&gt; &gt; equivalent to a check and retry mechanism.
&gt; &gt;
&gt; &gt; ## Goal
&gt; &gt;
&gt; &gt; We need to modify some logic in
&gt; &gt;
&gt; `org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl#internalTrimLedgers`
&gt; &gt; so that the ledger deletion logic in ledger-trimming is split into two
&gt; &gt; stages, marking and deleting. Once the marker information is updated to
&gt; the
&gt; &gt; metadatastore, every trimming will try to trigger the ledger deletion
&gt; until
&gt; &gt; all the deleteable ledgers are successfully deleted.
&gt; &gt;
&gt; &gt; ## Implementation
&gt; &gt;
&gt; &gt; This proposal aims to separate the deletion logic in ledger-trimming, so
&gt; &gt; that `ManagedLedgerImpl#internalTrimLedgers` is responsible for marking
&gt; the
&gt; &gt; deletable ledgers and then perform actual ledger deletion according to
&gt; the
&gt; &gt; metadatastore.
&gt; &gt;
&gt; &gt; Therefore, the entire trimming process is broken down into the following
&gt; &gt; steps:
&gt; &gt;
&gt; &gt; 1. mark deletable ledgers and update ledger metadata.
&gt; &gt; 2. do acutual ledger deletion after metadata is updated.
&gt; &gt;
&gt; &gt; For step 1, we can store the marker of deletable information in
&gt; &gt; `org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl#propertiesMap`.
&gt; When
&gt; &gt; retrieving the deleted ledger information, we can directly query by
&gt; &gt; iterating `propertiesMap`. If this solution is not accepted, maybe we can
&gt; &gt; create a new znode to store these information, but this approach will not
&gt; &gt; be able to reuse the current design.
&gt; &gt;
&gt; &gt; For step 2, we can perform the deletion of marked ledgers asynchronously
&gt; in
&gt; &gt; the callback of updating the meta information. And every trimming will
&gt; &gt; trigger the check and delete for those deleteable ledgers.
&gt; &gt;
&gt; &gt; Related PR: https://github.com/apache/pulsar/pull/13575
&gt;