You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/06/21 04:17:43 UTC

[GitHub] [pulsar] Jason918 opened a new issue, #16153: PIP-180: Shadow Topic, an alternative way to support readonly topic ownership.

Jason918 opened a new issue, #16153:
URL: https://github.com/apache/pulsar/issues/16153

   <!---
   Instructions for creating a PIP using this issue template:
   
    1. The author(s) of the proposal will create a GitHub issue ticket using this template.
       (Optionally, it can be helpful to send a note discussing the proposal to
       dev@pulsar.apache.org mailing list before submitting this GitHub issue. This discussion can
       help developers gauge interest in the proposed changes before formalizing the proposal.)
    2. The author(s) will send a note to the dev@pulsar.apache.org mailing list
       to start the discussion, using subject prefix `[PIP] xxx`. To determine the appropriate PIP
       number `xxx`, inspect the mailing list (https://lists.apache.org/list.html?dev@pulsar.apache.org)
       for the most recent PIP. Add 1 to that PIP's number to get your PIP's number.
    3. Based on the discussion and feedback, some changes might be applied by
       the author(s) to the text of the proposal.
    4. Once some consensus is reached, there will be a vote to formally approve
       the proposal. The vote will be held on the dev@pulsar.apache.org mailing list. Everyone
       is welcome to vote on the proposal, though it will considered to be binding
       only the vote of PMC members. It will be required to have a lazy majority of
       at least 3 binding +1s votes. The vote should stay open for at least 48 hours.
    5. When the vote is closed, if the outcome is positive, the state of the
       proposal is updated and the Pull Requests associated with this proposal can
       start to get merged into the master branch.
   
   -->
   
   ## Motivation
   
   The motivation is the same as [PIP-63](https://github.com/apache/pulsar/wiki/PIP-63%3A-Readonly-Topic-Ownership-Support), with a new use case support for 100K subscriptions broadcast in a single topic.
   1. The bandwidth of a broker limits the number of subscriptions for a single topic.
   2. Subscriptions are competing for the network bandwidth on brokers. Different subscriptions might have different levels of severity.
   3. When synchronizing cross-city message reading, cross-city access needs to be minimized.
   4. [New] Broadcast with 100K subscriptions. There is a limitation of the subscription number of a single topic. It's tested that with 40K subscriptions of a single topic, the client needs about 20min to start all the connections, and under the message producer rate of 1 msg/s, the average end to end latency is about 3s.
   
   But it's too complicated to implement with original PIP-63 proposal, the changed code is already over 3K+ lines, see #11960. And there are still some problems left,
   1. The LAC in readonly topic is updated in a polling pattern, which increases the bookie load bookie.
   2. The message data of readonly topic won't be cached in broker. Increase the network usage between broker and bookie when there are more than one subscriber is tail-reading.
   3. All the subscriptions is managed in original writable-topic, so the support max subscription number is not scaleable.
   
   This PIP tries to come up with a simpler solution to support readonly topic ownership and solve the problems the previous PR left.
   
   
   ## Goal
   
   The goal is to implement the feature of Shadow Topic to support readonly topic ownership.
   A shadow topic is the shadow of some normal persistent topic (let's call it source topic here). The source topic and the shadow topic must have the same number of partitions or both non-partitioned. Multiply shadow topics can be created from a source topic.
   
   Shadow topic share the same ledger id list as its source topic. User can't produce any messages to shadow topic directly and shadow topic don't create any new ledger for messages, all messages in shadow topic come from source topic.
   
   Shadow topic have its own subscriptions and don't share with its source topic. This means the shadow topic have its own cursor ledger to store persistent mark-delete info for each persistent subscriptions.
   
   The idea of this solution come from geo-replication, but shares the same ledgers to avoid message storage duplication. This is supported by shadow replication, which is very like geo-replication, with these difference:
   1. Geo-replication only works between the same topic in different broker clusters. But shadow topic have no naming limitation and they can be in the same cluster.
   2. Geo-replication duplicates data storage, but shadow topic won't.
   3. Geo-replication replicates data from each other, it's bidirectional, but shadow replication only have one way data flow.
   
   
   ## API Changes
   
   1. PulsarApi.proto
   Shadow topic need to know the original message id of the replicated messages, in order to update new ledger and lac.
   So we need add a `shadow_message_id` in CommandSend for replicator.
   ```
   message CommandSend {
      // ...
      // message id for shadow topic
      optional MessageIdData shadow_message_id = 9;
   }
   ```
   2. Admin API for creating shadow topic with source topic
   admin.topics().createShadowTopic(source-topic-name, shadow-topic-name)
   
   
   ## Implementation
   <img width="1286" alt="image" src="https://user-images.githubusercontent.com/2770146/174714660-81a87b86-137f-4235-a988-da424768a615.png">
   
   
   There are two key changes for implementation.
   1. How to replicate messages to shadow topics.
   2. How shadow topic manage shared ledgers info.
   
   How to replicate messages to shadow topics.
   This part is mostly implemented by ShadowReplicator, which extends PersistentReplicator introduced in geo-replication. The shadow topic list is added in the topic policy of the source topic. Source topic manage the lifecycle of all the replicators. The key is to add `shadow_message_id` when produce message to shadow topics. 
   
   How shadow topic manage shared ledgers info.
   This part is mostly implemented by ShadowManagedLedger, which extends current `ManagedLedgerImpl` with two key overrides.
   1. `initialize(..)`: 
   	a. Fetch ManagedLedgerInfo of source topic instead of current shadow topic. The source topic name is stored in the topic policy of the shadow topic.
   	b. Open the last ledger and read the explicit LAC from bookie, instead of creating new ledger. Reading LAC here requires that the source topic must enable explicit LAC feature by set `bookkeeperExplicitLacIntervalInMills` to non-zero value in broker.conf.
   	c. Do not start checkLedgerRollTask, which tries roll over ledger periodically
   2. `internalAsyncAddEntry()`: instead of write data to bookie, It only update metadata of ledgers, like currentLedger, lastConfirmedEntry and put the replicated message into cache.
   Besides, some other problems need to be taken care of. For example, 
   - LedgerInfos may be updated in source topic by offloader or ledger deletion, Shadow topic needs to watch the ledger info updating with metadata store and update in time.
   - Refresh LastAddConfirmed when a managed cursor requests entries beyond known LAC.
   
   ## Reject Alternatives
   
   See PIP-63
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] Jason918 commented on issue #16153: PIP-180: Shadow Topic, an alternative way to support readonly topic ownership.

Posted by "Jason918 (via GitHub)" <gi...@apache.org>.
Jason918 commented on issue #16153:
URL: https://github.com/apache/pulsar/issues/16153#issuecomment-1502859987

   > Hi, @Jason918. What is the status of this PIP? I see [that](https://github.com/apache/pulsar/pull/18265#issue-1429430819) the basic function is ready for this PIP. So my understanding is that some features are still left to be completed, right?
   > 
   > I move it to the 3.1.0 milestone first. Feel free to ping me if there are any problems.
   
   OK, let's move it to 3.1 first. We need add doc and more test of this. @StevenLuMT  will continue the work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] github-actions[bot] commented on issue #16153: PIP-180: Shadow Topic, an alternative way to support readonly topic ownership.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #16153:
URL: https://github.com/apache/pulsar/issues/16153#issuecomment-1276939022

   The issue had no activity for 30 days, mark with Stale label.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] github-actions[bot] commented on issue #16153: PIP-180: Shadow Topic, an alternative way to support readonly topic ownership.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #16153:
URL: https://github.com/apache/pulsar/issues/16153#issuecomment-1242850338

   The issue had no activity for 30 days, mark with Stale label.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] github-actions[bot] commented on issue #16153: PIP-180: Shadow Topic, an alternative way to support readonly topic ownership.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #16153:
URL: https://github.com/apache/pulsar/issues/16153#issuecomment-1312619723

   The issue had no activity for 30 days, mark with Stale label.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] RobertIndie commented on issue #16153: PIP-180: Shadow Topic, an alternative way to support readonly topic ownership.

Posted by "RobertIndie (via GitHub)" <gi...@apache.org>.
RobertIndie commented on issue #16153:
URL: https://github.com/apache/pulsar/issues/16153#issuecomment-1502617598

   Hi, @Jason918. What is the status of this PIP? I see [that](https://github.com/apache/pulsar/pull/18265#issue-1429430819) the basic function is ready for this PIP. So my understanding is that some features are still left to be completed, right?
   
   I move it to the 3.1.0 milestone first. Feel free to ping me if there are any problems.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org