You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Enrico Olivelli <eo...@gmail.com> on 2022/07/05 08:03:43 UTC

Re: [DISCUSS] PIP-180: Shadow Topic, an alternative way to support readonly topic ownership.

I have a couple of additional questions.

1. Security
What about security permissions about the shadow topic ?
We are reading from another topic.
I think we must clarify the decisions in the PIP

2. Truncation and deletion
What happens when you truncate or delete the source topic ?
please add a paragraph on the PIP

3. Offloaders
We are talking about BK metadata, how do Shadow Topics work with
Offloaded ledgers ?
Please clarify in the PIP

4. Changes in the number of partitions
the PIP says that the number of partitions must match the source topic.
Are we preventing changes to the number of partitions in the source topic ?

5. Topic stats
We should add information on the source topic and on the shadow topic.
Please clarify or draft your intentions in the PIP

6. GeoReplication
I guess that GeoReplication will not be possible for shadow topics.
Please clarify on the PIP

I believe that this feature is very powerful, but we must design it
carefully and discuss
about all the edge cases. Otherwise we will end up in something that
is half-baked
and we will have to resolve edge cases while developing or after going
to production.

Every feature must be fully integrated with the rest of Pulsar

Enrico

Il giorno mer 29 giu 2022 alle ore 08:40 Haiting Jiang
<ji...@apache.org> ha scritto:
>
> Hi Penghui
>
> On 2022/06/29 04:07:35 PengHui Li wrote:
> > Hi Haiting,
> >
> > Thanks for the explanation. I'm clear for now.
> >
> > Pulsar functions also can do such things by connecting data from one topic
> > to another topic.
> > But the difference is this proposal only copies the data to the cache of
> > another topic, and the data not
> > in the cache is also available by reading from ledgers.
> >
> > And this approach also follows benefits compared with replicating data to
> > multiple "real" topics.
> >
> > - reuse the topic metadata
> > - the same message ID which easy for troubleshooting
> >
> > Just one question
> >
> > >>>>>>>
> > ```
> > message CommandSend { // ... // message id for shadow topic optional
> >    MessageIdData shadow_message_id = 9; }
> > ```
> >
> > Can we get the message ID from the replicated data to avoid introducing a
> > new command?
> > Or use a marker message to avoid broker-to-broker directly protobuf command
> > interaction.
> >
> Sorry for not wrote it clearly. CommandSend is not a new command. It's exactly the main
> command producer used to send message to broker. The only change is add a new field in it.
> The whole command proto would be like this:
> ```
> message CommandSend {
>     required uint64 producer_id = 1;
>     required uint64 sequence_id = 2;
>     optional int32 num_messages = 3 [default = 1];
>     optional uint64 txnid_least_bits = 4 [default = 0];
>     optional uint64 txnid_most_bits = 5 [default = 0];
>
>     /// Add highest sequence id to support batch message with external sequence id
>     optional uint64 highest_sequence_id = 6 [default = 0];
>     optional bool is_chunk     =7 [default = false];
>
>     // Specify if the message being published is a Pulsar marker or not
>     optional bool marker = 8 [default = false];
>
>     // message id for shadow topic
>     optional MessageIdData shadow_message_id = 9;
> }
> ```
> So there won't be any broker-to-broker directly protobuf command interactions.
>
> Thanks,
> Haiting
>
> > Thanks,
> > Penghui
> >
> > On Wed, Jun 29, 2022 at 10:31 AM Haiting Jiang <ji...@apache.org>
> > wrote:
> >
> > > Hi Penghui & Asaf:
> > >
> > > Please allow me to provide some more detailes about **metadata**
> > > synchronization
> > > between source topic and shadow topic.
> > >
> > > 1.When shadow topic initializes, it will read from metadata store path
> > > "/managed-ledgers/{source_topic_ledger_name}", which contains all the
> > > managed ledger info. We don't
> > > need to read the  ledger information from source topic broker.
> > >
> > > 2. When shadow topic received new message from replicator, if the ledger
> > > id of the message
> > >  is the same as the last ledger, it just updates the LAC. If not, it will
> > > update ledger list from metadata,
> > > and then open the new ledger handle and update the LAC.
> > >
> > > As for the copy itself and add shadow message id in CommandSend, it mostly
> > > serves the purpose
> > > of filling the EntryCache.
> > >
> > > Thanks,
> > > Haiting
> > >
> > > On 2022/06/23 02:08:46 PengHui Li wrote:
> > > > > One question comes to mind here: Why not simply read the ledger
> > > information
> > > > from original topic, without copy?
> > > >
> > > > I think this is a good idea.
> > > >
> > > > Penghui
> > > > On Jun 22, 2022, 23:57 +0800, dev@pulsar.apache.org, wrote:
> > > > >
> > > > > One question comes to mind here: Why not simply read the ledger
> > > information
> > > > > from original topic, without copy?
> > > >
> > >
> >

Re: [DISCUSS] PIP-180: Shadow Topic, an alternative way to support readonly topic ownership.

Posted by Asaf Mesika <as...@gmail.com>.
Hi Haiting,

I've noticed Shadow Topic is *not* covered in the documentation.

Can you please add documentation for it?

It's such a great feature, it's almost a waste not having it documented.

Once it is, I can it to the proud list of features Pulsar have in this
lovely page we recently launched: https://pulsar.apache.org/features/


On Fri, Jul 29, 2022 at 12:29 PM Haiting Jiang <ji...@gmail.com>
wrote:

> > I think that there are still some references to shadow_message_id but
> > IIUC we don't need to add that field anymore because we are going to
> > use
>
>
> > optional MessageIdData message_id = 9;
>
>
> > is this correct ?
>
>
> Yes, no more `shadow_message_id`. Sorry I missed these references, already
> updated.
>
> Thanks,
> Haiting
>
> On Fri, Jul 29, 2022 at 5:15 PM Enrico Olivelli <eo...@gmail.com>
> wrote:
>
> > Il giorno ven 29 lug 2022 alle ore 09:05 Haiting Jiang
> > <ji...@apache.org> ha scritto:
> > >
> > > Hi Enrico,
> > >
> > > Any further suggestion on this PIP?
> > > If not, I would like to raise a  revote on this in a few days.
> >
> > now the PIP LGTM
> >
> > I think that there are still some references to shadow_message_id but
> > IIUC we don't need to add that field anymore because we are going to
> > use
> >
> > optional MessageIdData message_id = 9;
> >
> > is this correct ?
> >
> > Enrico
> >
> > >
> > > Thanks,
> > > Haiting
> > >
> > > On 2022/07/07 11:30:59 Haiting Jiang wrote:
> > > > Hi Enrico,
> > > >
> > > > Thanks for your feedback.
> > > >
> > > > On 2022/07/05 08:03:43 Enrico Olivelli wrote:
> > > > > I have a couple of additional questions.
> > > > >
> > > > > 1. Security
> > > > > What about security permissions about the shadow topic ?
> > > > > We are reading from another topic.
> > > > > I think we must clarify the decisions in the PIP
> > > >
> > > > As shadow topic is usually in another namespace, it would have its
> own
> > > > independent permission settings, and we can configure different
> > permissions
> > > > for source topic and shadow topic. So there would be no guarantee
> that
> > you are
> > > > allowed to consume shadow topic if you have permission to consume
> > source
> > > > topic.
> > > >
> > > > On the other hand, we uses topic policy to store shadow topic
> > settings, so a
> > > > new policy permission item needs be added as PolicyName.SHADOW_TOPIC,
> > and user
> > > > must have PolicyOperation.WRITE to this policy to create/delete
> shadow
> > topics.
> > > >
> > > > >
> > > > > 2. Truncation and deletion
> > > > > What happens when you truncate or delete the source topic ?
> > > > > please add a paragraph on the PIP
> > > > >
> > > >
> > > > 1. Truncation, from command `bin/pulsar-admin topics truncate
> > source-topic`.
> > > > For source topic truncation, nothing changes. It still move all
> > cursors to the
> > > > end of the topic and delete all inactive ledgers.
> > > > As shadow topic will watch `ManagedLedgerInfo` in metadata store,
> once
> > it
> > > > knows ledgers deleted, all cursors will skip all deleted ledgers.
> > > >
> > > > 2. Deletion, from command `bin/pulsar-admin topics delete
> > source-topic`.
> > > > Like geo-replication, topic deletion is forbidden if topic have
> shadow
> > > > replicators, users have to delete shadow topics first. Here is the
> new
> > admin
> > > > API for managing shadow topics with source topic in
> > > > `org.apache.pulsar.client.admin.Topics` :
> > > > ```
> > > > void createShadowTopic(String sourceTopicName, String
> shadowTopicName);
> > > > void deleteShadowTopic(String sourceTopicName, String
> shadowTopicName);
> > > > List<String> admin.topics().getShadowTopics(String sourceTopicName);
> > > >
> > > > //And their async version methods.
> > > > ```
> > > > And this requires new REST interfaces in admin server, where
> > > > ```
> > > > PATH = "/{tenant}/{namespace}/{topic}/shadowTopics";
> > > > METHOD = POST/DELETE/GET;
> > > > ```
> > > >
> > > > > 3. Offloaders
> > > > > We are talking about BK metadata, how do Shadow Topics work with
> > > > > Offloaded ledgers ?
> > > > > Please clarify in the PIP
> > > >
> > > > Offloading a ledger is a kind of writing operation to topic's
> > metadata, so
> > > > shadow topic can't offload ledgers to other long term storage.
> > However, for
> > > > ledgers thats are already offloaded by source topic, it's expected to
> > support
> > > > reading from offload ledgers in shadow topic, just like read from
> > source
> > > > topic.
> > > >
> > > > The implementation depends on shadow topic watching
> > `ManagedLedgerInfo` in
> > > > metadata store, and if LedgerInfo.offloadContext is updated by source
> > topic
> > > > offloader, shadow topic can get fully information to get a readHandle
> > from
> > > > ledgerOffload. And of course, the pre-condition is the shadow topic
> > must have
> > > > the same offload driver settings.
> > > >
> > > > >
> > > > > 4. Changes in the number of partitions
> > > > > the PIP says that the number of partitions must match the source
> > topic.
> > > > > Are we preventing changes to the number of partitions in the source
> > topic ?
> > > > >
> > > >
> > > > No, the updates on partition number will be synced to the shadow
> topic.
> > > > A source topic or partition will be responsible for the creation and
> > deletion
> > > > of its corresponding shadow topic partitions.
> > > >
> > > > > 5. Topic stats
> > > > > We should add information on the source topic and on the shadow
> > topic.
> > > > > Please clarify or draft your intentions in the PIP
> > > > >
> > > >
> > > > For topic stats on source topic, as shadow replicator will reuse most
> > of current
> > > > PersistentReplicator, the ReplicatorStatsImpl also can be applied to
> > shadow replicators.
> > > > And we need to add a new field in `TopicStatsImpl` like
> > geo-replication:
> > > > ```
> > > > Map<String /*shadow topic name*/, ReplicatorStatsImpl>
> > shadowReplication;
> > > > ```
> > > >
> > > > As for topic stats on shadow topic, previous `TopicStatsImpl` still
> > applies.
> > > > And I don't see any other stats need to be added at this point.
> > > >
> > > >
> > > > > 6. GeoReplication
> > > > > I guess that GeoReplication will not be possible for shadow topics.
> > > > > Please clarify on the PIP
> > > > >
> > > >
> > > > Yes, this is decided by the nature of shadow topic that it don't have
> > the write access to BK.
> > > > And I don't see the necessary of supporting GeoReplication for shadow
> > topics.
> > > > We can make source topic geo-replicated and create the same shadow
> > topic in each clusters.
> > > >
> > > > > I believe that this feature is very powerful, but we must design it
> > > > > carefully and discuss
> > > > > about all the edge cases. Otherwise we will end up in something
> that
> > > > > is half-baked
> > > > > and we will have to resolve edge cases while developing or after
> > going
> > > > > to production.
> > > > >
> > > > > Every feature must be fully integrated with the rest of Pulsar
> > > > >
> > > > > Enrico
> > > > >
> > > > > Il giorno mer 29 giu 2022 alle ore 08:40 Haiting Jiang
> > > > > <ji...@apache.org> ha scritto:
> > > > > >
> > > > > > Hi Penghui
> > > > > >
> > > > > > On 2022/06/29 04:07:35 PengHui Li wrote:
> > > > > > > Hi Haiting,
> > > > > > >
> > > > > > > Thanks for the explanation. I'm clear for now.
> > > > > > >
> > > > > > > Pulsar functions also can do such things by connecting data
> from
> > one topic
> > > > > > > to another topic.
> > > > > > > But the difference is this proposal only copies the data to the
> > cache of
> > > > > > > another topic, and the data not
> > > > > > > in the cache is also available by reading from ledgers.
> > > > > > >
> > > > > > > And this approach also follows benefits compared with
> > replicating data to
> > > > > > > multiple "real" topics.
> > > > > > >
> > > > > > > - reuse the topic metadata
> > > > > > > - the same message ID which easy for troubleshooting
> > > > > > >
> > > > > > > Just one question
> > > > > > >
> > > > > > > >>>>>>>
> > > > > > > ```
> > > > > > > message CommandSend { // ... // message id for shadow topic
> > optional
> > > > > > >    MessageIdData shadow_message_id = 9; }
> > > > > > > ```
> > > > > > >
> > > > > > > Can we get the message ID from the replicated data to avoid
> > introducing a
> > > > > > > new command?
> > > > > > > Or use a marker message to avoid broker-to-broker directly
> > protobuf command
> > > > > > > interaction.
> > > > > > >
> > > > > > Sorry for not wrote it clearly. CommandSend is not a new command.
> > It's exactly the main
> > > > > > command producer used to send message to broker. The only change
> > is add a new field in it.
> > > > > > The whole command proto would be like this:
> > > > > > ```
> > > > > > message CommandSend {
> > > > > >     required uint64 producer_id = 1;
> > > > > >     required uint64 sequence_id = 2;
> > > > > >     optional int32 num_messages = 3 [default = 1];
> > > > > >     optional uint64 txnid_least_bits = 4 [default = 0];
> > > > > >     optional uint64 txnid_most_bits = 5 [default = 0];
> > > > > >
> > > > > >     /// Add highest sequence id to support batch message with
> > external sequence id
> > > > > >     optional uint64 highest_sequence_id = 6 [default = 0];
> > > > > >     optional bool is_chunk     =7 [default = false];
> > > > > >
> > > > > >     // Specify if the message being published is a Pulsar marker
> > or not
> > > > > >     optional bool marker = 8 [default = false];
> > > > > >
> > > > > >     // message id for shadow topic
> > > > > >     optional MessageIdData shadow_message_id = 9;
> > > > > > }
> > > > > > ```
> > > > > > So there won't be any broker-to-broker directly protobuf command
> > interactions.
> > > > > >
> > > > > > Thanks,
> > > > > > Haiting
> > > > > >
> > > > > > > Thanks,
> > > > > > > Penghui
> > > > > > >
> > > > > > > On Wed, Jun 29, 2022 at 10:31 AM Haiting Jiang <
> > jianghaiting@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Penghui & Asaf:
> > > > > > > >
> > > > > > > > Please allow me to provide some more detailes about
> > **metadata**
> > > > > > > > synchronization
> > > > > > > > between source topic and shadow topic.
> > > > > > > >
> > > > > > > > 1.When shadow topic initializes, it will read from metadata
> > store path
> > > > > > > > "/managed-ledgers/{source_topic_ledger_name}", which contains
> > all the
> > > > > > > > managed ledger info. We don't
> > > > > > > > need to read the  ledger information from source topic
> broker.
> > > > > > > >
> > > > > > > > 2. When shadow topic received new message from replicator, if
> > the ledger
> > > > > > > > id of the message
> > > > > > > >  is the same as the last ledger, it just updates the LAC. If
> > not, it will
> > > > > > > > update ledger list from metadata,
> > > > > > > > and then open the new ledger handle and update the LAC.
> > > > > > > >
> > > > > > > > As for the copy itself and add shadow message id in
> > CommandSend, it mostly
> > > > > > > > serves the purpose
> > > > > > > > of filling the EntryCache.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Haiting
> > > > > > > >
> > > > > > > > On 2022/06/23 02:08:46 PengHui Li wrote:
> > > > > > > > > > One question comes to mind here: Why not simply read the
> > ledger
> > > > > > > > information
> > > > > > > > > from original topic, without copy?
> > > > > > > > >
> > > > > > > > > I think this is a good idea.
> > > > > > > > >
> > > > > > > > > Penghui
> > > > > > > > > On Jun 22, 2022, 23:57 +0800, dev@pulsar.apache.org,
> wrote:
> > > > > > > > > >
> > > > > > > > > > One question comes to mind here: Why not simply read the
> > ledger
> > > > > > > > information
> > > > > > > > > > from original topic, without copy?
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > > > BR,
> > > > Haiting
> > > >
> >
>

Re: [DISCUSS] PIP-180: Shadow Topic, an alternative way to support readonly topic ownership.

Posted by Haiting Jiang <ji...@gmail.com>.
> I think that there are still some references to shadow_message_id but
> IIUC we don't need to add that field anymore because we are going to
> use


> optional MessageIdData message_id = 9;


> is this correct ?


Yes, no more `shadow_message_id`. Sorry I missed these references, already
updated.

Thanks,
Haiting

On Fri, Jul 29, 2022 at 5:15 PM Enrico Olivelli <eo...@gmail.com> wrote:

> Il giorno ven 29 lug 2022 alle ore 09:05 Haiting Jiang
> <ji...@apache.org> ha scritto:
> >
> > Hi Enrico,
> >
> > Any further suggestion on this PIP?
> > If not, I would like to raise a  revote on this in a few days.
>
> now the PIP LGTM
>
> I think that there are still some references to shadow_message_id but
> IIUC we don't need to add that field anymore because we are going to
> use
>
> optional MessageIdData message_id = 9;
>
> is this correct ?
>
> Enrico
>
> >
> > Thanks,
> > Haiting
> >
> > On 2022/07/07 11:30:59 Haiting Jiang wrote:
> > > Hi Enrico,
> > >
> > > Thanks for your feedback.
> > >
> > > On 2022/07/05 08:03:43 Enrico Olivelli wrote:
> > > > I have a couple of additional questions.
> > > >
> > > > 1. Security
> > > > What about security permissions about the shadow topic ?
> > > > We are reading from another topic.
> > > > I think we must clarify the decisions in the PIP
> > >
> > > As shadow topic is usually in another namespace, it would have its own
> > > independent permission settings, and we can configure different
> permissions
> > > for source topic and shadow topic. So there would be no guarantee that
> you are
> > > allowed to consume shadow topic if you have permission to consume
> source
> > > topic.
> > >
> > > On the other hand, we uses topic policy to store shadow topic
> settings, so a
> > > new policy permission item needs be added as PolicyName.SHADOW_TOPIC,
> and user
> > > must have PolicyOperation.WRITE to this policy to create/delete shadow
> topics.
> > >
> > > >
> > > > 2. Truncation and deletion
> > > > What happens when you truncate or delete the source topic ?
> > > > please add a paragraph on the PIP
> > > >
> > >
> > > 1. Truncation, from command `bin/pulsar-admin topics truncate
> source-topic`.
> > > For source topic truncation, nothing changes. It still move all
> cursors to the
> > > end of the topic and delete all inactive ledgers.
> > > As shadow topic will watch `ManagedLedgerInfo` in metadata store, once
> it
> > > knows ledgers deleted, all cursors will skip all deleted ledgers.
> > >
> > > 2. Deletion, from command `bin/pulsar-admin topics delete
> source-topic`.
> > > Like geo-replication, topic deletion is forbidden if topic have shadow
> > > replicators, users have to delete shadow topics first. Here is the new
> admin
> > > API for managing shadow topics with source topic in
> > > `org.apache.pulsar.client.admin.Topics` :
> > > ```
> > > void createShadowTopic(String sourceTopicName, String shadowTopicName);
> > > void deleteShadowTopic(String sourceTopicName, String shadowTopicName);
> > > List<String> admin.topics().getShadowTopics(String sourceTopicName);
> > >
> > > //And their async version methods.
> > > ```
> > > And this requires new REST interfaces in admin server, where
> > > ```
> > > PATH = "/{tenant}/{namespace}/{topic}/shadowTopics";
> > > METHOD = POST/DELETE/GET;
> > > ```
> > >
> > > > 3. Offloaders
> > > > We are talking about BK metadata, how do Shadow Topics work with
> > > > Offloaded ledgers ?
> > > > Please clarify in the PIP
> > >
> > > Offloading a ledger is a kind of writing operation to topic's
> metadata, so
> > > shadow topic can't offload ledgers to other long term storage.
> However, for
> > > ledgers thats are already offloaded by source topic, it's expected to
> support
> > > reading from offload ledgers in shadow topic, just like read from
> source
> > > topic.
> > >
> > > The implementation depends on shadow topic watching
> `ManagedLedgerInfo` in
> > > metadata store, and if LedgerInfo.offloadContext is updated by source
> topic
> > > offloader, shadow topic can get fully information to get a readHandle
> from
> > > ledgerOffload. And of course, the pre-condition is the shadow topic
> must have
> > > the same offload driver settings.
> > >
> > > >
> > > > 4. Changes in the number of partitions
> > > > the PIP says that the number of partitions must match the source
> topic.
> > > > Are we preventing changes to the number of partitions in the source
> topic ?
> > > >
> > >
> > > No, the updates on partition number will be synced to the shadow topic.
> > > A source topic or partition will be responsible for the creation and
> deletion
> > > of its corresponding shadow topic partitions.
> > >
> > > > 5. Topic stats
> > > > We should add information on the source topic and on the shadow
> topic.
> > > > Please clarify or draft your intentions in the PIP
> > > >
> > >
> > > For topic stats on source topic, as shadow replicator will reuse most
> of current
> > > PersistentReplicator, the ReplicatorStatsImpl also can be applied to
> shadow replicators.
> > > And we need to add a new field in `TopicStatsImpl` like
> geo-replication:
> > > ```
> > > Map<String /*shadow topic name*/, ReplicatorStatsImpl>
> shadowReplication;
> > > ```
> > >
> > > As for topic stats on shadow topic, previous `TopicStatsImpl` still
> applies.
> > > And I don't see any other stats need to be added at this point.
> > >
> > >
> > > > 6. GeoReplication
> > > > I guess that GeoReplication will not be possible for shadow topics.
> > > > Please clarify on the PIP
> > > >
> > >
> > > Yes, this is decided by the nature of shadow topic that it don't have
> the write access to BK.
> > > And I don't see the necessary of supporting GeoReplication for shadow
> topics.
> > > We can make source topic geo-replicated and create the same shadow
> topic in each clusters.
> > >
> > > > I believe that this feature is very powerful, but we must design it
> > > > carefully and discuss
> > > > about all the edge cases. Otherwise we will end up in something that
> > > > is half-baked
> > > > and we will have to resolve edge cases while developing or after
> going
> > > > to production.
> > > >
> > > > Every feature must be fully integrated with the rest of Pulsar
> > > >
> > > > Enrico
> > > >
> > > > Il giorno mer 29 giu 2022 alle ore 08:40 Haiting Jiang
> > > > <ji...@apache.org> ha scritto:
> > > > >
> > > > > Hi Penghui
> > > > >
> > > > > On 2022/06/29 04:07:35 PengHui Li wrote:
> > > > > > Hi Haiting,
> > > > > >
> > > > > > Thanks for the explanation. I'm clear for now.
> > > > > >
> > > > > > Pulsar functions also can do such things by connecting data from
> one topic
> > > > > > to another topic.
> > > > > > But the difference is this proposal only copies the data to the
> cache of
> > > > > > another topic, and the data not
> > > > > > in the cache is also available by reading from ledgers.
> > > > > >
> > > > > > And this approach also follows benefits compared with
> replicating data to
> > > > > > multiple "real" topics.
> > > > > >
> > > > > > - reuse the topic metadata
> > > > > > - the same message ID which easy for troubleshooting
> > > > > >
> > > > > > Just one question
> > > > > >
> > > > > > >>>>>>>
> > > > > > ```
> > > > > > message CommandSend { // ... // message id for shadow topic
> optional
> > > > > >    MessageIdData shadow_message_id = 9; }
> > > > > > ```
> > > > > >
> > > > > > Can we get the message ID from the replicated data to avoid
> introducing a
> > > > > > new command?
> > > > > > Or use a marker message to avoid broker-to-broker directly
> protobuf command
> > > > > > interaction.
> > > > > >
> > > > > Sorry for not wrote it clearly. CommandSend is not a new command.
> It's exactly the main
> > > > > command producer used to send message to broker. The only change
> is add a new field in it.
> > > > > The whole command proto would be like this:
> > > > > ```
> > > > > message CommandSend {
> > > > >     required uint64 producer_id = 1;
> > > > >     required uint64 sequence_id = 2;
> > > > >     optional int32 num_messages = 3 [default = 1];
> > > > >     optional uint64 txnid_least_bits = 4 [default = 0];
> > > > >     optional uint64 txnid_most_bits = 5 [default = 0];
> > > > >
> > > > >     /// Add highest sequence id to support batch message with
> external sequence id
> > > > >     optional uint64 highest_sequence_id = 6 [default = 0];
> > > > >     optional bool is_chunk     =7 [default = false];
> > > > >
> > > > >     // Specify if the message being published is a Pulsar marker
> or not
> > > > >     optional bool marker = 8 [default = false];
> > > > >
> > > > >     // message id for shadow topic
> > > > >     optional MessageIdData shadow_message_id = 9;
> > > > > }
> > > > > ```
> > > > > So there won't be any broker-to-broker directly protobuf command
> interactions.
> > > > >
> > > > > Thanks,
> > > > > Haiting
> > > > >
> > > > > > Thanks,
> > > > > > Penghui
> > > > > >
> > > > > > On Wed, Jun 29, 2022 at 10:31 AM Haiting Jiang <
> jianghaiting@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Penghui & Asaf:
> > > > > > >
> > > > > > > Please allow me to provide some more detailes about
> **metadata**
> > > > > > > synchronization
> > > > > > > between source topic and shadow topic.
> > > > > > >
> > > > > > > 1.When shadow topic initializes, it will read from metadata
> store path
> > > > > > > "/managed-ledgers/{source_topic_ledger_name}", which contains
> all the
> > > > > > > managed ledger info. We don't
> > > > > > > need to read the  ledger information from source topic broker.
> > > > > > >
> > > > > > > 2. When shadow topic received new message from replicator, if
> the ledger
> > > > > > > id of the message
> > > > > > >  is the same as the last ledger, it just updates the LAC. If
> not, it will
> > > > > > > update ledger list from metadata,
> > > > > > > and then open the new ledger handle and update the LAC.
> > > > > > >
> > > > > > > As for the copy itself and add shadow message id in
> CommandSend, it mostly
> > > > > > > serves the purpose
> > > > > > > of filling the EntryCache.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Haiting
> > > > > > >
> > > > > > > On 2022/06/23 02:08:46 PengHui Li wrote:
> > > > > > > > > One question comes to mind here: Why not simply read the
> ledger
> > > > > > > information
> > > > > > > > from original topic, without copy?
> > > > > > > >
> > > > > > > > I think this is a good idea.
> > > > > > > >
> > > > > > > > Penghui
> > > > > > > > On Jun 22, 2022, 23:57 +0800, dev@pulsar.apache.org, wrote:
> > > > > > > > >
> > > > > > > > > One question comes to mind here: Why not simply read the
> ledger
> > > > > > > information
> > > > > > > > > from original topic, without copy?
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> > > BR,
> > > Haiting
> > >
>

Re: [DISCUSS] PIP-180: Shadow Topic, an alternative way to support readonly topic ownership.

Posted by Enrico Olivelli <eo...@gmail.com>.
Il giorno ven 29 lug 2022 alle ore 09:05 Haiting Jiang
<ji...@apache.org> ha scritto:
>
> Hi Enrico,
>
> Any further suggestion on this PIP?
> If not, I would like to raise a  revote on this in a few days.

now the PIP LGTM

I think that there are still some references to shadow_message_id but
IIUC we don't need to add that field anymore because we are going to
use

optional MessageIdData message_id = 9;

is this correct ?

Enrico

>
> Thanks,
> Haiting
>
> On 2022/07/07 11:30:59 Haiting Jiang wrote:
> > Hi Enrico,
> >
> > Thanks for your feedback.
> >
> > On 2022/07/05 08:03:43 Enrico Olivelli wrote:
> > > I have a couple of additional questions.
> > >
> > > 1. Security
> > > What about security permissions about the shadow topic ?
> > > We are reading from another topic.
> > > I think we must clarify the decisions in the PIP
> >
> > As shadow topic is usually in another namespace, it would have its own
> > independent permission settings, and we can configure different permissions
> > for source topic and shadow topic. So there would be no guarantee that you are
> > allowed to consume shadow topic if you have permission to consume source
> > topic.
> >
> > On the other hand, we uses topic policy to store shadow topic settings, so a
> > new policy permission item needs be added as PolicyName.SHADOW_TOPIC, and user
> > must have PolicyOperation.WRITE to this policy to create/delete shadow topics.
> >
> > >
> > > 2. Truncation and deletion
> > > What happens when you truncate or delete the source topic ?
> > > please add a paragraph on the PIP
> > >
> >
> > 1. Truncation, from command `bin/pulsar-admin topics truncate source-topic`.
> > For source topic truncation, nothing changes. It still move all cursors to the
> > end of the topic and delete all inactive ledgers.
> > As shadow topic will watch `ManagedLedgerInfo` in metadata store, once it
> > knows ledgers deleted, all cursors will skip all deleted ledgers.
> >
> > 2. Deletion, from command `bin/pulsar-admin topics delete source-topic`.
> > Like geo-replication, topic deletion is forbidden if topic have shadow
> > replicators, users have to delete shadow topics first. Here is the new admin
> > API for managing shadow topics with source topic in
> > `org.apache.pulsar.client.admin.Topics` :
> > ```
> > void createShadowTopic(String sourceTopicName, String shadowTopicName);
> > void deleteShadowTopic(String sourceTopicName, String shadowTopicName);
> > List<String> admin.topics().getShadowTopics(String sourceTopicName);
> >
> > //And their async version methods.
> > ```
> > And this requires new REST interfaces in admin server, where
> > ```
> > PATH = "/{tenant}/{namespace}/{topic}/shadowTopics";
> > METHOD = POST/DELETE/GET;
> > ```
> >
> > > 3. Offloaders
> > > We are talking about BK metadata, how do Shadow Topics work with
> > > Offloaded ledgers ?
> > > Please clarify in the PIP
> >
> > Offloading a ledger is a kind of writing operation to topic's metadata, so
> > shadow topic can't offload ledgers to other long term storage. However, for
> > ledgers thats are already offloaded by source topic, it's expected to support
> > reading from offload ledgers in shadow topic, just like read from source
> > topic.
> >
> > The implementation depends on shadow topic watching `ManagedLedgerInfo` in
> > metadata store, and if LedgerInfo.offloadContext is updated by source topic
> > offloader, shadow topic can get fully information to get a readHandle from
> > ledgerOffload. And of course, the pre-condition is the shadow topic must have
> > the same offload driver settings.
> >
> > >
> > > 4. Changes in the number of partitions
> > > the PIP says that the number of partitions must match the source topic.
> > > Are we preventing changes to the number of partitions in the source topic ?
> > >
> >
> > No, the updates on partition number will be synced to the shadow topic.
> > A source topic or partition will be responsible for the creation and deletion
> > of its corresponding shadow topic partitions.
> >
> > > 5. Topic stats
> > > We should add information on the source topic and on the shadow topic.
> > > Please clarify or draft your intentions in the PIP
> > >
> >
> > For topic stats on source topic, as shadow replicator will reuse most of current
> > PersistentReplicator, the ReplicatorStatsImpl also can be applied to shadow replicators.
> > And we need to add a new field in `TopicStatsImpl` like geo-replication:
> > ```
> > Map<String /*shadow topic name*/, ReplicatorStatsImpl> shadowReplication;
> > ```
> >
> > As for topic stats on shadow topic, previous `TopicStatsImpl` still applies.
> > And I don't see any other stats need to be added at this point.
> >
> >
> > > 6. GeoReplication
> > > I guess that GeoReplication will not be possible for shadow topics.
> > > Please clarify on the PIP
> > >
> >
> > Yes, this is decided by the nature of shadow topic that it don't have the write access to BK.
> > And I don't see the necessary of supporting GeoReplication for shadow topics.
> > We can make source topic geo-replicated and create the same shadow topic in each clusters.
> >
> > > I believe that this feature is very powerful, but we must design it
> > > carefully and discuss
> > > about all the edge cases. Otherwise we will end up in something that
> > > is half-baked
> > > and we will have to resolve edge cases while developing or after going
> > > to production.
> > >
> > > Every feature must be fully integrated with the rest of Pulsar
> > >
> > > Enrico
> > >
> > > Il giorno mer 29 giu 2022 alle ore 08:40 Haiting Jiang
> > > <ji...@apache.org> ha scritto:
> > > >
> > > > Hi Penghui
> > > >
> > > > On 2022/06/29 04:07:35 PengHui Li wrote:
> > > > > Hi Haiting,
> > > > >
> > > > > Thanks for the explanation. I'm clear for now.
> > > > >
> > > > > Pulsar functions also can do such things by connecting data from one topic
> > > > > to another topic.
> > > > > But the difference is this proposal only copies the data to the cache of
> > > > > another topic, and the data not
> > > > > in the cache is also available by reading from ledgers.
> > > > >
> > > > > And this approach also follows benefits compared with replicating data to
> > > > > multiple "real" topics.
> > > > >
> > > > > - reuse the topic metadata
> > > > > - the same message ID which easy for troubleshooting
> > > > >
> > > > > Just one question
> > > > >
> > > > > >>>>>>>
> > > > > ```
> > > > > message CommandSend { // ... // message id for shadow topic optional
> > > > >    MessageIdData shadow_message_id = 9; }
> > > > > ```
> > > > >
> > > > > Can we get the message ID from the replicated data to avoid introducing a
> > > > > new command?
> > > > > Or use a marker message to avoid broker-to-broker directly protobuf command
> > > > > interaction.
> > > > >
> > > > Sorry for not wrote it clearly. CommandSend is not a new command. It's exactly the main
> > > > command producer used to send message to broker. The only change is add a new field in it.
> > > > The whole command proto would be like this:
> > > > ```
> > > > message CommandSend {
> > > >     required uint64 producer_id = 1;
> > > >     required uint64 sequence_id = 2;
> > > >     optional int32 num_messages = 3 [default = 1];
> > > >     optional uint64 txnid_least_bits = 4 [default = 0];
> > > >     optional uint64 txnid_most_bits = 5 [default = 0];
> > > >
> > > >     /// Add highest sequence id to support batch message with external sequence id
> > > >     optional uint64 highest_sequence_id = 6 [default = 0];
> > > >     optional bool is_chunk     =7 [default = false];
> > > >
> > > >     // Specify if the message being published is a Pulsar marker or not
> > > >     optional bool marker = 8 [default = false];
> > > >
> > > >     // message id for shadow topic
> > > >     optional MessageIdData shadow_message_id = 9;
> > > > }
> > > > ```
> > > > So there won't be any broker-to-broker directly protobuf command interactions.
> > > >
> > > > Thanks,
> > > > Haiting
> > > >
> > > > > Thanks,
> > > > > Penghui
> > > > >
> > > > > On Wed, Jun 29, 2022 at 10:31 AM Haiting Jiang <ji...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > Hi Penghui & Asaf:
> > > > > >
> > > > > > Please allow me to provide some more detailes about **metadata**
> > > > > > synchronization
> > > > > > between source topic and shadow topic.
> > > > > >
> > > > > > 1.When shadow topic initializes, it will read from metadata store path
> > > > > > "/managed-ledgers/{source_topic_ledger_name}", which contains all the
> > > > > > managed ledger info. We don't
> > > > > > need to read the  ledger information from source topic broker.
> > > > > >
> > > > > > 2. When shadow topic received new message from replicator, if the ledger
> > > > > > id of the message
> > > > > >  is the same as the last ledger, it just updates the LAC. If not, it will
> > > > > > update ledger list from metadata,
> > > > > > and then open the new ledger handle and update the LAC.
> > > > > >
> > > > > > As for the copy itself and add shadow message id in CommandSend, it mostly
> > > > > > serves the purpose
> > > > > > of filling the EntryCache.
> > > > > >
> > > > > > Thanks,
> > > > > > Haiting
> > > > > >
> > > > > > On 2022/06/23 02:08:46 PengHui Li wrote:
> > > > > > > > One question comes to mind here: Why not simply read the ledger
> > > > > > information
> > > > > > > from original topic, without copy?
> > > > > > >
> > > > > > > I think this is a good idea.
> > > > > > >
> > > > > > > Penghui
> > > > > > > On Jun 22, 2022, 23:57 +0800, dev@pulsar.apache.org, wrote:
> > > > > > > >
> > > > > > > > One question comes to mind here: Why not simply read the ledger
> > > > > > information
> > > > > > > > from original topic, without copy?
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> > BR,
> > Haiting
> >

Re: [DISCUSS] PIP-180: Shadow Topic, an alternative way to support readonly topic ownership.

Posted by Haiting Jiang <ji...@apache.org>.
Hi Enrico,

Any further suggestion on this PIP?
If not, I would like to raise a  revote on this in a few days.

Thanks,
Haiting

On 2022/07/07 11:30:59 Haiting Jiang wrote:
> Hi Enrico,
> 
> Thanks for your feedback.
> 
> On 2022/07/05 08:03:43 Enrico Olivelli wrote:
> > I have a couple of additional questions.
> > 
> > 1. Security
> > What about security permissions about the shadow topic ?
> > We are reading from another topic.
> > I think we must clarify the decisions in the PIP
> 
> As shadow topic is usually in another namespace, it would have its own
> independent permission settings, and we can configure different permissions
> for source topic and shadow topic. So there would be no guarantee that you are
> allowed to consume shadow topic if you have permission to consume source
> topic.
> 
> On the other hand, we uses topic policy to store shadow topic settings, so a
> new policy permission item needs be added as PolicyName.SHADOW_TOPIC, and user
> must have PolicyOperation.WRITE to this policy to create/delete shadow topics.
> 
> > 
> > 2. Truncation and deletion
> > What happens when you truncate or delete the source topic ?
> > please add a paragraph on the PIP
> > 
> 
> 1. Truncation, from command `bin/pulsar-admin topics truncate source-topic`.
> For source topic truncation, nothing changes. It still move all cursors to the
> end of the topic and delete all inactive ledgers. 
> As shadow topic will watch `ManagedLedgerInfo` in metadata store, once it
> knows ledgers deleted, all cursors will skip all deleted ledgers.
> 
> 2. Deletion, from command `bin/pulsar-admin topics delete source-topic`.
> Like geo-replication, topic deletion is forbidden if topic have shadow
> replicators, users have to delete shadow topics first. Here is the new admin
> API for managing shadow topics with source topic in
> `org.apache.pulsar.client.admin.Topics` :
> ```
> void createShadowTopic(String sourceTopicName, String shadowTopicName);
> void deleteShadowTopic(String sourceTopicName, String shadowTopicName);
> List<String> admin.topics().getShadowTopics(String sourceTopicName);
> 
> //And their async version methods.
> ```
> And this requires new REST interfaces in admin server, where
> ```
> PATH = "/{tenant}/{namespace}/{topic}/shadowTopics";
> METHOD = POST/DELETE/GET;
> ```
> 
> > 3. Offloaders
> > We are talking about BK metadata, how do Shadow Topics work with
> > Offloaded ledgers ?
> > Please clarify in the PIP
> 
> Offloading a ledger is a kind of writing operation to topic's metadata, so
> shadow topic can't offload ledgers to other long term storage. However, for
> ledgers thats are already offloaded by source topic, it's expected to support
> reading from offload ledgers in shadow topic, just like read from source
> topic.
> 
> The implementation depends on shadow topic watching `ManagedLedgerInfo` in
> metadata store, and if LedgerInfo.offloadContext is updated by source topic
> offloader, shadow topic can get fully information to get a readHandle from
> ledgerOffload. And of course, the pre-condition is the shadow topic must have
> the same offload driver settings.
> 
> > 
> > 4. Changes in the number of partitions
> > the PIP says that the number of partitions must match the source topic.
> > Are we preventing changes to the number of partitions in the source topic ?
> > 
> 
> No, the updates on partition number will be synced to the shadow topic. 
> A source topic or partition will be responsible for the creation and deletion 
> of its corresponding shadow topic partitions.
> 
> > 5. Topic stats
> > We should add information on the source topic and on the shadow topic.
> > Please clarify or draft your intentions in the PIP
> > 
> 
> For topic stats on source topic, as shadow replicator will reuse most of current 
> PersistentReplicator, the ReplicatorStatsImpl also can be applied to shadow replicators.
> And we need to add a new field in `TopicStatsImpl` like geo-replication:
> ```
> Map<String /*shadow topic name*/, ReplicatorStatsImpl> shadowReplication;
> ```
> 
> As for topic stats on shadow topic, previous `TopicStatsImpl` still applies. 
> And I don't see any other stats need to be added at this point.
> 
> 
> > 6. GeoReplication
> > I guess that GeoReplication will not be possible for shadow topics.
> > Please clarify on the PIP
> > 
> 
> Yes, this is decided by the nature of shadow topic that it don't have the write access to BK.
> And I don't see the necessary of supporting GeoReplication for shadow topics.
> We can make source topic geo-replicated and create the same shadow topic in each clusters.
> 
> > I believe that this feature is very powerful, but we must design it
> > carefully and discuss
> > about all the edge cases. Otherwise we will end up in something that
> > is half-baked
> > and we will have to resolve edge cases while developing or after going
> > to production.
> > 
> > Every feature must be fully integrated with the rest of Pulsar
> > 
> > Enrico
> > 
> > Il giorno mer 29 giu 2022 alle ore 08:40 Haiting Jiang
> > <ji...@apache.org> ha scritto:
> > >
> > > Hi Penghui
> > >
> > > On 2022/06/29 04:07:35 PengHui Li wrote:
> > > > Hi Haiting,
> > > >
> > > > Thanks for the explanation. I'm clear for now.
> > > >
> > > > Pulsar functions also can do such things by connecting data from one topic
> > > > to another topic.
> > > > But the difference is this proposal only copies the data to the cache of
> > > > another topic, and the data not
> > > > in the cache is also available by reading from ledgers.
> > > >
> > > > And this approach also follows benefits compared with replicating data to
> > > > multiple "real" topics.
> > > >
> > > > - reuse the topic metadata
> > > > - the same message ID which easy for troubleshooting
> > > >
> > > > Just one question
> > > >
> > > > >>>>>>>
> > > > ```
> > > > message CommandSend { // ... // message id for shadow topic optional
> > > >    MessageIdData shadow_message_id = 9; }
> > > > ```
> > > >
> > > > Can we get the message ID from the replicated data to avoid introducing a
> > > > new command?
> > > > Or use a marker message to avoid broker-to-broker directly protobuf command
> > > > interaction.
> > > >
> > > Sorry for not wrote it clearly. CommandSend is not a new command. It's exactly the main
> > > command producer used to send message to broker. The only change is add a new field in it.
> > > The whole command proto would be like this:
> > > ```
> > > message CommandSend {
> > >     required uint64 producer_id = 1;
> > >     required uint64 sequence_id = 2;
> > >     optional int32 num_messages = 3 [default = 1];
> > >     optional uint64 txnid_least_bits = 4 [default = 0];
> > >     optional uint64 txnid_most_bits = 5 [default = 0];
> > >
> > >     /// Add highest sequence id to support batch message with external sequence id
> > >     optional uint64 highest_sequence_id = 6 [default = 0];
> > >     optional bool is_chunk     =7 [default = false];
> > >
> > >     // Specify if the message being published is a Pulsar marker or not
> > >     optional bool marker = 8 [default = false];
> > >
> > >     // message id for shadow topic
> > >     optional MessageIdData shadow_message_id = 9;
> > > }
> > > ```
> > > So there won't be any broker-to-broker directly protobuf command interactions.
> > >
> > > Thanks,
> > > Haiting
> > >
> > > > Thanks,
> > > > Penghui
> > > >
> > > > On Wed, Jun 29, 2022 at 10:31 AM Haiting Jiang <ji...@apache.org>
> > > > wrote:
> > > >
> > > > > Hi Penghui & Asaf:
> > > > >
> > > > > Please allow me to provide some more detailes about **metadata**
> > > > > synchronization
> > > > > between source topic and shadow topic.
> > > > >
> > > > > 1.When shadow topic initializes, it will read from metadata store path
> > > > > "/managed-ledgers/{source_topic_ledger_name}", which contains all the
> > > > > managed ledger info. We don't
> > > > > need to read the  ledger information from source topic broker.
> > > > >
> > > > > 2. When shadow topic received new message from replicator, if the ledger
> > > > > id of the message
> > > > >  is the same as the last ledger, it just updates the LAC. If not, it will
> > > > > update ledger list from metadata,
> > > > > and then open the new ledger handle and update the LAC.
> > > > >
> > > > > As for the copy itself and add shadow message id in CommandSend, it mostly
> > > > > serves the purpose
> > > > > of filling the EntryCache.
> > > > >
> > > > > Thanks,
> > > > > Haiting
> > > > >
> > > > > On 2022/06/23 02:08:46 PengHui Li wrote:
> > > > > > > One question comes to mind here: Why not simply read the ledger
> > > > > information
> > > > > > from original topic, without copy?
> > > > > >
> > > > > > I think this is a good idea.
> > > > > >
> > > > > > Penghui
> > > > > > On Jun 22, 2022, 23:57 +0800, dev@pulsar.apache.org, wrote:
> > > > > > >
> > > > > > > One question comes to mind here: Why not simply read the ledger
> > > > > information
> > > > > > > from original topic, without copy?
> > > > > >
> > > > >
> > > >
> > 
> 
> BR,
> Haiting
> 

Re: [DISCUSS] PIP-180: Shadow Topic, an alternative way to support readonly topic ownership.

Posted by Haiting Jiang <ji...@apache.org>.
Hi Enrico,

Thanks for your feedback.

On 2022/07/05 08:03:43 Enrico Olivelli wrote:
> I have a couple of additional questions.
> 
> 1. Security
> What about security permissions about the shadow topic ?
> We are reading from another topic.
> I think we must clarify the decisions in the PIP

As shadow topic is usually in another namespace, it would have its own
independent permission settings, and we can configure different permissions
for source topic and shadow topic. So there would be no guarantee that you are
allowed to consume shadow topic if you have permission to consume source
topic.

On the other hand, we uses topic policy to store shadow topic settings, so a
new policy permission item needs be added as PolicyName.SHADOW_TOPIC, and user
must have PolicyOperation.WRITE to this policy to create/delete shadow topics.

> 
> 2. Truncation and deletion
> What happens when you truncate or delete the source topic ?
> please add a paragraph on the PIP
> 

1. Truncation, from command `bin/pulsar-admin topics truncate source-topic`.
For source topic truncation, nothing changes. It still move all cursors to the
end of the topic and delete all inactive ledgers. 
As shadow topic will watch `ManagedLedgerInfo` in metadata store, once it
knows ledgers deleted, all cursors will skip all deleted ledgers.

2. Deletion, from command `bin/pulsar-admin topics delete source-topic`.
Like geo-replication, topic deletion is forbidden if topic have shadow
replicators, users have to delete shadow topics first. Here is the new admin
API for managing shadow topics with source topic in
`org.apache.pulsar.client.admin.Topics` :
```
void createShadowTopic(String sourceTopicName, String shadowTopicName);
void deleteShadowTopic(String sourceTopicName, String shadowTopicName);
List<String> admin.topics().getShadowTopics(String sourceTopicName);

//And their async version methods.
```
And this requires new REST interfaces in admin server, where
```
PATH = "/{tenant}/{namespace}/{topic}/shadowTopics";
METHOD = POST/DELETE/GET;
```

> 3. Offloaders
> We are talking about BK metadata, how do Shadow Topics work with
> Offloaded ledgers ?
> Please clarify in the PIP

Offloading a ledger is a kind of writing operation to topic's metadata, so
shadow topic can't offload ledgers to other long term storage. However, for
ledgers thats are already offloaded by source topic, it's expected to support
reading from offload ledgers in shadow topic, just like read from source
topic.

The implementation depends on shadow topic watching `ManagedLedgerInfo` in
metadata store, and if LedgerInfo.offloadContext is updated by source topic
offloader, shadow topic can get fully information to get a readHandle from
ledgerOffload. And of course, the pre-condition is the shadow topic must have
the same offload driver settings.

> 
> 4. Changes in the number of partitions
> the PIP says that the number of partitions must match the source topic.
> Are we preventing changes to the number of partitions in the source topic ?
> 

No, the updates on partition number will be synced to the shadow topic. 
A source topic or partition will be responsible for the creation and deletion 
of its corresponding shadow topic partitions.

> 5. Topic stats
> We should add information on the source topic and on the shadow topic.
> Please clarify or draft your intentions in the PIP
> 

For topic stats on source topic, as shadow replicator will reuse most of current 
PersistentReplicator, the ReplicatorStatsImpl also can be applied to shadow replicators.
And we need to add a new field in `TopicStatsImpl` like geo-replication:
```
Map<String /*shadow topic name*/, ReplicatorStatsImpl> shadowReplication;
```

As for topic stats on shadow topic, previous `TopicStatsImpl` still applies. 
And I don't see any other stats need to be added at this point.


> 6. GeoReplication
> I guess that GeoReplication will not be possible for shadow topics.
> Please clarify on the PIP
> 

Yes, this is decided by the nature of shadow topic that it don't have the write access to BK.
And I don't see the necessary of supporting GeoReplication for shadow topics.
We can make source topic geo-replicated and create the same shadow topic in each clusters.

> I believe that this feature is very powerful, but we must design it
> carefully and discuss
> about all the edge cases. Otherwise we will end up in something that
> is half-baked
> and we will have to resolve edge cases while developing or after going
> to production.
> 
> Every feature must be fully integrated with the rest of Pulsar
> 
> Enrico
> 
> Il giorno mer 29 giu 2022 alle ore 08:40 Haiting Jiang
> <ji...@apache.org> ha scritto:
> >
> > Hi Penghui
> >
> > On 2022/06/29 04:07:35 PengHui Li wrote:
> > > Hi Haiting,
> > >
> > > Thanks for the explanation. I'm clear for now.
> > >
> > > Pulsar functions also can do such things by connecting data from one topic
> > > to another topic.
> > > But the difference is this proposal only copies the data to the cache of
> > > another topic, and the data not
> > > in the cache is also available by reading from ledgers.
> > >
> > > And this approach also follows benefits compared with replicating data to
> > > multiple "real" topics.
> > >
> > > - reuse the topic metadata
> > > - the same message ID which easy for troubleshooting
> > >
> > > Just one question
> > >
> > > >>>>>>>
> > > ```
> > > message CommandSend { // ... // message id for shadow topic optional
> > >    MessageIdData shadow_message_id = 9; }
> > > ```
> > >
> > > Can we get the message ID from the replicated data to avoid introducing a
> > > new command?
> > > Or use a marker message to avoid broker-to-broker directly protobuf command
> > > interaction.
> > >
> > Sorry for not wrote it clearly. CommandSend is not a new command. It's exactly the main
> > command producer used to send message to broker. The only change is add a new field in it.
> > The whole command proto would be like this:
> > ```
> > message CommandSend {
> >     required uint64 producer_id = 1;
> >     required uint64 sequence_id = 2;
> >     optional int32 num_messages = 3 [default = 1];
> >     optional uint64 txnid_least_bits = 4 [default = 0];
> >     optional uint64 txnid_most_bits = 5 [default = 0];
> >
> >     /// Add highest sequence id to support batch message with external sequence id
> >     optional uint64 highest_sequence_id = 6 [default = 0];
> >     optional bool is_chunk     =7 [default = false];
> >
> >     // Specify if the message being published is a Pulsar marker or not
> >     optional bool marker = 8 [default = false];
> >
> >     // message id for shadow topic
> >     optional MessageIdData shadow_message_id = 9;
> > }
> > ```
> > So there won't be any broker-to-broker directly protobuf command interactions.
> >
> > Thanks,
> > Haiting
> >
> > > Thanks,
> > > Penghui
> > >
> > > On Wed, Jun 29, 2022 at 10:31 AM Haiting Jiang <ji...@apache.org>
> > > wrote:
> > >
> > > > Hi Penghui & Asaf:
> > > >
> > > > Please allow me to provide some more detailes about **metadata**
> > > > synchronization
> > > > between source topic and shadow topic.
> > > >
> > > > 1.When shadow topic initializes, it will read from metadata store path
> > > > "/managed-ledgers/{source_topic_ledger_name}", which contains all the
> > > > managed ledger info. We don't
> > > > need to read the  ledger information from source topic broker.
> > > >
> > > > 2. When shadow topic received new message from replicator, if the ledger
> > > > id of the message
> > > >  is the same as the last ledger, it just updates the LAC. If not, it will
> > > > update ledger list from metadata,
> > > > and then open the new ledger handle and update the LAC.
> > > >
> > > > As for the copy itself and add shadow message id in CommandSend, it mostly
> > > > serves the purpose
> > > > of filling the EntryCache.
> > > >
> > > > Thanks,
> > > > Haiting
> > > >
> > > > On 2022/06/23 02:08:46 PengHui Li wrote:
> > > > > > One question comes to mind here: Why not simply read the ledger
> > > > information
> > > > > from original topic, without copy?
> > > > >
> > > > > I think this is a good idea.
> > > > >
> > > > > Penghui
> > > > > On Jun 22, 2022, 23:57 +0800, dev@pulsar.apache.org, wrote:
> > > > > >
> > > > > > One question comes to mind here: Why not simply read the ledger
> > > > information
> > > > > > from original topic, without copy?
> > > > >
> > > >
> > >
> 

BR,
Haiting