You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Michael Marshall <mm...@apache.org> on 2022/02/01 06:26:08 UTC

Re: [DISCUSS] PIP-139 : Support Broker send command to real close producer/consumer.

This proposal identifies an important issue that we should definitely
solve. I have some questions.

> When there are no user-created topics under a namespace,
> Namespace should be deleted.

This is supposed to mean that the namespace should be able to be
deleted, correct?

> For this reason, we need to close the system topic reader/producer
> first, then remove the system topic. finally, remove the namespace.

I agree that expanding the protobuf CloseProducer and CloseConsumer
commands could be valuable here. The expansion would ensure that
producers and consumers don't attempt to reconnect. However, I think
we might still have a race condition that could make tenant or
namespace deletion fail. Specifically, if a new producer or consumer
creates a topic after the namespace deletion has started but
before it is complete. Do you agree that the underlying race still exists?

In my view, the fundamental problem here is that deleting certain Pulsar
resources takes time and, in a distributed system, that means race
conditions.

What if we expand our usage of the "terminated" feature to apply to
namespaces (and tenants)? Then, a terminated namespace can have
bundles and topics can be deleted but not created (just as a terminated
topic cannot have any new messages published to it). This would take
care of all topic creation race conditions. We'd probably need to add
new protobuf exceptions for this feature.

Thanks,
Michael


On Sat, Jan 29, 2022 at 7:25 PM Zike Yang
<zk...@streamnative.io.invalid> wrote:
>
> +1
>
>
> Thanks,
> Zike
>
> On Sat, Jan 29, 2022 at 12:30 PM guo jiwei <te...@apache.org> wrote:
> >
> > Hi
> >  The PIP link : https://github.com/apache/pulsar/issues/13989
> >
> > Regards
> > Jiwei Guo (Tboy)
> >
> >
> > On Sat, Jan 29, 2022 at 11:46 AM mattison chao <ma...@gmail.com>
> > wrote:
> >
> > > Hello everyone,
> > >
> > > I want to start a discussion about PIP-139 : Support Broker send command
> > > to real close producer/consumer.
> > >
> > > This is the PIP document
> > >
> > > https://github.com/apache/pulsar/issues/13989 <
> > > https://github.com/apache/pulsar/issues/13979>
> > >
> > > Please check it out and feel free to share your thoughts.
> > >
> > > Best,
> > > Mattison
> > >
> > >
> > > ———————— Pasted below for quoting convenience.
> > >
> > >
> > >
> > > Relation pull request:  #13337
> > > Authors: @Technoboy-  @mattisonchao
> > >
> > > ## Motivation
> > >
> > > Before we discuss this pip, I'd like to supplement some context to help
> > > contributors who don't want to read the original pull request.
> > >
> > > > When there are no user-created topics under a namespace, Namespace
> > > should be deleted. But currently, the system topic existed and the
> > > reader/producer could auto-create the system which may cause the namespace
> > > deletion to fail.
> > >
> > > For this reason, we need to close the system topic reader/producer first,
> > > then remove the system topic. finally, remove the namespace.
> > >
> > > Following this way, we first want to use ``terminate`` to solve this
> > > problem. then we found producers can disconnect, but consumers are still
> > > alive. So, another PR #13960 wants to add consumers' closing logic.
> > >
> > > After #13960, all things look good, but another problem appears. that is
> > > we need to wait until consumers completely consume all messages (this may
> > > make terminate topic so long and the operation timeout)then get
> > > ``reachedEndOfTopic``. the relative code here :
> > >
> > >
> > > https://github.com/apache/pulsar/blob/07ef9231db8b844586b9217ee2d59237eb9c54b7/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/CompactorSubscription.java#L102-L106
> > >
> > > In the #13337 case, we need to force close consumers immediately. So we
> > > write this PIP to discuss another way to resolve this problem.
> > >
> > > ## Goal
> > >
> > > We can add a new field(`allow_reconnect`) in command
> > > ``CommandCloseProducer``/ ``CommandCloseConsumer`` to close
> > > producer/consumers immediately.
> > >
> > > ## API Changes
> > >
> > > - Add ``allow_reconnect`` to ``CommandCloseProducer``;
> > >
> > > ```java
> > > **Before**
> > >
> > > message CommandCloseProducer {
> > >     required uint64 producer_id = 1;
> > >     required uint64 request_id = 2;
> > > }
> > >
> > > **After**
> > > message CommandCloseProducer {
> > >     required uint64 producer_id = 1;
> > >     required uint64 request_id = 2;
> > >     optional bool allow_reconnect = 3 [default = true];
> > > }
> > > ```
> > >
> > > - Add ``allow_reconnect`` to ``CommandCloseConsumer``
> > > ```java
> > > **Before**
> > >
> > > message CommandCloseConsumer {
> > >     required uint64 consumer_id = 1;
> > >     required uint64 request_id = 2;
> > > }
> > >
> > > **After**
> > > message CommandCloseConsumer {
> > >     required uint64 consumer_id = 1;
> > >     required uint64 request_id = 2;
> > >     optional bool allow_reconnect = 3 [default = true];
> > > }
> > > ```
> > >
> > > ## Implementation
> > >
> > > ### ClientCnx - Producer:
> > >
> > > **Before**
> > > ```java
> > >     @Override
> > >     protected void handleCloseProducer(CommandCloseProducer closeProducer)
> > > {
> > >         log.info("[{}] Broker notification of Closed producer: {}",
> > > remoteAddress, closeProducer.getProducerId());
> > >         final long producerId = closeProducer.getProducerId();
> > >         ProducerImpl<?> producer = producers.get(producerId);
> > >         if (producer != null) {
> > >             producer.connectionClosed(this);
> > >         } else {
> > >             log.warn("Producer with id {} not found while closing producer
> > > ", producerId);
> > >         }
> > >     }
> > > ```
> > > After:
> > > ```java
> > >    @Override
> > >     protected void handleCloseProducer(CommandCloseProducer closeProducer)
> > > {
> > >         log.info("[{}] Broker notification of Closed producer: {}",
> > > remoteAddress, closeProducer.getProducerId());
> > >         final long producerId = closeProducer.getProducerId();
> > >         ProducerImpl<?> producer = producers.get(producerId);
> > >         if (producer != null) {
> > >             if (closeProducer.isAllowReconnect) {
> > >                 producer.connectionClosed(this);
> > >             } else {
> > >                 producer.closeAsync();
> > >             }
> > >         } else {
> > >             log.warn("Producer with id {} not found while closing producer
> > > ", producerId);
> > >         }
> > >     }
> > > ```
> > > ### ClientCnx - Consumer:
> > >
> > > **Before**
> > > ```java
> > >     @Override
> > >     protected void handleCloseConsumer(CommandCloseConsumer closeConsumer)
> > > {
> > >         log.info("[{}] Broker notification of Closed consumer: {}",
> > > remoteAddress, closeConsumer.getConsumerId());
> > >         final long consumerId = closeConsumer.getConsumerId();
> > >         ConsumerImpl<?> consumer = consumers.get(consumerId);
> > >         if (consumer != null) {
> > >             consumer.connectionClosed(this);
> > >         } else {
> > >             log.warn("Consumer with id {} not found while closing consumer
> > > ", consumerId);
> > >         }
> > >     }
> > > ```
> > > **After**
> > > ```java
> > >     @Override
> > >     protected void handleCloseConsumer(CommandCloseConsumer closeConsumer)
> > > {
> > >         log.info("[{}] Broker notification of Closed consumer: {}",
> > > remoteAddress, closeConsumer.getConsumerId());
> > >         final long consumerId = closeConsumer.getConsumerId();
> > >         ConsumerImpl<?> consumer = consumers.get(consumerId);
> > >         if (consumer != null) {
> > >             if (closeConsumer.isAllowReconnect) {
> > >                 consumer.connectionClosed(this);
> > >             } else {
> > >                 consumer.closeAsync();
> > >             }
> > >         } else {
> > >             log.warn("Consumer with id {} not found while closing consumer
> > > ", consumerId);
> > >         }
> > >     }
> > > ```
> > >
> > >
> > > ## Reject Alternatives
> > >
> > > none.
>
>
>
> --
> Zike Yang

Re: [DISCUSS] PIP-139 : Support Broker send command to real close producer/consumer.

Posted by guo jiwei <te...@apache.org>.
Hi
   As we discussed above, I have posted pr #14730
<https://github.com/apache/pulsar/pull/14730> to solve the deleting
namespace issue.
   Thanks Michael for your idea. And thanks to all the people joining
this.

Regards
Jiwei Guo (Tboy)


On Mon, Mar 7, 2022 at 1:45 PM guo jiwei <te...@apache.org> wrote:

> Hi  Penghui,
>    After test, we can use #12136 to stop the replicator.
>
>
> Regards
> Jiwei Guo (Tboy)
>
>
> On Sat, Mar 5, 2022 at 5:31 PM PengHui Li <pe...@apache.org> wrote:
>
>> > Great point. I was focused on deleting namespaces and missed the case
>> where the user wants to delete a topic from a replicated namespace. I
>> agree that we should make it possible to delete these topics without
>> removing the namespace replication.
>>
>> Oh, sorry. I thought again, after
>> https://github.com/apache/pulsar/pull/12136
>> introduced topic level geo-replication configuration, users can disable
>> for
>> a topic even if the namespace enabled geo-replication.
>>
>> And topic level policy is introduced in 2.6.0. I think It's time to make
>> it
>> one of
>> the features turned on by default.
>>
>> So, we can follow the steps that stop the replication first and then
>> delete
>> the
>> topic from clusters.
>>
>> > Note that it is already possible
>> to delete these topics by deleting with force or by configuring
>> inactive topic deletion.
>>
>> Hmm, I think both of them do not work for now, because the producer of
>> the replicator will always be active.
>>
>> > Considering users can delete replicated topics by deleting with force,
>> are you saying we need to provide a non-force way to delete these
>> topics?
>>
>> Yes, if the topic doesn't have users' producers/consumers, users don't
>> need
>> to force delete the topic, it's a normal deletion.
>>
>>
>> ----------------------------------------------------------------------------------
>> @mattison @jiwei could you please check if we can use
>> https://github.com/apache/pulsar/pull/12136
>> to stop the geo-replication first, and then delete the topic?
>> If it works, we don't need to change the protocol.
>>
>> Thanks,
>> Penghui
>>
>>
>>
>> On Thu, Mar 3, 2022 at 7:12 AM Michael Marshall <mm...@apache.org>
>> wrote:
>>
>> > > The geo-replication's configuration can be centralized by using
>> > > one configuration store. But that doesn't change anything, we should
>> > > provide the same behavior for both centralized and decentralized
>> > > configuration store.
>> >
>> > Perhaps I misused the word decentralized. I meant that the challenge
>> > comes because geo-replication configurations for clusters are
>> > independent from each other.
>> >
>> > > The current challenge is
>> > > users usually set replicated clusters for a namespace, if remove
>> > > the replication configuration, the entire namespace will be affected.
>> > > We have supported setting the replicated configuration for a topic
>> [1],
>> > > only for 2.10.0 or later.
>> >
>> > Great point. I was focused on deleting namespaces and missed the case
>> > where the user wants to delete a topic from a replicated namespace. I
>> > agree that we should make it possible to delete these topics without
>> > removing the namespace replication. Note that it is already possible
>> > to delete these topics by deleting with force or by configuring
>> > inactive topic deletion.
>> >
>> > > we should provided steps for how to delete an unused replicated
>> > > topic and what are the effects of not removing properly
>> >
>> > I agree, and I think documentation coupled with helpful error logs
>> > will be very helpful here.
>> >
>> > Considering users can delete replicated topics by deleting with force,
>> > are you saying we need to provide a non-force way to delete these
>> > topics?
>> >
>> > If we want to provide a path to normal deletion, we already have an
>> > `isRemote` flag on producers, and the inactive topic deletion code
>> > uses this flag to determine if there are non-replication producers
>> > connected to a topic. We could modify the deletion logic for a global
>> > topic so that it can be deleted as long as the only producers
>> > connected are remote producers. My main concern is that normal
>> > deletion could allow users to miss the nuance that they must also
>> > delete the topic in the other cluster. This might be an overly
>> > cautious concern, though.
>> >
>> > Thanks,
>> > Michael
>> >
>> >
>> >
>> > On Tue, Mar 1, 2022 at 1:55 AM PengHui Li <pe...@apache.org> wrote:
>> > >
>> > > > I agree with you that it'd be nice to provide the same deletion
>> > > behavior. However, because geo-replication's configuration is
>> > > decentralized, I think namespace or topic deletion is more complicated
>> > > than unreplicated deletion. Note that users cannot currently delete
>> > > namespaces that are configured with remote replication clusters.
>> > >
>> > > The geo-replication's configuration can be centralized by using
>> > > one configuration store. But that doesn't change anything, we should
>> > > provide the same behavior for both centralized and decentralized
>> > > configuration store.
>> > >
>> > > > Note that while a user cannot explicitly delete a replicated topic,
>> > > they can remove the replication configuration in cluster A and cluster
>> > > B, and then they are left with unreplicated namespaces and topics,
>> > > which can be deleted.
>> > >
>> > > Yes, stopping the replication first is a more elegant approach,
>> > > and it is also very easy to understand for users. The current
>> challenge
>> > is
>> > > users usually set replicated clusters for a namespace, if remove
>> > > the replication configuration, the entire namespace will be affected.
>> > > We have supported setting the replicated configuration for a topic
>> [1],
>> > > only for 2.10.0 or later.
>> > >
>> > > > In this case, the specific problem stems from a remote cluster
>> > > targeting the local cluster. If the user only deletes the local topic,
>> > > without removing the replication configuration, I don't think it
>> > > should be surprising when the topic gets recreated (assuming auto
>> > > topic creation is enabled).
>> > >
>> > > From the user's perspective, they don't have any producers/consumers,
>> > > If the topic only has the producer of the replicator connect to the
>> > remote
>> > > topic
>> > > each other, in this case, I think we should allow users to delete the
>> > > topic.
>> > > Users will be surprised in this case, because they are not able to
>> delete
>> > > an unused topic, they are not able to change the replication
>> > configuration
>> > > for
>> > > a namespace.
>> > >
>> > > > In my view, I think it's okay to make it harder to delete
>> > > geo-replicated namespaces or topics than local namespaces or topics.
>> > > Failing requests with a good, informative error message protects
>> > > users. Additionally, if users are able to delete a namespace or topic
>> > > by force, that gives them a work around--although we should document
>> > > the limitations of this workaround.
>> > >
>> > > I agree with it's okay to make it harder to delete a replicated
>> topic, so
>> > > for
>> > > a replicated topic, users should delete the topic in each cluster, we
>> > should
>> > > provided steps for how to delete an unused replicated topic and what
>> are
>> > > the effects of not removing properly, for example, if a topic has 2
>> > > clusters, cluster-a
>> > > and cluster-b, but only delete the topic from cluster-a, the topic
>> will
>> > be
>> > > created
>> > > again if cluster-b write some new data into this topic, depends on the
>> > topic
>> > > auto-creation.
>> > >
>> > > If we only support set replicated cluster for a topic, not a
>> namespace,
>> > > I like the approach that stops the replication first and then deletes
>> the
>> > > topic.
>> > >
>> > > [1] https://github.com/apache/pulsar/pull/12136
>> > >
>> > > Thanks,
>> > > Penghui
>> > >
>> > > On Tue, Mar 1, 2022 at 2:27 PM Michael Marshall <mmarshall@apache.org
>> >
>> > > wrote:
>> > >
>> > > > > Either replicated topic or non-replicated topic, we should provide
>> > the
>> > > > > same topic deletion behavior, the topic can be deleted if no
>> active
>> > > > > user's producers/consumers, should not be affected by the Pulsar
>> > > > > internal producers/consumers.
>> > > >
>> > > > I agree with you that it'd be nice to provide the same deletion
>> > > > behavior. However, because geo-replication's configuration is
>> > > > decentralized, I think namespace or topic deletion is more
>> complicated
>> > > > than unreplicated deletion. Note that users cannot currently delete
>> > > > namespaces that are configured with remote replication clusters.
>> > > >
>> > > > One challenge I see is the configuration in the remote cluster. If
>> > > > cluster A is replicated to cluster B for namespace C, deleting a
>> topic
>> > > > in cluster B does not affect the configuration in cluster A. If we
>> > > > make it easy to delete the topic in cluster B, we need to decide how
>> > > > to handle failed replication in cluster A. In the same way, deleting
>> > > > namespace C in cluster B doesn't delete it from cluster A. For
>> > > > example, when should cluster A stop attempting to create producers
>> to
>> > > > cluster B for the deleted namespace/topic and how do we inform
>> > > > operators that the replication is no longer working as configured.
>> We
>> > > > avoid several problems by failing the initial user request.
>> > > >
>> > > > Note that while a user cannot explicitly delete a replicated topic,
>> > > > they can remove the replication configuration in cluster A and
>> cluster
>> > > > B, and then they are left with unreplicated namespaces and topics,
>> > > > which can be deleted.
>> > > >
>> > > > > But for now, for a replicated topic, users are not able to delete
>> it
>> > > > because
>> > > > > of the internal active producer, adding force delete option can
>> > delete
>> > > > > the topic, but it will create again if users enabled topic
>> > auto-creation.
>> > > >
>> > > > In this case, the specific problem stems from a remote cluster
>> > > > targeting the local cluster. If the user only deletes the local
>> topic,
>> > > > without removing the replication configuration, I don't think it
>> > > > should be surprising when the topic gets recreated (assuming auto
>> > > > topic creation is enabled).
>> > > >
>> > > > In my view, I think it's okay to make it harder to delete
>> > > > geo-replicated namespaces or topics than local namespaces or topics.
>> > > > Failing requests with a good, informative error message protects
>> > > > users. Additionally, if users are able to delete a namespace or
>> topic
>> > > > by force, that gives them a work around--although we should document
>> > > > the limitations of this workaround.
>> > > >
>> > > > I am not experienced with geo-replication, so please let me know if
>> > > > any of my analysis doesn't align with the actual design.
>> > > >
>> > > > Thanks,
>> > > > Michael
>> > > >
>> > > >
>> > > > On Sun, Feb 27, 2022 at 8:42 PM guo jiwei <te...@apache.org>
>> > wrote:
>> > > > >
>> > > > > I have added the geo-replicator topic case and updated the PIP.
>> > > > >
>> > > > >
>> > > > > Regards
>> > > > > Jiwei Guo (Tboy)
>> > > > >
>> > > > >
>> > > > > On Sun, Feb 27, 2022 at 1:00 PM PengHui Li <pe...@apache.org>
>> > wrote:
>> > > > >
>> > > > > > > To me, the main question is whether we create a custom error
>> or
>> > > > expand
>> > > > > > the `CommandCloseProducer` command. I lean towards adding an
>> error
>> > > > > > because it will automatically be backward compatible based on
>> the
>> > way
>> > > > > > the client determines which errors are retriable. Although, I
>> don't
>> > > > > > have a strong opinion.
>> > > > > >
>> > > > > > I have the same opinion as you. But it looks like not able to
>> fix
>> > the
>> > > > > > geo-replication topic deletion.
>> > > > > >
>> > > > > > Either replicated topic or non-replicated topic, we should
>> provide
>> > the
>> > > > > > same topic deletion behavior, the topic can be deleted if no
>> active
>> > > > > > user's producers/consumers, should not be affected by the Pulsar
>> > > > > > internal producers/consumers.
>> > > > > >
>> > > > > > But for now, for a replicated topic, users are not able to
>> delete
>> > it
>> > > > > > because
>> > > > > > of the internal active producer, adding force delete option can
>> > delete
>> > > > > > the topic, but it will create again if users enabled topic
>> > > > auto-creation.
>> > > > > >
>> > > > > > I think in this case, it seems we don't have a chance to give
>> the
>> > > > producer
>> > > > > > an exception, because the reconnection will connect to a new
>> topic
>> > only
>> > > > > > with
>> > > > > > the same topic name as before.
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Penghui
>> > > > > >
>> > > > > > On Sat, Feb 26, 2022 at 2:24 PM Michael Marshall <
>> > mmarshall@apache.org
>> > > > >
>> > > > > > wrote:
>> > > > > >
>> > > > > > > > We need to determine the overall plan, not the
>> implementation
>> > > > > > > > at this moment.
>> > > > > > >
>> > > > > > > Great point, I agree.
>> > > > > > >
>> > > > > > > > Looks like we need an option in the
>> > > > > > > > `CommandCloseProducer` to avoid the replication producer
>> > > > > > > > reconnecting again.
>> > > > > > >
>> > > > > > > This is an important point. One of the primary requirements
>> for
>> > > > > > > geo-replication must also be that a replicated namespace or
>> > topic can
>> > > > > > > be deleted and recreated without any problems or leaks in the
>> > broker.
>> > > > > > > That requires that the producer is cleaned up properly when
>> the
>> > > > > > > namespace is deleted. I think we could achieve that with a new
>> > server
>> > > > > > > error or an expansion of the `CommandCloseProducer` command.
>> > > > > > >
>> > > > > > > To me, the main question is whether we create a custom error
>> or
>> > > > expand
>> > > > > > > the `CommandCloseProducer` command. I lean towards adding an
>> > error
>> > > > > > > because it will automatically be backward compatible based on
>> > the way
>> > > > > > > the client determines which errors are retriable. Although, I
>> > don't
>> > > > > > > have a strong opinion.
>> > > > > > >
>> > > > > > > The key code paths that we need to consider are topic lookup
>> and
>> > > > topic
>> > > > > > > creation. Both the http admin endpoint and the binary protocol
>> > > > > > > endpoint interact with these code paths, and we should fail
>> > requests
>> > > > > > > for namespaces which have the "deleted" field set to true.
>> > > > > > >
>> > > > > > > In looking at the code, we could modify the logic in
>> > > > > > > `PulsarWebResource#checkLocalOrGetPeerReplicationCluster`.
>> That
>> > > > method
>> > > > > > > is always called for topic lookups via HTTP or via the Pulsar
>> > > > > > > binary protocol. If a namespace doesn't exist, that method
>> > currently
>> > > > > > > returns a NOT_FOUND exception. In the binary protocol case,
>> that
>> > > > > > > exception is converted to a `ServerError.MetadataError`. We
>> could
>> > > > > > > modify the method like I do here [0], or we could add another
>> > case
>> > > > and
>> > > > > > > return a custom error message.
>> > > > > > >
>> > > > > > > We'd also need to prevent topic creation via the admin
>> endpoint
>> > and
>> > > > > > > the auto topic creation via the binary protocol. I believe
>> this
>> > will
>> > > > > > > be just as easy as the lookup case, but I haven't confirmed.
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > > Michael
>> > > > > > >
>> > > > > > > [0]
>> > > > > > >
>> > > > > >
>> > > >
>> >
>> https://github.com/michaeljmarshall/pulsar/commit/93611a741af163587f79b88bc1c9f1acc7953512
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > On Fri, Feb 25, 2022 at 3:45 AM PengHui Li <
>> penghui@apache.org>
>> > > > wrote:
>> > > > > > > >
>> > > > > > > > The replicated topic deletion also has a similar problem.
>> > > > > > > > If the topic auto-creation is enabled, we are not able to
>> > delete
>> > > > > > > > a replicated topic, because the producer used for
>> replication
>> > > > > > > > will try to reconnect. Looks like we need an option in the
>> > > > > > > > `CommandCloseProducer` to avoid the replication producer
>> > > > > > > > reconnecting again.
>> > > > > > > >
>> > > > > > > > Penghui
>> > > > > > > >
>> > > > > > > > On Fri, Feb 25, 2022 at 5:26 PM PengHui Li <
>> penghui@apache.org
>> > >
>> > > > wrote:
>> > > > > > > >
>> > > > > > > > > > Penghui, are you suggesting that we implement the
>> > > > namespace/tenant
>> > > > > > > > > terminated logic after completing this PIP?
>> > > > > > > > >
>> > > > > > > > > I'm ok with both being implemented together or separated.
>> > > > > > > > > We need to determine the overall plan, not the
>> implementation
>> > > > > > > > > at this moment.
>> > > > > > > > >
>> > > > > > > > > > For the sake of discussion, if we implement the
>> namespace
>> > > > > > terminated
>> > > > > > > > > logic first, we could fulfill the underlying requirements
>> for
>> > > > this
>> > > > > > PIP
>> > > > > > > > > by returning a new non-retriable error response when a
>> client
>> > > > tries
>> > > > > > to
>> > > > > > > > > connect a producer or a consumer to a topic in a namespace
>> > that
>> > > > is
>> > > > > > > > > "terminated". If we do add the "namespace terminated"
>> > feature,
>> > > > we'll
>> > > > > > > > > need to add a non-retriable exception for this case,
>> anyway.
>> > The
>> > > > main
>> > > > > > > > > advantage here is that we'd only need one expansion of the
>> > > > protobuf
>> > > > > > > > > instead of two. The downside is that the protocol for
>> > connected
>> > > > > > > > > clients has a couple more roundtrips. The broker would
>> > disconnect
>> > > > > > > > > connected clients and then fail their reconnection attempt
>> > with a
>> > > > > > > > > non-retriable error.
>> > > > > > > > >
>> > > > > > > > > Regarding the namespace "terminated" concept, I just
>> noticed
>> > > > that we
>> > > > > > > > > already have a "deleted" field in a namespace's policies
>> [0].
>> > > > There
>> > > > > > is
>> > > > > > > > > even a comment that says:
>> > > > > > > > >
>> > > > > > > > > Yes, if we can terminate the namespace first, we should
>> > return
>> > > > > > > “namespace
>> > > > > > > > > not found”
>> > > > > > > > > if the producers and consumers try to reconnect, the
>> client
>> > > > should
>> > > > > > stop
>> > > > > > > > > reconnecting
>> > > > > > > > > after getting this error. Looks like following this way,
>> we
>> > don't
>> > > > > > need
>> > > > > > > to
>> > > > > > > > > introduce any protocol changes anymore.
>> > > > > > > > >
>> > > > > > > > > > As I think about it more, I no longer think "terminated"
>> > is the
>> > > > > > right
>> > > > > > > > > term for what I proposed above. Our goal is to briefly
>> > prevent
>> > > > any
>> > > > > > > > > topic creation to ensure we can delete all sub resources
>> for
>> > a
>> > > > > > > > > namespace. On the other hand, a terminated topic isn't
>> > > > necessarily
>> > > > > > > > > short lived. If we want to apply the "terminated" term
>> > > > unequivocally
>> > > > > > > > > to both topics and namespaces, I think a terminated
>> namespace
>> > > > would
>> > > > > > > > > need to be a namespace where all topics are in terminated
>> > state
>> > > > and
>> > > > > > no
>> > > > > > > > > additional topics could be created. That's not the feature
>> > we're
>> > > > > > > > > discussing here, though. Deleted seems like the right term
>> > to me,
>> > > > > > > > > especially since we're already using it to prevent a race
>> > > > condition
>> > > > > > > > >
>> > > > > > > > > Yes, topic termination has different meanings, deletion
>> > should
>> > > > be the
>> > > > > > > right
>> > > > > > > > > term here.
>> > > > > > > > >
>> > > > > > > > > Thanks,
>> > > > > > > > > Penghui
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > On Fri, Feb 25, 2022 at 12:19 PM Michael Marshall <
>> > > > > > > mmarshall@apache.org>
>> > > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > >> Regarding the namespace "terminated" concept, I just
>> noticed
>> > > > that we
>> > > > > > > > >> already have a "deleted" field in a namespace's policies
>> > [0].
>> > > > There
>> > > > > > is
>> > > > > > > > >> even a comment that says:
>> > > > > > > > >>
>> > > > > > > > >> > // set the policies to deleted so that somebody else
>> > cannot
>> > > > > > acquire
>> > > > > > > > >> this namespace
>> > > > > > > > >>
>> > > > > > > > >> I am not familiar with this feature, but it seems like
>> this
>> > > > policy
>> > > > > > > > >> field could be checked before creating a topic in a
>> > namespace.
>> > > > That
>> > > > > > > > >> would remove certain races described above.
>> > > > > > > > >>
>> > > > > > > > >> As I think about it more, I no longer think "terminated"
>> is
>> > the
>> > > > > > right
>> > > > > > > > >> term for what I proposed above. Our goal is to briefly
>> > prevent
>> > > > any
>> > > > > > > > >> topic creation to ensure we can delete all sub resources
>> > for a
>> > > > > > > > >> namespace. On the other hand, a terminated topic isn't
>> > > > necessarily
>> > > > > > > > >> short lived. If we want to apply the "terminated" term
>> > > > unequivocally
>> > > > > > > > >> to both topics and namespaces, I think a terminated
>> > namespace
>> > > > would
>> > > > > > > > >> need to be a namespace where all topics are in terminated
>> > state
>> > > > and
>> > > > > > no
>> > > > > > > > >> additional topics could be created. That's not the
>> feature
>> > we're
>> > > > > > > > >> discussing here, though. Deleted seems like the right
>> term
>> > to
>> > > > me,
>> > > > > > > > >> especially since we're already using it to prevent a race
>> > > > condition
>> > > > > > > > >> [0].
>> > > > > > > > >>
>> > > > > > > > >> Thanks,
>> > > > > > > > >> Michael
>> > > > > > > > >>
>> > > > > > > > >> [0]
>> > > > > > > > >>
>> > > > > > >
>> > > > > >
>> > > >
>> >
>> https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/impl/NamespacesBase.java#L252-L262
>> > > > > > > > >>
>> > > > > > > > >> On Thu, Feb 24, 2022 at 9:58 PM Michael Marshall <
>> > > > > > > mmarshall@apache.org>
>> > > > > > > > >> wrote:
>> > > > > > > > >> >
>> > > > > > > > >> > Hi Dave,
>> > > > > > > > >> >
>> > > > > > > > >> > > automatically delete tenants and namespaces for not
>> > > > containing
>> > > > > > > topics
>> > > > > > > > >> >
>> > > > > > > > >> > I don't think that is what we are discussing. I agree
>> > that the
>> > > > > > > initial
>> > > > > > > > >> > email says just that, though, which is why I asked
>> above:
>> > > > > > > > >> >
>> > > > > > > > >> > >> When there are no user-created topics under a
>> > namespace,
>> > > > > > > > >> > >> Namespace should be deleted.
>> > > > > > > > >> > > This is supposed to mean that the namespace should be
>> > able
>> > > > to be
>> > > > > > > > >> > > deleted, correct?
>> > > > > > > > >> >
>> > > > > > > > >> > Perhaps Mattison can clarify, too.
>> > > > > > > > >> >
>> > > > > > > > >> > My current understanding of the context for the PIP is
>> > that a
>> > > > user
>> > > > > > > > >> > call to delete a namespace without force can fail when
>> a
>> > > > producer
>> > > > > > > > >> > reconnects to a deleted topic. The goal is to remove
>> the
>> > race
>> > > > > > > > >> > condition to ensure namespace deletion can succeed.
>> > > > > > > > >> >
>> > > > > > > > >> > Thanks,
>> > > > > > > > >> > Michael
>> > > > > > > > >> >
>> > > > > > > > >> > On Thu, Feb 24, 2022 at 5:21 PM Dave Fisher <
>> > wave@apache.org>
>> > > > > > > wrote:
>> > > > > > > > >> > >
>> > > > > > > > >> > > Hi -
>> > > > > > > > >> > >
>> > > > > > > > >> > > I hope I’m understanding what’s being discussed.
>> > > > > > > > >> > >
>> > > > > > > > >> > > If we are going to automatically delete tenants and
>> > > > namespaces
>> > > > > > for
>> > > > > > > > >> not containing topics then we need to make both of these
>> > > > automatic
>> > > > > > > actions
>> > > > > > > > >> configurable with a default to NOT do so. Otherwise we
>> break
>> > > > > > existing
>> > > > > > > use
>> > > > > > > > >> cases.
>> > > > > > > > >> > >
>> > > > > > > > >> > > Automatic deletion of namespaces should be
>> configurable
>> > at
>> > > > both
>> > > > > > > the
>> > > > > > > > >> cluster and tenant level.
>> > > > > > > > >> > >
>> > > > > > > > >> > > Regards,
>> > > > > > > > >> > > Dave
>> > > > > > > > >> > >
>> > > > > > > > >> > > > On Feb 24, 2022, at 2:25 PM, Michael Marshall <
>> > > > > > > mmarshall@apache.org>
>> > > > > > > > >> wrote:
>> > > > > > > > >> > > >
>> > > > > > > > >> > > >> The old producer/consumer should be closed after
>> > > > applying the
>> > > > > > > > >> changes from
>> > > > > > > > >> > > >> this proposal.
>> > > > > > > > >> > > >
>> > > > > > > > >> > > > Penghui, are you suggesting that we implement the
>> > > > > > > namespace/tenant
>> > > > > > > > >> > > > terminated logic after completing this PIP?
>> > > > > > > > >> > > >
>> > > > > > > > >> > > > For the sake of discussion, if we implement the
>> > namespace
>> > > > > > > terminated
>> > > > > > > > >> > > > logic first, we could fulfill the underlying
>> > requirements
>> > > > for
>> > > > > > > this
>> > > > > > > > >> PIP
>> > > > > > > > >> > > > by returning a new non-retriable error response
>> when a
>> > > > client
>> > > > > > > tries
>> > > > > > > > >> to
>> > > > > > > > >> > > > connect a producer or a consumer to a topic in a
>> > namespace
>> > > > > > that
>> > > > > > > is
>> > > > > > > > >> > > > "terminated". If we do add the "namespace
>> terminated"
>> > > > feature,
>> > > > > > > we'll
>> > > > > > > > >> > > > need to add a non-retriable exception for this
>> case,
>> > > > anyway.
>> > > > > > The
>> > > > > > > > >> main
>> > > > > > > > >> > > > advantage here is that we'd only need one expansion
>> > of the
>> > > > > > > protobuf
>> > > > > > > > >> > > > instead of two. The downside is that the protocol
>> for
>> > > > > > connected
>> > > > > > > > >> > > > clients has a couple more roundtrips. The broker
>> would
>> > > > > > > disconnect
>> > > > > > > > >> > > > connected clients and then fail their reconnection
>> > attempt
>> > > > > > with
>> > > > > > > a
>> > > > > > > > >> > > > non-retriable error.
>> > > > > > > > >> > > >
>> > > > > > > > >> > > > Thanks,
>> > > > > > > > >> > > > Michael
>> > > > > > > > >> > > >
>> > > > > > > > >> > > > On Thu, Feb 24, 2022 at 7:11 AM PengHui Li <
>> > > > > > penghui@apache.org>
>> > > > > > > > >> wrote:
>> > > > > > > > >> > > >>
>> > > > > > > > >> > > >>> If we want to solve this problem, we need to add
>> > some
>> > > > sync
>> > > > > > > > >> resources like
>> > > > > > > > >> > > >> lock/state, I think it’s a harm for us, we don’t
>> > need to
>> > > > do
>> > > > > > > that.
>> > > > > > > > >> > > >>
>> > > > > > > > >> > > >> I think we can make the namespace/tenants to the
>> > inactive
>> > > > > > state
>> > > > > > > > >> first so
>> > > > > > > > >> > > >> that we can avoid any new
>> > > > > > > > >> > > >> producer/consumer connect to the topic under the
>> > > > > > > namespace/tenant.
>> > > > > > > > >> > > >>
>> > > > > > > > >> > > >> The old producer/consumer should be closed after
>> > > > applying the
>> > > > > > > > >> changes from
>> > > > > > > > >> > > >> this proposal.
>> > > > > > > > >> > > >>
>> > > > > > > > >> > > >> Thanks,
>> > > > > > > > >> > > >> Penghui
>> > > > > > > > >> > > >>
>> > > > > > > > >> > > >> On Tue, Feb 8, 2022 at 5:47 PM mattison chao <
>> > > > > > > > >> mattisonchao@gmail.com> wrote:
>> > > > > > > > >> > > >>
>> > > > > > > > >> > > >>>> This is supposed to mean that the namespace
>> should
>> > be
>> > > > able
>> > > > > > > to be
>> > > > > > > > >> > > >>>> deleted, correct?
>> > > > > > > > >> > > >>>
>> > > > > > > > >> > > >>> Yes, the main background is the user doesn’t
>> have an
>> > > > active
>> > > > > > > > >> topic. so,
>> > > > > > > > >> > > >>> they want to delete the namespace.
>> > > > > > > > >> > > >>>
>> > > > > > > > >> > > >>>> However, I think
>> > > > > > > > >> > > >>>> we might still have a race condition that could
>> > make
>> > > > tenant
>> > > > > > > or
>> > > > > > > > >> > > >>>> namespace deletion fail. Specifically, if a new
>> > > > producer or
>> > > > > > > > >> consumer
>> > > > > > > > >> > > >>>> creates a topic after the namespace deletion has
>> > > > started
>> > > > > > but
>> > > > > > > > >> > > >>>> before it is complete. Do you agree that the
>> > underlying
>> > > > > > race
>> > > > > > > > >> still
>> > > > > > > > >> > > >>> exists?
>> > > > > > > > >> > > >>>
>> > > > > > > > >> > > >>> Yes, this condition exists. I think it’s not a
>> big
>> > > > problem
>> > > > > > > > >> because the
>> > > > > > > > >> > > >>> user doesn’t want to use this namespace anymore.
>> > > > > > > > >> > > >>> If this scenario appears, they will get an error
>> and
>> > > > need to
>> > > > > > > > >> delete it
>> > > > > > > > >> > > >>> again.
>> > > > > > > > >> > > >>>
>> > > > > > > > >> > > >>>> What if we expand our usage of the "terminated"
>> > > > feature to
>> > > > > > > apply
>> > > > > > > > >> to
>> > > > > > > > >> > > >>>> namespaces (and tenants)? Then, a terminated
>> > namespace
>> > > > can
>> > > > > > > have
>> > > > > > > > >> > > >>>> bundles and topics can be deleted but not
>> created
>> > > > (just as
>> > > > > > a
>> > > > > > > > >> terminated
>> > > > > > > > >> > > >>>> topic cannot have any new messages published to
>> > it).
>> > > > This
>> > > > > > > would
>> > > > > > > > >> take
>> > > > > > > > >> > > >>>> care of all topic creation race conditions. We'd
>> > > > probably
>> > > > > > > need
>> > > > > > > > >> to add
>> > > > > > > > >> > > >>>> new protobuf exceptions for this feature.
>> > > > > > > > >> > > >>>
>> > > > > > > > >> > > >>>
>> > > > > > > > >> > > >>> If we want to solve this problem, we need to add
>> > some
>> > > > sync
>> > > > > > > > >> resources like
>> > > > > > > > >> > > >>> lock/state, I think it’s a harm for us, we don’t
>> > need
>> > > > to do
>> > > > > > > that.
>> > > > > > > > >> > > >>>
>> > > > > > > > >> > > >>> Thanks for your suggestions, let me know what you
>> > think.
>> > > > > > > > >> > > >>>
>> > > > > > > > >> > > >>> Best,
>> > > > > > > > >> > > >>> Mattison
>> > > > > > > > >> > > >>>
>> > > > > > > > >> > > >>>> On Feb 1, 2022, at 2:26 PM, Michael Marshall <
>> > > > > > > > >> mmarshall@apache.org>
>> > > > > > > > >> > > >>> wrote:
>> > > > > > > > >> > > >>>>
>> > > > > > > > >> > > >>>> This proposal identifies an important issue
>> that we
>> > > > should
>> > > > > > > > >> definitely
>> > > > > > > > >> > > >>>> solve. I have some questions.
>> > > > > > > > >> > > >>>>
>> > > > > > > > >> > > >>>>> When there are no user-created topics under a
>> > > > namespace,
>> > > > > > > > >> > > >>>>> Namespace should be deleted.
>> > > > > > > > >> > > >>>>
>> > > > > > > > >> > > >>>> This is supposed to mean that the namespace
>> should
>> > be
>> > > > able
>> > > > > > > to be
>> > > > > > > > >> > > >>>> deleted, correct?
>> > > > > > > > >> > > >>>>
>> > > > > > > > >> > > >>>>> For this reason, we need to close the system
>> topic
>> > > > > > > > >> reader/producer
>> > > > > > > > >> > > >>>>> first, then remove the system topic. finally,
>> > remove
>> > > > the
>> > > > > > > > >> namespace.
>> > > > > > > > >> > > >>>>
>> > > > > > > > >> > > >>>> I agree that expanding the protobuf
>> CloseProducer
>> > and
>> > > > > > > > >> CloseConsumer
>> > > > > > > > >> > > >>>> commands could be valuable here. The expansion
>> > would
>> > > > ensure
>> > > > > > > that
>> > > > > > > > >> > > >>>> producers and consumers don't attempt to
>> reconnect.
>> > > > > > However,
>> > > > > > > I
>> > > > > > > > >> think
>> > > > > > > > >> > > >>>> we might still have a race condition that could
>> > make
>> > > > tenant
>> > > > > > > or
>> > > > > > > > >> > > >>>> namespace deletion fail. Specifically, if a new
>> > > > producer or
>> > > > > > > > >> consumer
>> > > > > > > > >> > > >>>> creates a topic after the namespace deletion has
>> > > > started
>> > > > > > but
>> > > > > > > > >> > > >>>> before it is complete. Do you agree that the
>> > underlying
>> > > > > > race
>> > > > > > > > >> still
>> > > > > > > > >> > > >>> exists?
>> > > > > > > > >> > > >>>>
>> > > > > > > > >> > > >>>> In my view, the fundamental problem here is that
>> > > > deleting
>> > > > > > > > >> certain Pulsar
>> > > > > > > > >> > > >>>> resources takes time and, in a distributed
>> system,
>> > that
>> > > > > > means
>> > > > > > > > >> race
>> > > > > > > > >> > > >>>> conditions.
>> > > > > > > > >> > > >>>>
>> > > > > > > > >> > > >>>> What if we expand our usage of the "terminated"
>> > > > feature to
>> > > > > > > apply
>> > > > > > > > >> to
>> > > > > > > > >> > > >>>> namespaces (and tenants)? Then, a terminated
>> > namespace
>> > > > can
>> > > > > > > have
>> > > > > > > > >> > > >>>> bundles and topics can be deleted but not
>> created
>> > > > (just as
>> > > > > > a
>> > > > > > > > >> terminated
>> > > > > > > > >> > > >>>> topic cannot have any new messages published to
>> > it).
>> > > > This
>> > > > > > > would
>> > > > > > > > >> take
>> > > > > > > > >> > > >>>> care of all topic creation race conditions. We'd
>> > > > probably
>> > > > > > > need
>> > > > > > > > >> to add
>> > > > > > > > >> > > >>>> new protobuf exceptions for this feature.
>> > > > > > > > >> > > >>>>
>> > > > > > > > >> > > >>>> Thanks,
>> > > > > > > > >> > > >>>> Michael
>> > > > > > > > >> > > >>>>
>> > > > > > > > >> > > >>>>
>> > > > > > > > >> > > >>>> On Sat, Jan 29, 2022 at 7:25 PM Zike Yang
>> > > > > > > > >> > > >>>> <zk...@streamnative.io.invalid> wrote:
>> > > > > > > > >> > > >>>>>
>> > > > > > > > >> > > >>>>> +1
>> > > > > > > > >> > > >>>>>
>> > > > > > > > >> > > >>>>>
>> > > > > > > > >> > > >>>>> Thanks,
>> > > > > > > > >> > > >>>>> Zike
>> > > > > > > > >> > > >>>>>
>> > > > > > > > >> > > >>>>> On Sat, Jan 29, 2022 at 12:30 PM guo jiwei <
>> > > > > > > > >> technoboy@apache.org>
>> > > > > > > > >> > > >>> wrote:
>> > > > > > > > >> > > >>>>>>
>> > > > > > > > >> > > >>>>>> Hi
>> > > > > > > > >> > > >>>>>> The PIP link :
>> > > > > > > https://github.com/apache/pulsar/issues/13989
>> > > > > > > > >> > > >>>>>>
>> > > > > > > > >> > > >>>>>> Regards
>> > > > > > > > >> > > >>>>>> Jiwei Guo (Tboy)
>> > > > > > > > >> > > >>>>>>
>> > > > > > > > >> > > >>>>>>
>> > > > > > > > >> > > >>>>>> On Sat, Jan 29, 2022 at 11:46 AM mattison
>> chao <
>> > > > > > > > >> mattisonchao@gmail.com
>> > > > > > > > >> > > >>>>
>> > > > > > > > >> > > >>>>>> wrote:
>> > > > > > > > >> > > >>>>>>
>> > > > > > > > >> > > >>>>>>> Hello everyone,
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>> I want to start a discussion about PIP-139 :
>> > Support
>> > > > > > > Broker
>> > > > > > > > >> send
>> > > > > > > > >> > > >>> command
>> > > > > > > > >> > > >>>>>>> to real close producer/consumer.
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>> This is the PIP document
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>>
>> https://github.com/apache/pulsar/issues/13989 <
>> > > > > > > > >> > > >>>>>>>
>> https://github.com/apache/pulsar/issues/13979>
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>> Please check it out and feel free to share
>> your
>> > > > > > thoughts.
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>> Best,
>> > > > > > > > >> > > >>>>>>> Mattison
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>> ———————— Pasted below for quoting
>> convenience.
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>> Relation pull request:  #13337
>> > > > > > > > >> > > >>>>>>> Authors: @Technoboy-  @mattisonchao
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>> ## Motivation
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>> Before we discuss this pip, I'd like to
>> > supplement
>> > > > some
>> > > > > > > > >> context to
>> > > > > > > > >> > > >>> help
>> > > > > > > > >> > > >>>>>>> contributors who don't want to read the
>> original
>> > > > pull
>> > > > > > > request.
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>>> When there are no user-created topics under
>> a
>> > > > > > namespace,
>> > > > > > > > >> Namespace
>> > > > > > > > >> > > >>>>>>> should be deleted. But currently, the system
>> > topic
>> > > > > > existed
>> > > > > > > > >> and the
>> > > > > > > > >> > > >>>>>>> reader/producer could auto-create the system
>> > which
>> > > > may
>> > > > > > > cause
>> > > > > > > > >> the
>> > > > > > > > >> > > >>> namespace
>> > > > > > > > >> > > >>>>>>> deletion to fail.
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>> For this reason, we need to close the system
>> > topic
>> > > > > > > > >> reader/producer
>> > > > > > > > >> > > >>> first,
>> > > > > > > > >> > > >>>>>>> then remove the system topic. finally, remove
>> > the
>> > > > > > > namespace.
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>> Following this way, we first want to use
>> > > > ``terminate``
>> > > > > > to
>> > > > > > > > >> solve this
>> > > > > > > > >> > > >>>>>>> problem. then we found producers can
>> > disconnect, but
>> > > > > > > > >> consumers are
>> > > > > > > > >> > > >>> still
>> > > > > > > > >> > > >>>>>>> alive. So, another PR #13960 wants to add
>> > consumers'
>> > > > > > > closing
>> > > > > > > > >> logic.
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>> After #13960, all things look good, but
>> another
>> > > > problem
>> > > > > > > > >> appears. that
>> > > > > > > > >> > > >>> is
>> > > > > > > > >> > > >>>>>>> we need to wait until consumers completely
>> > consume
>> > > > all
>> > > > > > > > >> messages (this
>> > > > > > > > >> > > >>> may
>> > > > > > > > >> > > >>>>>>> make terminate topic so long and the
>> operation
>> > > > > > > timeout)then
>> > > > > > > > >> get
>> > > > > > > > >> > > >>>>>>> ``reachedEndOfTopic``. the relative code
>> here :
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>
>> > > > > > > > >>
>> > > > > > >
>> > > > > >
>> > > >
>> >
>> https://github.com/apache/pulsar/blob/07ef9231db8b844586b9217ee2d59237eb9c54b7/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/CompactorSubscription.java#L102-L106
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>> In the #13337 case, we need to force close
>> > consumers
>> > > > > > > > >> immediately. So
>> > > > > > > > >> > > >>> we
>> > > > > > > > >> > > >>>>>>> write this PIP to discuss another way to
>> resolve
>> > > > this
>> > > > > > > problem.
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>> ## Goal
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>> We can add a new field(`allow_reconnect`) in
>> > command
>> > > > > > > > >> > > >>>>>>> ``CommandCloseProducer``/
>> > ``CommandCloseConsumer``
>> > > > to
>> > > > > > > close
>> > > > > > > > >> > > >>>>>>> producer/consumers immediately.
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>> ## API Changes
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>> - Add ``allow_reconnect`` to
>> > > > ``CommandCloseProducer``;
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>> ```java
>> > > > > > > > >> > > >>>>>>> **Before**
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>> message CommandCloseProducer {
>> > > > > > > > >> > > >>>>>>>   required uint64 producer_id = 1;
>> > > > > > > > >> > > >>>>>>>   required uint64 request_id = 2;
>> > > > > > > > >> > > >>>>>>> }
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>> **After**
>> > > > > > > > >> > > >>>>>>> message CommandCloseProducer {
>> > > > > > > > >> > > >>>>>>>   required uint64 producer_id = 1;
>> > > > > > > > >> > > >>>>>>>   required uint64 request_id = 2;
>> > > > > > > > >> > > >>>>>>>   optional bool allow_reconnect = 3 [default
>> =
>> > > > true];
>> > > > > > > > >> > > >>>>>>> }
>> > > > > > > > >> > > >>>>>>> ```
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>> - Add ``allow_reconnect`` to
>> > > > ``CommandCloseConsumer``
>> > > > > > > > >> > > >>>>>>> ```java
>> > > > > > > > >> > > >>>>>>> **Before**
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>> message CommandCloseConsumer {
>> > > > > > > > >> > > >>>>>>>   required uint64 consumer_id = 1;
>> > > > > > > > >> > > >>>>>>>   required uint64 request_id = 2;
>> > > > > > > > >> > > >>>>>>> }
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>> **After**
>> > > > > > > > >> > > >>>>>>> message CommandCloseConsumer {
>> > > > > > > > >> > > >>>>>>>   required uint64 consumer_id = 1;
>> > > > > > > > >> > > >>>>>>>   required uint64 request_id = 2;
>> > > > > > > > >> > > >>>>>>>   optional bool allow_reconnect = 3 [default
>> =
>> > > > true];
>> > > > > > > > >> > > >>>>>>> }
>> > > > > > > > >> > > >>>>>>> ```
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>> ## Implementation
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>> ### ClientCnx - Producer:
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>> **Before**
>> > > > > > > > >> > > >>>>>>> ```java
>> > > > > > > > >> > > >>>>>>>   @Override
>> > > > > > > > >> > > >>>>>>>   protected void
>> > > > > > handleCloseProducer(CommandCloseProducer
>> > > > > > > > >> > > >>> closeProducer)
>> > > > > > > > >> > > >>>>>>> {
>> > > > > > > > >> > > >>>>>>>       log.info("[{}] Broker notification of
>> > Closed
>> > > > > > > producer:
>> > > > > > > > >> {}",
>> > > > > > > > >> > > >>>>>>> remoteAddress,
>> closeProducer.getProducerId());
>> > > > > > > > >> > > >>>>>>>       final long producerId =
>> > > > > > > closeProducer.getProducerId();
>> > > > > > > > >> > > >>>>>>>       ProducerImpl<?> producer =
>> > > > > > > producers.get(producerId);
>> > > > > > > > >> > > >>>>>>>       if (producer != null) {
>> > > > > > > > >> > > >>>>>>>           producer.connectionClosed(this);
>> > > > > > > > >> > > >>>>>>>       } else {
>> > > > > > > > >> > > >>>>>>>           log.warn("Producer with id {} not
>> > found
>> > > > while
>> > > > > > > > >> closing
>> > > > > > > > >> > > >>> producer
>> > > > > > > > >> > > >>>>>>> ", producerId);
>> > > > > > > > >> > > >>>>>>>       }
>> > > > > > > > >> > > >>>>>>>   }
>> > > > > > > > >> > > >>>>>>> ```
>> > > > > > > > >> > > >>>>>>> After:
>> > > > > > > > >> > > >>>>>>> ```java
>> > > > > > > > >> > > >>>>>>>  @Override
>> > > > > > > > >> > > >>>>>>>   protected void
>> > > > > > handleCloseProducer(CommandCloseProducer
>> > > > > > > > >> > > >>> closeProducer)
>> > > > > > > > >> > > >>>>>>> {
>> > > > > > > > >> > > >>>>>>>       log.info("[{}] Broker notification of
>> > Closed
>> > > > > > > producer:
>> > > > > > > > >> {}",
>> > > > > > > > >> > > >>>>>>> remoteAddress,
>> closeProducer.getProducerId());
>> > > > > > > > >> > > >>>>>>>       final long producerId =
>> > > > > > > closeProducer.getProducerId();
>> > > > > > > > >> > > >>>>>>>       ProducerImpl<?> producer =
>> > > > > > > producers.get(producerId);
>> > > > > > > > >> > > >>>>>>>       if (producer != null) {
>> > > > > > > > >> > > >>>>>>>           if
>> (closeProducer.isAllowReconnect) {
>> > > > > > > > >> > > >>>>>>>
>>  producer.connectionClosed(this);
>> > > > > > > > >> > > >>>>>>>           } else {
>> > > > > > > > >> > > >>>>>>>               producer.closeAsync();
>> > > > > > > > >> > > >>>>>>>           }
>> > > > > > > > >> > > >>>>>>>       } else {
>> > > > > > > > >> > > >>>>>>>           log.warn("Producer with id {} not
>> > found
>> > > > while
>> > > > > > > > >> closing
>> > > > > > > > >> > > >>> producer
>> > > > > > > > >> > > >>>>>>> ", producerId);
>> > > > > > > > >> > > >>>>>>>       }
>> > > > > > > > >> > > >>>>>>>   }
>> > > > > > > > >> > > >>>>>>> ```
>> > > > > > > > >> > > >>>>>>> ### ClientCnx - Consumer:
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>> **Before**
>> > > > > > > > >> > > >>>>>>> ```java
>> > > > > > > > >> > > >>>>>>>   @Override
>> > > > > > > > >> > > >>>>>>>   protected void
>> > > > > > handleCloseConsumer(CommandCloseConsumer
>> > > > > > > > >> > > >>> closeConsumer)
>> > > > > > > > >> > > >>>>>>> {
>> > > > > > > > >> > > >>>>>>>       log.info("[{}] Broker notification of
>> > Closed
>> > > > > > > consumer:
>> > > > > > > > >> {}",
>> > > > > > > > >> > > >>>>>>> remoteAddress,
>> closeConsumer.getConsumerId());
>> > > > > > > > >> > > >>>>>>>       final long consumerId =
>> > > > > > > closeConsumer.getConsumerId();
>> > > > > > > > >> > > >>>>>>>       ConsumerImpl<?> consumer =
>> > > > > > > consumers.get(consumerId);
>> > > > > > > > >> > > >>>>>>>       if (consumer != null) {
>> > > > > > > > >> > > >>>>>>>           consumer.connectionClosed(this);
>> > > > > > > > >> > > >>>>>>>       } else {
>> > > > > > > > >> > > >>>>>>>           log.warn("Consumer with id {} not
>> > found
>> > > > while
>> > > > > > > > >> closing
>> > > > > > > > >> > > >>> consumer
>> > > > > > > > >> > > >>>>>>> ", consumerId);
>> > > > > > > > >> > > >>>>>>>       }
>> > > > > > > > >> > > >>>>>>>   }
>> > > > > > > > >> > > >>>>>>> ```
>> > > > > > > > >> > > >>>>>>> **After**
>> > > > > > > > >> > > >>>>>>> ```java
>> > > > > > > > >> > > >>>>>>>   @Override
>> > > > > > > > >> > > >>>>>>>   protected void
>> > > > > > handleCloseConsumer(CommandCloseConsumer
>> > > > > > > > >> > > >>> closeConsumer)
>> > > > > > > > >> > > >>>>>>> {
>> > > > > > > > >> > > >>>>>>>       log.info("[{}] Broker notification of
>> > Closed
>> > > > > > > consumer:
>> > > > > > > > >> {}",
>> > > > > > > > >> > > >>>>>>> remoteAddress,
>> closeConsumer.getConsumerId());
>> > > > > > > > >> > > >>>>>>>       final long consumerId =
>> > > > > > > closeConsumer.getConsumerId();
>> > > > > > > > >> > > >>>>>>>       ConsumerImpl<?> consumer =
>> > > > > > > consumers.get(consumerId);
>> > > > > > > > >> > > >>>>>>>       if (consumer != null) {
>> > > > > > > > >> > > >>>>>>>           if
>> (closeConsumer.isAllowReconnect) {
>> > > > > > > > >> > > >>>>>>>
>>  consumer.connectionClosed(this);
>> > > > > > > > >> > > >>>>>>>           } else {
>> > > > > > > > >> > > >>>>>>>               consumer.closeAsync();
>> > > > > > > > >> > > >>>>>>>           }
>> > > > > > > > >> > > >>>>>>>       } else {
>> > > > > > > > >> > > >>>>>>>           log.warn("Consumer with id {} not
>> > found
>> > > > while
>> > > > > > > > >> closing
>> > > > > > > > >> > > >>> consumer
>> > > > > > > > >> > > >>>>>>> ", consumerId);
>> > > > > > > > >> > > >>>>>>>       }
>> > > > > > > > >> > > >>>>>>>   }
>> > > > > > > > >> > > >>>>>>> ```
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>> ## Reject Alternatives
>> > > > > > > > >> > > >>>>>>>
>> > > > > > > > >> > > >>>>>>> none.
>> > > > > > > > >> > > >>>>>
>> > > > > > > > >> > > >>>>>
>> > > > > > > > >> > > >>>>>
>> > > > > > > > >> > > >>>>> --
>> > > > > > > > >> > > >>>>> Zike Yang
>> > > > > > > > >> > > >>>
>> > > > > > > > >> > > >>>
>> > > > > > > > >> > >
>> > > > > > > > >>
>> > > > > > > > >
>> > > > > > >
>> > > > > >
>> > > >
>> >
>>
>

Re: [DISCUSS] PIP-139 : Support Broker send command to real close producer/consumer.

Posted by guo jiwei <te...@apache.org>.
Hi  Penghui,
   After test, we can use #12136 to stop the replicator.


Regards
Jiwei Guo (Tboy)


On Sat, Mar 5, 2022 at 5:31 PM PengHui Li <pe...@apache.org> wrote:

> > Great point. I was focused on deleting namespaces and missed the case
> where the user wants to delete a topic from a replicated namespace. I
> agree that we should make it possible to delete these topics without
> removing the namespace replication.
>
> Oh, sorry. I thought again, after
> https://github.com/apache/pulsar/pull/12136
> introduced topic level geo-replication configuration, users can disable for
> a topic even if the namespace enabled geo-replication.
>
> And topic level policy is introduced in 2.6.0. I think It's time to make it
> one of
> the features turned on by default.
>
> So, we can follow the steps that stop the replication first and then delete
> the
> topic from clusters.
>
> > Note that it is already possible
> to delete these topics by deleting with force or by configuring
> inactive topic deletion.
>
> Hmm, I think both of them do not work for now, because the producer of
> the replicator will always be active.
>
> > Considering users can delete replicated topics by deleting with force,
> are you saying we need to provide a non-force way to delete these
> topics?
>
> Yes, if the topic doesn't have users' producers/consumers, users don't need
> to force delete the topic, it's a normal deletion.
>
>
> ----------------------------------------------------------------------------------
> @mattison @jiwei could you please check if we can use
> https://github.com/apache/pulsar/pull/12136
> to stop the geo-replication first, and then delete the topic?
> If it works, we don't need to change the protocol.
>
> Thanks,
> Penghui
>
>
>
> On Thu, Mar 3, 2022 at 7:12 AM Michael Marshall <mm...@apache.org>
> wrote:
>
> > > The geo-replication's configuration can be centralized by using
> > > one configuration store. But that doesn't change anything, we should
> > > provide the same behavior for both centralized and decentralized
> > > configuration store.
> >
> > Perhaps I misused the word decentralized. I meant that the challenge
> > comes because geo-replication configurations for clusters are
> > independent from each other.
> >
> > > The current challenge is
> > > users usually set replicated clusters for a namespace, if remove
> > > the replication configuration, the entire namespace will be affected.
> > > We have supported setting the replicated configuration for a topic [1],
> > > only for 2.10.0 or later.
> >
> > Great point. I was focused on deleting namespaces and missed the case
> > where the user wants to delete a topic from a replicated namespace. I
> > agree that we should make it possible to delete these topics without
> > removing the namespace replication. Note that it is already possible
> > to delete these topics by deleting with force or by configuring
> > inactive topic deletion.
> >
> > > we should provided steps for how to delete an unused replicated
> > > topic and what are the effects of not removing properly
> >
> > I agree, and I think documentation coupled with helpful error logs
> > will be very helpful here.
> >
> > Considering users can delete replicated topics by deleting with force,
> > are you saying we need to provide a non-force way to delete these
> > topics?
> >
> > If we want to provide a path to normal deletion, we already have an
> > `isRemote` flag on producers, and the inactive topic deletion code
> > uses this flag to determine if there are non-replication producers
> > connected to a topic. We could modify the deletion logic for a global
> > topic so that it can be deleted as long as the only producers
> > connected are remote producers. My main concern is that normal
> > deletion could allow users to miss the nuance that they must also
> > delete the topic in the other cluster. This might be an overly
> > cautious concern, though.
> >
> > Thanks,
> > Michael
> >
> >
> >
> > On Tue, Mar 1, 2022 at 1:55 AM PengHui Li <pe...@apache.org> wrote:
> > >
> > > > I agree with you that it'd be nice to provide the same deletion
> > > behavior. However, because geo-replication's configuration is
> > > decentralized, I think namespace or topic deletion is more complicated
> > > than unreplicated deletion. Note that users cannot currently delete
> > > namespaces that are configured with remote replication clusters.
> > >
> > > The geo-replication's configuration can be centralized by using
> > > one configuration store. But that doesn't change anything, we should
> > > provide the same behavior for both centralized and decentralized
> > > configuration store.
> > >
> > > > Note that while a user cannot explicitly delete a replicated topic,
> > > they can remove the replication configuration in cluster A and cluster
> > > B, and then they are left with unreplicated namespaces and topics,
> > > which can be deleted.
> > >
> > > Yes, stopping the replication first is a more elegant approach,
> > > and it is also very easy to understand for users. The current challenge
> > is
> > > users usually set replicated clusters for a namespace, if remove
> > > the replication configuration, the entire namespace will be affected.
> > > We have supported setting the replicated configuration for a topic [1],
> > > only for 2.10.0 or later.
> > >
> > > > In this case, the specific problem stems from a remote cluster
> > > targeting the local cluster. If the user only deletes the local topic,
> > > without removing the replication configuration, I don't think it
> > > should be surprising when the topic gets recreated (assuming auto
> > > topic creation is enabled).
> > >
> > > From the user's perspective, they don't have any producers/consumers,
> > > If the topic only has the producer of the replicator connect to the
> > remote
> > > topic
> > > each other, in this case, I think we should allow users to delete the
> > > topic.
> > > Users will be surprised in this case, because they are not able to
> delete
> > > an unused topic, they are not able to change the replication
> > configuration
> > > for
> > > a namespace.
> > >
> > > > In my view, I think it's okay to make it harder to delete
> > > geo-replicated namespaces or topics than local namespaces or topics.
> > > Failing requests with a good, informative error message protects
> > > users. Additionally, if users are able to delete a namespace or topic
> > > by force, that gives them a work around--although we should document
> > > the limitations of this workaround.
> > >
> > > I agree with it's okay to make it harder to delete a replicated topic,
> so
> > > for
> > > a replicated topic, users should delete the topic in each cluster, we
> > should
> > > provided steps for how to delete an unused replicated topic and what
> are
> > > the effects of not removing properly, for example, if a topic has 2
> > > clusters, cluster-a
> > > and cluster-b, but only delete the topic from cluster-a, the topic will
> > be
> > > created
> > > again if cluster-b write some new data into this topic, depends on the
> > topic
> > > auto-creation.
> > >
> > > If we only support set replicated cluster for a topic, not a namespace,
> > > I like the approach that stops the replication first and then deletes
> the
> > > topic.
> > >
> > > [1] https://github.com/apache/pulsar/pull/12136
> > >
> > > Thanks,
> > > Penghui
> > >
> > > On Tue, Mar 1, 2022 at 2:27 PM Michael Marshall <mm...@apache.org>
> > > wrote:
> > >
> > > > > Either replicated topic or non-replicated topic, we should provide
> > the
> > > > > same topic deletion behavior, the topic can be deleted if no active
> > > > > user's producers/consumers, should not be affected by the Pulsar
> > > > > internal producers/consumers.
> > > >
> > > > I agree with you that it'd be nice to provide the same deletion
> > > > behavior. However, because geo-replication's configuration is
> > > > decentralized, I think namespace or topic deletion is more
> complicated
> > > > than unreplicated deletion. Note that users cannot currently delete
> > > > namespaces that are configured with remote replication clusters.
> > > >
> > > > One challenge I see is the configuration in the remote cluster. If
> > > > cluster A is replicated to cluster B for namespace C, deleting a
> topic
> > > > in cluster B does not affect the configuration in cluster A. If we
> > > > make it easy to delete the topic in cluster B, we need to decide how
> > > > to handle failed replication in cluster A. In the same way, deleting
> > > > namespace C in cluster B doesn't delete it from cluster A. For
> > > > example, when should cluster A stop attempting to create producers to
> > > > cluster B for the deleted namespace/topic and how do we inform
> > > > operators that the replication is no longer working as configured. We
> > > > avoid several problems by failing the initial user request.
> > > >
> > > > Note that while a user cannot explicitly delete a replicated topic,
> > > > they can remove the replication configuration in cluster A and
> cluster
> > > > B, and then they are left with unreplicated namespaces and topics,
> > > > which can be deleted.
> > > >
> > > > > But for now, for a replicated topic, users are not able to delete
> it
> > > > because
> > > > > of the internal active producer, adding force delete option can
> > delete
> > > > > the topic, but it will create again if users enabled topic
> > auto-creation.
> > > >
> > > > In this case, the specific problem stems from a remote cluster
> > > > targeting the local cluster. If the user only deletes the local
> topic,
> > > > without removing the replication configuration, I don't think it
> > > > should be surprising when the topic gets recreated (assuming auto
> > > > topic creation is enabled).
> > > >
> > > > In my view, I think it's okay to make it harder to delete
> > > > geo-replicated namespaces or topics than local namespaces or topics.
> > > > Failing requests with a good, informative error message protects
> > > > users. Additionally, if users are able to delete a namespace or topic
> > > > by force, that gives them a work around--although we should document
> > > > the limitations of this workaround.
> > > >
> > > > I am not experienced with geo-replication, so please let me know if
> > > > any of my analysis doesn't align with the actual design.
> > > >
> > > > Thanks,
> > > > Michael
> > > >
> > > >
> > > > On Sun, Feb 27, 2022 at 8:42 PM guo jiwei <te...@apache.org>
> > wrote:
> > > > >
> > > > > I have added the geo-replicator topic case and updated the PIP.
> > > > >
> > > > >
> > > > > Regards
> > > > > Jiwei Guo (Tboy)
> > > > >
> > > > >
> > > > > On Sun, Feb 27, 2022 at 1:00 PM PengHui Li <pe...@apache.org>
> > wrote:
> > > > >
> > > > > > > To me, the main question is whether we create a custom error or
> > > > expand
> > > > > > the `CommandCloseProducer` command. I lean towards adding an
> error
> > > > > > because it will automatically be backward compatible based on the
> > way
> > > > > > the client determines which errors are retriable. Although, I
> don't
> > > > > > have a strong opinion.
> > > > > >
> > > > > > I have the same opinion as you. But it looks like not able to fix
> > the
> > > > > > geo-replication topic deletion.
> > > > > >
> > > > > > Either replicated topic or non-replicated topic, we should
> provide
> > the
> > > > > > same topic deletion behavior, the topic can be deleted if no
> active
> > > > > > user's producers/consumers, should not be affected by the Pulsar
> > > > > > internal producers/consumers.
> > > > > >
> > > > > > But for now, for a replicated topic, users are not able to delete
> > it
> > > > > > because
> > > > > > of the internal active producer, adding force delete option can
> > delete
> > > > > > the topic, but it will create again if users enabled topic
> > > > auto-creation.
> > > > > >
> > > > > > I think in this case, it seems we don't have a chance to give the
> > > > producer
> > > > > > an exception, because the reconnection will connect to a new
> topic
> > only
> > > > > > with
> > > > > > the same topic name as before.
> > > > > >
> > > > > > Thanks,
> > > > > > Penghui
> > > > > >
> > > > > > On Sat, Feb 26, 2022 at 2:24 PM Michael Marshall <
> > mmarshall@apache.org
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > > We need to determine the overall plan, not the implementation
> > > > > > > > at this moment.
> > > > > > >
> > > > > > > Great point, I agree.
> > > > > > >
> > > > > > > > Looks like we need an option in the
> > > > > > > > `CommandCloseProducer` to avoid the replication producer
> > > > > > > > reconnecting again.
> > > > > > >
> > > > > > > This is an important point. One of the primary requirements for
> > > > > > > geo-replication must also be that a replicated namespace or
> > topic can
> > > > > > > be deleted and recreated without any problems or leaks in the
> > broker.
> > > > > > > That requires that the producer is cleaned up properly when the
> > > > > > > namespace is deleted. I think we could achieve that with a new
> > server
> > > > > > > error or an expansion of the `CommandCloseProducer` command.
> > > > > > >
> > > > > > > To me, the main question is whether we create a custom error or
> > > > expand
> > > > > > > the `CommandCloseProducer` command. I lean towards adding an
> > error
> > > > > > > because it will automatically be backward compatible based on
> > the way
> > > > > > > the client determines which errors are retriable. Although, I
> > don't
> > > > > > > have a strong opinion.
> > > > > > >
> > > > > > > The key code paths that we need to consider are topic lookup
> and
> > > > topic
> > > > > > > creation. Both the http admin endpoint and the binary protocol
> > > > > > > endpoint interact with these code paths, and we should fail
> > requests
> > > > > > > for namespaces which have the "deleted" field set to true.
> > > > > > >
> > > > > > > In looking at the code, we could modify the logic in
> > > > > > > `PulsarWebResource#checkLocalOrGetPeerReplicationCluster`. That
> > > > method
> > > > > > > is always called for topic lookups via HTTP or via the Pulsar
> > > > > > > binary protocol. If a namespace doesn't exist, that method
> > currently
> > > > > > > returns a NOT_FOUND exception. In the binary protocol case,
> that
> > > > > > > exception is converted to a `ServerError.MetadataError`. We
> could
> > > > > > > modify the method like I do here [0], or we could add another
> > case
> > > > and
> > > > > > > return a custom error message.
> > > > > > >
> > > > > > > We'd also need to prevent topic creation via the admin endpoint
> > and
> > > > > > > the auto topic creation via the binary protocol. I believe this
> > will
> > > > > > > be just as easy as the lookup case, but I haven't confirmed.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Michael
> > > > > > >
> > > > > > > [0]
> > > > > > >
> > > > > >
> > > >
> >
> https://github.com/michaeljmarshall/pulsar/commit/93611a741af163587f79b88bc1c9f1acc7953512
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Feb 25, 2022 at 3:45 AM PengHui Li <penghui@apache.org
> >
> > > > wrote:
> > > > > > > >
> > > > > > > > The replicated topic deletion also has a similar problem.
> > > > > > > > If the topic auto-creation is enabled, we are not able to
> > delete
> > > > > > > > a replicated topic, because the producer used for replication
> > > > > > > > will try to reconnect. Looks like we need an option in the
> > > > > > > > `CommandCloseProducer` to avoid the replication producer
> > > > > > > > reconnecting again.
> > > > > > > >
> > > > > > > > Penghui
> > > > > > > >
> > > > > > > > On Fri, Feb 25, 2022 at 5:26 PM PengHui Li <
> penghui@apache.org
> > >
> > > > wrote:
> > > > > > > >
> > > > > > > > > > Penghui, are you suggesting that we implement the
> > > > namespace/tenant
> > > > > > > > > terminated logic after completing this PIP?
> > > > > > > > >
> > > > > > > > > I'm ok with both being implemented together or separated.
> > > > > > > > > We need to determine the overall plan, not the
> implementation
> > > > > > > > > at this moment.
> > > > > > > > >
> > > > > > > > > > For the sake of discussion, if we implement the namespace
> > > > > > terminated
> > > > > > > > > logic first, we could fulfill the underlying requirements
> for
> > > > this
> > > > > > PIP
> > > > > > > > > by returning a new non-retriable error response when a
> client
> > > > tries
> > > > > > to
> > > > > > > > > connect a producer or a consumer to a topic in a namespace
> > that
> > > > is
> > > > > > > > > "terminated". If we do add the "namespace terminated"
> > feature,
> > > > we'll
> > > > > > > > > need to add a non-retriable exception for this case,
> anyway.
> > The
> > > > main
> > > > > > > > > advantage here is that we'd only need one expansion of the
> > > > protobuf
> > > > > > > > > instead of two. The downside is that the protocol for
> > connected
> > > > > > > > > clients has a couple more roundtrips. The broker would
> > disconnect
> > > > > > > > > connected clients and then fail their reconnection attempt
> > with a
> > > > > > > > > non-retriable error.
> > > > > > > > >
> > > > > > > > > Regarding the namespace "terminated" concept, I just
> noticed
> > > > that we
> > > > > > > > > already have a "deleted" field in a namespace's policies
> [0].
> > > > There
> > > > > > is
> > > > > > > > > even a comment that says:
> > > > > > > > >
> > > > > > > > > Yes, if we can terminate the namespace first, we should
> > return
> > > > > > > “namespace
> > > > > > > > > not found”
> > > > > > > > > if the producers and consumers try to reconnect, the client
> > > > should
> > > > > > stop
> > > > > > > > > reconnecting
> > > > > > > > > after getting this error. Looks like following this way, we
> > don't
> > > > > > need
> > > > > > > to
> > > > > > > > > introduce any protocol changes anymore.
> > > > > > > > >
> > > > > > > > > > As I think about it more, I no longer think "terminated"
> > is the
> > > > > > right
> > > > > > > > > term for what I proposed above. Our goal is to briefly
> > prevent
> > > > any
> > > > > > > > > topic creation to ensure we can delete all sub resources
> for
> > a
> > > > > > > > > namespace. On the other hand, a terminated topic isn't
> > > > necessarily
> > > > > > > > > short lived. If we want to apply the "terminated" term
> > > > unequivocally
> > > > > > > > > to both topics and namespaces, I think a terminated
> namespace
> > > > would
> > > > > > > > > need to be a namespace where all topics are in terminated
> > state
> > > > and
> > > > > > no
> > > > > > > > > additional topics could be created. That's not the feature
> > we're
> > > > > > > > > discussing here, though. Deleted seems like the right term
> > to me,
> > > > > > > > > especially since we're already using it to prevent a race
> > > > condition
> > > > > > > > >
> > > > > > > > > Yes, topic termination has different meanings, deletion
> > should
> > > > be the
> > > > > > > right
> > > > > > > > > term here.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Penghui
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Feb 25, 2022 at 12:19 PM Michael Marshall <
> > > > > > > mmarshall@apache.org>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> Regarding the namespace "terminated" concept, I just
> noticed
> > > > that we
> > > > > > > > >> already have a "deleted" field in a namespace's policies
> > [0].
> > > > There
> > > > > > is
> > > > > > > > >> even a comment that says:
> > > > > > > > >>
> > > > > > > > >> > // set the policies to deleted so that somebody else
> > cannot
> > > > > > acquire
> > > > > > > > >> this namespace
> > > > > > > > >>
> > > > > > > > >> I am not familiar with this feature, but it seems like
> this
> > > > policy
> > > > > > > > >> field could be checked before creating a topic in a
> > namespace.
> > > > That
> > > > > > > > >> would remove certain races described above.
> > > > > > > > >>
> > > > > > > > >> As I think about it more, I no longer think "terminated"
> is
> > the
> > > > > > right
> > > > > > > > >> term for what I proposed above. Our goal is to briefly
> > prevent
> > > > any
> > > > > > > > >> topic creation to ensure we can delete all sub resources
> > for a
> > > > > > > > >> namespace. On the other hand, a terminated topic isn't
> > > > necessarily
> > > > > > > > >> short lived. If we want to apply the "terminated" term
> > > > unequivocally
> > > > > > > > >> to both topics and namespaces, I think a terminated
> > namespace
> > > > would
> > > > > > > > >> need to be a namespace where all topics are in terminated
> > state
> > > > and
> > > > > > no
> > > > > > > > >> additional topics could be created. That's not the feature
> > we're
> > > > > > > > >> discussing here, though. Deleted seems like the right term
> > to
> > > > me,
> > > > > > > > >> especially since we're already using it to prevent a race
> > > > condition
> > > > > > > > >> [0].
> > > > > > > > >>
> > > > > > > > >> Thanks,
> > > > > > > > >> Michael
> > > > > > > > >>
> > > > > > > > >> [0]
> > > > > > > > >>
> > > > > > >
> > > > > >
> > > >
> >
> https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/impl/NamespacesBase.java#L252-L262
> > > > > > > > >>
> > > > > > > > >> On Thu, Feb 24, 2022 at 9:58 PM Michael Marshall <
> > > > > > > mmarshall@apache.org>
> > > > > > > > >> wrote:
> > > > > > > > >> >
> > > > > > > > >> > Hi Dave,
> > > > > > > > >> >
> > > > > > > > >> > > automatically delete tenants and namespaces for not
> > > > containing
> > > > > > > topics
> > > > > > > > >> >
> > > > > > > > >> > I don't think that is what we are discussing. I agree
> > that the
> > > > > > > initial
> > > > > > > > >> > email says just that, though, which is why I asked
> above:
> > > > > > > > >> >
> > > > > > > > >> > >> When there are no user-created topics under a
> > namespace,
> > > > > > > > >> > >> Namespace should be deleted.
> > > > > > > > >> > > This is supposed to mean that the namespace should be
> > able
> > > > to be
> > > > > > > > >> > > deleted, correct?
> > > > > > > > >> >
> > > > > > > > >> > Perhaps Mattison can clarify, too.
> > > > > > > > >> >
> > > > > > > > >> > My current understanding of the context for the PIP is
> > that a
> > > > user
> > > > > > > > >> > call to delete a namespace without force can fail when a
> > > > producer
> > > > > > > > >> > reconnects to a deleted topic. The goal is to remove the
> > race
> > > > > > > > >> > condition to ensure namespace deletion can succeed.
> > > > > > > > >> >
> > > > > > > > >> > Thanks,
> > > > > > > > >> > Michael
> > > > > > > > >> >
> > > > > > > > >> > On Thu, Feb 24, 2022 at 5:21 PM Dave Fisher <
> > wave@apache.org>
> > > > > > > wrote:
> > > > > > > > >> > >
> > > > > > > > >> > > Hi -
> > > > > > > > >> > >
> > > > > > > > >> > > I hope I’m understanding what’s being discussed.
> > > > > > > > >> > >
> > > > > > > > >> > > If we are going to automatically delete tenants and
> > > > namespaces
> > > > > > for
> > > > > > > > >> not containing topics then we need to make both of these
> > > > automatic
> > > > > > > actions
> > > > > > > > >> configurable with a default to NOT do so. Otherwise we
> break
> > > > > > existing
> > > > > > > use
> > > > > > > > >> cases.
> > > > > > > > >> > >
> > > > > > > > >> > > Automatic deletion of namespaces should be
> configurable
> > at
> > > > both
> > > > > > > the
> > > > > > > > >> cluster and tenant level.
> > > > > > > > >> > >
> > > > > > > > >> > > Regards,
> > > > > > > > >> > > Dave
> > > > > > > > >> > >
> > > > > > > > >> > > > On Feb 24, 2022, at 2:25 PM, Michael Marshall <
> > > > > > > mmarshall@apache.org>
> > > > > > > > >> wrote:
> > > > > > > > >> > > >
> > > > > > > > >> > > >> The old producer/consumer should be closed after
> > > > applying the
> > > > > > > > >> changes from
> > > > > > > > >> > > >> this proposal.
> > > > > > > > >> > > >
> > > > > > > > >> > > > Penghui, are you suggesting that we implement the
> > > > > > > namespace/tenant
> > > > > > > > >> > > > terminated logic after completing this PIP?
> > > > > > > > >> > > >
> > > > > > > > >> > > > For the sake of discussion, if we implement the
> > namespace
> > > > > > > terminated
> > > > > > > > >> > > > logic first, we could fulfill the underlying
> > requirements
> > > > for
> > > > > > > this
> > > > > > > > >> PIP
> > > > > > > > >> > > > by returning a new non-retriable error response
> when a
> > > > client
> > > > > > > tries
> > > > > > > > >> to
> > > > > > > > >> > > > connect a producer or a consumer to a topic in a
> > namespace
> > > > > > that
> > > > > > > is
> > > > > > > > >> > > > "terminated". If we do add the "namespace
> terminated"
> > > > feature,
> > > > > > > we'll
> > > > > > > > >> > > > need to add a non-retriable exception for this case,
> > > > anyway.
> > > > > > The
> > > > > > > > >> main
> > > > > > > > >> > > > advantage here is that we'd only need one expansion
> > of the
> > > > > > > protobuf
> > > > > > > > >> > > > instead of two. The downside is that the protocol
> for
> > > > > > connected
> > > > > > > > >> > > > clients has a couple more roundtrips. The broker
> would
> > > > > > > disconnect
> > > > > > > > >> > > > connected clients and then fail their reconnection
> > attempt
> > > > > > with
> > > > > > > a
> > > > > > > > >> > > > non-retriable error.
> > > > > > > > >> > > >
> > > > > > > > >> > > > Thanks,
> > > > > > > > >> > > > Michael
> > > > > > > > >> > > >
> > > > > > > > >> > > > On Thu, Feb 24, 2022 at 7:11 AM PengHui Li <
> > > > > > penghui@apache.org>
> > > > > > > > >> wrote:
> > > > > > > > >> > > >>
> > > > > > > > >> > > >>> If we want to solve this problem, we need to add
> > some
> > > > sync
> > > > > > > > >> resources like
> > > > > > > > >> > > >> lock/state, I think it’s a harm for us, we don’t
> > need to
> > > > do
> > > > > > > that.
> > > > > > > > >> > > >>
> > > > > > > > >> > > >> I think we can make the namespace/tenants to the
> > inactive
> > > > > > state
> > > > > > > > >> first so
> > > > > > > > >> > > >> that we can avoid any new
> > > > > > > > >> > > >> producer/consumer connect to the topic under the
> > > > > > > namespace/tenant.
> > > > > > > > >> > > >>
> > > > > > > > >> > > >> The old producer/consumer should be closed after
> > > > applying the
> > > > > > > > >> changes from
> > > > > > > > >> > > >> this proposal.
> > > > > > > > >> > > >>
> > > > > > > > >> > > >> Thanks,
> > > > > > > > >> > > >> Penghui
> > > > > > > > >> > > >>
> > > > > > > > >> > > >> On Tue, Feb 8, 2022 at 5:47 PM mattison chao <
> > > > > > > > >> mattisonchao@gmail.com> wrote:
> > > > > > > > >> > > >>
> > > > > > > > >> > > >>>> This is supposed to mean that the namespace
> should
> > be
> > > > able
> > > > > > > to be
> > > > > > > > >> > > >>>> deleted, correct?
> > > > > > > > >> > > >>>
> > > > > > > > >> > > >>> Yes, the main background is the user doesn’t have
> an
> > > > active
> > > > > > > > >> topic. so,
> > > > > > > > >> > > >>> they want to delete the namespace.
> > > > > > > > >> > > >>>
> > > > > > > > >> > > >>>> However, I think
> > > > > > > > >> > > >>>> we might still have a race condition that could
> > make
> > > > tenant
> > > > > > > or
> > > > > > > > >> > > >>>> namespace deletion fail. Specifically, if a new
> > > > producer or
> > > > > > > > >> consumer
> > > > > > > > >> > > >>>> creates a topic after the namespace deletion has
> > > > started
> > > > > > but
> > > > > > > > >> > > >>>> before it is complete. Do you agree that the
> > underlying
> > > > > > race
> > > > > > > > >> still
> > > > > > > > >> > > >>> exists?
> > > > > > > > >> > > >>>
> > > > > > > > >> > > >>> Yes, this condition exists. I think it’s not a big
> > > > problem
> > > > > > > > >> because the
> > > > > > > > >> > > >>> user doesn’t want to use this namespace anymore.
> > > > > > > > >> > > >>> If this scenario appears, they will get an error
> and
> > > > need to
> > > > > > > > >> delete it
> > > > > > > > >> > > >>> again.
> > > > > > > > >> > > >>>
> > > > > > > > >> > > >>>> What if we expand our usage of the "terminated"
> > > > feature to
> > > > > > > apply
> > > > > > > > >> to
> > > > > > > > >> > > >>>> namespaces (and tenants)? Then, a terminated
> > namespace
> > > > can
> > > > > > > have
> > > > > > > > >> > > >>>> bundles and topics can be deleted but not created
> > > > (just as
> > > > > > a
> > > > > > > > >> terminated
> > > > > > > > >> > > >>>> topic cannot have any new messages published to
> > it).
> > > > This
> > > > > > > would
> > > > > > > > >> take
> > > > > > > > >> > > >>>> care of all topic creation race conditions. We'd
> > > > probably
> > > > > > > need
> > > > > > > > >> to add
> > > > > > > > >> > > >>>> new protobuf exceptions for this feature.
> > > > > > > > >> > > >>>
> > > > > > > > >> > > >>>
> > > > > > > > >> > > >>> If we want to solve this problem, we need to add
> > some
> > > > sync
> > > > > > > > >> resources like
> > > > > > > > >> > > >>> lock/state, I think it’s a harm for us, we don’t
> > need
> > > > to do
> > > > > > > that.
> > > > > > > > >> > > >>>
> > > > > > > > >> > > >>> Thanks for your suggestions, let me know what you
> > think.
> > > > > > > > >> > > >>>
> > > > > > > > >> > > >>> Best,
> > > > > > > > >> > > >>> Mattison
> > > > > > > > >> > > >>>
> > > > > > > > >> > > >>>> On Feb 1, 2022, at 2:26 PM, Michael Marshall <
> > > > > > > > >> mmarshall@apache.org>
> > > > > > > > >> > > >>> wrote:
> > > > > > > > >> > > >>>>
> > > > > > > > >> > > >>>> This proposal identifies an important issue that
> we
> > > > should
> > > > > > > > >> definitely
> > > > > > > > >> > > >>>> solve. I have some questions.
> > > > > > > > >> > > >>>>
> > > > > > > > >> > > >>>>> When there are no user-created topics under a
> > > > namespace,
> > > > > > > > >> > > >>>>> Namespace should be deleted.
> > > > > > > > >> > > >>>>
> > > > > > > > >> > > >>>> This is supposed to mean that the namespace
> should
> > be
> > > > able
> > > > > > > to be
> > > > > > > > >> > > >>>> deleted, correct?
> > > > > > > > >> > > >>>>
> > > > > > > > >> > > >>>>> For this reason, we need to close the system
> topic
> > > > > > > > >> reader/producer
> > > > > > > > >> > > >>>>> first, then remove the system topic. finally,
> > remove
> > > > the
> > > > > > > > >> namespace.
> > > > > > > > >> > > >>>>
> > > > > > > > >> > > >>>> I agree that expanding the protobuf CloseProducer
> > and
> > > > > > > > >> CloseConsumer
> > > > > > > > >> > > >>>> commands could be valuable here. The expansion
> > would
> > > > ensure
> > > > > > > that
> > > > > > > > >> > > >>>> producers and consumers don't attempt to
> reconnect.
> > > > > > However,
> > > > > > > I
> > > > > > > > >> think
> > > > > > > > >> > > >>>> we might still have a race condition that could
> > make
> > > > tenant
> > > > > > > or
> > > > > > > > >> > > >>>> namespace deletion fail. Specifically, if a new
> > > > producer or
> > > > > > > > >> consumer
> > > > > > > > >> > > >>>> creates a topic after the namespace deletion has
> > > > started
> > > > > > but
> > > > > > > > >> > > >>>> before it is complete. Do you agree that the
> > underlying
> > > > > > race
> > > > > > > > >> still
> > > > > > > > >> > > >>> exists?
> > > > > > > > >> > > >>>>
> > > > > > > > >> > > >>>> In my view, the fundamental problem here is that
> > > > deleting
> > > > > > > > >> certain Pulsar
> > > > > > > > >> > > >>>> resources takes time and, in a distributed
> system,
> > that
> > > > > > means
> > > > > > > > >> race
> > > > > > > > >> > > >>>> conditions.
> > > > > > > > >> > > >>>>
> > > > > > > > >> > > >>>> What if we expand our usage of the "terminated"
> > > > feature to
> > > > > > > apply
> > > > > > > > >> to
> > > > > > > > >> > > >>>> namespaces (and tenants)? Then, a terminated
> > namespace
> > > > can
> > > > > > > have
> > > > > > > > >> > > >>>> bundles and topics can be deleted but not created
> > > > (just as
> > > > > > a
> > > > > > > > >> terminated
> > > > > > > > >> > > >>>> topic cannot have any new messages published to
> > it).
> > > > This
> > > > > > > would
> > > > > > > > >> take
> > > > > > > > >> > > >>>> care of all topic creation race conditions. We'd
> > > > probably
> > > > > > > need
> > > > > > > > >> to add
> > > > > > > > >> > > >>>> new protobuf exceptions for this feature.
> > > > > > > > >> > > >>>>
> > > > > > > > >> > > >>>> Thanks,
> > > > > > > > >> > > >>>> Michael
> > > > > > > > >> > > >>>>
> > > > > > > > >> > > >>>>
> > > > > > > > >> > > >>>> On Sat, Jan 29, 2022 at 7:25 PM Zike Yang
> > > > > > > > >> > > >>>> <zk...@streamnative.io.invalid> wrote:
> > > > > > > > >> > > >>>>>
> > > > > > > > >> > > >>>>> +1
> > > > > > > > >> > > >>>>>
> > > > > > > > >> > > >>>>>
> > > > > > > > >> > > >>>>> Thanks,
> > > > > > > > >> > > >>>>> Zike
> > > > > > > > >> > > >>>>>
> > > > > > > > >> > > >>>>> On Sat, Jan 29, 2022 at 12:30 PM guo jiwei <
> > > > > > > > >> technoboy@apache.org>
> > > > > > > > >> > > >>> wrote:
> > > > > > > > >> > > >>>>>>
> > > > > > > > >> > > >>>>>> Hi
> > > > > > > > >> > > >>>>>> The PIP link :
> > > > > > > https://github.com/apache/pulsar/issues/13989
> > > > > > > > >> > > >>>>>>
> > > > > > > > >> > > >>>>>> Regards
> > > > > > > > >> > > >>>>>> Jiwei Guo (Tboy)
> > > > > > > > >> > > >>>>>>
> > > > > > > > >> > > >>>>>>
> > > > > > > > >> > > >>>>>> On Sat, Jan 29, 2022 at 11:46 AM mattison chao
> <
> > > > > > > > >> mattisonchao@gmail.com
> > > > > > > > >> > > >>>>
> > > > > > > > >> > > >>>>>> wrote:
> > > > > > > > >> > > >>>>>>
> > > > > > > > >> > > >>>>>>> Hello everyone,
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>> I want to start a discussion about PIP-139 :
> > Support
> > > > > > > Broker
> > > > > > > > >> send
> > > > > > > > >> > > >>> command
> > > > > > > > >> > > >>>>>>> to real close producer/consumer.
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>> This is the PIP document
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>> https://github.com/apache/pulsar/issues/13989
> <
> > > > > > > > >> > > >>>>>>> https://github.com/apache/pulsar/issues/13979
> >
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>> Please check it out and feel free to share
> your
> > > > > > thoughts.
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>> Best,
> > > > > > > > >> > > >>>>>>> Mattison
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>> ———————— Pasted below for quoting convenience.
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>> Relation pull request:  #13337
> > > > > > > > >> > > >>>>>>> Authors: @Technoboy-  @mattisonchao
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>> ## Motivation
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>> Before we discuss this pip, I'd like to
> > supplement
> > > > some
> > > > > > > > >> context to
> > > > > > > > >> > > >>> help
> > > > > > > > >> > > >>>>>>> contributors who don't want to read the
> original
> > > > pull
> > > > > > > request.
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>>> When there are no user-created topics under a
> > > > > > namespace,
> > > > > > > > >> Namespace
> > > > > > > > >> > > >>>>>>> should be deleted. But currently, the system
> > topic
> > > > > > existed
> > > > > > > > >> and the
> > > > > > > > >> > > >>>>>>> reader/producer could auto-create the system
> > which
> > > > may
> > > > > > > cause
> > > > > > > > >> the
> > > > > > > > >> > > >>> namespace
> > > > > > > > >> > > >>>>>>> deletion to fail.
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>> For this reason, we need to close the system
> > topic
> > > > > > > > >> reader/producer
> > > > > > > > >> > > >>> first,
> > > > > > > > >> > > >>>>>>> then remove the system topic. finally, remove
> > the
> > > > > > > namespace.
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>> Following this way, we first want to use
> > > > ``terminate``
> > > > > > to
> > > > > > > > >> solve this
> > > > > > > > >> > > >>>>>>> problem. then we found producers can
> > disconnect, but
> > > > > > > > >> consumers are
> > > > > > > > >> > > >>> still
> > > > > > > > >> > > >>>>>>> alive. So, another PR #13960 wants to add
> > consumers'
> > > > > > > closing
> > > > > > > > >> logic.
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>> After #13960, all things look good, but
> another
> > > > problem
> > > > > > > > >> appears. that
> > > > > > > > >> > > >>> is
> > > > > > > > >> > > >>>>>>> we need to wait until consumers completely
> > consume
> > > > all
> > > > > > > > >> messages (this
> > > > > > > > >> > > >>> may
> > > > > > > > >> > > >>>>>>> make terminate topic so long and the operation
> > > > > > > timeout)then
> > > > > > > > >> get
> > > > > > > > >> > > >>>>>>> ``reachedEndOfTopic``. the relative code here
> :
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>
> > > > > > > > >>
> > > > > > >
> > > > > >
> > > >
> >
> https://github.com/apache/pulsar/blob/07ef9231db8b844586b9217ee2d59237eb9c54b7/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/CompactorSubscription.java#L102-L106
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>> In the #13337 case, we need to force close
> > consumers
> > > > > > > > >> immediately. So
> > > > > > > > >> > > >>> we
> > > > > > > > >> > > >>>>>>> write this PIP to discuss another way to
> resolve
> > > > this
> > > > > > > problem.
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>> ## Goal
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>> We can add a new field(`allow_reconnect`) in
> > command
> > > > > > > > >> > > >>>>>>> ``CommandCloseProducer``/
> > ``CommandCloseConsumer``
> > > > to
> > > > > > > close
> > > > > > > > >> > > >>>>>>> producer/consumers immediately.
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>> ## API Changes
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>> - Add ``allow_reconnect`` to
> > > > ``CommandCloseProducer``;
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>> ```java
> > > > > > > > >> > > >>>>>>> **Before**
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>> message CommandCloseProducer {
> > > > > > > > >> > > >>>>>>>   required uint64 producer_id = 1;
> > > > > > > > >> > > >>>>>>>   required uint64 request_id = 2;
> > > > > > > > >> > > >>>>>>> }
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>> **After**
> > > > > > > > >> > > >>>>>>> message CommandCloseProducer {
> > > > > > > > >> > > >>>>>>>   required uint64 producer_id = 1;
> > > > > > > > >> > > >>>>>>>   required uint64 request_id = 2;
> > > > > > > > >> > > >>>>>>>   optional bool allow_reconnect = 3 [default =
> > > > true];
> > > > > > > > >> > > >>>>>>> }
> > > > > > > > >> > > >>>>>>> ```
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>> - Add ``allow_reconnect`` to
> > > > ``CommandCloseConsumer``
> > > > > > > > >> > > >>>>>>> ```java
> > > > > > > > >> > > >>>>>>> **Before**
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>> message CommandCloseConsumer {
> > > > > > > > >> > > >>>>>>>   required uint64 consumer_id = 1;
> > > > > > > > >> > > >>>>>>>   required uint64 request_id = 2;
> > > > > > > > >> > > >>>>>>> }
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>> **After**
> > > > > > > > >> > > >>>>>>> message CommandCloseConsumer {
> > > > > > > > >> > > >>>>>>>   required uint64 consumer_id = 1;
> > > > > > > > >> > > >>>>>>>   required uint64 request_id = 2;
> > > > > > > > >> > > >>>>>>>   optional bool allow_reconnect = 3 [default =
> > > > true];
> > > > > > > > >> > > >>>>>>> }
> > > > > > > > >> > > >>>>>>> ```
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>> ## Implementation
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>> ### ClientCnx - Producer:
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>> **Before**
> > > > > > > > >> > > >>>>>>> ```java
> > > > > > > > >> > > >>>>>>>   @Override
> > > > > > > > >> > > >>>>>>>   protected void
> > > > > > handleCloseProducer(CommandCloseProducer
> > > > > > > > >> > > >>> closeProducer)
> > > > > > > > >> > > >>>>>>> {
> > > > > > > > >> > > >>>>>>>       log.info("[{}] Broker notification of
> > Closed
> > > > > > > producer:
> > > > > > > > >> {}",
> > > > > > > > >> > > >>>>>>> remoteAddress, closeProducer.getProducerId());
> > > > > > > > >> > > >>>>>>>       final long producerId =
> > > > > > > closeProducer.getProducerId();
> > > > > > > > >> > > >>>>>>>       ProducerImpl<?> producer =
> > > > > > > producers.get(producerId);
> > > > > > > > >> > > >>>>>>>       if (producer != null) {
> > > > > > > > >> > > >>>>>>>           producer.connectionClosed(this);
> > > > > > > > >> > > >>>>>>>       } else {
> > > > > > > > >> > > >>>>>>>           log.warn("Producer with id {} not
> > found
> > > > while
> > > > > > > > >> closing
> > > > > > > > >> > > >>> producer
> > > > > > > > >> > > >>>>>>> ", producerId);
> > > > > > > > >> > > >>>>>>>       }
> > > > > > > > >> > > >>>>>>>   }
> > > > > > > > >> > > >>>>>>> ```
> > > > > > > > >> > > >>>>>>> After:
> > > > > > > > >> > > >>>>>>> ```java
> > > > > > > > >> > > >>>>>>>  @Override
> > > > > > > > >> > > >>>>>>>   protected void
> > > > > > handleCloseProducer(CommandCloseProducer
> > > > > > > > >> > > >>> closeProducer)
> > > > > > > > >> > > >>>>>>> {
> > > > > > > > >> > > >>>>>>>       log.info("[{}] Broker notification of
> > Closed
> > > > > > > producer:
> > > > > > > > >> {}",
> > > > > > > > >> > > >>>>>>> remoteAddress, closeProducer.getProducerId());
> > > > > > > > >> > > >>>>>>>       final long producerId =
> > > > > > > closeProducer.getProducerId();
> > > > > > > > >> > > >>>>>>>       ProducerImpl<?> producer =
> > > > > > > producers.get(producerId);
> > > > > > > > >> > > >>>>>>>       if (producer != null) {
> > > > > > > > >> > > >>>>>>>           if (closeProducer.isAllowReconnect)
> {
> > > > > > > > >> > > >>>>>>>               producer.connectionClosed(this);
> > > > > > > > >> > > >>>>>>>           } else {
> > > > > > > > >> > > >>>>>>>               producer.closeAsync();
> > > > > > > > >> > > >>>>>>>           }
> > > > > > > > >> > > >>>>>>>       } else {
> > > > > > > > >> > > >>>>>>>           log.warn("Producer with id {} not
> > found
> > > > while
> > > > > > > > >> closing
> > > > > > > > >> > > >>> producer
> > > > > > > > >> > > >>>>>>> ", producerId);
> > > > > > > > >> > > >>>>>>>       }
> > > > > > > > >> > > >>>>>>>   }
> > > > > > > > >> > > >>>>>>> ```
> > > > > > > > >> > > >>>>>>> ### ClientCnx - Consumer:
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>> **Before**
> > > > > > > > >> > > >>>>>>> ```java
> > > > > > > > >> > > >>>>>>>   @Override
> > > > > > > > >> > > >>>>>>>   protected void
> > > > > > handleCloseConsumer(CommandCloseConsumer
> > > > > > > > >> > > >>> closeConsumer)
> > > > > > > > >> > > >>>>>>> {
> > > > > > > > >> > > >>>>>>>       log.info("[{}] Broker notification of
> > Closed
> > > > > > > consumer:
> > > > > > > > >> {}",
> > > > > > > > >> > > >>>>>>> remoteAddress, closeConsumer.getConsumerId());
> > > > > > > > >> > > >>>>>>>       final long consumerId =
> > > > > > > closeConsumer.getConsumerId();
> > > > > > > > >> > > >>>>>>>       ConsumerImpl<?> consumer =
> > > > > > > consumers.get(consumerId);
> > > > > > > > >> > > >>>>>>>       if (consumer != null) {
> > > > > > > > >> > > >>>>>>>           consumer.connectionClosed(this);
> > > > > > > > >> > > >>>>>>>       } else {
> > > > > > > > >> > > >>>>>>>           log.warn("Consumer with id {} not
> > found
> > > > while
> > > > > > > > >> closing
> > > > > > > > >> > > >>> consumer
> > > > > > > > >> > > >>>>>>> ", consumerId);
> > > > > > > > >> > > >>>>>>>       }
> > > > > > > > >> > > >>>>>>>   }
> > > > > > > > >> > > >>>>>>> ```
> > > > > > > > >> > > >>>>>>> **After**
> > > > > > > > >> > > >>>>>>> ```java
> > > > > > > > >> > > >>>>>>>   @Override
> > > > > > > > >> > > >>>>>>>   protected void
> > > > > > handleCloseConsumer(CommandCloseConsumer
> > > > > > > > >> > > >>> closeConsumer)
> > > > > > > > >> > > >>>>>>> {
> > > > > > > > >> > > >>>>>>>       log.info("[{}] Broker notification of
> > Closed
> > > > > > > consumer:
> > > > > > > > >> {}",
> > > > > > > > >> > > >>>>>>> remoteAddress, closeConsumer.getConsumerId());
> > > > > > > > >> > > >>>>>>>       final long consumerId =
> > > > > > > closeConsumer.getConsumerId();
> > > > > > > > >> > > >>>>>>>       ConsumerImpl<?> consumer =
> > > > > > > consumers.get(consumerId);
> > > > > > > > >> > > >>>>>>>       if (consumer != null) {
> > > > > > > > >> > > >>>>>>>           if (closeConsumer.isAllowReconnect)
> {
> > > > > > > > >> > > >>>>>>>               consumer.connectionClosed(this);
> > > > > > > > >> > > >>>>>>>           } else {
> > > > > > > > >> > > >>>>>>>               consumer.closeAsync();
> > > > > > > > >> > > >>>>>>>           }
> > > > > > > > >> > > >>>>>>>       } else {
> > > > > > > > >> > > >>>>>>>           log.warn("Consumer with id {} not
> > found
> > > > while
> > > > > > > > >> closing
> > > > > > > > >> > > >>> consumer
> > > > > > > > >> > > >>>>>>> ", consumerId);
> > > > > > > > >> > > >>>>>>>       }
> > > > > > > > >> > > >>>>>>>   }
> > > > > > > > >> > > >>>>>>> ```
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>> ## Reject Alternatives
> > > > > > > > >> > > >>>>>>>
> > > > > > > > >> > > >>>>>>> none.
> > > > > > > > >> > > >>>>>
> > > > > > > > >> > > >>>>>
> > > > > > > > >> > > >>>>>
> > > > > > > > >> > > >>>>> --
> > > > > > > > >> > > >>>>> Zike Yang
> > > > > > > > >> > > >>>
> > > > > > > > >> > > >>>
> > > > > > > > >> > >
> > > > > > > > >>
> > > > > > > > >
> > > > > > >
> > > > > >
> > > >
> >
>

Re: [DISCUSS] PIP-139 : Support Broker send command to real close producer/consumer.

Posted by PengHui Li <pe...@apache.org>.
> Great point. I was focused on deleting namespaces and missed the case
where the user wants to delete a topic from a replicated namespace. I
agree that we should make it possible to delete these topics without
removing the namespace replication.

Oh, sorry. I thought again, after
https://github.com/apache/pulsar/pull/12136
introduced topic level geo-replication configuration, users can disable for
a topic even if the namespace enabled geo-replication.

And topic level policy is introduced in 2.6.0. I think It's time to make it
one of
the features turned on by default.

So, we can follow the steps that stop the replication first and then delete
the
topic from clusters.

> Note that it is already possible
to delete these topics by deleting with force or by configuring
inactive topic deletion.

Hmm, I think both of them do not work for now, because the producer of
the replicator will always be active.

> Considering users can delete replicated topics by deleting with force,
are you saying we need to provide a non-force way to delete these
topics?

Yes, if the topic doesn't have users' producers/consumers, users don't need
to force delete the topic, it's a normal deletion.

----------------------------------------------------------------------------------
@mattison @jiwei could you please check if we can use
https://github.com/apache/pulsar/pull/12136
to stop the geo-replication first, and then delete the topic?
If it works, we don't need to change the protocol.

Thanks,
Penghui



On Thu, Mar 3, 2022 at 7:12 AM Michael Marshall <mm...@apache.org>
wrote:

> > The geo-replication's configuration can be centralized by using
> > one configuration store. But that doesn't change anything, we should
> > provide the same behavior for both centralized and decentralized
> > configuration store.
>
> Perhaps I misused the word decentralized. I meant that the challenge
> comes because geo-replication configurations for clusters are
> independent from each other.
>
> > The current challenge is
> > users usually set replicated clusters for a namespace, if remove
> > the replication configuration, the entire namespace will be affected.
> > We have supported setting the replicated configuration for a topic [1],
> > only for 2.10.0 or later.
>
> Great point. I was focused on deleting namespaces and missed the case
> where the user wants to delete a topic from a replicated namespace. I
> agree that we should make it possible to delete these topics without
> removing the namespace replication. Note that it is already possible
> to delete these topics by deleting with force or by configuring
> inactive topic deletion.
>
> > we should provided steps for how to delete an unused replicated
> > topic and what are the effects of not removing properly
>
> I agree, and I think documentation coupled with helpful error logs
> will be very helpful here.
>
> Considering users can delete replicated topics by deleting with force,
> are you saying we need to provide a non-force way to delete these
> topics?
>
> If we want to provide a path to normal deletion, we already have an
> `isRemote` flag on producers, and the inactive topic deletion code
> uses this flag to determine if there are non-replication producers
> connected to a topic. We could modify the deletion logic for a global
> topic so that it can be deleted as long as the only producers
> connected are remote producers. My main concern is that normal
> deletion could allow users to miss the nuance that they must also
> delete the topic in the other cluster. This might be an overly
> cautious concern, though.
>
> Thanks,
> Michael
>
>
>
> On Tue, Mar 1, 2022 at 1:55 AM PengHui Li <pe...@apache.org> wrote:
> >
> > > I agree with you that it'd be nice to provide the same deletion
> > behavior. However, because geo-replication's configuration is
> > decentralized, I think namespace or topic deletion is more complicated
> > than unreplicated deletion. Note that users cannot currently delete
> > namespaces that are configured with remote replication clusters.
> >
> > The geo-replication's configuration can be centralized by using
> > one configuration store. But that doesn't change anything, we should
> > provide the same behavior for both centralized and decentralized
> > configuration store.
> >
> > > Note that while a user cannot explicitly delete a replicated topic,
> > they can remove the replication configuration in cluster A and cluster
> > B, and then they are left with unreplicated namespaces and topics,
> > which can be deleted.
> >
> > Yes, stopping the replication first is a more elegant approach,
> > and it is also very easy to understand for users. The current challenge
> is
> > users usually set replicated clusters for a namespace, if remove
> > the replication configuration, the entire namespace will be affected.
> > We have supported setting the replicated configuration for a topic [1],
> > only for 2.10.0 or later.
> >
> > > In this case, the specific problem stems from a remote cluster
> > targeting the local cluster. If the user only deletes the local topic,
> > without removing the replication configuration, I don't think it
> > should be surprising when the topic gets recreated (assuming auto
> > topic creation is enabled).
> >
> > From the user's perspective, they don't have any producers/consumers,
> > If the topic only has the producer of the replicator connect to the
> remote
> > topic
> > each other, in this case, I think we should allow users to delete the
> > topic.
> > Users will be surprised in this case, because they are not able to delete
> > an unused topic, they are not able to change the replication
> configuration
> > for
> > a namespace.
> >
> > > In my view, I think it's okay to make it harder to delete
> > geo-replicated namespaces or topics than local namespaces or topics.
> > Failing requests with a good, informative error message protects
> > users. Additionally, if users are able to delete a namespace or topic
> > by force, that gives them a work around--although we should document
> > the limitations of this workaround.
> >
> > I agree with it's okay to make it harder to delete a replicated topic, so
> > for
> > a replicated topic, users should delete the topic in each cluster, we
> should
> > provided steps for how to delete an unused replicated topic and what are
> > the effects of not removing properly, for example, if a topic has 2
> > clusters, cluster-a
> > and cluster-b, but only delete the topic from cluster-a, the topic will
> be
> > created
> > again if cluster-b write some new data into this topic, depends on the
> topic
> > auto-creation.
> >
> > If we only support set replicated cluster for a topic, not a namespace,
> > I like the approach that stops the replication first and then deletes the
> > topic.
> >
> > [1] https://github.com/apache/pulsar/pull/12136
> >
> > Thanks,
> > Penghui
> >
> > On Tue, Mar 1, 2022 at 2:27 PM Michael Marshall <mm...@apache.org>
> > wrote:
> >
> > > > Either replicated topic or non-replicated topic, we should provide
> the
> > > > same topic deletion behavior, the topic can be deleted if no active
> > > > user's producers/consumers, should not be affected by the Pulsar
> > > > internal producers/consumers.
> > >
> > > I agree with you that it'd be nice to provide the same deletion
> > > behavior. However, because geo-replication's configuration is
> > > decentralized, I think namespace or topic deletion is more complicated
> > > than unreplicated deletion. Note that users cannot currently delete
> > > namespaces that are configured with remote replication clusters.
> > >
> > > One challenge I see is the configuration in the remote cluster. If
> > > cluster A is replicated to cluster B for namespace C, deleting a topic
> > > in cluster B does not affect the configuration in cluster A. If we
> > > make it easy to delete the topic in cluster B, we need to decide how
> > > to handle failed replication in cluster A. In the same way, deleting
> > > namespace C in cluster B doesn't delete it from cluster A. For
> > > example, when should cluster A stop attempting to create producers to
> > > cluster B for the deleted namespace/topic and how do we inform
> > > operators that the replication is no longer working as configured. We
> > > avoid several problems by failing the initial user request.
> > >
> > > Note that while a user cannot explicitly delete a replicated topic,
> > > they can remove the replication configuration in cluster A and cluster
> > > B, and then they are left with unreplicated namespaces and topics,
> > > which can be deleted.
> > >
> > > > But for now, for a replicated topic, users are not able to delete it
> > > because
> > > > of the internal active producer, adding force delete option can
> delete
> > > > the topic, but it will create again if users enabled topic
> auto-creation.
> > >
> > > In this case, the specific problem stems from a remote cluster
> > > targeting the local cluster. If the user only deletes the local topic,
> > > without removing the replication configuration, I don't think it
> > > should be surprising when the topic gets recreated (assuming auto
> > > topic creation is enabled).
> > >
> > > In my view, I think it's okay to make it harder to delete
> > > geo-replicated namespaces or topics than local namespaces or topics.
> > > Failing requests with a good, informative error message protects
> > > users. Additionally, if users are able to delete a namespace or topic
> > > by force, that gives them a work around--although we should document
> > > the limitations of this workaround.
> > >
> > > I am not experienced with geo-replication, so please let me know if
> > > any of my analysis doesn't align with the actual design.
> > >
> > > Thanks,
> > > Michael
> > >
> > >
> > > On Sun, Feb 27, 2022 at 8:42 PM guo jiwei <te...@apache.org>
> wrote:
> > > >
> > > > I have added the geo-replicator topic case and updated the PIP.
> > > >
> > > >
> > > > Regards
> > > > Jiwei Guo (Tboy)
> > > >
> > > >
> > > > On Sun, Feb 27, 2022 at 1:00 PM PengHui Li <pe...@apache.org>
> wrote:
> > > >
> > > > > > To me, the main question is whether we create a custom error or
> > > expand
> > > > > the `CommandCloseProducer` command. I lean towards adding an error
> > > > > because it will automatically be backward compatible based on the
> way
> > > > > the client determines which errors are retriable. Although, I don't
> > > > > have a strong opinion.
> > > > >
> > > > > I have the same opinion as you. But it looks like not able to fix
> the
> > > > > geo-replication topic deletion.
> > > > >
> > > > > Either replicated topic or non-replicated topic, we should provide
> the
> > > > > same topic deletion behavior, the topic can be deleted if no active
> > > > > user's producers/consumers, should not be affected by the Pulsar
> > > > > internal producers/consumers.
> > > > >
> > > > > But for now, for a replicated topic, users are not able to delete
> it
> > > > > because
> > > > > of the internal active producer, adding force delete option can
> delete
> > > > > the topic, but it will create again if users enabled topic
> > > auto-creation.
> > > > >
> > > > > I think in this case, it seems we don't have a chance to give the
> > > producer
> > > > > an exception, because the reconnection will connect to a new topic
> only
> > > > > with
> > > > > the same topic name as before.
> > > > >
> > > > > Thanks,
> > > > > Penghui
> > > > >
> > > > > On Sat, Feb 26, 2022 at 2:24 PM Michael Marshall <
> mmarshall@apache.org
> > > >
> > > > > wrote:
> > > > >
> > > > > > > We need to determine the overall plan, not the implementation
> > > > > > > at this moment.
> > > > > >
> > > > > > Great point, I agree.
> > > > > >
> > > > > > > Looks like we need an option in the
> > > > > > > `CommandCloseProducer` to avoid the replication producer
> > > > > > > reconnecting again.
> > > > > >
> > > > > > This is an important point. One of the primary requirements for
> > > > > > geo-replication must also be that a replicated namespace or
> topic can
> > > > > > be deleted and recreated without any problems or leaks in the
> broker.
> > > > > > That requires that the producer is cleaned up properly when the
> > > > > > namespace is deleted. I think we could achieve that with a new
> server
> > > > > > error or an expansion of the `CommandCloseProducer` command.
> > > > > >
> > > > > > To me, the main question is whether we create a custom error or
> > > expand
> > > > > > the `CommandCloseProducer` command. I lean towards adding an
> error
> > > > > > because it will automatically be backward compatible based on
> the way
> > > > > > the client determines which errors are retriable. Although, I
> don't
> > > > > > have a strong opinion.
> > > > > >
> > > > > > The key code paths that we need to consider are topic lookup and
> > > topic
> > > > > > creation. Both the http admin endpoint and the binary protocol
> > > > > > endpoint interact with these code paths, and we should fail
> requests
> > > > > > for namespaces which have the "deleted" field set to true.
> > > > > >
> > > > > > In looking at the code, we could modify the logic in
> > > > > > `PulsarWebResource#checkLocalOrGetPeerReplicationCluster`. That
> > > method
> > > > > > is always called for topic lookups via HTTP or via the Pulsar
> > > > > > binary protocol. If a namespace doesn't exist, that method
> currently
> > > > > > returns a NOT_FOUND exception. In the binary protocol case, that
> > > > > > exception is converted to a `ServerError.MetadataError`. We could
> > > > > > modify the method like I do here [0], or we could add another
> case
> > > and
> > > > > > return a custom error message.
> > > > > >
> > > > > > We'd also need to prevent topic creation via the admin endpoint
> and
> > > > > > the auto topic creation via the binary protocol. I believe this
> will
> > > > > > be just as easy as the lookup case, but I haven't confirmed.
> > > > > >
> > > > > > Thanks,
> > > > > > Michael
> > > > > >
> > > > > > [0]
> > > > > >
> > > > >
> > >
> https://github.com/michaeljmarshall/pulsar/commit/93611a741af163587f79b88bc1c9f1acc7953512
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Feb 25, 2022 at 3:45 AM PengHui Li <pe...@apache.org>
> > > wrote:
> > > > > > >
> > > > > > > The replicated topic deletion also has a similar problem.
> > > > > > > If the topic auto-creation is enabled, we are not able to
> delete
> > > > > > > a replicated topic, because the producer used for replication
> > > > > > > will try to reconnect. Looks like we need an option in the
> > > > > > > `CommandCloseProducer` to avoid the replication producer
> > > > > > > reconnecting again.
> > > > > > >
> > > > > > > Penghui
> > > > > > >
> > > > > > > On Fri, Feb 25, 2022 at 5:26 PM PengHui Li <penghui@apache.org
> >
> > > wrote:
> > > > > > >
> > > > > > > > > Penghui, are you suggesting that we implement the
> > > namespace/tenant
> > > > > > > > terminated logic after completing this PIP?
> > > > > > > >
> > > > > > > > I'm ok with both being implemented together or separated.
> > > > > > > > We need to determine the overall plan, not the implementation
> > > > > > > > at this moment.
> > > > > > > >
> > > > > > > > > For the sake of discussion, if we implement the namespace
> > > > > terminated
> > > > > > > > logic first, we could fulfill the underlying requirements for
> > > this
> > > > > PIP
> > > > > > > > by returning a new non-retriable error response when a client
> > > tries
> > > > > to
> > > > > > > > connect a producer or a consumer to a topic in a namespace
> that
> > > is
> > > > > > > > "terminated". If we do add the "namespace terminated"
> feature,
> > > we'll
> > > > > > > > need to add a non-retriable exception for this case, anyway.
> The
> > > main
> > > > > > > > advantage here is that we'd only need one expansion of the
> > > protobuf
> > > > > > > > instead of two. The downside is that the protocol for
> connected
> > > > > > > > clients has a couple more roundtrips. The broker would
> disconnect
> > > > > > > > connected clients and then fail their reconnection attempt
> with a
> > > > > > > > non-retriable error.
> > > > > > > >
> > > > > > > > Regarding the namespace "terminated" concept, I just noticed
> > > that we
> > > > > > > > already have a "deleted" field in a namespace's policies [0].
> > > There
> > > > > is
> > > > > > > > even a comment that says:
> > > > > > > >
> > > > > > > > Yes, if we can terminate the namespace first, we should
> return
> > > > > > “namespace
> > > > > > > > not found”
> > > > > > > > if the producers and consumers try to reconnect, the client
> > > should
> > > > > stop
> > > > > > > > reconnecting
> > > > > > > > after getting this error. Looks like following this way, we
> don't
> > > > > need
> > > > > > to
> > > > > > > > introduce any protocol changes anymore.
> > > > > > > >
> > > > > > > > > As I think about it more, I no longer think "terminated"
> is the
> > > > > right
> > > > > > > > term for what I proposed above. Our goal is to briefly
> prevent
> > > any
> > > > > > > > topic creation to ensure we can delete all sub resources for
> a
> > > > > > > > namespace. On the other hand, a terminated topic isn't
> > > necessarily
> > > > > > > > short lived. If we want to apply the "terminated" term
> > > unequivocally
> > > > > > > > to both topics and namespaces, I think a terminated namespace
> > > would
> > > > > > > > need to be a namespace where all topics are in terminated
> state
> > > and
> > > > > no
> > > > > > > > additional topics could be created. That's not the feature
> we're
> > > > > > > > discussing here, though. Deleted seems like the right term
> to me,
> > > > > > > > especially since we're already using it to prevent a race
> > > condition
> > > > > > > >
> > > > > > > > Yes, topic termination has different meanings, deletion
> should
> > > be the
> > > > > > right
> > > > > > > > term here.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Penghui
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Feb 25, 2022 at 12:19 PM Michael Marshall <
> > > > > > mmarshall@apache.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> Regarding the namespace "terminated" concept, I just noticed
> > > that we
> > > > > > > >> already have a "deleted" field in a namespace's policies
> [0].
> > > There
> > > > > is
> > > > > > > >> even a comment that says:
> > > > > > > >>
> > > > > > > >> > // set the policies to deleted so that somebody else
> cannot
> > > > > acquire
> > > > > > > >> this namespace
> > > > > > > >>
> > > > > > > >> I am not familiar with this feature, but it seems like this
> > > policy
> > > > > > > >> field could be checked before creating a topic in a
> namespace.
> > > That
> > > > > > > >> would remove certain races described above.
> > > > > > > >>
> > > > > > > >> As I think about it more, I no longer think "terminated" is
> the
> > > > > right
> > > > > > > >> term for what I proposed above. Our goal is to briefly
> prevent
> > > any
> > > > > > > >> topic creation to ensure we can delete all sub resources
> for a
> > > > > > > >> namespace. On the other hand, a terminated topic isn't
> > > necessarily
> > > > > > > >> short lived. If we want to apply the "terminated" term
> > > unequivocally
> > > > > > > >> to both topics and namespaces, I think a terminated
> namespace
> > > would
> > > > > > > >> need to be a namespace where all topics are in terminated
> state
> > > and
> > > > > no
> > > > > > > >> additional topics could be created. That's not the feature
> we're
> > > > > > > >> discussing here, though. Deleted seems like the right term
> to
> > > me,
> > > > > > > >> especially since we're already using it to prevent a race
> > > condition
> > > > > > > >> [0].
> > > > > > > >>
> > > > > > > >> Thanks,
> > > > > > > >> Michael
> > > > > > > >>
> > > > > > > >> [0]
> > > > > > > >>
> > > > > >
> > > > >
> > >
> https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/impl/NamespacesBase.java#L252-L262
> > > > > > > >>
> > > > > > > >> On Thu, Feb 24, 2022 at 9:58 PM Michael Marshall <
> > > > > > mmarshall@apache.org>
> > > > > > > >> wrote:
> > > > > > > >> >
> > > > > > > >> > Hi Dave,
> > > > > > > >> >
> > > > > > > >> > > automatically delete tenants and namespaces for not
> > > containing
> > > > > > topics
> > > > > > > >> >
> > > > > > > >> > I don't think that is what we are discussing. I agree
> that the
> > > > > > initial
> > > > > > > >> > email says just that, though, which is why I asked above:
> > > > > > > >> >
> > > > > > > >> > >> When there are no user-created topics under a
> namespace,
> > > > > > > >> > >> Namespace should be deleted.
> > > > > > > >> > > This is supposed to mean that the namespace should be
> able
> > > to be
> > > > > > > >> > > deleted, correct?
> > > > > > > >> >
> > > > > > > >> > Perhaps Mattison can clarify, too.
> > > > > > > >> >
> > > > > > > >> > My current understanding of the context for the PIP is
> that a
> > > user
> > > > > > > >> > call to delete a namespace without force can fail when a
> > > producer
> > > > > > > >> > reconnects to a deleted topic. The goal is to remove the
> race
> > > > > > > >> > condition to ensure namespace deletion can succeed.
> > > > > > > >> >
> > > > > > > >> > Thanks,
> > > > > > > >> > Michael
> > > > > > > >> >
> > > > > > > >> > On Thu, Feb 24, 2022 at 5:21 PM Dave Fisher <
> wave@apache.org>
> > > > > > wrote:
> > > > > > > >> > >
> > > > > > > >> > > Hi -
> > > > > > > >> > >
> > > > > > > >> > > I hope I’m understanding what’s being discussed.
> > > > > > > >> > >
> > > > > > > >> > > If we are going to automatically delete tenants and
> > > namespaces
> > > > > for
> > > > > > > >> not containing topics then we need to make both of these
> > > automatic
> > > > > > actions
> > > > > > > >> configurable with a default to NOT do so. Otherwise we break
> > > > > existing
> > > > > > use
> > > > > > > >> cases.
> > > > > > > >> > >
> > > > > > > >> > > Automatic deletion of namespaces should be configurable
> at
> > > both
> > > > > > the
> > > > > > > >> cluster and tenant level.
> > > > > > > >> > >
> > > > > > > >> > > Regards,
> > > > > > > >> > > Dave
> > > > > > > >> > >
> > > > > > > >> > > > On Feb 24, 2022, at 2:25 PM, Michael Marshall <
> > > > > > mmarshall@apache.org>
> > > > > > > >> wrote:
> > > > > > > >> > > >
> > > > > > > >> > > >> The old producer/consumer should be closed after
> > > applying the
> > > > > > > >> changes from
> > > > > > > >> > > >> this proposal.
> > > > > > > >> > > >
> > > > > > > >> > > > Penghui, are you suggesting that we implement the
> > > > > > namespace/tenant
> > > > > > > >> > > > terminated logic after completing this PIP?
> > > > > > > >> > > >
> > > > > > > >> > > > For the sake of discussion, if we implement the
> namespace
> > > > > > terminated
> > > > > > > >> > > > logic first, we could fulfill the underlying
> requirements
> > > for
> > > > > > this
> > > > > > > >> PIP
> > > > > > > >> > > > by returning a new non-retriable error response when a
> > > client
> > > > > > tries
> > > > > > > >> to
> > > > > > > >> > > > connect a producer or a consumer to a topic in a
> namespace
> > > > > that
> > > > > > is
> > > > > > > >> > > > "terminated". If we do add the "namespace terminated"
> > > feature,
> > > > > > we'll
> > > > > > > >> > > > need to add a non-retriable exception for this case,
> > > anyway.
> > > > > The
> > > > > > > >> main
> > > > > > > >> > > > advantage here is that we'd only need one expansion
> of the
> > > > > > protobuf
> > > > > > > >> > > > instead of two. The downside is that the protocol for
> > > > > connected
> > > > > > > >> > > > clients has a couple more roundtrips. The broker would
> > > > > > disconnect
> > > > > > > >> > > > connected clients and then fail their reconnection
> attempt
> > > > > with
> > > > > > a
> > > > > > > >> > > > non-retriable error.
> > > > > > > >> > > >
> > > > > > > >> > > > Thanks,
> > > > > > > >> > > > Michael
> > > > > > > >> > > >
> > > > > > > >> > > > On Thu, Feb 24, 2022 at 7:11 AM PengHui Li <
> > > > > penghui@apache.org>
> > > > > > > >> wrote:
> > > > > > > >> > > >>
> > > > > > > >> > > >>> If we want to solve this problem, we need to add
> some
> > > sync
> > > > > > > >> resources like
> > > > > > > >> > > >> lock/state, I think it’s a harm for us, we don’t
> need to
> > > do
> > > > > > that.
> > > > > > > >> > > >>
> > > > > > > >> > > >> I think we can make the namespace/tenants to the
> inactive
> > > > > state
> > > > > > > >> first so
> > > > > > > >> > > >> that we can avoid any new
> > > > > > > >> > > >> producer/consumer connect to the topic under the
> > > > > > namespace/tenant.
> > > > > > > >> > > >>
> > > > > > > >> > > >> The old producer/consumer should be closed after
> > > applying the
> > > > > > > >> changes from
> > > > > > > >> > > >> this proposal.
> > > > > > > >> > > >>
> > > > > > > >> > > >> Thanks,
> > > > > > > >> > > >> Penghui
> > > > > > > >> > > >>
> > > > > > > >> > > >> On Tue, Feb 8, 2022 at 5:47 PM mattison chao <
> > > > > > > >> mattisonchao@gmail.com> wrote:
> > > > > > > >> > > >>
> > > > > > > >> > > >>>> This is supposed to mean that the namespace should
> be
> > > able
> > > > > > to be
> > > > > > > >> > > >>>> deleted, correct?
> > > > > > > >> > > >>>
> > > > > > > >> > > >>> Yes, the main background is the user doesn’t have an
> > > active
> > > > > > > >> topic. so,
> > > > > > > >> > > >>> they want to delete the namespace.
> > > > > > > >> > > >>>
> > > > > > > >> > > >>>> However, I think
> > > > > > > >> > > >>>> we might still have a race condition that could
> make
> > > tenant
> > > > > > or
> > > > > > > >> > > >>>> namespace deletion fail. Specifically, if a new
> > > producer or
> > > > > > > >> consumer
> > > > > > > >> > > >>>> creates a topic after the namespace deletion has
> > > started
> > > > > but
> > > > > > > >> > > >>>> before it is complete. Do you agree that the
> underlying
> > > > > race
> > > > > > > >> still
> > > > > > > >> > > >>> exists?
> > > > > > > >> > > >>>
> > > > > > > >> > > >>> Yes, this condition exists. I think it’s not a big
> > > problem
> > > > > > > >> because the
> > > > > > > >> > > >>> user doesn’t want to use this namespace anymore.
> > > > > > > >> > > >>> If this scenario appears, they will get an error and
> > > need to
> > > > > > > >> delete it
> > > > > > > >> > > >>> again.
> > > > > > > >> > > >>>
> > > > > > > >> > > >>>> What if we expand our usage of the "terminated"
> > > feature to
> > > > > > apply
> > > > > > > >> to
> > > > > > > >> > > >>>> namespaces (and tenants)? Then, a terminated
> namespace
> > > can
> > > > > > have
> > > > > > > >> > > >>>> bundles and topics can be deleted but not created
> > > (just as
> > > > > a
> > > > > > > >> terminated
> > > > > > > >> > > >>>> topic cannot have any new messages published to
> it).
> > > This
> > > > > > would
> > > > > > > >> take
> > > > > > > >> > > >>>> care of all topic creation race conditions. We'd
> > > probably
> > > > > > need
> > > > > > > >> to add
> > > > > > > >> > > >>>> new protobuf exceptions for this feature.
> > > > > > > >> > > >>>
> > > > > > > >> > > >>>
> > > > > > > >> > > >>> If we want to solve this problem, we need to add
> some
> > > sync
> > > > > > > >> resources like
> > > > > > > >> > > >>> lock/state, I think it’s a harm for us, we don’t
> need
> > > to do
> > > > > > that.
> > > > > > > >> > > >>>
> > > > > > > >> > > >>> Thanks for your suggestions, let me know what you
> think.
> > > > > > > >> > > >>>
> > > > > > > >> > > >>> Best,
> > > > > > > >> > > >>> Mattison
> > > > > > > >> > > >>>
> > > > > > > >> > > >>>> On Feb 1, 2022, at 2:26 PM, Michael Marshall <
> > > > > > > >> mmarshall@apache.org>
> > > > > > > >> > > >>> wrote:
> > > > > > > >> > > >>>>
> > > > > > > >> > > >>>> This proposal identifies an important issue that we
> > > should
> > > > > > > >> definitely
> > > > > > > >> > > >>>> solve. I have some questions.
> > > > > > > >> > > >>>>
> > > > > > > >> > > >>>>> When there are no user-created topics under a
> > > namespace,
> > > > > > > >> > > >>>>> Namespace should be deleted.
> > > > > > > >> > > >>>>
> > > > > > > >> > > >>>> This is supposed to mean that the namespace should
> be
> > > able
> > > > > > to be
> > > > > > > >> > > >>>> deleted, correct?
> > > > > > > >> > > >>>>
> > > > > > > >> > > >>>>> For this reason, we need to close the system topic
> > > > > > > >> reader/producer
> > > > > > > >> > > >>>>> first, then remove the system topic. finally,
> remove
> > > the
> > > > > > > >> namespace.
> > > > > > > >> > > >>>>
> > > > > > > >> > > >>>> I agree that expanding the protobuf CloseProducer
> and
> > > > > > > >> CloseConsumer
> > > > > > > >> > > >>>> commands could be valuable here. The expansion
> would
> > > ensure
> > > > > > that
> > > > > > > >> > > >>>> producers and consumers don't attempt to reconnect.
> > > > > However,
> > > > > > I
> > > > > > > >> think
> > > > > > > >> > > >>>> we might still have a race condition that could
> make
> > > tenant
> > > > > > or
> > > > > > > >> > > >>>> namespace deletion fail. Specifically, if a new
> > > producer or
> > > > > > > >> consumer
> > > > > > > >> > > >>>> creates a topic after the namespace deletion has
> > > started
> > > > > but
> > > > > > > >> > > >>>> before it is complete. Do you agree that the
> underlying
> > > > > race
> > > > > > > >> still
> > > > > > > >> > > >>> exists?
> > > > > > > >> > > >>>>
> > > > > > > >> > > >>>> In my view, the fundamental problem here is that
> > > deleting
> > > > > > > >> certain Pulsar
> > > > > > > >> > > >>>> resources takes time and, in a distributed system,
> that
> > > > > means
> > > > > > > >> race
> > > > > > > >> > > >>>> conditions.
> > > > > > > >> > > >>>>
> > > > > > > >> > > >>>> What if we expand our usage of the "terminated"
> > > feature to
> > > > > > apply
> > > > > > > >> to
> > > > > > > >> > > >>>> namespaces (and tenants)? Then, a terminated
> namespace
> > > can
> > > > > > have
> > > > > > > >> > > >>>> bundles and topics can be deleted but not created
> > > (just as
> > > > > a
> > > > > > > >> terminated
> > > > > > > >> > > >>>> topic cannot have any new messages published to
> it).
> > > This
> > > > > > would
> > > > > > > >> take
> > > > > > > >> > > >>>> care of all topic creation race conditions. We'd
> > > probably
> > > > > > need
> > > > > > > >> to add
> > > > > > > >> > > >>>> new protobuf exceptions for this feature.
> > > > > > > >> > > >>>>
> > > > > > > >> > > >>>> Thanks,
> > > > > > > >> > > >>>> Michael
> > > > > > > >> > > >>>>
> > > > > > > >> > > >>>>
> > > > > > > >> > > >>>> On Sat, Jan 29, 2022 at 7:25 PM Zike Yang
> > > > > > > >> > > >>>> <zk...@streamnative.io.invalid> wrote:
> > > > > > > >> > > >>>>>
> > > > > > > >> > > >>>>> +1
> > > > > > > >> > > >>>>>
> > > > > > > >> > > >>>>>
> > > > > > > >> > > >>>>> Thanks,
> > > > > > > >> > > >>>>> Zike
> > > > > > > >> > > >>>>>
> > > > > > > >> > > >>>>> On Sat, Jan 29, 2022 at 12:30 PM guo jiwei <
> > > > > > > >> technoboy@apache.org>
> > > > > > > >> > > >>> wrote:
> > > > > > > >> > > >>>>>>
> > > > > > > >> > > >>>>>> Hi
> > > > > > > >> > > >>>>>> The PIP link :
> > > > > > https://github.com/apache/pulsar/issues/13989
> > > > > > > >> > > >>>>>>
> > > > > > > >> > > >>>>>> Regards
> > > > > > > >> > > >>>>>> Jiwei Guo (Tboy)
> > > > > > > >> > > >>>>>>
> > > > > > > >> > > >>>>>>
> > > > > > > >> > > >>>>>> On Sat, Jan 29, 2022 at 11:46 AM mattison chao <
> > > > > > > >> mattisonchao@gmail.com
> > > > > > > >> > > >>>>
> > > > > > > >> > > >>>>>> wrote:
> > > > > > > >> > > >>>>>>
> > > > > > > >> > > >>>>>>> Hello everyone,
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>> I want to start a discussion about PIP-139 :
> Support
> > > > > > Broker
> > > > > > > >> send
> > > > > > > >> > > >>> command
> > > > > > > >> > > >>>>>>> to real close producer/consumer.
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>> This is the PIP document
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>> https://github.com/apache/pulsar/issues/13989 <
> > > > > > > >> > > >>>>>>> https://github.com/apache/pulsar/issues/13979>
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>> Please check it out and feel free to share your
> > > > > thoughts.
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>> Best,
> > > > > > > >> > > >>>>>>> Mattison
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>> ———————— Pasted below for quoting convenience.
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>> Relation pull request:  #13337
> > > > > > > >> > > >>>>>>> Authors: @Technoboy-  @mattisonchao
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>> ## Motivation
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>> Before we discuss this pip, I'd like to
> supplement
> > > some
> > > > > > > >> context to
> > > > > > > >> > > >>> help
> > > > > > > >> > > >>>>>>> contributors who don't want to read the original
> > > pull
> > > > > > request.
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>>> When there are no user-created topics under a
> > > > > namespace,
> > > > > > > >> Namespace
> > > > > > > >> > > >>>>>>> should be deleted. But currently, the system
> topic
> > > > > existed
> > > > > > > >> and the
> > > > > > > >> > > >>>>>>> reader/producer could auto-create the system
> which
> > > may
> > > > > > cause
> > > > > > > >> the
> > > > > > > >> > > >>> namespace
> > > > > > > >> > > >>>>>>> deletion to fail.
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>> For this reason, we need to close the system
> topic
> > > > > > > >> reader/producer
> > > > > > > >> > > >>> first,
> > > > > > > >> > > >>>>>>> then remove the system topic. finally, remove
> the
> > > > > > namespace.
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>> Following this way, we first want to use
> > > ``terminate``
> > > > > to
> > > > > > > >> solve this
> > > > > > > >> > > >>>>>>> problem. then we found producers can
> disconnect, but
> > > > > > > >> consumers are
> > > > > > > >> > > >>> still
> > > > > > > >> > > >>>>>>> alive. So, another PR #13960 wants to add
> consumers'
> > > > > > closing
> > > > > > > >> logic.
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>> After #13960, all things look good, but another
> > > problem
> > > > > > > >> appears. that
> > > > > > > >> > > >>> is
> > > > > > > >> > > >>>>>>> we need to wait until consumers completely
> consume
> > > all
> > > > > > > >> messages (this
> > > > > > > >> > > >>> may
> > > > > > > >> > > >>>>>>> make terminate topic so long and the operation
> > > > > > timeout)then
> > > > > > > >> get
> > > > > > > >> > > >>>>>>> ``reachedEndOfTopic``. the relative code here :
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>
> > > > > > > >>
> > > > > >
> > > > >
> > >
> https://github.com/apache/pulsar/blob/07ef9231db8b844586b9217ee2d59237eb9c54b7/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/CompactorSubscription.java#L102-L106
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>> In the #13337 case, we need to force close
> consumers
> > > > > > > >> immediately. So
> > > > > > > >> > > >>> we
> > > > > > > >> > > >>>>>>> write this PIP to discuss another way to resolve
> > > this
> > > > > > problem.
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>> ## Goal
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>> We can add a new field(`allow_reconnect`) in
> command
> > > > > > > >> > > >>>>>>> ``CommandCloseProducer``/
> ``CommandCloseConsumer``
> > > to
> > > > > > close
> > > > > > > >> > > >>>>>>> producer/consumers immediately.
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>> ## API Changes
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>> - Add ``allow_reconnect`` to
> > > ``CommandCloseProducer``;
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>> ```java
> > > > > > > >> > > >>>>>>> **Before**
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>> message CommandCloseProducer {
> > > > > > > >> > > >>>>>>>   required uint64 producer_id = 1;
> > > > > > > >> > > >>>>>>>   required uint64 request_id = 2;
> > > > > > > >> > > >>>>>>> }
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>> **After**
> > > > > > > >> > > >>>>>>> message CommandCloseProducer {
> > > > > > > >> > > >>>>>>>   required uint64 producer_id = 1;
> > > > > > > >> > > >>>>>>>   required uint64 request_id = 2;
> > > > > > > >> > > >>>>>>>   optional bool allow_reconnect = 3 [default =
> > > true];
> > > > > > > >> > > >>>>>>> }
> > > > > > > >> > > >>>>>>> ```
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>> - Add ``allow_reconnect`` to
> > > ``CommandCloseConsumer``
> > > > > > > >> > > >>>>>>> ```java
> > > > > > > >> > > >>>>>>> **Before**
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>> message CommandCloseConsumer {
> > > > > > > >> > > >>>>>>>   required uint64 consumer_id = 1;
> > > > > > > >> > > >>>>>>>   required uint64 request_id = 2;
> > > > > > > >> > > >>>>>>> }
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>> **After**
> > > > > > > >> > > >>>>>>> message CommandCloseConsumer {
> > > > > > > >> > > >>>>>>>   required uint64 consumer_id = 1;
> > > > > > > >> > > >>>>>>>   required uint64 request_id = 2;
> > > > > > > >> > > >>>>>>>   optional bool allow_reconnect = 3 [default =
> > > true];
> > > > > > > >> > > >>>>>>> }
> > > > > > > >> > > >>>>>>> ```
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>> ## Implementation
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>> ### ClientCnx - Producer:
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>> **Before**
> > > > > > > >> > > >>>>>>> ```java
> > > > > > > >> > > >>>>>>>   @Override
> > > > > > > >> > > >>>>>>>   protected void
> > > > > handleCloseProducer(CommandCloseProducer
> > > > > > > >> > > >>> closeProducer)
> > > > > > > >> > > >>>>>>> {
> > > > > > > >> > > >>>>>>>       log.info("[{}] Broker notification of
> Closed
> > > > > > producer:
> > > > > > > >> {}",
> > > > > > > >> > > >>>>>>> remoteAddress, closeProducer.getProducerId());
> > > > > > > >> > > >>>>>>>       final long producerId =
> > > > > > closeProducer.getProducerId();
> > > > > > > >> > > >>>>>>>       ProducerImpl<?> producer =
> > > > > > producers.get(producerId);
> > > > > > > >> > > >>>>>>>       if (producer != null) {
> > > > > > > >> > > >>>>>>>           producer.connectionClosed(this);
> > > > > > > >> > > >>>>>>>       } else {
> > > > > > > >> > > >>>>>>>           log.warn("Producer with id {} not
> found
> > > while
> > > > > > > >> closing
> > > > > > > >> > > >>> producer
> > > > > > > >> > > >>>>>>> ", producerId);
> > > > > > > >> > > >>>>>>>       }
> > > > > > > >> > > >>>>>>>   }
> > > > > > > >> > > >>>>>>> ```
> > > > > > > >> > > >>>>>>> After:
> > > > > > > >> > > >>>>>>> ```java
> > > > > > > >> > > >>>>>>>  @Override
> > > > > > > >> > > >>>>>>>   protected void
> > > > > handleCloseProducer(CommandCloseProducer
> > > > > > > >> > > >>> closeProducer)
> > > > > > > >> > > >>>>>>> {
> > > > > > > >> > > >>>>>>>       log.info("[{}] Broker notification of
> Closed
> > > > > > producer:
> > > > > > > >> {}",
> > > > > > > >> > > >>>>>>> remoteAddress, closeProducer.getProducerId());
> > > > > > > >> > > >>>>>>>       final long producerId =
> > > > > > closeProducer.getProducerId();
> > > > > > > >> > > >>>>>>>       ProducerImpl<?> producer =
> > > > > > producers.get(producerId);
> > > > > > > >> > > >>>>>>>       if (producer != null) {
> > > > > > > >> > > >>>>>>>           if (closeProducer.isAllowReconnect) {
> > > > > > > >> > > >>>>>>>               producer.connectionClosed(this);
> > > > > > > >> > > >>>>>>>           } else {
> > > > > > > >> > > >>>>>>>               producer.closeAsync();
> > > > > > > >> > > >>>>>>>           }
> > > > > > > >> > > >>>>>>>       } else {
> > > > > > > >> > > >>>>>>>           log.warn("Producer with id {} not
> found
> > > while
> > > > > > > >> closing
> > > > > > > >> > > >>> producer
> > > > > > > >> > > >>>>>>> ", producerId);
> > > > > > > >> > > >>>>>>>       }
> > > > > > > >> > > >>>>>>>   }
> > > > > > > >> > > >>>>>>> ```
> > > > > > > >> > > >>>>>>> ### ClientCnx - Consumer:
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>> **Before**
> > > > > > > >> > > >>>>>>> ```java
> > > > > > > >> > > >>>>>>>   @Override
> > > > > > > >> > > >>>>>>>   protected void
> > > > > handleCloseConsumer(CommandCloseConsumer
> > > > > > > >> > > >>> closeConsumer)
> > > > > > > >> > > >>>>>>> {
> > > > > > > >> > > >>>>>>>       log.info("[{}] Broker notification of
> Closed
> > > > > > consumer:
> > > > > > > >> {}",
> > > > > > > >> > > >>>>>>> remoteAddress, closeConsumer.getConsumerId());
> > > > > > > >> > > >>>>>>>       final long consumerId =
> > > > > > closeConsumer.getConsumerId();
> > > > > > > >> > > >>>>>>>       ConsumerImpl<?> consumer =
> > > > > > consumers.get(consumerId);
> > > > > > > >> > > >>>>>>>       if (consumer != null) {
> > > > > > > >> > > >>>>>>>           consumer.connectionClosed(this);
> > > > > > > >> > > >>>>>>>       } else {
> > > > > > > >> > > >>>>>>>           log.warn("Consumer with id {} not
> found
> > > while
> > > > > > > >> closing
> > > > > > > >> > > >>> consumer
> > > > > > > >> > > >>>>>>> ", consumerId);
> > > > > > > >> > > >>>>>>>       }
> > > > > > > >> > > >>>>>>>   }
> > > > > > > >> > > >>>>>>> ```
> > > > > > > >> > > >>>>>>> **After**
> > > > > > > >> > > >>>>>>> ```java
> > > > > > > >> > > >>>>>>>   @Override
> > > > > > > >> > > >>>>>>>   protected void
> > > > > handleCloseConsumer(CommandCloseConsumer
> > > > > > > >> > > >>> closeConsumer)
> > > > > > > >> > > >>>>>>> {
> > > > > > > >> > > >>>>>>>       log.info("[{}] Broker notification of
> Closed
> > > > > > consumer:
> > > > > > > >> {}",
> > > > > > > >> > > >>>>>>> remoteAddress, closeConsumer.getConsumerId());
> > > > > > > >> > > >>>>>>>       final long consumerId =
> > > > > > closeConsumer.getConsumerId();
> > > > > > > >> > > >>>>>>>       ConsumerImpl<?> consumer =
> > > > > > consumers.get(consumerId);
> > > > > > > >> > > >>>>>>>       if (consumer != null) {
> > > > > > > >> > > >>>>>>>           if (closeConsumer.isAllowReconnect) {
> > > > > > > >> > > >>>>>>>               consumer.connectionClosed(this);
> > > > > > > >> > > >>>>>>>           } else {
> > > > > > > >> > > >>>>>>>               consumer.closeAsync();
> > > > > > > >> > > >>>>>>>           }
> > > > > > > >> > > >>>>>>>       } else {
> > > > > > > >> > > >>>>>>>           log.warn("Consumer with id {} not
> found
> > > while
> > > > > > > >> closing
> > > > > > > >> > > >>> consumer
> > > > > > > >> > > >>>>>>> ", consumerId);
> > > > > > > >> > > >>>>>>>       }
> > > > > > > >> > > >>>>>>>   }
> > > > > > > >> > > >>>>>>> ```
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>> ## Reject Alternatives
> > > > > > > >> > > >>>>>>>
> > > > > > > >> > > >>>>>>> none.
> > > > > > > >> > > >>>>>
> > > > > > > >> > > >>>>>
> > > > > > > >> > > >>>>>
> > > > > > > >> > > >>>>> --
> > > > > > > >> > > >>>>> Zike Yang
> > > > > > > >> > > >>>
> > > > > > > >> > > >>>
> > > > > > > >> > >
> > > > > > > >>
> > > > > > > >
> > > > > >
> > > > >
> > >
>

Re: [DISCUSS] PIP-139 : Support Broker send command to real close producer/consumer.

Posted by Michael Marshall <mm...@apache.org>.
> The geo-replication's configuration can be centralized by using
> one configuration store. But that doesn't change anything, we should
> provide the same behavior for both centralized and decentralized
> configuration store.

Perhaps I misused the word decentralized. I meant that the challenge
comes because geo-replication configurations for clusters are
independent from each other.

> The current challenge is
> users usually set replicated clusters for a namespace, if remove
> the replication configuration, the entire namespace will be affected.
> We have supported setting the replicated configuration for a topic [1],
> only for 2.10.0 or later.

Great point. I was focused on deleting namespaces and missed the case
where the user wants to delete a topic from a replicated namespace. I
agree that we should make it possible to delete these topics without
removing the namespace replication. Note that it is already possible
to delete these topics by deleting with force or by configuring
inactive topic deletion.

> we should provided steps for how to delete an unused replicated
> topic and what are the effects of not removing properly

I agree, and I think documentation coupled with helpful error logs
will be very helpful here.

Considering users can delete replicated topics by deleting with force,
are you saying we need to provide a non-force way to delete these
topics?

If we want to provide a path to normal deletion, we already have an
`isRemote` flag on producers, and the inactive topic deletion code
uses this flag to determine if there are non-replication producers
connected to a topic. We could modify the deletion logic for a global
topic so that it can be deleted as long as the only producers
connected are remote producers. My main concern is that normal
deletion could allow users to miss the nuance that they must also
delete the topic in the other cluster. This might be an overly
cautious concern, though.

Thanks,
Michael



On Tue, Mar 1, 2022 at 1:55 AM PengHui Li <pe...@apache.org> wrote:
>
> > I agree with you that it'd be nice to provide the same deletion
> behavior. However, because geo-replication's configuration is
> decentralized, I think namespace or topic deletion is more complicated
> than unreplicated deletion. Note that users cannot currently delete
> namespaces that are configured with remote replication clusters.
>
> The geo-replication's configuration can be centralized by using
> one configuration store. But that doesn't change anything, we should
> provide the same behavior for both centralized and decentralized
> configuration store.
>
> > Note that while a user cannot explicitly delete a replicated topic,
> they can remove the replication configuration in cluster A and cluster
> B, and then they are left with unreplicated namespaces and topics,
> which can be deleted.
>
> Yes, stopping the replication first is a more elegant approach,
> and it is also very easy to understand for users. The current challenge is
> users usually set replicated clusters for a namespace, if remove
> the replication configuration, the entire namespace will be affected.
> We have supported setting the replicated configuration for a topic [1],
> only for 2.10.0 or later.
>
> > In this case, the specific problem stems from a remote cluster
> targeting the local cluster. If the user only deletes the local topic,
> without removing the replication configuration, I don't think it
> should be surprising when the topic gets recreated (assuming auto
> topic creation is enabled).
>
> From the user's perspective, they don't have any producers/consumers,
> If the topic only has the producer of the replicator connect to the remote
> topic
> each other, in this case, I think we should allow users to delete the
> topic.
> Users will be surprised in this case, because they are not able to delete
> an unused topic, they are not able to change the replication configuration
> for
> a namespace.
>
> > In my view, I think it's okay to make it harder to delete
> geo-replicated namespaces or topics than local namespaces or topics.
> Failing requests with a good, informative error message protects
> users. Additionally, if users are able to delete a namespace or topic
> by force, that gives them a work around--although we should document
> the limitations of this workaround.
>
> I agree with it's okay to make it harder to delete a replicated topic, so
> for
> a replicated topic, users should delete the topic in each cluster, we should
> provided steps for how to delete an unused replicated topic and what are
> the effects of not removing properly, for example, if a topic has 2
> clusters, cluster-a
> and cluster-b, but only delete the topic from cluster-a, the topic will be
> created
> again if cluster-b write some new data into this topic, depends on the topic
> auto-creation.
>
> If we only support set replicated cluster for a topic, not a namespace,
> I like the approach that stops the replication first and then deletes the
> topic.
>
> [1] https://github.com/apache/pulsar/pull/12136
>
> Thanks,
> Penghui
>
> On Tue, Mar 1, 2022 at 2:27 PM Michael Marshall <mm...@apache.org>
> wrote:
>
> > > Either replicated topic or non-replicated topic, we should provide the
> > > same topic deletion behavior, the topic can be deleted if no active
> > > user's producers/consumers, should not be affected by the Pulsar
> > > internal producers/consumers.
> >
> > I agree with you that it'd be nice to provide the same deletion
> > behavior. However, because geo-replication's configuration is
> > decentralized, I think namespace or topic deletion is more complicated
> > than unreplicated deletion. Note that users cannot currently delete
> > namespaces that are configured with remote replication clusters.
> >
> > One challenge I see is the configuration in the remote cluster. If
> > cluster A is replicated to cluster B for namespace C, deleting a topic
> > in cluster B does not affect the configuration in cluster A. If we
> > make it easy to delete the topic in cluster B, we need to decide how
> > to handle failed replication in cluster A. In the same way, deleting
> > namespace C in cluster B doesn't delete it from cluster A. For
> > example, when should cluster A stop attempting to create producers to
> > cluster B for the deleted namespace/topic and how do we inform
> > operators that the replication is no longer working as configured. We
> > avoid several problems by failing the initial user request.
> >
> > Note that while a user cannot explicitly delete a replicated topic,
> > they can remove the replication configuration in cluster A and cluster
> > B, and then they are left with unreplicated namespaces and topics,
> > which can be deleted.
> >
> > > But for now, for a replicated topic, users are not able to delete it
> > because
> > > of the internal active producer, adding force delete option can delete
> > > the topic, but it will create again if users enabled topic auto-creation.
> >
> > In this case, the specific problem stems from a remote cluster
> > targeting the local cluster. If the user only deletes the local topic,
> > without removing the replication configuration, I don't think it
> > should be surprising when the topic gets recreated (assuming auto
> > topic creation is enabled).
> >
> > In my view, I think it's okay to make it harder to delete
> > geo-replicated namespaces or topics than local namespaces or topics.
> > Failing requests with a good, informative error message protects
> > users. Additionally, if users are able to delete a namespace or topic
> > by force, that gives them a work around--although we should document
> > the limitations of this workaround.
> >
> > I am not experienced with geo-replication, so please let me know if
> > any of my analysis doesn't align with the actual design.
> >
> > Thanks,
> > Michael
> >
> >
> > On Sun, Feb 27, 2022 at 8:42 PM guo jiwei <te...@apache.org> wrote:
> > >
> > > I have added the geo-replicator topic case and updated the PIP.
> > >
> > >
> > > Regards
> > > Jiwei Guo (Tboy)
> > >
> > >
> > > On Sun, Feb 27, 2022 at 1:00 PM PengHui Li <pe...@apache.org> wrote:
> > >
> > > > > To me, the main question is whether we create a custom error or
> > expand
> > > > the `CommandCloseProducer` command. I lean towards adding an error
> > > > because it will automatically be backward compatible based on the way
> > > > the client determines which errors are retriable. Although, I don't
> > > > have a strong opinion.
> > > >
> > > > I have the same opinion as you. But it looks like not able to fix the
> > > > geo-replication topic deletion.
> > > >
> > > > Either replicated topic or non-replicated topic, we should provide the
> > > > same topic deletion behavior, the topic can be deleted if no active
> > > > user's producers/consumers, should not be affected by the Pulsar
> > > > internal producers/consumers.
> > > >
> > > > But for now, for a replicated topic, users are not able to delete it
> > > > because
> > > > of the internal active producer, adding force delete option can delete
> > > > the topic, but it will create again if users enabled topic
> > auto-creation.
> > > >
> > > > I think in this case, it seems we don't have a chance to give the
> > producer
> > > > an exception, because the reconnection will connect to a new topic only
> > > > with
> > > > the same topic name as before.
> > > >
> > > > Thanks,
> > > > Penghui
> > > >
> > > > On Sat, Feb 26, 2022 at 2:24 PM Michael Marshall <mmarshall@apache.org
> > >
> > > > wrote:
> > > >
> > > > > > We need to determine the overall plan, not the implementation
> > > > > > at this moment.
> > > > >
> > > > > Great point, I agree.
> > > > >
> > > > > > Looks like we need an option in the
> > > > > > `CommandCloseProducer` to avoid the replication producer
> > > > > > reconnecting again.
> > > > >
> > > > > This is an important point. One of the primary requirements for
> > > > > geo-replication must also be that a replicated namespace or topic can
> > > > > be deleted and recreated without any problems or leaks in the broker.
> > > > > That requires that the producer is cleaned up properly when the
> > > > > namespace is deleted. I think we could achieve that with a new server
> > > > > error or an expansion of the `CommandCloseProducer` command.
> > > > >
> > > > > To me, the main question is whether we create a custom error or
> > expand
> > > > > the `CommandCloseProducer` command. I lean towards adding an error
> > > > > because it will automatically be backward compatible based on the way
> > > > > the client determines which errors are retriable. Although, I don't
> > > > > have a strong opinion.
> > > > >
> > > > > The key code paths that we need to consider are topic lookup and
> > topic
> > > > > creation. Both the http admin endpoint and the binary protocol
> > > > > endpoint interact with these code paths, and we should fail requests
> > > > > for namespaces which have the "deleted" field set to true.
> > > > >
> > > > > In looking at the code, we could modify the logic in
> > > > > `PulsarWebResource#checkLocalOrGetPeerReplicationCluster`. That
> > method
> > > > > is always called for topic lookups via HTTP or via the Pulsar
> > > > > binary protocol. If a namespace doesn't exist, that method currently
> > > > > returns a NOT_FOUND exception. In the binary protocol case, that
> > > > > exception is converted to a `ServerError.MetadataError`. We could
> > > > > modify the method like I do here [0], or we could add another case
> > and
> > > > > return a custom error message.
> > > > >
> > > > > We'd also need to prevent topic creation via the admin endpoint and
> > > > > the auto topic creation via the binary protocol. I believe this will
> > > > > be just as easy as the lookup case, but I haven't confirmed.
> > > > >
> > > > > Thanks,
> > > > > Michael
> > > > >
> > > > > [0]
> > > > >
> > > >
> > https://github.com/michaeljmarshall/pulsar/commit/93611a741af163587f79b88bc1c9f1acc7953512
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Feb 25, 2022 at 3:45 AM PengHui Li <pe...@apache.org>
> > wrote:
> > > > > >
> > > > > > The replicated topic deletion also has a similar problem.
> > > > > > If the topic auto-creation is enabled, we are not able to delete
> > > > > > a replicated topic, because the producer used for replication
> > > > > > will try to reconnect. Looks like we need an option in the
> > > > > > `CommandCloseProducer` to avoid the replication producer
> > > > > > reconnecting again.
> > > > > >
> > > > > > Penghui
> > > > > >
> > > > > > On Fri, Feb 25, 2022 at 5:26 PM PengHui Li <pe...@apache.org>
> > wrote:
> > > > > >
> > > > > > > > Penghui, are you suggesting that we implement the
> > namespace/tenant
> > > > > > > terminated logic after completing this PIP?
> > > > > > >
> > > > > > > I'm ok with both being implemented together or separated.
> > > > > > > We need to determine the overall plan, not the implementation
> > > > > > > at this moment.
> > > > > > >
> > > > > > > > For the sake of discussion, if we implement the namespace
> > > > terminated
> > > > > > > logic first, we could fulfill the underlying requirements for
> > this
> > > > PIP
> > > > > > > by returning a new non-retriable error response when a client
> > tries
> > > > to
> > > > > > > connect a producer or a consumer to a topic in a namespace that
> > is
> > > > > > > "terminated". If we do add the "namespace terminated" feature,
> > we'll
> > > > > > > need to add a non-retriable exception for this case, anyway. The
> > main
> > > > > > > advantage here is that we'd only need one expansion of the
> > protobuf
> > > > > > > instead of two. The downside is that the protocol for connected
> > > > > > > clients has a couple more roundtrips. The broker would disconnect
> > > > > > > connected clients and then fail their reconnection attempt with a
> > > > > > > non-retriable error.
> > > > > > >
> > > > > > > Regarding the namespace "terminated" concept, I just noticed
> > that we
> > > > > > > already have a "deleted" field in a namespace's policies [0].
> > There
> > > > is
> > > > > > > even a comment that says:
> > > > > > >
> > > > > > > Yes, if we can terminate the namespace first, we should return
> > > > > “namespace
> > > > > > > not found”
> > > > > > > if the producers and consumers try to reconnect, the client
> > should
> > > > stop
> > > > > > > reconnecting
> > > > > > > after getting this error. Looks like following this way, we don't
> > > > need
> > > > > to
> > > > > > > introduce any protocol changes anymore.
> > > > > > >
> > > > > > > > As I think about it more, I no longer think "terminated" is the
> > > > right
> > > > > > > term for what I proposed above. Our goal is to briefly prevent
> > any
> > > > > > > topic creation to ensure we can delete all sub resources for a
> > > > > > > namespace. On the other hand, a terminated topic isn't
> > necessarily
> > > > > > > short lived. If we want to apply the "terminated" term
> > unequivocally
> > > > > > > to both topics and namespaces, I think a terminated namespace
> > would
> > > > > > > need to be a namespace where all topics are in terminated state
> > and
> > > > no
> > > > > > > additional topics could be created. That's not the feature we're
> > > > > > > discussing here, though. Deleted seems like the right term to me,
> > > > > > > especially since we're already using it to prevent a race
> > condition
> > > > > > >
> > > > > > > Yes, topic termination has different meanings, deletion should
> > be the
> > > > > right
> > > > > > > term here.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Penghui
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Feb 25, 2022 at 12:19 PM Michael Marshall <
> > > > > mmarshall@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Regarding the namespace "terminated" concept, I just noticed
> > that we
> > > > > > >> already have a "deleted" field in a namespace's policies [0].
> > There
> > > > is
> > > > > > >> even a comment that says:
> > > > > > >>
> > > > > > >> > // set the policies to deleted so that somebody else cannot
> > > > acquire
> > > > > > >> this namespace
> > > > > > >>
> > > > > > >> I am not familiar with this feature, but it seems like this
> > policy
> > > > > > >> field could be checked before creating a topic in a namespace.
> > That
> > > > > > >> would remove certain races described above.
> > > > > > >>
> > > > > > >> As I think about it more, I no longer think "terminated" is the
> > > > right
> > > > > > >> term for what I proposed above. Our goal is to briefly prevent
> > any
> > > > > > >> topic creation to ensure we can delete all sub resources for a
> > > > > > >> namespace. On the other hand, a terminated topic isn't
> > necessarily
> > > > > > >> short lived. If we want to apply the "terminated" term
> > unequivocally
> > > > > > >> to both topics and namespaces, I think a terminated namespace
> > would
> > > > > > >> need to be a namespace where all topics are in terminated state
> > and
> > > > no
> > > > > > >> additional topics could be created. That's not the feature we're
> > > > > > >> discussing here, though. Deleted seems like the right term to
> > me,
> > > > > > >> especially since we're already using it to prevent a race
> > condition
> > > > > > >> [0].
> > > > > > >>
> > > > > > >> Thanks,
> > > > > > >> Michael
> > > > > > >>
> > > > > > >> [0]
> > > > > > >>
> > > > >
> > > >
> > https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/impl/NamespacesBase.java#L252-L262
> > > > > > >>
> > > > > > >> On Thu, Feb 24, 2022 at 9:58 PM Michael Marshall <
> > > > > mmarshall@apache.org>
> > > > > > >> wrote:
> > > > > > >> >
> > > > > > >> > Hi Dave,
> > > > > > >> >
> > > > > > >> > > automatically delete tenants and namespaces for not
> > containing
> > > > > topics
> > > > > > >> >
> > > > > > >> > I don't think that is what we are discussing. I agree that the
> > > > > initial
> > > > > > >> > email says just that, though, which is why I asked above:
> > > > > > >> >
> > > > > > >> > >> When there are no user-created topics under a namespace,
> > > > > > >> > >> Namespace should be deleted.
> > > > > > >> > > This is supposed to mean that the namespace should be able
> > to be
> > > > > > >> > > deleted, correct?
> > > > > > >> >
> > > > > > >> > Perhaps Mattison can clarify, too.
> > > > > > >> >
> > > > > > >> > My current understanding of the context for the PIP is that a
> > user
> > > > > > >> > call to delete a namespace without force can fail when a
> > producer
> > > > > > >> > reconnects to a deleted topic. The goal is to remove the race
> > > > > > >> > condition to ensure namespace deletion can succeed.
> > > > > > >> >
> > > > > > >> > Thanks,
> > > > > > >> > Michael
> > > > > > >> >
> > > > > > >> > On Thu, Feb 24, 2022 at 5:21 PM Dave Fisher <wa...@apache.org>
> > > > > wrote:
> > > > > > >> > >
> > > > > > >> > > Hi -
> > > > > > >> > >
> > > > > > >> > > I hope I’m understanding what’s being discussed.
> > > > > > >> > >
> > > > > > >> > > If we are going to automatically delete tenants and
> > namespaces
> > > > for
> > > > > > >> not containing topics then we need to make both of these
> > automatic
> > > > > actions
> > > > > > >> configurable with a default to NOT do so. Otherwise we break
> > > > existing
> > > > > use
> > > > > > >> cases.
> > > > > > >> > >
> > > > > > >> > > Automatic deletion of namespaces should be configurable at
> > both
> > > > > the
> > > > > > >> cluster and tenant level.
> > > > > > >> > >
> > > > > > >> > > Regards,
> > > > > > >> > > Dave
> > > > > > >> > >
> > > > > > >> > > > On Feb 24, 2022, at 2:25 PM, Michael Marshall <
> > > > > mmarshall@apache.org>
> > > > > > >> wrote:
> > > > > > >> > > >
> > > > > > >> > > >> The old producer/consumer should be closed after
> > applying the
> > > > > > >> changes from
> > > > > > >> > > >> this proposal.
> > > > > > >> > > >
> > > > > > >> > > > Penghui, are you suggesting that we implement the
> > > > > namespace/tenant
> > > > > > >> > > > terminated logic after completing this PIP?
> > > > > > >> > > >
> > > > > > >> > > > For the sake of discussion, if we implement the namespace
> > > > > terminated
> > > > > > >> > > > logic first, we could fulfill the underlying requirements
> > for
> > > > > this
> > > > > > >> PIP
> > > > > > >> > > > by returning a new non-retriable error response when a
> > client
> > > > > tries
> > > > > > >> to
> > > > > > >> > > > connect a producer or a consumer to a topic in a namespace
> > > > that
> > > > > is
> > > > > > >> > > > "terminated". If we do add the "namespace terminated"
> > feature,
> > > > > we'll
> > > > > > >> > > > need to add a non-retriable exception for this case,
> > anyway.
> > > > The
> > > > > > >> main
> > > > > > >> > > > advantage here is that we'd only need one expansion of the
> > > > > protobuf
> > > > > > >> > > > instead of two. The downside is that the protocol for
> > > > connected
> > > > > > >> > > > clients has a couple more roundtrips. The broker would
> > > > > disconnect
> > > > > > >> > > > connected clients and then fail their reconnection attempt
> > > > with
> > > > > a
> > > > > > >> > > > non-retriable error.
> > > > > > >> > > >
> > > > > > >> > > > Thanks,
> > > > > > >> > > > Michael
> > > > > > >> > > >
> > > > > > >> > > > On Thu, Feb 24, 2022 at 7:11 AM PengHui Li <
> > > > penghui@apache.org>
> > > > > > >> wrote:
> > > > > > >> > > >>
> > > > > > >> > > >>> If we want to solve this problem, we need to add some
> > sync
> > > > > > >> resources like
> > > > > > >> > > >> lock/state, I think it’s a harm for us, we don’t need to
> > do
> > > > > that.
> > > > > > >> > > >>
> > > > > > >> > > >> I think we can make the namespace/tenants to the inactive
> > > > state
> > > > > > >> first so
> > > > > > >> > > >> that we can avoid any new
> > > > > > >> > > >> producer/consumer connect to the topic under the
> > > > > namespace/tenant.
> > > > > > >> > > >>
> > > > > > >> > > >> The old producer/consumer should be closed after
> > applying the
> > > > > > >> changes from
> > > > > > >> > > >> this proposal.
> > > > > > >> > > >>
> > > > > > >> > > >> Thanks,
> > > > > > >> > > >> Penghui
> > > > > > >> > > >>
> > > > > > >> > > >> On Tue, Feb 8, 2022 at 5:47 PM mattison chao <
> > > > > > >> mattisonchao@gmail.com> wrote:
> > > > > > >> > > >>
> > > > > > >> > > >>>> This is supposed to mean that the namespace should be
> > able
> > > > > to be
> > > > > > >> > > >>>> deleted, correct?
> > > > > > >> > > >>>
> > > > > > >> > > >>> Yes, the main background is the user doesn’t have an
> > active
> > > > > > >> topic. so,
> > > > > > >> > > >>> they want to delete the namespace.
> > > > > > >> > > >>>
> > > > > > >> > > >>>> However, I think
> > > > > > >> > > >>>> we might still have a race condition that could make
> > tenant
> > > > > or
> > > > > > >> > > >>>> namespace deletion fail. Specifically, if a new
> > producer or
> > > > > > >> consumer
> > > > > > >> > > >>>> creates a topic after the namespace deletion has
> > started
> > > > but
> > > > > > >> > > >>>> before it is complete. Do you agree that the underlying
> > > > race
> > > > > > >> still
> > > > > > >> > > >>> exists?
> > > > > > >> > > >>>
> > > > > > >> > > >>> Yes, this condition exists. I think it’s not a big
> > problem
> > > > > > >> because the
> > > > > > >> > > >>> user doesn’t want to use this namespace anymore.
> > > > > > >> > > >>> If this scenario appears, they will get an error and
> > need to
> > > > > > >> delete it
> > > > > > >> > > >>> again.
> > > > > > >> > > >>>
> > > > > > >> > > >>>> What if we expand our usage of the "terminated"
> > feature to
> > > > > apply
> > > > > > >> to
> > > > > > >> > > >>>> namespaces (and tenants)? Then, a terminated namespace
> > can
> > > > > have
> > > > > > >> > > >>>> bundles and topics can be deleted but not created
> > (just as
> > > > a
> > > > > > >> terminated
> > > > > > >> > > >>>> topic cannot have any new messages published to it).
> > This
> > > > > would
> > > > > > >> take
> > > > > > >> > > >>>> care of all topic creation race conditions. We'd
> > probably
> > > > > need
> > > > > > >> to add
> > > > > > >> > > >>>> new protobuf exceptions for this feature.
> > > > > > >> > > >>>
> > > > > > >> > > >>>
> > > > > > >> > > >>> If we want to solve this problem, we need to add some
> > sync
> > > > > > >> resources like
> > > > > > >> > > >>> lock/state, I think it’s a harm for us, we don’t need
> > to do
> > > > > that.
> > > > > > >> > > >>>
> > > > > > >> > > >>> Thanks for your suggestions, let me know what you think.
> > > > > > >> > > >>>
> > > > > > >> > > >>> Best,
> > > > > > >> > > >>> Mattison
> > > > > > >> > > >>>
> > > > > > >> > > >>>> On Feb 1, 2022, at 2:26 PM, Michael Marshall <
> > > > > > >> mmarshall@apache.org>
> > > > > > >> > > >>> wrote:
> > > > > > >> > > >>>>
> > > > > > >> > > >>>> This proposal identifies an important issue that we
> > should
> > > > > > >> definitely
> > > > > > >> > > >>>> solve. I have some questions.
> > > > > > >> > > >>>>
> > > > > > >> > > >>>>> When there are no user-created topics under a
> > namespace,
> > > > > > >> > > >>>>> Namespace should be deleted.
> > > > > > >> > > >>>>
> > > > > > >> > > >>>> This is supposed to mean that the namespace should be
> > able
> > > > > to be
> > > > > > >> > > >>>> deleted, correct?
> > > > > > >> > > >>>>
> > > > > > >> > > >>>>> For this reason, we need to close the system topic
> > > > > > >> reader/producer
> > > > > > >> > > >>>>> first, then remove the system topic. finally, remove
> > the
> > > > > > >> namespace.
> > > > > > >> > > >>>>
> > > > > > >> > > >>>> I agree that expanding the protobuf CloseProducer and
> > > > > > >> CloseConsumer
> > > > > > >> > > >>>> commands could be valuable here. The expansion would
> > ensure
> > > > > that
> > > > > > >> > > >>>> producers and consumers don't attempt to reconnect.
> > > > However,
> > > > > I
> > > > > > >> think
> > > > > > >> > > >>>> we might still have a race condition that could make
> > tenant
> > > > > or
> > > > > > >> > > >>>> namespace deletion fail. Specifically, if a new
> > producer or
> > > > > > >> consumer
> > > > > > >> > > >>>> creates a topic after the namespace deletion has
> > started
> > > > but
> > > > > > >> > > >>>> before it is complete. Do you agree that the underlying
> > > > race
> > > > > > >> still
> > > > > > >> > > >>> exists?
> > > > > > >> > > >>>>
> > > > > > >> > > >>>> In my view, the fundamental problem here is that
> > deleting
> > > > > > >> certain Pulsar
> > > > > > >> > > >>>> resources takes time and, in a distributed system, that
> > > > means
> > > > > > >> race
> > > > > > >> > > >>>> conditions.
> > > > > > >> > > >>>>
> > > > > > >> > > >>>> What if we expand our usage of the "terminated"
> > feature to
> > > > > apply
> > > > > > >> to
> > > > > > >> > > >>>> namespaces (and tenants)? Then, a terminated namespace
> > can
> > > > > have
> > > > > > >> > > >>>> bundles and topics can be deleted but not created
> > (just as
> > > > a
> > > > > > >> terminated
> > > > > > >> > > >>>> topic cannot have any new messages published to it).
> > This
> > > > > would
> > > > > > >> take
> > > > > > >> > > >>>> care of all topic creation race conditions. We'd
> > probably
> > > > > need
> > > > > > >> to add
> > > > > > >> > > >>>> new protobuf exceptions for this feature.
> > > > > > >> > > >>>>
> > > > > > >> > > >>>> Thanks,
> > > > > > >> > > >>>> Michael
> > > > > > >> > > >>>>
> > > > > > >> > > >>>>
> > > > > > >> > > >>>> On Sat, Jan 29, 2022 at 7:25 PM Zike Yang
> > > > > > >> > > >>>> <zk...@streamnative.io.invalid> wrote:
> > > > > > >> > > >>>>>
> > > > > > >> > > >>>>> +1
> > > > > > >> > > >>>>>
> > > > > > >> > > >>>>>
> > > > > > >> > > >>>>> Thanks,
> > > > > > >> > > >>>>> Zike
> > > > > > >> > > >>>>>
> > > > > > >> > > >>>>> On Sat, Jan 29, 2022 at 12:30 PM guo jiwei <
> > > > > > >> technoboy@apache.org>
> > > > > > >> > > >>> wrote:
> > > > > > >> > > >>>>>>
> > > > > > >> > > >>>>>> Hi
> > > > > > >> > > >>>>>> The PIP link :
> > > > > https://github.com/apache/pulsar/issues/13989
> > > > > > >> > > >>>>>>
> > > > > > >> > > >>>>>> Regards
> > > > > > >> > > >>>>>> Jiwei Guo (Tboy)
> > > > > > >> > > >>>>>>
> > > > > > >> > > >>>>>>
> > > > > > >> > > >>>>>> On Sat, Jan 29, 2022 at 11:46 AM mattison chao <
> > > > > > >> mattisonchao@gmail.com
> > > > > > >> > > >>>>
> > > > > > >> > > >>>>>> wrote:
> > > > > > >> > > >>>>>>
> > > > > > >> > > >>>>>>> Hello everyone,
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>> I want to start a discussion about PIP-139 : Support
> > > > > Broker
> > > > > > >> send
> > > > > > >> > > >>> command
> > > > > > >> > > >>>>>>> to real close producer/consumer.
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>> This is the PIP document
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>> https://github.com/apache/pulsar/issues/13989 <
> > > > > > >> > > >>>>>>> https://github.com/apache/pulsar/issues/13979>
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>> Please check it out and feel free to share your
> > > > thoughts.
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>> Best,
> > > > > > >> > > >>>>>>> Mattison
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>> ———————— Pasted below for quoting convenience.
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>> Relation pull request:  #13337
> > > > > > >> > > >>>>>>> Authors: @Technoboy-  @mattisonchao
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>> ## Motivation
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>> Before we discuss this pip, I'd like to supplement
> > some
> > > > > > >> context to
> > > > > > >> > > >>> help
> > > > > > >> > > >>>>>>> contributors who don't want to read the original
> > pull
> > > > > request.
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>>> When there are no user-created topics under a
> > > > namespace,
> > > > > > >> Namespace
> > > > > > >> > > >>>>>>> should be deleted. But currently, the system topic
> > > > existed
> > > > > > >> and the
> > > > > > >> > > >>>>>>> reader/producer could auto-create the system which
> > may
> > > > > cause
> > > > > > >> the
> > > > > > >> > > >>> namespace
> > > > > > >> > > >>>>>>> deletion to fail.
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>> For this reason, we need to close the system topic
> > > > > > >> reader/producer
> > > > > > >> > > >>> first,
> > > > > > >> > > >>>>>>> then remove the system topic. finally, remove the
> > > > > namespace.
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>> Following this way, we first want to use
> > ``terminate``
> > > > to
> > > > > > >> solve this
> > > > > > >> > > >>>>>>> problem. then we found producers can disconnect, but
> > > > > > >> consumers are
> > > > > > >> > > >>> still
> > > > > > >> > > >>>>>>> alive. So, another PR #13960 wants to add consumers'
> > > > > closing
> > > > > > >> logic.
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>> After #13960, all things look good, but another
> > problem
> > > > > > >> appears. that
> > > > > > >> > > >>> is
> > > > > > >> > > >>>>>>> we need to wait until consumers completely consume
> > all
> > > > > > >> messages (this
> > > > > > >> > > >>> may
> > > > > > >> > > >>>>>>> make terminate topic so long and the operation
> > > > > timeout)then
> > > > > > >> get
> > > > > > >> > > >>>>>>> ``reachedEndOfTopic``. the relative code here :
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>
> > > > > > >>
> > > > >
> > > >
> > https://github.com/apache/pulsar/blob/07ef9231db8b844586b9217ee2d59237eb9c54b7/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/CompactorSubscription.java#L102-L106
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>> In the #13337 case, we need to force close consumers
> > > > > > >> immediately. So
> > > > > > >> > > >>> we
> > > > > > >> > > >>>>>>> write this PIP to discuss another way to resolve
> > this
> > > > > problem.
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>> ## Goal
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>> We can add a new field(`allow_reconnect`) in command
> > > > > > >> > > >>>>>>> ``CommandCloseProducer``/ ``CommandCloseConsumer``
> > to
> > > > > close
> > > > > > >> > > >>>>>>> producer/consumers immediately.
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>> ## API Changes
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>> - Add ``allow_reconnect`` to
> > ``CommandCloseProducer``;
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>> ```java
> > > > > > >> > > >>>>>>> **Before**
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>> message CommandCloseProducer {
> > > > > > >> > > >>>>>>>   required uint64 producer_id = 1;
> > > > > > >> > > >>>>>>>   required uint64 request_id = 2;
> > > > > > >> > > >>>>>>> }
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>> **After**
> > > > > > >> > > >>>>>>> message CommandCloseProducer {
> > > > > > >> > > >>>>>>>   required uint64 producer_id = 1;
> > > > > > >> > > >>>>>>>   required uint64 request_id = 2;
> > > > > > >> > > >>>>>>>   optional bool allow_reconnect = 3 [default =
> > true];
> > > > > > >> > > >>>>>>> }
> > > > > > >> > > >>>>>>> ```
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>> - Add ``allow_reconnect`` to
> > ``CommandCloseConsumer``
> > > > > > >> > > >>>>>>> ```java
> > > > > > >> > > >>>>>>> **Before**
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>> message CommandCloseConsumer {
> > > > > > >> > > >>>>>>>   required uint64 consumer_id = 1;
> > > > > > >> > > >>>>>>>   required uint64 request_id = 2;
> > > > > > >> > > >>>>>>> }
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>> **After**
> > > > > > >> > > >>>>>>> message CommandCloseConsumer {
> > > > > > >> > > >>>>>>>   required uint64 consumer_id = 1;
> > > > > > >> > > >>>>>>>   required uint64 request_id = 2;
> > > > > > >> > > >>>>>>>   optional bool allow_reconnect = 3 [default =
> > true];
> > > > > > >> > > >>>>>>> }
> > > > > > >> > > >>>>>>> ```
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>> ## Implementation
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>> ### ClientCnx - Producer:
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>> **Before**
> > > > > > >> > > >>>>>>> ```java
> > > > > > >> > > >>>>>>>   @Override
> > > > > > >> > > >>>>>>>   protected void
> > > > handleCloseProducer(CommandCloseProducer
> > > > > > >> > > >>> closeProducer)
> > > > > > >> > > >>>>>>> {
> > > > > > >> > > >>>>>>>       log.info("[{}] Broker notification of Closed
> > > > > producer:
> > > > > > >> {}",
> > > > > > >> > > >>>>>>> remoteAddress, closeProducer.getProducerId());
> > > > > > >> > > >>>>>>>       final long producerId =
> > > > > closeProducer.getProducerId();
> > > > > > >> > > >>>>>>>       ProducerImpl<?> producer =
> > > > > producers.get(producerId);
> > > > > > >> > > >>>>>>>       if (producer != null) {
> > > > > > >> > > >>>>>>>           producer.connectionClosed(this);
> > > > > > >> > > >>>>>>>       } else {
> > > > > > >> > > >>>>>>>           log.warn("Producer with id {} not found
> > while
> > > > > > >> closing
> > > > > > >> > > >>> producer
> > > > > > >> > > >>>>>>> ", producerId);
> > > > > > >> > > >>>>>>>       }
> > > > > > >> > > >>>>>>>   }
> > > > > > >> > > >>>>>>> ```
> > > > > > >> > > >>>>>>> After:
> > > > > > >> > > >>>>>>> ```java
> > > > > > >> > > >>>>>>>  @Override
> > > > > > >> > > >>>>>>>   protected void
> > > > handleCloseProducer(CommandCloseProducer
> > > > > > >> > > >>> closeProducer)
> > > > > > >> > > >>>>>>> {
> > > > > > >> > > >>>>>>>       log.info("[{}] Broker notification of Closed
> > > > > producer:
> > > > > > >> {}",
> > > > > > >> > > >>>>>>> remoteAddress, closeProducer.getProducerId());
> > > > > > >> > > >>>>>>>       final long producerId =
> > > > > closeProducer.getProducerId();
> > > > > > >> > > >>>>>>>       ProducerImpl<?> producer =
> > > > > producers.get(producerId);
> > > > > > >> > > >>>>>>>       if (producer != null) {
> > > > > > >> > > >>>>>>>           if (closeProducer.isAllowReconnect) {
> > > > > > >> > > >>>>>>>               producer.connectionClosed(this);
> > > > > > >> > > >>>>>>>           } else {
> > > > > > >> > > >>>>>>>               producer.closeAsync();
> > > > > > >> > > >>>>>>>           }
> > > > > > >> > > >>>>>>>       } else {
> > > > > > >> > > >>>>>>>           log.warn("Producer with id {} not found
> > while
> > > > > > >> closing
> > > > > > >> > > >>> producer
> > > > > > >> > > >>>>>>> ", producerId);
> > > > > > >> > > >>>>>>>       }
> > > > > > >> > > >>>>>>>   }
> > > > > > >> > > >>>>>>> ```
> > > > > > >> > > >>>>>>> ### ClientCnx - Consumer:
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>> **Before**
> > > > > > >> > > >>>>>>> ```java
> > > > > > >> > > >>>>>>>   @Override
> > > > > > >> > > >>>>>>>   protected void
> > > > handleCloseConsumer(CommandCloseConsumer
> > > > > > >> > > >>> closeConsumer)
> > > > > > >> > > >>>>>>> {
> > > > > > >> > > >>>>>>>       log.info("[{}] Broker notification of Closed
> > > > > consumer:
> > > > > > >> {}",
> > > > > > >> > > >>>>>>> remoteAddress, closeConsumer.getConsumerId());
> > > > > > >> > > >>>>>>>       final long consumerId =
> > > > > closeConsumer.getConsumerId();
> > > > > > >> > > >>>>>>>       ConsumerImpl<?> consumer =
> > > > > consumers.get(consumerId);
> > > > > > >> > > >>>>>>>       if (consumer != null) {
> > > > > > >> > > >>>>>>>           consumer.connectionClosed(this);
> > > > > > >> > > >>>>>>>       } else {
> > > > > > >> > > >>>>>>>           log.warn("Consumer with id {} not found
> > while
> > > > > > >> closing
> > > > > > >> > > >>> consumer
> > > > > > >> > > >>>>>>> ", consumerId);
> > > > > > >> > > >>>>>>>       }
> > > > > > >> > > >>>>>>>   }
> > > > > > >> > > >>>>>>> ```
> > > > > > >> > > >>>>>>> **After**
> > > > > > >> > > >>>>>>> ```java
> > > > > > >> > > >>>>>>>   @Override
> > > > > > >> > > >>>>>>>   protected void
> > > > handleCloseConsumer(CommandCloseConsumer
> > > > > > >> > > >>> closeConsumer)
> > > > > > >> > > >>>>>>> {
> > > > > > >> > > >>>>>>>       log.info("[{}] Broker notification of Closed
> > > > > consumer:
> > > > > > >> {}",
> > > > > > >> > > >>>>>>> remoteAddress, closeConsumer.getConsumerId());
> > > > > > >> > > >>>>>>>       final long consumerId =
> > > > > closeConsumer.getConsumerId();
> > > > > > >> > > >>>>>>>       ConsumerImpl<?> consumer =
> > > > > consumers.get(consumerId);
> > > > > > >> > > >>>>>>>       if (consumer != null) {
> > > > > > >> > > >>>>>>>           if (closeConsumer.isAllowReconnect) {
> > > > > > >> > > >>>>>>>               consumer.connectionClosed(this);
> > > > > > >> > > >>>>>>>           } else {
> > > > > > >> > > >>>>>>>               consumer.closeAsync();
> > > > > > >> > > >>>>>>>           }
> > > > > > >> > > >>>>>>>       } else {
> > > > > > >> > > >>>>>>>           log.warn("Consumer with id {} not found
> > while
> > > > > > >> closing
> > > > > > >> > > >>> consumer
> > > > > > >> > > >>>>>>> ", consumerId);
> > > > > > >> > > >>>>>>>       }
> > > > > > >> > > >>>>>>>   }
> > > > > > >> > > >>>>>>> ```
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>> ## Reject Alternatives
> > > > > > >> > > >>>>>>>
> > > > > > >> > > >>>>>>> none.
> > > > > > >> > > >>>>>
> > > > > > >> > > >>>>>
> > > > > > >> > > >>>>>
> > > > > > >> > > >>>>> --
> > > > > > >> > > >>>>> Zike Yang
> > > > > > >> > > >>>
> > > > > > >> > > >>>
> > > > > > >> > >
> > > > > > >>
> > > > > > >
> > > > >
> > > >
> >

Re: [DISCUSS] PIP-139 : Support Broker send command to real close producer/consumer.

Posted by PengHui Li <pe...@apache.org>.
> I agree with you that it'd be nice to provide the same deletion
behavior. However, because geo-replication's configuration is
decentralized, I think namespace or topic deletion is more complicated
than unreplicated deletion. Note that users cannot currently delete
namespaces that are configured with remote replication clusters.

The geo-replication's configuration can be centralized by using
one configuration store. But that doesn't change anything, we should
provide the same behavior for both centralized and decentralized
configuration store.

> Note that while a user cannot explicitly delete a replicated topic,
they can remove the replication configuration in cluster A and cluster
B, and then they are left with unreplicated namespaces and topics,
which can be deleted.

Yes, stopping the replication first is a more elegant approach,
and it is also very easy to understand for users. The current challenge is
users usually set replicated clusters for a namespace, if remove
the replication configuration, the entire namespace will be affected.
We have supported setting the replicated configuration for a topic [1],
only for 2.10.0 or later.

> In this case, the specific problem stems from a remote cluster
targeting the local cluster. If the user only deletes the local topic,
without removing the replication configuration, I don't think it
should be surprising when the topic gets recreated (assuming auto
topic creation is enabled).

From the user's perspective, they don't have any producers/consumers,
If the topic only has the producer of the replicator connect to the remote
topic
each other, in this case, I think we should allow users to delete the
topic.
Users will be surprised in this case, because they are not able to delete
an unused topic, they are not able to change the replication configuration
for
a namespace.

> In my view, I think it's okay to make it harder to delete
geo-replicated namespaces or topics than local namespaces or topics.
Failing requests with a good, informative error message protects
users. Additionally, if users are able to delete a namespace or topic
by force, that gives them a work around--although we should document
the limitations of this workaround.

I agree with it's okay to make it harder to delete a replicated topic, so
for
a replicated topic, users should delete the topic in each cluster, we should
provided steps for how to delete an unused replicated topic and what are
the effects of not removing properly, for example, if a topic has 2
clusters, cluster-a
and cluster-b, but only delete the topic from cluster-a, the topic will be
created
again if cluster-b write some new data into this topic, depends on the topic
auto-creation.

If we only support set replicated cluster for a topic, not a namespace,
I like the approach that stops the replication first and then deletes the
topic.

[1] https://github.com/apache/pulsar/pull/12136

Thanks,
Penghui

On Tue, Mar 1, 2022 at 2:27 PM Michael Marshall <mm...@apache.org>
wrote:

> > Either replicated topic or non-replicated topic, we should provide the
> > same topic deletion behavior, the topic can be deleted if no active
> > user's producers/consumers, should not be affected by the Pulsar
> > internal producers/consumers.
>
> I agree with you that it'd be nice to provide the same deletion
> behavior. However, because geo-replication's configuration is
> decentralized, I think namespace or topic deletion is more complicated
> than unreplicated deletion. Note that users cannot currently delete
> namespaces that are configured with remote replication clusters.
>
> One challenge I see is the configuration in the remote cluster. If
> cluster A is replicated to cluster B for namespace C, deleting a topic
> in cluster B does not affect the configuration in cluster A. If we
> make it easy to delete the topic in cluster B, we need to decide how
> to handle failed replication in cluster A. In the same way, deleting
> namespace C in cluster B doesn't delete it from cluster A. For
> example, when should cluster A stop attempting to create producers to
> cluster B for the deleted namespace/topic and how do we inform
> operators that the replication is no longer working as configured. We
> avoid several problems by failing the initial user request.
>
> Note that while a user cannot explicitly delete a replicated topic,
> they can remove the replication configuration in cluster A and cluster
> B, and then they are left with unreplicated namespaces and topics,
> which can be deleted.
>
> > But for now, for a replicated topic, users are not able to delete it
> because
> > of the internal active producer, adding force delete option can delete
> > the topic, but it will create again if users enabled topic auto-creation.
>
> In this case, the specific problem stems from a remote cluster
> targeting the local cluster. If the user only deletes the local topic,
> without removing the replication configuration, I don't think it
> should be surprising when the topic gets recreated (assuming auto
> topic creation is enabled).
>
> In my view, I think it's okay to make it harder to delete
> geo-replicated namespaces or topics than local namespaces or topics.
> Failing requests with a good, informative error message protects
> users. Additionally, if users are able to delete a namespace or topic
> by force, that gives them a work around--although we should document
> the limitations of this workaround.
>
> I am not experienced with geo-replication, so please let me know if
> any of my analysis doesn't align with the actual design.
>
> Thanks,
> Michael
>
>
> On Sun, Feb 27, 2022 at 8:42 PM guo jiwei <te...@apache.org> wrote:
> >
> > I have added the geo-replicator topic case and updated the PIP.
> >
> >
> > Regards
> > Jiwei Guo (Tboy)
> >
> >
> > On Sun, Feb 27, 2022 at 1:00 PM PengHui Li <pe...@apache.org> wrote:
> >
> > > > To me, the main question is whether we create a custom error or
> expand
> > > the `CommandCloseProducer` command. I lean towards adding an error
> > > because it will automatically be backward compatible based on the way
> > > the client determines which errors are retriable. Although, I don't
> > > have a strong opinion.
> > >
> > > I have the same opinion as you. But it looks like not able to fix the
> > > geo-replication topic deletion.
> > >
> > > Either replicated topic or non-replicated topic, we should provide the
> > > same topic deletion behavior, the topic can be deleted if no active
> > > user's producers/consumers, should not be affected by the Pulsar
> > > internal producers/consumers.
> > >
> > > But for now, for a replicated topic, users are not able to delete it
> > > because
> > > of the internal active producer, adding force delete option can delete
> > > the topic, but it will create again if users enabled topic
> auto-creation.
> > >
> > > I think in this case, it seems we don't have a chance to give the
> producer
> > > an exception, because the reconnection will connect to a new topic only
> > > with
> > > the same topic name as before.
> > >
> > > Thanks,
> > > Penghui
> > >
> > > On Sat, Feb 26, 2022 at 2:24 PM Michael Marshall <mmarshall@apache.org
> >
> > > wrote:
> > >
> > > > > We need to determine the overall plan, not the implementation
> > > > > at this moment.
> > > >
> > > > Great point, I agree.
> > > >
> > > > > Looks like we need an option in the
> > > > > `CommandCloseProducer` to avoid the replication producer
> > > > > reconnecting again.
> > > >
> > > > This is an important point. One of the primary requirements for
> > > > geo-replication must also be that a replicated namespace or topic can
> > > > be deleted and recreated without any problems or leaks in the broker.
> > > > That requires that the producer is cleaned up properly when the
> > > > namespace is deleted. I think we could achieve that with a new server
> > > > error or an expansion of the `CommandCloseProducer` command.
> > > >
> > > > To me, the main question is whether we create a custom error or
> expand
> > > > the `CommandCloseProducer` command. I lean towards adding an error
> > > > because it will automatically be backward compatible based on the way
> > > > the client determines which errors are retriable. Although, I don't
> > > > have a strong opinion.
> > > >
> > > > The key code paths that we need to consider are topic lookup and
> topic
> > > > creation. Both the http admin endpoint and the binary protocol
> > > > endpoint interact with these code paths, and we should fail requests
> > > > for namespaces which have the "deleted" field set to true.
> > > >
> > > > In looking at the code, we could modify the logic in
> > > > `PulsarWebResource#checkLocalOrGetPeerReplicationCluster`. That
> method
> > > > is always called for topic lookups via HTTP or via the Pulsar
> > > > binary protocol. If a namespace doesn't exist, that method currently
> > > > returns a NOT_FOUND exception. In the binary protocol case, that
> > > > exception is converted to a `ServerError.MetadataError`. We could
> > > > modify the method like I do here [0], or we could add another case
> and
> > > > return a custom error message.
> > > >
> > > > We'd also need to prevent topic creation via the admin endpoint and
> > > > the auto topic creation via the binary protocol. I believe this will
> > > > be just as easy as the lookup case, but I haven't confirmed.
> > > >
> > > > Thanks,
> > > > Michael
> > > >
> > > > [0]
> > > >
> > >
> https://github.com/michaeljmarshall/pulsar/commit/93611a741af163587f79b88bc1c9f1acc7953512
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, Feb 25, 2022 at 3:45 AM PengHui Li <pe...@apache.org>
> wrote:
> > > > >
> > > > > The replicated topic deletion also has a similar problem.
> > > > > If the topic auto-creation is enabled, we are not able to delete
> > > > > a replicated topic, because the producer used for replication
> > > > > will try to reconnect. Looks like we need an option in the
> > > > > `CommandCloseProducer` to avoid the replication producer
> > > > > reconnecting again.
> > > > >
> > > > > Penghui
> > > > >
> > > > > On Fri, Feb 25, 2022 at 5:26 PM PengHui Li <pe...@apache.org>
> wrote:
> > > > >
> > > > > > > Penghui, are you suggesting that we implement the
> namespace/tenant
> > > > > > terminated logic after completing this PIP?
> > > > > >
> > > > > > I'm ok with both being implemented together or separated.
> > > > > > We need to determine the overall plan, not the implementation
> > > > > > at this moment.
> > > > > >
> > > > > > > For the sake of discussion, if we implement the namespace
> > > terminated
> > > > > > logic first, we could fulfill the underlying requirements for
> this
> > > PIP
> > > > > > by returning a new non-retriable error response when a client
> tries
> > > to
> > > > > > connect a producer or a consumer to a topic in a namespace that
> is
> > > > > > "terminated". If we do add the "namespace terminated" feature,
> we'll
> > > > > > need to add a non-retriable exception for this case, anyway. The
> main
> > > > > > advantage here is that we'd only need one expansion of the
> protobuf
> > > > > > instead of two. The downside is that the protocol for connected
> > > > > > clients has a couple more roundtrips. The broker would disconnect
> > > > > > connected clients and then fail their reconnection attempt with a
> > > > > > non-retriable error.
> > > > > >
> > > > > > Regarding the namespace "terminated" concept, I just noticed
> that we
> > > > > > already have a "deleted" field in a namespace's policies [0].
> There
> > > is
> > > > > > even a comment that says:
> > > > > >
> > > > > > Yes, if we can terminate the namespace first, we should return
> > > > “namespace
> > > > > > not found”
> > > > > > if the producers and consumers try to reconnect, the client
> should
> > > stop
> > > > > > reconnecting
> > > > > > after getting this error. Looks like following this way, we don't
> > > need
> > > > to
> > > > > > introduce any protocol changes anymore.
> > > > > >
> > > > > > > As I think about it more, I no longer think "terminated" is the
> > > right
> > > > > > term for what I proposed above. Our goal is to briefly prevent
> any
> > > > > > topic creation to ensure we can delete all sub resources for a
> > > > > > namespace. On the other hand, a terminated topic isn't
> necessarily
> > > > > > short lived. If we want to apply the "terminated" term
> unequivocally
> > > > > > to both topics and namespaces, I think a terminated namespace
> would
> > > > > > need to be a namespace where all topics are in terminated state
> and
> > > no
> > > > > > additional topics could be created. That's not the feature we're
> > > > > > discussing here, though. Deleted seems like the right term to me,
> > > > > > especially since we're already using it to prevent a race
> condition
> > > > > >
> > > > > > Yes, topic termination has different meanings, deletion should
> be the
> > > > right
> > > > > > term here.
> > > > > >
> > > > > > Thanks,
> > > > > > Penghui
> > > > > >
> > > > > >
> > > > > > On Fri, Feb 25, 2022 at 12:19 PM Michael Marshall <
> > > > mmarshall@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > >> Regarding the namespace "terminated" concept, I just noticed
> that we
> > > > > >> already have a "deleted" field in a namespace's policies [0].
> There
> > > is
> > > > > >> even a comment that says:
> > > > > >>
> > > > > >> > // set the policies to deleted so that somebody else cannot
> > > acquire
> > > > > >> this namespace
> > > > > >>
> > > > > >> I am not familiar with this feature, but it seems like this
> policy
> > > > > >> field could be checked before creating a topic in a namespace.
> That
> > > > > >> would remove certain races described above.
> > > > > >>
> > > > > >> As I think about it more, I no longer think "terminated" is the
> > > right
> > > > > >> term for what I proposed above. Our goal is to briefly prevent
> any
> > > > > >> topic creation to ensure we can delete all sub resources for a
> > > > > >> namespace. On the other hand, a terminated topic isn't
> necessarily
> > > > > >> short lived. If we want to apply the "terminated" term
> unequivocally
> > > > > >> to both topics and namespaces, I think a terminated namespace
> would
> > > > > >> need to be a namespace where all topics are in terminated state
> and
> > > no
> > > > > >> additional topics could be created. That's not the feature we're
> > > > > >> discussing here, though. Deleted seems like the right term to
> me,
> > > > > >> especially since we're already using it to prevent a race
> condition
> > > > > >> [0].
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Michael
> > > > > >>
> > > > > >> [0]
> > > > > >>
> > > >
> > >
> https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/impl/NamespacesBase.java#L252-L262
> > > > > >>
> > > > > >> On Thu, Feb 24, 2022 at 9:58 PM Michael Marshall <
> > > > mmarshall@apache.org>
> > > > > >> wrote:
> > > > > >> >
> > > > > >> > Hi Dave,
> > > > > >> >
> > > > > >> > > automatically delete tenants and namespaces for not
> containing
> > > > topics
> > > > > >> >
> > > > > >> > I don't think that is what we are discussing. I agree that the
> > > > initial
> > > > > >> > email says just that, though, which is why I asked above:
> > > > > >> >
> > > > > >> > >> When there are no user-created topics under a namespace,
> > > > > >> > >> Namespace should be deleted.
> > > > > >> > > This is supposed to mean that the namespace should be able
> to be
> > > > > >> > > deleted, correct?
> > > > > >> >
> > > > > >> > Perhaps Mattison can clarify, too.
> > > > > >> >
> > > > > >> > My current understanding of the context for the PIP is that a
> user
> > > > > >> > call to delete a namespace without force can fail when a
> producer
> > > > > >> > reconnects to a deleted topic. The goal is to remove the race
> > > > > >> > condition to ensure namespace deletion can succeed.
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> > Michael
> > > > > >> >
> > > > > >> > On Thu, Feb 24, 2022 at 5:21 PM Dave Fisher <wa...@apache.org>
> > > > wrote:
> > > > > >> > >
> > > > > >> > > Hi -
> > > > > >> > >
> > > > > >> > > I hope I’m understanding what’s being discussed.
> > > > > >> > >
> > > > > >> > > If we are going to automatically delete tenants and
> namespaces
> > > for
> > > > > >> not containing topics then we need to make both of these
> automatic
> > > > actions
> > > > > >> configurable with a default to NOT do so. Otherwise we break
> > > existing
> > > > use
> > > > > >> cases.
> > > > > >> > >
> > > > > >> > > Automatic deletion of namespaces should be configurable at
> both
> > > > the
> > > > > >> cluster and tenant level.
> > > > > >> > >
> > > > > >> > > Regards,
> > > > > >> > > Dave
> > > > > >> > >
> > > > > >> > > > On Feb 24, 2022, at 2:25 PM, Michael Marshall <
> > > > mmarshall@apache.org>
> > > > > >> wrote:
> > > > > >> > > >
> > > > > >> > > >> The old producer/consumer should be closed after
> applying the
> > > > > >> changes from
> > > > > >> > > >> this proposal.
> > > > > >> > > >
> > > > > >> > > > Penghui, are you suggesting that we implement the
> > > > namespace/tenant
> > > > > >> > > > terminated logic after completing this PIP?
> > > > > >> > > >
> > > > > >> > > > For the sake of discussion, if we implement the namespace
> > > > terminated
> > > > > >> > > > logic first, we could fulfill the underlying requirements
> for
> > > > this
> > > > > >> PIP
> > > > > >> > > > by returning a new non-retriable error response when a
> client
> > > > tries
> > > > > >> to
> > > > > >> > > > connect a producer or a consumer to a topic in a namespace
> > > that
> > > > is
> > > > > >> > > > "terminated". If we do add the "namespace terminated"
> feature,
> > > > we'll
> > > > > >> > > > need to add a non-retriable exception for this case,
> anyway.
> > > The
> > > > > >> main
> > > > > >> > > > advantage here is that we'd only need one expansion of the
> > > > protobuf
> > > > > >> > > > instead of two. The downside is that the protocol for
> > > connected
> > > > > >> > > > clients has a couple more roundtrips. The broker would
> > > > disconnect
> > > > > >> > > > connected clients and then fail their reconnection attempt
> > > with
> > > > a
> > > > > >> > > > non-retriable error.
> > > > > >> > > >
> > > > > >> > > > Thanks,
> > > > > >> > > > Michael
> > > > > >> > > >
> > > > > >> > > > On Thu, Feb 24, 2022 at 7:11 AM PengHui Li <
> > > penghui@apache.org>
> > > > > >> wrote:
> > > > > >> > > >>
> > > > > >> > > >>> If we want to solve this problem, we need to add some
> sync
> > > > > >> resources like
> > > > > >> > > >> lock/state, I think it’s a harm for us, we don’t need to
> do
> > > > that.
> > > > > >> > > >>
> > > > > >> > > >> I think we can make the namespace/tenants to the inactive
> > > state
> > > > > >> first so
> > > > > >> > > >> that we can avoid any new
> > > > > >> > > >> producer/consumer connect to the topic under the
> > > > namespace/tenant.
> > > > > >> > > >>
> > > > > >> > > >> The old producer/consumer should be closed after
> applying the
> > > > > >> changes from
> > > > > >> > > >> this proposal.
> > > > > >> > > >>
> > > > > >> > > >> Thanks,
> > > > > >> > > >> Penghui
> > > > > >> > > >>
> > > > > >> > > >> On Tue, Feb 8, 2022 at 5:47 PM mattison chao <
> > > > > >> mattisonchao@gmail.com> wrote:
> > > > > >> > > >>
> > > > > >> > > >>>> This is supposed to mean that the namespace should be
> able
> > > > to be
> > > > > >> > > >>>> deleted, correct?
> > > > > >> > > >>>
> > > > > >> > > >>> Yes, the main background is the user doesn’t have an
> active
> > > > > >> topic. so,
> > > > > >> > > >>> they want to delete the namespace.
> > > > > >> > > >>>
> > > > > >> > > >>>> However, I think
> > > > > >> > > >>>> we might still have a race condition that could make
> tenant
> > > > or
> > > > > >> > > >>>> namespace deletion fail. Specifically, if a new
> producer or
> > > > > >> consumer
> > > > > >> > > >>>> creates a topic after the namespace deletion has
> started
> > > but
> > > > > >> > > >>>> before it is complete. Do you agree that the underlying
> > > race
> > > > > >> still
> > > > > >> > > >>> exists?
> > > > > >> > > >>>
> > > > > >> > > >>> Yes, this condition exists. I think it’s not a big
> problem
> > > > > >> because the
> > > > > >> > > >>> user doesn’t want to use this namespace anymore.
> > > > > >> > > >>> If this scenario appears, they will get an error and
> need to
> > > > > >> delete it
> > > > > >> > > >>> again.
> > > > > >> > > >>>
> > > > > >> > > >>>> What if we expand our usage of the "terminated"
> feature to
> > > > apply
> > > > > >> to
> > > > > >> > > >>>> namespaces (and tenants)? Then, a terminated namespace
> can
> > > > have
> > > > > >> > > >>>> bundles and topics can be deleted but not created
> (just as
> > > a
> > > > > >> terminated
> > > > > >> > > >>>> topic cannot have any new messages published to it).
> This
> > > > would
> > > > > >> take
> > > > > >> > > >>>> care of all topic creation race conditions. We'd
> probably
> > > > need
> > > > > >> to add
> > > > > >> > > >>>> new protobuf exceptions for this feature.
> > > > > >> > > >>>
> > > > > >> > > >>>
> > > > > >> > > >>> If we want to solve this problem, we need to add some
> sync
> > > > > >> resources like
> > > > > >> > > >>> lock/state, I think it’s a harm for us, we don’t need
> to do
> > > > that.
> > > > > >> > > >>>
> > > > > >> > > >>> Thanks for your suggestions, let me know what you think.
> > > > > >> > > >>>
> > > > > >> > > >>> Best,
> > > > > >> > > >>> Mattison
> > > > > >> > > >>>
> > > > > >> > > >>>> On Feb 1, 2022, at 2:26 PM, Michael Marshall <
> > > > > >> mmarshall@apache.org>
> > > > > >> > > >>> wrote:
> > > > > >> > > >>>>
> > > > > >> > > >>>> This proposal identifies an important issue that we
> should
> > > > > >> definitely
> > > > > >> > > >>>> solve. I have some questions.
> > > > > >> > > >>>>
> > > > > >> > > >>>>> When there are no user-created topics under a
> namespace,
> > > > > >> > > >>>>> Namespace should be deleted.
> > > > > >> > > >>>>
> > > > > >> > > >>>> This is supposed to mean that the namespace should be
> able
> > > > to be
> > > > > >> > > >>>> deleted, correct?
> > > > > >> > > >>>>
> > > > > >> > > >>>>> For this reason, we need to close the system topic
> > > > > >> reader/producer
> > > > > >> > > >>>>> first, then remove the system topic. finally, remove
> the
> > > > > >> namespace.
> > > > > >> > > >>>>
> > > > > >> > > >>>> I agree that expanding the protobuf CloseProducer and
> > > > > >> CloseConsumer
> > > > > >> > > >>>> commands could be valuable here. The expansion would
> ensure
> > > > that
> > > > > >> > > >>>> producers and consumers don't attempt to reconnect.
> > > However,
> > > > I
> > > > > >> think
> > > > > >> > > >>>> we might still have a race condition that could make
> tenant
> > > > or
> > > > > >> > > >>>> namespace deletion fail. Specifically, if a new
> producer or
> > > > > >> consumer
> > > > > >> > > >>>> creates a topic after the namespace deletion has
> started
> > > but
> > > > > >> > > >>>> before it is complete. Do you agree that the underlying
> > > race
> > > > > >> still
> > > > > >> > > >>> exists?
> > > > > >> > > >>>>
> > > > > >> > > >>>> In my view, the fundamental problem here is that
> deleting
> > > > > >> certain Pulsar
> > > > > >> > > >>>> resources takes time and, in a distributed system, that
> > > means
> > > > > >> race
> > > > > >> > > >>>> conditions.
> > > > > >> > > >>>>
> > > > > >> > > >>>> What if we expand our usage of the "terminated"
> feature to
> > > > apply
> > > > > >> to
> > > > > >> > > >>>> namespaces (and tenants)? Then, a terminated namespace
> can
> > > > have
> > > > > >> > > >>>> bundles and topics can be deleted but not created
> (just as
> > > a
> > > > > >> terminated
> > > > > >> > > >>>> topic cannot have any new messages published to it).
> This
> > > > would
> > > > > >> take
> > > > > >> > > >>>> care of all topic creation race conditions. We'd
> probably
> > > > need
> > > > > >> to add
> > > > > >> > > >>>> new protobuf exceptions for this feature.
> > > > > >> > > >>>>
> > > > > >> > > >>>> Thanks,
> > > > > >> > > >>>> Michael
> > > > > >> > > >>>>
> > > > > >> > > >>>>
> > > > > >> > > >>>> On Sat, Jan 29, 2022 at 7:25 PM Zike Yang
> > > > > >> > > >>>> <zk...@streamnative.io.invalid> wrote:
> > > > > >> > > >>>>>
> > > > > >> > > >>>>> +1
> > > > > >> > > >>>>>
> > > > > >> > > >>>>>
> > > > > >> > > >>>>> Thanks,
> > > > > >> > > >>>>> Zike
> > > > > >> > > >>>>>
> > > > > >> > > >>>>> On Sat, Jan 29, 2022 at 12:30 PM guo jiwei <
> > > > > >> technoboy@apache.org>
> > > > > >> > > >>> wrote:
> > > > > >> > > >>>>>>
> > > > > >> > > >>>>>> Hi
> > > > > >> > > >>>>>> The PIP link :
> > > > https://github.com/apache/pulsar/issues/13989
> > > > > >> > > >>>>>>
> > > > > >> > > >>>>>> Regards
> > > > > >> > > >>>>>> Jiwei Guo (Tboy)
> > > > > >> > > >>>>>>
> > > > > >> > > >>>>>>
> > > > > >> > > >>>>>> On Sat, Jan 29, 2022 at 11:46 AM mattison chao <
> > > > > >> mattisonchao@gmail.com
> > > > > >> > > >>>>
> > > > > >> > > >>>>>> wrote:
> > > > > >> > > >>>>>>
> > > > > >> > > >>>>>>> Hello everyone,
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>> I want to start a discussion about PIP-139 : Support
> > > > Broker
> > > > > >> send
> > > > > >> > > >>> command
> > > > > >> > > >>>>>>> to real close producer/consumer.
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>> This is the PIP document
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>> https://github.com/apache/pulsar/issues/13989 <
> > > > > >> > > >>>>>>> https://github.com/apache/pulsar/issues/13979>
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>> Please check it out and feel free to share your
> > > thoughts.
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>> Best,
> > > > > >> > > >>>>>>> Mattison
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>> ———————— Pasted below for quoting convenience.
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>> Relation pull request:  #13337
> > > > > >> > > >>>>>>> Authors: @Technoboy-  @mattisonchao
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>> ## Motivation
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>> Before we discuss this pip, I'd like to supplement
> some
> > > > > >> context to
> > > > > >> > > >>> help
> > > > > >> > > >>>>>>> contributors who don't want to read the original
> pull
> > > > request.
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>>> When there are no user-created topics under a
> > > namespace,
> > > > > >> Namespace
> > > > > >> > > >>>>>>> should be deleted. But currently, the system topic
> > > existed
> > > > > >> and the
> > > > > >> > > >>>>>>> reader/producer could auto-create the system which
> may
> > > > cause
> > > > > >> the
> > > > > >> > > >>> namespace
> > > > > >> > > >>>>>>> deletion to fail.
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>> For this reason, we need to close the system topic
> > > > > >> reader/producer
> > > > > >> > > >>> first,
> > > > > >> > > >>>>>>> then remove the system topic. finally, remove the
> > > > namespace.
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>> Following this way, we first want to use
> ``terminate``
> > > to
> > > > > >> solve this
> > > > > >> > > >>>>>>> problem. then we found producers can disconnect, but
> > > > > >> consumers are
> > > > > >> > > >>> still
> > > > > >> > > >>>>>>> alive. So, another PR #13960 wants to add consumers'
> > > > closing
> > > > > >> logic.
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>> After #13960, all things look good, but another
> problem
> > > > > >> appears. that
> > > > > >> > > >>> is
> > > > > >> > > >>>>>>> we need to wait until consumers completely consume
> all
> > > > > >> messages (this
> > > > > >> > > >>> may
> > > > > >> > > >>>>>>> make terminate topic so long and the operation
> > > > timeout)then
> > > > > >> get
> > > > > >> > > >>>>>>> ``reachedEndOfTopic``. the relative code here :
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>
> > > > > >>
> > > >
> > >
> https://github.com/apache/pulsar/blob/07ef9231db8b844586b9217ee2d59237eb9c54b7/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/CompactorSubscription.java#L102-L106
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>> In the #13337 case, we need to force close consumers
> > > > > >> immediately. So
> > > > > >> > > >>> we
> > > > > >> > > >>>>>>> write this PIP to discuss another way to resolve
> this
> > > > problem.
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>> ## Goal
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>> We can add a new field(`allow_reconnect`) in command
> > > > > >> > > >>>>>>> ``CommandCloseProducer``/ ``CommandCloseConsumer``
> to
> > > > close
> > > > > >> > > >>>>>>> producer/consumers immediately.
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>> ## API Changes
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>> - Add ``allow_reconnect`` to
> ``CommandCloseProducer``;
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>> ```java
> > > > > >> > > >>>>>>> **Before**
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>> message CommandCloseProducer {
> > > > > >> > > >>>>>>>   required uint64 producer_id = 1;
> > > > > >> > > >>>>>>>   required uint64 request_id = 2;
> > > > > >> > > >>>>>>> }
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>> **After**
> > > > > >> > > >>>>>>> message CommandCloseProducer {
> > > > > >> > > >>>>>>>   required uint64 producer_id = 1;
> > > > > >> > > >>>>>>>   required uint64 request_id = 2;
> > > > > >> > > >>>>>>>   optional bool allow_reconnect = 3 [default =
> true];
> > > > > >> > > >>>>>>> }
> > > > > >> > > >>>>>>> ```
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>> - Add ``allow_reconnect`` to
> ``CommandCloseConsumer``
> > > > > >> > > >>>>>>> ```java
> > > > > >> > > >>>>>>> **Before**
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>> message CommandCloseConsumer {
> > > > > >> > > >>>>>>>   required uint64 consumer_id = 1;
> > > > > >> > > >>>>>>>   required uint64 request_id = 2;
> > > > > >> > > >>>>>>> }
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>> **After**
> > > > > >> > > >>>>>>> message CommandCloseConsumer {
> > > > > >> > > >>>>>>>   required uint64 consumer_id = 1;
> > > > > >> > > >>>>>>>   required uint64 request_id = 2;
> > > > > >> > > >>>>>>>   optional bool allow_reconnect = 3 [default =
> true];
> > > > > >> > > >>>>>>> }
> > > > > >> > > >>>>>>> ```
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>> ## Implementation
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>> ### ClientCnx - Producer:
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>> **Before**
> > > > > >> > > >>>>>>> ```java
> > > > > >> > > >>>>>>>   @Override
> > > > > >> > > >>>>>>>   protected void
> > > handleCloseProducer(CommandCloseProducer
> > > > > >> > > >>> closeProducer)
> > > > > >> > > >>>>>>> {
> > > > > >> > > >>>>>>>       log.info("[{}] Broker notification of Closed
> > > > producer:
> > > > > >> {}",
> > > > > >> > > >>>>>>> remoteAddress, closeProducer.getProducerId());
> > > > > >> > > >>>>>>>       final long producerId =
> > > > closeProducer.getProducerId();
> > > > > >> > > >>>>>>>       ProducerImpl<?> producer =
> > > > producers.get(producerId);
> > > > > >> > > >>>>>>>       if (producer != null) {
> > > > > >> > > >>>>>>>           producer.connectionClosed(this);
> > > > > >> > > >>>>>>>       } else {
> > > > > >> > > >>>>>>>           log.warn("Producer with id {} not found
> while
> > > > > >> closing
> > > > > >> > > >>> producer
> > > > > >> > > >>>>>>> ", producerId);
> > > > > >> > > >>>>>>>       }
> > > > > >> > > >>>>>>>   }
> > > > > >> > > >>>>>>> ```
> > > > > >> > > >>>>>>> After:
> > > > > >> > > >>>>>>> ```java
> > > > > >> > > >>>>>>>  @Override
> > > > > >> > > >>>>>>>   protected void
> > > handleCloseProducer(CommandCloseProducer
> > > > > >> > > >>> closeProducer)
> > > > > >> > > >>>>>>> {
> > > > > >> > > >>>>>>>       log.info("[{}] Broker notification of Closed
> > > > producer:
> > > > > >> {}",
> > > > > >> > > >>>>>>> remoteAddress, closeProducer.getProducerId());
> > > > > >> > > >>>>>>>       final long producerId =
> > > > closeProducer.getProducerId();
> > > > > >> > > >>>>>>>       ProducerImpl<?> producer =
> > > > producers.get(producerId);
> > > > > >> > > >>>>>>>       if (producer != null) {
> > > > > >> > > >>>>>>>           if (closeProducer.isAllowReconnect) {
> > > > > >> > > >>>>>>>               producer.connectionClosed(this);
> > > > > >> > > >>>>>>>           } else {
> > > > > >> > > >>>>>>>               producer.closeAsync();
> > > > > >> > > >>>>>>>           }
> > > > > >> > > >>>>>>>       } else {
> > > > > >> > > >>>>>>>           log.warn("Producer with id {} not found
> while
> > > > > >> closing
> > > > > >> > > >>> producer
> > > > > >> > > >>>>>>> ", producerId);
> > > > > >> > > >>>>>>>       }
> > > > > >> > > >>>>>>>   }
> > > > > >> > > >>>>>>> ```
> > > > > >> > > >>>>>>> ### ClientCnx - Consumer:
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>> **Before**
> > > > > >> > > >>>>>>> ```java
> > > > > >> > > >>>>>>>   @Override
> > > > > >> > > >>>>>>>   protected void
> > > handleCloseConsumer(CommandCloseConsumer
> > > > > >> > > >>> closeConsumer)
> > > > > >> > > >>>>>>> {
> > > > > >> > > >>>>>>>       log.info("[{}] Broker notification of Closed
> > > > consumer:
> > > > > >> {}",
> > > > > >> > > >>>>>>> remoteAddress, closeConsumer.getConsumerId());
> > > > > >> > > >>>>>>>       final long consumerId =
> > > > closeConsumer.getConsumerId();
> > > > > >> > > >>>>>>>       ConsumerImpl<?> consumer =
> > > > consumers.get(consumerId);
> > > > > >> > > >>>>>>>       if (consumer != null) {
> > > > > >> > > >>>>>>>           consumer.connectionClosed(this);
> > > > > >> > > >>>>>>>       } else {
> > > > > >> > > >>>>>>>           log.warn("Consumer with id {} not found
> while
> > > > > >> closing
> > > > > >> > > >>> consumer
> > > > > >> > > >>>>>>> ", consumerId);
> > > > > >> > > >>>>>>>       }
> > > > > >> > > >>>>>>>   }
> > > > > >> > > >>>>>>> ```
> > > > > >> > > >>>>>>> **After**
> > > > > >> > > >>>>>>> ```java
> > > > > >> > > >>>>>>>   @Override
> > > > > >> > > >>>>>>>   protected void
> > > handleCloseConsumer(CommandCloseConsumer
> > > > > >> > > >>> closeConsumer)
> > > > > >> > > >>>>>>> {
> > > > > >> > > >>>>>>>       log.info("[{}] Broker notification of Closed
> > > > consumer:
> > > > > >> {}",
> > > > > >> > > >>>>>>> remoteAddress, closeConsumer.getConsumerId());
> > > > > >> > > >>>>>>>       final long consumerId =
> > > > closeConsumer.getConsumerId();
> > > > > >> > > >>>>>>>       ConsumerImpl<?> consumer =
> > > > consumers.get(consumerId);
> > > > > >> > > >>>>>>>       if (consumer != null) {
> > > > > >> > > >>>>>>>           if (closeConsumer.isAllowReconnect) {
> > > > > >> > > >>>>>>>               consumer.connectionClosed(this);
> > > > > >> > > >>>>>>>           } else {
> > > > > >> > > >>>>>>>               consumer.closeAsync();
> > > > > >> > > >>>>>>>           }
> > > > > >> > > >>>>>>>       } else {
> > > > > >> > > >>>>>>>           log.warn("Consumer with id {} not found
> while
> > > > > >> closing
> > > > > >> > > >>> consumer
> > > > > >> > > >>>>>>> ", consumerId);
> > > > > >> > > >>>>>>>       }
> > > > > >> > > >>>>>>>   }
> > > > > >> > > >>>>>>> ```
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>> ## Reject Alternatives
> > > > > >> > > >>>>>>>
> > > > > >> > > >>>>>>> none.
> > > > > >> > > >>>>>
> > > > > >> > > >>>>>
> > > > > >> > > >>>>>
> > > > > >> > > >>>>> --
> > > > > >> > > >>>>> Zike Yang
> > > > > >> > > >>>
> > > > > >> > > >>>
> > > > > >> > >
> > > > > >>
> > > > > >
> > > >
> > >
>

Re: [DISCUSS] PIP-139 : Support Broker send command to real close producer/consumer.

Posted by Michael Marshall <mm...@apache.org>.
> Either replicated topic or non-replicated topic, we should provide the
> same topic deletion behavior, the topic can be deleted if no active
> user's producers/consumers, should not be affected by the Pulsar
> internal producers/consumers.

I agree with you that it'd be nice to provide the same deletion
behavior. However, because geo-replication's configuration is
decentralized, I think namespace or topic deletion is more complicated
than unreplicated deletion. Note that users cannot currently delete
namespaces that are configured with remote replication clusters.

One challenge I see is the configuration in the remote cluster. If
cluster A is replicated to cluster B for namespace C, deleting a topic
in cluster B does not affect the configuration in cluster A. If we
make it easy to delete the topic in cluster B, we need to decide how
to handle failed replication in cluster A. In the same way, deleting
namespace C in cluster B doesn't delete it from cluster A. For
example, when should cluster A stop attempting to create producers to
cluster B for the deleted namespace/topic and how do we inform
operators that the replication is no longer working as configured. We
avoid several problems by failing the initial user request.

Note that while a user cannot explicitly delete a replicated topic,
they can remove the replication configuration in cluster A and cluster
B, and then they are left with unreplicated namespaces and topics,
which can be deleted.

> But for now, for a replicated topic, users are not able to delete it because
> of the internal active producer, adding force delete option can delete
> the topic, but it will create again if users enabled topic auto-creation.

In this case, the specific problem stems from a remote cluster
targeting the local cluster. If the user only deletes the local topic,
without removing the replication configuration, I don't think it
should be surprising when the topic gets recreated (assuming auto
topic creation is enabled).

In my view, I think it's okay to make it harder to delete
geo-replicated namespaces or topics than local namespaces or topics.
Failing requests with a good, informative error message protects
users. Additionally, if users are able to delete a namespace or topic
by force, that gives them a work around--although we should document
the limitations of this workaround.

I am not experienced with geo-replication, so please let me know if
any of my analysis doesn't align with the actual design.

Thanks,
Michael


On Sun, Feb 27, 2022 at 8:42 PM guo jiwei <te...@apache.org> wrote:
>
> I have added the geo-replicator topic case and updated the PIP.
>
>
> Regards
> Jiwei Guo (Tboy)
>
>
> On Sun, Feb 27, 2022 at 1:00 PM PengHui Li <pe...@apache.org> wrote:
>
> > > To me, the main question is whether we create a custom error or expand
> > the `CommandCloseProducer` command. I lean towards adding an error
> > because it will automatically be backward compatible based on the way
> > the client determines which errors are retriable. Although, I don't
> > have a strong opinion.
> >
> > I have the same opinion as you. But it looks like not able to fix the
> > geo-replication topic deletion.
> >
> > Either replicated topic or non-replicated topic, we should provide the
> > same topic deletion behavior, the topic can be deleted if no active
> > user's producers/consumers, should not be affected by the Pulsar
> > internal producers/consumers.
> >
> > But for now, for a replicated topic, users are not able to delete it
> > because
> > of the internal active producer, adding force delete option can delete
> > the topic, but it will create again if users enabled topic auto-creation.
> >
> > I think in this case, it seems we don't have a chance to give the producer
> > an exception, because the reconnection will connect to a new topic only
> > with
> > the same topic name as before.
> >
> > Thanks,
> > Penghui
> >
> > On Sat, Feb 26, 2022 at 2:24 PM Michael Marshall <mm...@apache.org>
> > wrote:
> >
> > > > We need to determine the overall plan, not the implementation
> > > > at this moment.
> > >
> > > Great point, I agree.
> > >
> > > > Looks like we need an option in the
> > > > `CommandCloseProducer` to avoid the replication producer
> > > > reconnecting again.
> > >
> > > This is an important point. One of the primary requirements for
> > > geo-replication must also be that a replicated namespace or topic can
> > > be deleted and recreated without any problems or leaks in the broker.
> > > That requires that the producer is cleaned up properly when the
> > > namespace is deleted. I think we could achieve that with a new server
> > > error or an expansion of the `CommandCloseProducer` command.
> > >
> > > To me, the main question is whether we create a custom error or expand
> > > the `CommandCloseProducer` command. I lean towards adding an error
> > > because it will automatically be backward compatible based on the way
> > > the client determines which errors are retriable. Although, I don't
> > > have a strong opinion.
> > >
> > > The key code paths that we need to consider are topic lookup and topic
> > > creation. Both the http admin endpoint and the binary protocol
> > > endpoint interact with these code paths, and we should fail requests
> > > for namespaces which have the "deleted" field set to true.
> > >
> > > In looking at the code, we could modify the logic in
> > > `PulsarWebResource#checkLocalOrGetPeerReplicationCluster`. That method
> > > is always called for topic lookups via HTTP or via the Pulsar
> > > binary protocol. If a namespace doesn't exist, that method currently
> > > returns a NOT_FOUND exception. In the binary protocol case, that
> > > exception is converted to a `ServerError.MetadataError`. We could
> > > modify the method like I do here [0], or we could add another case and
> > > return a custom error message.
> > >
> > > We'd also need to prevent topic creation via the admin endpoint and
> > > the auto topic creation via the binary protocol. I believe this will
> > > be just as easy as the lookup case, but I haven't confirmed.
> > >
> > > Thanks,
> > > Michael
> > >
> > > [0]
> > >
> > https://github.com/michaeljmarshall/pulsar/commit/93611a741af163587f79b88bc1c9f1acc7953512
> > >
> > >
> > >
> > >
> > > On Fri, Feb 25, 2022 at 3:45 AM PengHui Li <pe...@apache.org> wrote:
> > > >
> > > > The replicated topic deletion also has a similar problem.
> > > > If the topic auto-creation is enabled, we are not able to delete
> > > > a replicated topic, because the producer used for replication
> > > > will try to reconnect. Looks like we need an option in the
> > > > `CommandCloseProducer` to avoid the replication producer
> > > > reconnecting again.
> > > >
> > > > Penghui
> > > >
> > > > On Fri, Feb 25, 2022 at 5:26 PM PengHui Li <pe...@apache.org> wrote:
> > > >
> > > > > > Penghui, are you suggesting that we implement the namespace/tenant
> > > > > terminated logic after completing this PIP?
> > > > >
> > > > > I'm ok with both being implemented together or separated.
> > > > > We need to determine the overall plan, not the implementation
> > > > > at this moment.
> > > > >
> > > > > > For the sake of discussion, if we implement the namespace
> > terminated
> > > > > logic first, we could fulfill the underlying requirements for this
> > PIP
> > > > > by returning a new non-retriable error response when a client tries
> > to
> > > > > connect a producer or a consumer to a topic in a namespace that is
> > > > > "terminated". If we do add the "namespace terminated" feature, we'll
> > > > > need to add a non-retriable exception for this case, anyway. The main
> > > > > advantage here is that we'd only need one expansion of the protobuf
> > > > > instead of two. The downside is that the protocol for connected
> > > > > clients has a couple more roundtrips. The broker would disconnect
> > > > > connected clients and then fail their reconnection attempt with a
> > > > > non-retriable error.
> > > > >
> > > > > Regarding the namespace "terminated" concept, I just noticed that we
> > > > > already have a "deleted" field in a namespace's policies [0]. There
> > is
> > > > > even a comment that says:
> > > > >
> > > > > Yes, if we can terminate the namespace first, we should return
> > > “namespace
> > > > > not found”
> > > > > if the producers and consumers try to reconnect, the client should
> > stop
> > > > > reconnecting
> > > > > after getting this error. Looks like following this way, we don't
> > need
> > > to
> > > > > introduce any protocol changes anymore.
> > > > >
> > > > > > As I think about it more, I no longer think "terminated" is the
> > right
> > > > > term for what I proposed above. Our goal is to briefly prevent any
> > > > > topic creation to ensure we can delete all sub resources for a
> > > > > namespace. On the other hand, a terminated topic isn't necessarily
> > > > > short lived. If we want to apply the "terminated" term unequivocally
> > > > > to both topics and namespaces, I think a terminated namespace would
> > > > > need to be a namespace where all topics are in terminated state and
> > no
> > > > > additional topics could be created. That's not the feature we're
> > > > > discussing here, though. Deleted seems like the right term to me,
> > > > > especially since we're already using it to prevent a race condition
> > > > >
> > > > > Yes, topic termination has different meanings, deletion should be the
> > > right
> > > > > term here.
> > > > >
> > > > > Thanks,
> > > > > Penghui
> > > > >
> > > > >
> > > > > On Fri, Feb 25, 2022 at 12:19 PM Michael Marshall <
> > > mmarshall@apache.org>
> > > > > wrote:
> > > > >
> > > > >> Regarding the namespace "terminated" concept, I just noticed that we
> > > > >> already have a "deleted" field in a namespace's policies [0]. There
> > is
> > > > >> even a comment that says:
> > > > >>
> > > > >> > // set the policies to deleted so that somebody else cannot
> > acquire
> > > > >> this namespace
> > > > >>
> > > > >> I am not familiar with this feature, but it seems like this policy
> > > > >> field could be checked before creating a topic in a namespace. That
> > > > >> would remove certain races described above.
> > > > >>
> > > > >> As I think about it more, I no longer think "terminated" is the
> > right
> > > > >> term for what I proposed above. Our goal is to briefly prevent any
> > > > >> topic creation to ensure we can delete all sub resources for a
> > > > >> namespace. On the other hand, a terminated topic isn't necessarily
> > > > >> short lived. If we want to apply the "terminated" term unequivocally
> > > > >> to both topics and namespaces, I think a terminated namespace would
> > > > >> need to be a namespace where all topics are in terminated state and
> > no
> > > > >> additional topics could be created. That's not the feature we're
> > > > >> discussing here, though. Deleted seems like the right term to me,
> > > > >> especially since we're already using it to prevent a race condition
> > > > >> [0].
> > > > >>
> > > > >> Thanks,
> > > > >> Michael
> > > > >>
> > > > >> [0]
> > > > >>
> > >
> > https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/impl/NamespacesBase.java#L252-L262
> > > > >>
> > > > >> On Thu, Feb 24, 2022 at 9:58 PM Michael Marshall <
> > > mmarshall@apache.org>
> > > > >> wrote:
> > > > >> >
> > > > >> > Hi Dave,
> > > > >> >
> > > > >> > > automatically delete tenants and namespaces for not containing
> > > topics
> > > > >> >
> > > > >> > I don't think that is what we are discussing. I agree that the
> > > initial
> > > > >> > email says just that, though, which is why I asked above:
> > > > >> >
> > > > >> > >> When there are no user-created topics under a namespace,
> > > > >> > >> Namespace should be deleted.
> > > > >> > > This is supposed to mean that the namespace should be able to be
> > > > >> > > deleted, correct?
> > > > >> >
> > > > >> > Perhaps Mattison can clarify, too.
> > > > >> >
> > > > >> > My current understanding of the context for the PIP is that a user
> > > > >> > call to delete a namespace without force can fail when a producer
> > > > >> > reconnects to a deleted topic. The goal is to remove the race
> > > > >> > condition to ensure namespace deletion can succeed.
> > > > >> >
> > > > >> > Thanks,
> > > > >> > Michael
> > > > >> >
> > > > >> > On Thu, Feb 24, 2022 at 5:21 PM Dave Fisher <wa...@apache.org>
> > > wrote:
> > > > >> > >
> > > > >> > > Hi -
> > > > >> > >
> > > > >> > > I hope I’m understanding what’s being discussed.
> > > > >> > >
> > > > >> > > If we are going to automatically delete tenants and namespaces
> > for
> > > > >> not containing topics then we need to make both of these automatic
> > > actions
> > > > >> configurable with a default to NOT do so. Otherwise we break
> > existing
> > > use
> > > > >> cases.
> > > > >> > >
> > > > >> > > Automatic deletion of namespaces should be configurable at both
> > > the
> > > > >> cluster and tenant level.
> > > > >> > >
> > > > >> > > Regards,
> > > > >> > > Dave
> > > > >> > >
> > > > >> > > > On Feb 24, 2022, at 2:25 PM, Michael Marshall <
> > > mmarshall@apache.org>
> > > > >> wrote:
> > > > >> > > >
> > > > >> > > >> The old producer/consumer should be closed after applying the
> > > > >> changes from
> > > > >> > > >> this proposal.
> > > > >> > > >
> > > > >> > > > Penghui, are you suggesting that we implement the
> > > namespace/tenant
> > > > >> > > > terminated logic after completing this PIP?
> > > > >> > > >
> > > > >> > > > For the sake of discussion, if we implement the namespace
> > > terminated
> > > > >> > > > logic first, we could fulfill the underlying requirements for
> > > this
> > > > >> PIP
> > > > >> > > > by returning a new non-retriable error response when a client
> > > tries
> > > > >> to
> > > > >> > > > connect a producer or a consumer to a topic in a namespace
> > that
> > > is
> > > > >> > > > "terminated". If we do add the "namespace terminated" feature,
> > > we'll
> > > > >> > > > need to add a non-retriable exception for this case, anyway.
> > The
> > > > >> main
> > > > >> > > > advantage here is that we'd only need one expansion of the
> > > protobuf
> > > > >> > > > instead of two. The downside is that the protocol for
> > connected
> > > > >> > > > clients has a couple more roundtrips. The broker would
> > > disconnect
> > > > >> > > > connected clients and then fail their reconnection attempt
> > with
> > > a
> > > > >> > > > non-retriable error.
> > > > >> > > >
> > > > >> > > > Thanks,
> > > > >> > > > Michael
> > > > >> > > >
> > > > >> > > > On Thu, Feb 24, 2022 at 7:11 AM PengHui Li <
> > penghui@apache.org>
> > > > >> wrote:
> > > > >> > > >>
> > > > >> > > >>> If we want to solve this problem, we need to add some sync
> > > > >> resources like
> > > > >> > > >> lock/state, I think it’s a harm for us, we don’t need to do
> > > that.
> > > > >> > > >>
> > > > >> > > >> I think we can make the namespace/tenants to the inactive
> > state
> > > > >> first so
> > > > >> > > >> that we can avoid any new
> > > > >> > > >> producer/consumer connect to the topic under the
> > > namespace/tenant.
> > > > >> > > >>
> > > > >> > > >> The old producer/consumer should be closed after applying the
> > > > >> changes from
> > > > >> > > >> this proposal.
> > > > >> > > >>
> > > > >> > > >> Thanks,
> > > > >> > > >> Penghui
> > > > >> > > >>
> > > > >> > > >> On Tue, Feb 8, 2022 at 5:47 PM mattison chao <
> > > > >> mattisonchao@gmail.com> wrote:
> > > > >> > > >>
> > > > >> > > >>>> This is supposed to mean that the namespace should be able
> > > to be
> > > > >> > > >>>> deleted, correct?
> > > > >> > > >>>
> > > > >> > > >>> Yes, the main background is the user doesn’t have an active
> > > > >> topic. so,
> > > > >> > > >>> they want to delete the namespace.
> > > > >> > > >>>
> > > > >> > > >>>> However, I think
> > > > >> > > >>>> we might still have a race condition that could make tenant
> > > or
> > > > >> > > >>>> namespace deletion fail. Specifically, if a new producer or
> > > > >> consumer
> > > > >> > > >>>> creates a topic after the namespace deletion has started
> > but
> > > > >> > > >>>> before it is complete. Do you agree that the underlying
> > race
> > > > >> still
> > > > >> > > >>> exists?
> > > > >> > > >>>
> > > > >> > > >>> Yes, this condition exists. I think it’s not a big problem
> > > > >> because the
> > > > >> > > >>> user doesn’t want to use this namespace anymore.
> > > > >> > > >>> If this scenario appears, they will get an error and need to
> > > > >> delete it
> > > > >> > > >>> again.
> > > > >> > > >>>
> > > > >> > > >>>> What if we expand our usage of the "terminated" feature to
> > > apply
> > > > >> to
> > > > >> > > >>>> namespaces (and tenants)? Then, a terminated namespace can
> > > have
> > > > >> > > >>>> bundles and topics can be deleted but not created (just as
> > a
> > > > >> terminated
> > > > >> > > >>>> topic cannot have any new messages published to it). This
> > > would
> > > > >> take
> > > > >> > > >>>> care of all topic creation race conditions. We'd probably
> > > need
> > > > >> to add
> > > > >> > > >>>> new protobuf exceptions for this feature.
> > > > >> > > >>>
> > > > >> > > >>>
> > > > >> > > >>> If we want to solve this problem, we need to add some sync
> > > > >> resources like
> > > > >> > > >>> lock/state, I think it’s a harm for us, we don’t need to do
> > > that.
> > > > >> > > >>>
> > > > >> > > >>> Thanks for your suggestions, let me know what you think.
> > > > >> > > >>>
> > > > >> > > >>> Best,
> > > > >> > > >>> Mattison
> > > > >> > > >>>
> > > > >> > > >>>> On Feb 1, 2022, at 2:26 PM, Michael Marshall <
> > > > >> mmarshall@apache.org>
> > > > >> > > >>> wrote:
> > > > >> > > >>>>
> > > > >> > > >>>> This proposal identifies an important issue that we should
> > > > >> definitely
> > > > >> > > >>>> solve. I have some questions.
> > > > >> > > >>>>
> > > > >> > > >>>>> When there are no user-created topics under a namespace,
> > > > >> > > >>>>> Namespace should be deleted.
> > > > >> > > >>>>
> > > > >> > > >>>> This is supposed to mean that the namespace should be able
> > > to be
> > > > >> > > >>>> deleted, correct?
> > > > >> > > >>>>
> > > > >> > > >>>>> For this reason, we need to close the system topic
> > > > >> reader/producer
> > > > >> > > >>>>> first, then remove the system topic. finally, remove the
> > > > >> namespace.
> > > > >> > > >>>>
> > > > >> > > >>>> I agree that expanding the protobuf CloseProducer and
> > > > >> CloseConsumer
> > > > >> > > >>>> commands could be valuable here. The expansion would ensure
> > > that
> > > > >> > > >>>> producers and consumers don't attempt to reconnect.
> > However,
> > > I
> > > > >> think
> > > > >> > > >>>> we might still have a race condition that could make tenant
> > > or
> > > > >> > > >>>> namespace deletion fail. Specifically, if a new producer or
> > > > >> consumer
> > > > >> > > >>>> creates a topic after the namespace deletion has started
> > but
> > > > >> > > >>>> before it is complete. Do you agree that the underlying
> > race
> > > > >> still
> > > > >> > > >>> exists?
> > > > >> > > >>>>
> > > > >> > > >>>> In my view, the fundamental problem here is that deleting
> > > > >> certain Pulsar
> > > > >> > > >>>> resources takes time and, in a distributed system, that
> > means
> > > > >> race
> > > > >> > > >>>> conditions.
> > > > >> > > >>>>
> > > > >> > > >>>> What if we expand our usage of the "terminated" feature to
> > > apply
> > > > >> to
> > > > >> > > >>>> namespaces (and tenants)? Then, a terminated namespace can
> > > have
> > > > >> > > >>>> bundles and topics can be deleted but not created (just as
> > a
> > > > >> terminated
> > > > >> > > >>>> topic cannot have any new messages published to it). This
> > > would
> > > > >> take
> > > > >> > > >>>> care of all topic creation race conditions. We'd probably
> > > need
> > > > >> to add
> > > > >> > > >>>> new protobuf exceptions for this feature.
> > > > >> > > >>>>
> > > > >> > > >>>> Thanks,
> > > > >> > > >>>> Michael
> > > > >> > > >>>>
> > > > >> > > >>>>
> > > > >> > > >>>> On Sat, Jan 29, 2022 at 7:25 PM Zike Yang
> > > > >> > > >>>> <zk...@streamnative.io.invalid> wrote:
> > > > >> > > >>>>>
> > > > >> > > >>>>> +1
> > > > >> > > >>>>>
> > > > >> > > >>>>>
> > > > >> > > >>>>> Thanks,
> > > > >> > > >>>>> Zike
> > > > >> > > >>>>>
> > > > >> > > >>>>> On Sat, Jan 29, 2022 at 12:30 PM guo jiwei <
> > > > >> technoboy@apache.org>
> > > > >> > > >>> wrote:
> > > > >> > > >>>>>>
> > > > >> > > >>>>>> Hi
> > > > >> > > >>>>>> The PIP link :
> > > https://github.com/apache/pulsar/issues/13989
> > > > >> > > >>>>>>
> > > > >> > > >>>>>> Regards
> > > > >> > > >>>>>> Jiwei Guo (Tboy)
> > > > >> > > >>>>>>
> > > > >> > > >>>>>>
> > > > >> > > >>>>>> On Sat, Jan 29, 2022 at 11:46 AM mattison chao <
> > > > >> mattisonchao@gmail.com
> > > > >> > > >>>>
> > > > >> > > >>>>>> wrote:
> > > > >> > > >>>>>>
> > > > >> > > >>>>>>> Hello everyone,
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>> I want to start a discussion about PIP-139 : Support
> > > Broker
> > > > >> send
> > > > >> > > >>> command
> > > > >> > > >>>>>>> to real close producer/consumer.
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>> This is the PIP document
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>> https://github.com/apache/pulsar/issues/13989 <
> > > > >> > > >>>>>>> https://github.com/apache/pulsar/issues/13979>
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>> Please check it out and feel free to share your
> > thoughts.
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>> Best,
> > > > >> > > >>>>>>> Mattison
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>> ———————— Pasted below for quoting convenience.
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>> Relation pull request:  #13337
> > > > >> > > >>>>>>> Authors: @Technoboy-  @mattisonchao
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>> ## Motivation
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>> Before we discuss this pip, I'd like to supplement some
> > > > >> context to
> > > > >> > > >>> help
> > > > >> > > >>>>>>> contributors who don't want to read the original pull
> > > request.
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>>> When there are no user-created topics under a
> > namespace,
> > > > >> Namespace
> > > > >> > > >>>>>>> should be deleted. But currently, the system topic
> > existed
> > > > >> and the
> > > > >> > > >>>>>>> reader/producer could auto-create the system which may
> > > cause
> > > > >> the
> > > > >> > > >>> namespace
> > > > >> > > >>>>>>> deletion to fail.
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>> For this reason, we need to close the system topic
> > > > >> reader/producer
> > > > >> > > >>> first,
> > > > >> > > >>>>>>> then remove the system topic. finally, remove the
> > > namespace.
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>> Following this way, we first want to use ``terminate``
> > to
> > > > >> solve this
> > > > >> > > >>>>>>> problem. then we found producers can disconnect, but
> > > > >> consumers are
> > > > >> > > >>> still
> > > > >> > > >>>>>>> alive. So, another PR #13960 wants to add consumers'
> > > closing
> > > > >> logic.
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>> After #13960, all things look good, but another problem
> > > > >> appears. that
> > > > >> > > >>> is
> > > > >> > > >>>>>>> we need to wait until consumers completely consume all
> > > > >> messages (this
> > > > >> > > >>> may
> > > > >> > > >>>>>>> make terminate topic so long and the operation
> > > timeout)then
> > > > >> get
> > > > >> > > >>>>>>> ``reachedEndOfTopic``. the relative code here :
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>>
> > > > >> > > >>>
> > > > >>
> > >
> > https://github.com/apache/pulsar/blob/07ef9231db8b844586b9217ee2d59237eb9c54b7/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/CompactorSubscription.java#L102-L106
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>> In the #13337 case, we need to force close consumers
> > > > >> immediately. So
> > > > >> > > >>> we
> > > > >> > > >>>>>>> write this PIP to discuss another way to resolve this
> > > problem.
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>> ## Goal
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>> We can add a new field(`allow_reconnect`) in command
> > > > >> > > >>>>>>> ``CommandCloseProducer``/ ``CommandCloseConsumer`` to
> > > close
> > > > >> > > >>>>>>> producer/consumers immediately.
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>> ## API Changes
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>> - Add ``allow_reconnect`` to ``CommandCloseProducer``;
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>> ```java
> > > > >> > > >>>>>>> **Before**
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>> message CommandCloseProducer {
> > > > >> > > >>>>>>>   required uint64 producer_id = 1;
> > > > >> > > >>>>>>>   required uint64 request_id = 2;
> > > > >> > > >>>>>>> }
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>> **After**
> > > > >> > > >>>>>>> message CommandCloseProducer {
> > > > >> > > >>>>>>>   required uint64 producer_id = 1;
> > > > >> > > >>>>>>>   required uint64 request_id = 2;
> > > > >> > > >>>>>>>   optional bool allow_reconnect = 3 [default = true];
> > > > >> > > >>>>>>> }
> > > > >> > > >>>>>>> ```
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>> - Add ``allow_reconnect`` to ``CommandCloseConsumer``
> > > > >> > > >>>>>>> ```java
> > > > >> > > >>>>>>> **Before**
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>> message CommandCloseConsumer {
> > > > >> > > >>>>>>>   required uint64 consumer_id = 1;
> > > > >> > > >>>>>>>   required uint64 request_id = 2;
> > > > >> > > >>>>>>> }
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>> **After**
> > > > >> > > >>>>>>> message CommandCloseConsumer {
> > > > >> > > >>>>>>>   required uint64 consumer_id = 1;
> > > > >> > > >>>>>>>   required uint64 request_id = 2;
> > > > >> > > >>>>>>>   optional bool allow_reconnect = 3 [default = true];
> > > > >> > > >>>>>>> }
> > > > >> > > >>>>>>> ```
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>> ## Implementation
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>> ### ClientCnx - Producer:
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>> **Before**
> > > > >> > > >>>>>>> ```java
> > > > >> > > >>>>>>>   @Override
> > > > >> > > >>>>>>>   protected void
> > handleCloseProducer(CommandCloseProducer
> > > > >> > > >>> closeProducer)
> > > > >> > > >>>>>>> {
> > > > >> > > >>>>>>>       log.info("[{}] Broker notification of Closed
> > > producer:
> > > > >> {}",
> > > > >> > > >>>>>>> remoteAddress, closeProducer.getProducerId());
> > > > >> > > >>>>>>>       final long producerId =
> > > closeProducer.getProducerId();
> > > > >> > > >>>>>>>       ProducerImpl<?> producer =
> > > producers.get(producerId);
> > > > >> > > >>>>>>>       if (producer != null) {
> > > > >> > > >>>>>>>           producer.connectionClosed(this);
> > > > >> > > >>>>>>>       } else {
> > > > >> > > >>>>>>>           log.warn("Producer with id {} not found while
> > > > >> closing
> > > > >> > > >>> producer
> > > > >> > > >>>>>>> ", producerId);
> > > > >> > > >>>>>>>       }
> > > > >> > > >>>>>>>   }
> > > > >> > > >>>>>>> ```
> > > > >> > > >>>>>>> After:
> > > > >> > > >>>>>>> ```java
> > > > >> > > >>>>>>>  @Override
> > > > >> > > >>>>>>>   protected void
> > handleCloseProducer(CommandCloseProducer
> > > > >> > > >>> closeProducer)
> > > > >> > > >>>>>>> {
> > > > >> > > >>>>>>>       log.info("[{}] Broker notification of Closed
> > > producer:
> > > > >> {}",
> > > > >> > > >>>>>>> remoteAddress, closeProducer.getProducerId());
> > > > >> > > >>>>>>>       final long producerId =
> > > closeProducer.getProducerId();
> > > > >> > > >>>>>>>       ProducerImpl<?> producer =
> > > producers.get(producerId);
> > > > >> > > >>>>>>>       if (producer != null) {
> > > > >> > > >>>>>>>           if (closeProducer.isAllowReconnect) {
> > > > >> > > >>>>>>>               producer.connectionClosed(this);
> > > > >> > > >>>>>>>           } else {
> > > > >> > > >>>>>>>               producer.closeAsync();
> > > > >> > > >>>>>>>           }
> > > > >> > > >>>>>>>       } else {
> > > > >> > > >>>>>>>           log.warn("Producer with id {} not found while
> > > > >> closing
> > > > >> > > >>> producer
> > > > >> > > >>>>>>> ", producerId);
> > > > >> > > >>>>>>>       }
> > > > >> > > >>>>>>>   }
> > > > >> > > >>>>>>> ```
> > > > >> > > >>>>>>> ### ClientCnx - Consumer:
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>> **Before**
> > > > >> > > >>>>>>> ```java
> > > > >> > > >>>>>>>   @Override
> > > > >> > > >>>>>>>   protected void
> > handleCloseConsumer(CommandCloseConsumer
> > > > >> > > >>> closeConsumer)
> > > > >> > > >>>>>>> {
> > > > >> > > >>>>>>>       log.info("[{}] Broker notification of Closed
> > > consumer:
> > > > >> {}",
> > > > >> > > >>>>>>> remoteAddress, closeConsumer.getConsumerId());
> > > > >> > > >>>>>>>       final long consumerId =
> > > closeConsumer.getConsumerId();
> > > > >> > > >>>>>>>       ConsumerImpl<?> consumer =
> > > consumers.get(consumerId);
> > > > >> > > >>>>>>>       if (consumer != null) {
> > > > >> > > >>>>>>>           consumer.connectionClosed(this);
> > > > >> > > >>>>>>>       } else {
> > > > >> > > >>>>>>>           log.warn("Consumer with id {} not found while
> > > > >> closing
> > > > >> > > >>> consumer
> > > > >> > > >>>>>>> ", consumerId);
> > > > >> > > >>>>>>>       }
> > > > >> > > >>>>>>>   }
> > > > >> > > >>>>>>> ```
> > > > >> > > >>>>>>> **After**
> > > > >> > > >>>>>>> ```java
> > > > >> > > >>>>>>>   @Override
> > > > >> > > >>>>>>>   protected void
> > handleCloseConsumer(CommandCloseConsumer
> > > > >> > > >>> closeConsumer)
> > > > >> > > >>>>>>> {
> > > > >> > > >>>>>>>       log.info("[{}] Broker notification of Closed
> > > consumer:
> > > > >> {}",
> > > > >> > > >>>>>>> remoteAddress, closeConsumer.getConsumerId());
> > > > >> > > >>>>>>>       final long consumerId =
> > > closeConsumer.getConsumerId();
> > > > >> > > >>>>>>>       ConsumerImpl<?> consumer =
> > > consumers.get(consumerId);
> > > > >> > > >>>>>>>       if (consumer != null) {
> > > > >> > > >>>>>>>           if (closeConsumer.isAllowReconnect) {
> > > > >> > > >>>>>>>               consumer.connectionClosed(this);
> > > > >> > > >>>>>>>           } else {
> > > > >> > > >>>>>>>               consumer.closeAsync();
> > > > >> > > >>>>>>>           }
> > > > >> > > >>>>>>>       } else {
> > > > >> > > >>>>>>>           log.warn("Consumer with id {} not found while
> > > > >> closing
> > > > >> > > >>> consumer
> > > > >> > > >>>>>>> ", consumerId);
> > > > >> > > >>>>>>>       }
> > > > >> > > >>>>>>>   }
> > > > >> > > >>>>>>> ```
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>> ## Reject Alternatives
> > > > >> > > >>>>>>>
> > > > >> > > >>>>>>> none.
> > > > >> > > >>>>>
> > > > >> > > >>>>>
> > > > >> > > >>>>>
> > > > >> > > >>>>> --
> > > > >> > > >>>>> Zike Yang
> > > > >> > > >>>
> > > > >> > > >>>
> > > > >> > >
> > > > >>
> > > > >
> > >
> >

Re: [DISCUSS] PIP-139 : Support Broker send command to real close producer/consumer.

Posted by guo jiwei <te...@apache.org>.
I have added the geo-replicator topic case and updated the PIP.


Regards
Jiwei Guo (Tboy)


On Sun, Feb 27, 2022 at 1:00 PM PengHui Li <pe...@apache.org> wrote:

> > To me, the main question is whether we create a custom error or expand
> the `CommandCloseProducer` command. I lean towards adding an error
> because it will automatically be backward compatible based on the way
> the client determines which errors are retriable. Although, I don't
> have a strong opinion.
>
> I have the same opinion as you. But it looks like not able to fix the
> geo-replication topic deletion.
>
> Either replicated topic or non-replicated topic, we should provide the
> same topic deletion behavior, the topic can be deleted if no active
> user's producers/consumers, should not be affected by the Pulsar
> internal producers/consumers.
>
> But for now, for a replicated topic, users are not able to delete it
> because
> of the internal active producer, adding force delete option can delete
> the topic, but it will create again if users enabled topic auto-creation.
>
> I think in this case, it seems we don't have a chance to give the producer
> an exception, because the reconnection will connect to a new topic only
> with
> the same topic name as before.
>
> Thanks,
> Penghui
>
> On Sat, Feb 26, 2022 at 2:24 PM Michael Marshall <mm...@apache.org>
> wrote:
>
> > > We need to determine the overall plan, not the implementation
> > > at this moment.
> >
> > Great point, I agree.
> >
> > > Looks like we need an option in the
> > > `CommandCloseProducer` to avoid the replication producer
> > > reconnecting again.
> >
> > This is an important point. One of the primary requirements for
> > geo-replication must also be that a replicated namespace or topic can
> > be deleted and recreated without any problems or leaks in the broker.
> > That requires that the producer is cleaned up properly when the
> > namespace is deleted. I think we could achieve that with a new server
> > error or an expansion of the `CommandCloseProducer` command.
> >
> > To me, the main question is whether we create a custom error or expand
> > the `CommandCloseProducer` command. I lean towards adding an error
> > because it will automatically be backward compatible based on the way
> > the client determines which errors are retriable. Although, I don't
> > have a strong opinion.
> >
> > The key code paths that we need to consider are topic lookup and topic
> > creation. Both the http admin endpoint and the binary protocol
> > endpoint interact with these code paths, and we should fail requests
> > for namespaces which have the "deleted" field set to true.
> >
> > In looking at the code, we could modify the logic in
> > `PulsarWebResource#checkLocalOrGetPeerReplicationCluster`. That method
> > is always called for topic lookups via HTTP or via the Pulsar
> > binary protocol. If a namespace doesn't exist, that method currently
> > returns a NOT_FOUND exception. In the binary protocol case, that
> > exception is converted to a `ServerError.MetadataError`. We could
> > modify the method like I do here [0], or we could add another case and
> > return a custom error message.
> >
> > We'd also need to prevent topic creation via the admin endpoint and
> > the auto topic creation via the binary protocol. I believe this will
> > be just as easy as the lookup case, but I haven't confirmed.
> >
> > Thanks,
> > Michael
> >
> > [0]
> >
> https://github.com/michaeljmarshall/pulsar/commit/93611a741af163587f79b88bc1c9f1acc7953512
> >
> >
> >
> >
> > On Fri, Feb 25, 2022 at 3:45 AM PengHui Li <pe...@apache.org> wrote:
> > >
> > > The replicated topic deletion also has a similar problem.
> > > If the topic auto-creation is enabled, we are not able to delete
> > > a replicated topic, because the producer used for replication
> > > will try to reconnect. Looks like we need an option in the
> > > `CommandCloseProducer` to avoid the replication producer
> > > reconnecting again.
> > >
> > > Penghui
> > >
> > > On Fri, Feb 25, 2022 at 5:26 PM PengHui Li <pe...@apache.org> wrote:
> > >
> > > > > Penghui, are you suggesting that we implement the namespace/tenant
> > > > terminated logic after completing this PIP?
> > > >
> > > > I'm ok with both being implemented together or separated.
> > > > We need to determine the overall plan, not the implementation
> > > > at this moment.
> > > >
> > > > > For the sake of discussion, if we implement the namespace
> terminated
> > > > logic first, we could fulfill the underlying requirements for this
> PIP
> > > > by returning a new non-retriable error response when a client tries
> to
> > > > connect a producer or a consumer to a topic in a namespace that is
> > > > "terminated". If we do add the "namespace terminated" feature, we'll
> > > > need to add a non-retriable exception for this case, anyway. The main
> > > > advantage here is that we'd only need one expansion of the protobuf
> > > > instead of two. The downside is that the protocol for connected
> > > > clients has a couple more roundtrips. The broker would disconnect
> > > > connected clients and then fail their reconnection attempt with a
> > > > non-retriable error.
> > > >
> > > > Regarding the namespace "terminated" concept, I just noticed that we
> > > > already have a "deleted" field in a namespace's policies [0]. There
> is
> > > > even a comment that says:
> > > >
> > > > Yes, if we can terminate the namespace first, we should return
> > “namespace
> > > > not found”
> > > > if the producers and consumers try to reconnect, the client should
> stop
> > > > reconnecting
> > > > after getting this error. Looks like following this way, we don't
> need
> > to
> > > > introduce any protocol changes anymore.
> > > >
> > > > > As I think about it more, I no longer think "terminated" is the
> right
> > > > term for what I proposed above. Our goal is to briefly prevent any
> > > > topic creation to ensure we can delete all sub resources for a
> > > > namespace. On the other hand, a terminated topic isn't necessarily
> > > > short lived. If we want to apply the "terminated" term unequivocally
> > > > to both topics and namespaces, I think a terminated namespace would
> > > > need to be a namespace where all topics are in terminated state and
> no
> > > > additional topics could be created. That's not the feature we're
> > > > discussing here, though. Deleted seems like the right term to me,
> > > > especially since we're already using it to prevent a race condition
> > > >
> > > > Yes, topic termination has different meanings, deletion should be the
> > right
> > > > term here.
> > > >
> > > > Thanks,
> > > > Penghui
> > > >
> > > >
> > > > On Fri, Feb 25, 2022 at 12:19 PM Michael Marshall <
> > mmarshall@apache.org>
> > > > wrote:
> > > >
> > > >> Regarding the namespace "terminated" concept, I just noticed that we
> > > >> already have a "deleted" field in a namespace's policies [0]. There
> is
> > > >> even a comment that says:
> > > >>
> > > >> > // set the policies to deleted so that somebody else cannot
> acquire
> > > >> this namespace
> > > >>
> > > >> I am not familiar with this feature, but it seems like this policy
> > > >> field could be checked before creating a topic in a namespace. That
> > > >> would remove certain races described above.
> > > >>
> > > >> As I think about it more, I no longer think "terminated" is the
> right
> > > >> term for what I proposed above. Our goal is to briefly prevent any
> > > >> topic creation to ensure we can delete all sub resources for a
> > > >> namespace. On the other hand, a terminated topic isn't necessarily
> > > >> short lived. If we want to apply the "terminated" term unequivocally
> > > >> to both topics and namespaces, I think a terminated namespace would
> > > >> need to be a namespace where all topics are in terminated state and
> no
> > > >> additional topics could be created. That's not the feature we're
> > > >> discussing here, though. Deleted seems like the right term to me,
> > > >> especially since we're already using it to prevent a race condition
> > > >> [0].
> > > >>
> > > >> Thanks,
> > > >> Michael
> > > >>
> > > >> [0]
> > > >>
> >
> https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/impl/NamespacesBase.java#L252-L262
> > > >>
> > > >> On Thu, Feb 24, 2022 at 9:58 PM Michael Marshall <
> > mmarshall@apache.org>
> > > >> wrote:
> > > >> >
> > > >> > Hi Dave,
> > > >> >
> > > >> > > automatically delete tenants and namespaces for not containing
> > topics
> > > >> >
> > > >> > I don't think that is what we are discussing. I agree that the
> > initial
> > > >> > email says just that, though, which is why I asked above:
> > > >> >
> > > >> > >> When there are no user-created topics under a namespace,
> > > >> > >> Namespace should be deleted.
> > > >> > > This is supposed to mean that the namespace should be able to be
> > > >> > > deleted, correct?
> > > >> >
> > > >> > Perhaps Mattison can clarify, too.
> > > >> >
> > > >> > My current understanding of the context for the PIP is that a user
> > > >> > call to delete a namespace without force can fail when a producer
> > > >> > reconnects to a deleted topic. The goal is to remove the race
> > > >> > condition to ensure namespace deletion can succeed.
> > > >> >
> > > >> > Thanks,
> > > >> > Michael
> > > >> >
> > > >> > On Thu, Feb 24, 2022 at 5:21 PM Dave Fisher <wa...@apache.org>
> > wrote:
> > > >> > >
> > > >> > > Hi -
> > > >> > >
> > > >> > > I hope I’m understanding what’s being discussed.
> > > >> > >
> > > >> > > If we are going to automatically delete tenants and namespaces
> for
> > > >> not containing topics then we need to make both of these automatic
> > actions
> > > >> configurable with a default to NOT do so. Otherwise we break
> existing
> > use
> > > >> cases.
> > > >> > >
> > > >> > > Automatic deletion of namespaces should be configurable at both
> > the
> > > >> cluster and tenant level.
> > > >> > >
> > > >> > > Regards,
> > > >> > > Dave
> > > >> > >
> > > >> > > > On Feb 24, 2022, at 2:25 PM, Michael Marshall <
> > mmarshall@apache.org>
> > > >> wrote:
> > > >> > > >
> > > >> > > >> The old producer/consumer should be closed after applying the
> > > >> changes from
> > > >> > > >> this proposal.
> > > >> > > >
> > > >> > > > Penghui, are you suggesting that we implement the
> > namespace/tenant
> > > >> > > > terminated logic after completing this PIP?
> > > >> > > >
> > > >> > > > For the sake of discussion, if we implement the namespace
> > terminated
> > > >> > > > logic first, we could fulfill the underlying requirements for
> > this
> > > >> PIP
> > > >> > > > by returning a new non-retriable error response when a client
> > tries
> > > >> to
> > > >> > > > connect a producer or a consumer to a topic in a namespace
> that
> > is
> > > >> > > > "terminated". If we do add the "namespace terminated" feature,
> > we'll
> > > >> > > > need to add a non-retriable exception for this case, anyway.
> The
> > > >> main
> > > >> > > > advantage here is that we'd only need one expansion of the
> > protobuf
> > > >> > > > instead of two. The downside is that the protocol for
> connected
> > > >> > > > clients has a couple more roundtrips. The broker would
> > disconnect
> > > >> > > > connected clients and then fail their reconnection attempt
> with
> > a
> > > >> > > > non-retriable error.
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > > Michael
> > > >> > > >
> > > >> > > > On Thu, Feb 24, 2022 at 7:11 AM PengHui Li <
> penghui@apache.org>
> > > >> wrote:
> > > >> > > >>
> > > >> > > >>> If we want to solve this problem, we need to add some sync
> > > >> resources like
> > > >> > > >> lock/state, I think it’s a harm for us, we don’t need to do
> > that.
> > > >> > > >>
> > > >> > > >> I think we can make the namespace/tenants to the inactive
> state
> > > >> first so
> > > >> > > >> that we can avoid any new
> > > >> > > >> producer/consumer connect to the topic under the
> > namespace/tenant.
> > > >> > > >>
> > > >> > > >> The old producer/consumer should be closed after applying the
> > > >> changes from
> > > >> > > >> this proposal.
> > > >> > > >>
> > > >> > > >> Thanks,
> > > >> > > >> Penghui
> > > >> > > >>
> > > >> > > >> On Tue, Feb 8, 2022 at 5:47 PM mattison chao <
> > > >> mattisonchao@gmail.com> wrote:
> > > >> > > >>
> > > >> > > >>>> This is supposed to mean that the namespace should be able
> > to be
> > > >> > > >>>> deleted, correct?
> > > >> > > >>>
> > > >> > > >>> Yes, the main background is the user doesn’t have an active
> > > >> topic. so,
> > > >> > > >>> they want to delete the namespace.
> > > >> > > >>>
> > > >> > > >>>> However, I think
> > > >> > > >>>> we might still have a race condition that could make tenant
> > or
> > > >> > > >>>> namespace deletion fail. Specifically, if a new producer or
> > > >> consumer
> > > >> > > >>>> creates a topic after the namespace deletion has started
> but
> > > >> > > >>>> before it is complete. Do you agree that the underlying
> race
> > > >> still
> > > >> > > >>> exists?
> > > >> > > >>>
> > > >> > > >>> Yes, this condition exists. I think it’s not a big problem
> > > >> because the
> > > >> > > >>> user doesn’t want to use this namespace anymore.
> > > >> > > >>> If this scenario appears, they will get an error and need to
> > > >> delete it
> > > >> > > >>> again.
> > > >> > > >>>
> > > >> > > >>>> What if we expand our usage of the "terminated" feature to
> > apply
> > > >> to
> > > >> > > >>>> namespaces (and tenants)? Then, a terminated namespace can
> > have
> > > >> > > >>>> bundles and topics can be deleted but not created (just as
> a
> > > >> terminated
> > > >> > > >>>> topic cannot have any new messages published to it). This
> > would
> > > >> take
> > > >> > > >>>> care of all topic creation race conditions. We'd probably
> > need
> > > >> to add
> > > >> > > >>>> new protobuf exceptions for this feature.
> > > >> > > >>>
> > > >> > > >>>
> > > >> > > >>> If we want to solve this problem, we need to add some sync
> > > >> resources like
> > > >> > > >>> lock/state, I think it’s a harm for us, we don’t need to do
> > that.
> > > >> > > >>>
> > > >> > > >>> Thanks for your suggestions, let me know what you think.
> > > >> > > >>>
> > > >> > > >>> Best,
> > > >> > > >>> Mattison
> > > >> > > >>>
> > > >> > > >>>> On Feb 1, 2022, at 2:26 PM, Michael Marshall <
> > > >> mmarshall@apache.org>
> > > >> > > >>> wrote:
> > > >> > > >>>>
> > > >> > > >>>> This proposal identifies an important issue that we should
> > > >> definitely
> > > >> > > >>>> solve. I have some questions.
> > > >> > > >>>>
> > > >> > > >>>>> When there are no user-created topics under a namespace,
> > > >> > > >>>>> Namespace should be deleted.
> > > >> > > >>>>
> > > >> > > >>>> This is supposed to mean that the namespace should be able
> > to be
> > > >> > > >>>> deleted, correct?
> > > >> > > >>>>
> > > >> > > >>>>> For this reason, we need to close the system topic
> > > >> reader/producer
> > > >> > > >>>>> first, then remove the system topic. finally, remove the
> > > >> namespace.
> > > >> > > >>>>
> > > >> > > >>>> I agree that expanding the protobuf CloseProducer and
> > > >> CloseConsumer
> > > >> > > >>>> commands could be valuable here. The expansion would ensure
> > that
> > > >> > > >>>> producers and consumers don't attempt to reconnect.
> However,
> > I
> > > >> think
> > > >> > > >>>> we might still have a race condition that could make tenant
> > or
> > > >> > > >>>> namespace deletion fail. Specifically, if a new producer or
> > > >> consumer
> > > >> > > >>>> creates a topic after the namespace deletion has started
> but
> > > >> > > >>>> before it is complete. Do you agree that the underlying
> race
> > > >> still
> > > >> > > >>> exists?
> > > >> > > >>>>
> > > >> > > >>>> In my view, the fundamental problem here is that deleting
> > > >> certain Pulsar
> > > >> > > >>>> resources takes time and, in a distributed system, that
> means
> > > >> race
> > > >> > > >>>> conditions.
> > > >> > > >>>>
> > > >> > > >>>> What if we expand our usage of the "terminated" feature to
> > apply
> > > >> to
> > > >> > > >>>> namespaces (and tenants)? Then, a terminated namespace can
> > have
> > > >> > > >>>> bundles and topics can be deleted but not created (just as
> a
> > > >> terminated
> > > >> > > >>>> topic cannot have any new messages published to it). This
> > would
> > > >> take
> > > >> > > >>>> care of all topic creation race conditions. We'd probably
> > need
> > > >> to add
> > > >> > > >>>> new protobuf exceptions for this feature.
> > > >> > > >>>>
> > > >> > > >>>> Thanks,
> > > >> > > >>>> Michael
> > > >> > > >>>>
> > > >> > > >>>>
> > > >> > > >>>> On Sat, Jan 29, 2022 at 7:25 PM Zike Yang
> > > >> > > >>>> <zk...@streamnative.io.invalid> wrote:
> > > >> > > >>>>>
> > > >> > > >>>>> +1
> > > >> > > >>>>>
> > > >> > > >>>>>
> > > >> > > >>>>> Thanks,
> > > >> > > >>>>> Zike
> > > >> > > >>>>>
> > > >> > > >>>>> On Sat, Jan 29, 2022 at 12:30 PM guo jiwei <
> > > >> technoboy@apache.org>
> > > >> > > >>> wrote:
> > > >> > > >>>>>>
> > > >> > > >>>>>> Hi
> > > >> > > >>>>>> The PIP link :
> > https://github.com/apache/pulsar/issues/13989
> > > >> > > >>>>>>
> > > >> > > >>>>>> Regards
> > > >> > > >>>>>> Jiwei Guo (Tboy)
> > > >> > > >>>>>>
> > > >> > > >>>>>>
> > > >> > > >>>>>> On Sat, Jan 29, 2022 at 11:46 AM mattison chao <
> > > >> mattisonchao@gmail.com
> > > >> > > >>>>
> > > >> > > >>>>>> wrote:
> > > >> > > >>>>>>
> > > >> > > >>>>>>> Hello everyone,
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> I want to start a discussion about PIP-139 : Support
> > Broker
> > > >> send
> > > >> > > >>> command
> > > >> > > >>>>>>> to real close producer/consumer.
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> This is the PIP document
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> https://github.com/apache/pulsar/issues/13989 <
> > > >> > > >>>>>>> https://github.com/apache/pulsar/issues/13979>
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> Please check it out and feel free to share your
> thoughts.
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> Best,
> > > >> > > >>>>>>> Mattison
> > > >> > > >>>>>>>
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> ———————— Pasted below for quoting convenience.
> > > >> > > >>>>>>>
> > > >> > > >>>>>>>
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> Relation pull request:  #13337
> > > >> > > >>>>>>> Authors: @Technoboy-  @mattisonchao
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> ## Motivation
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> Before we discuss this pip, I'd like to supplement some
> > > >> context to
> > > >> > > >>> help
> > > >> > > >>>>>>> contributors who don't want to read the original pull
> > request.
> > > >> > > >>>>>>>
> > > >> > > >>>>>>>> When there are no user-created topics under a
> namespace,
> > > >> Namespace
> > > >> > > >>>>>>> should be deleted. But currently, the system topic
> existed
> > > >> and the
> > > >> > > >>>>>>> reader/producer could auto-create the system which may
> > cause
> > > >> the
> > > >> > > >>> namespace
> > > >> > > >>>>>>> deletion to fail.
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> For this reason, we need to close the system topic
> > > >> reader/producer
> > > >> > > >>> first,
> > > >> > > >>>>>>> then remove the system topic. finally, remove the
> > namespace.
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> Following this way, we first want to use ``terminate``
> to
> > > >> solve this
> > > >> > > >>>>>>> problem. then we found producers can disconnect, but
> > > >> consumers are
> > > >> > > >>> still
> > > >> > > >>>>>>> alive. So, another PR #13960 wants to add consumers'
> > closing
> > > >> logic.
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> After #13960, all things look good, but another problem
> > > >> appears. that
> > > >> > > >>> is
> > > >> > > >>>>>>> we need to wait until consumers completely consume all
> > > >> messages (this
> > > >> > > >>> may
> > > >> > > >>>>>>> make terminate topic so long and the operation
> > timeout)then
> > > >> get
> > > >> > > >>>>>>> ``reachedEndOfTopic``. the relative code here :
> > > >> > > >>>>>>>
> > > >> > > >>>>>>>
> > > >> > > >>>>>>>
> > > >> > > >>>
> > > >>
> >
> https://github.com/apache/pulsar/blob/07ef9231db8b844586b9217ee2d59237eb9c54b7/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/CompactorSubscription.java#L102-L106
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> In the #13337 case, we need to force close consumers
> > > >> immediately. So
> > > >> > > >>> we
> > > >> > > >>>>>>> write this PIP to discuss another way to resolve this
> > problem.
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> ## Goal
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> We can add a new field(`allow_reconnect`) in command
> > > >> > > >>>>>>> ``CommandCloseProducer``/ ``CommandCloseConsumer`` to
> > close
> > > >> > > >>>>>>> producer/consumers immediately.
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> ## API Changes
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> - Add ``allow_reconnect`` to ``CommandCloseProducer``;
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> ```java
> > > >> > > >>>>>>> **Before**
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> message CommandCloseProducer {
> > > >> > > >>>>>>>   required uint64 producer_id = 1;
> > > >> > > >>>>>>>   required uint64 request_id = 2;
> > > >> > > >>>>>>> }
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> **After**
> > > >> > > >>>>>>> message CommandCloseProducer {
> > > >> > > >>>>>>>   required uint64 producer_id = 1;
> > > >> > > >>>>>>>   required uint64 request_id = 2;
> > > >> > > >>>>>>>   optional bool allow_reconnect = 3 [default = true];
> > > >> > > >>>>>>> }
> > > >> > > >>>>>>> ```
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> - Add ``allow_reconnect`` to ``CommandCloseConsumer``
> > > >> > > >>>>>>> ```java
> > > >> > > >>>>>>> **Before**
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> message CommandCloseConsumer {
> > > >> > > >>>>>>>   required uint64 consumer_id = 1;
> > > >> > > >>>>>>>   required uint64 request_id = 2;
> > > >> > > >>>>>>> }
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> **After**
> > > >> > > >>>>>>> message CommandCloseConsumer {
> > > >> > > >>>>>>>   required uint64 consumer_id = 1;
> > > >> > > >>>>>>>   required uint64 request_id = 2;
> > > >> > > >>>>>>>   optional bool allow_reconnect = 3 [default = true];
> > > >> > > >>>>>>> }
> > > >> > > >>>>>>> ```
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> ## Implementation
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> ### ClientCnx - Producer:
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> **Before**
> > > >> > > >>>>>>> ```java
> > > >> > > >>>>>>>   @Override
> > > >> > > >>>>>>>   protected void
> handleCloseProducer(CommandCloseProducer
> > > >> > > >>> closeProducer)
> > > >> > > >>>>>>> {
> > > >> > > >>>>>>>       log.info("[{}] Broker notification of Closed
> > producer:
> > > >> {}",
> > > >> > > >>>>>>> remoteAddress, closeProducer.getProducerId());
> > > >> > > >>>>>>>       final long producerId =
> > closeProducer.getProducerId();
> > > >> > > >>>>>>>       ProducerImpl<?> producer =
> > producers.get(producerId);
> > > >> > > >>>>>>>       if (producer != null) {
> > > >> > > >>>>>>>           producer.connectionClosed(this);
> > > >> > > >>>>>>>       } else {
> > > >> > > >>>>>>>           log.warn("Producer with id {} not found while
> > > >> closing
> > > >> > > >>> producer
> > > >> > > >>>>>>> ", producerId);
> > > >> > > >>>>>>>       }
> > > >> > > >>>>>>>   }
> > > >> > > >>>>>>> ```
> > > >> > > >>>>>>> After:
> > > >> > > >>>>>>> ```java
> > > >> > > >>>>>>>  @Override
> > > >> > > >>>>>>>   protected void
> handleCloseProducer(CommandCloseProducer
> > > >> > > >>> closeProducer)
> > > >> > > >>>>>>> {
> > > >> > > >>>>>>>       log.info("[{}] Broker notification of Closed
> > producer:
> > > >> {}",
> > > >> > > >>>>>>> remoteAddress, closeProducer.getProducerId());
> > > >> > > >>>>>>>       final long producerId =
> > closeProducer.getProducerId();
> > > >> > > >>>>>>>       ProducerImpl<?> producer =
> > producers.get(producerId);
> > > >> > > >>>>>>>       if (producer != null) {
> > > >> > > >>>>>>>           if (closeProducer.isAllowReconnect) {
> > > >> > > >>>>>>>               producer.connectionClosed(this);
> > > >> > > >>>>>>>           } else {
> > > >> > > >>>>>>>               producer.closeAsync();
> > > >> > > >>>>>>>           }
> > > >> > > >>>>>>>       } else {
> > > >> > > >>>>>>>           log.warn("Producer with id {} not found while
> > > >> closing
> > > >> > > >>> producer
> > > >> > > >>>>>>> ", producerId);
> > > >> > > >>>>>>>       }
> > > >> > > >>>>>>>   }
> > > >> > > >>>>>>> ```
> > > >> > > >>>>>>> ### ClientCnx - Consumer:
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> **Before**
> > > >> > > >>>>>>> ```java
> > > >> > > >>>>>>>   @Override
> > > >> > > >>>>>>>   protected void
> handleCloseConsumer(CommandCloseConsumer
> > > >> > > >>> closeConsumer)
> > > >> > > >>>>>>> {
> > > >> > > >>>>>>>       log.info("[{}] Broker notification of Closed
> > consumer:
> > > >> {}",
> > > >> > > >>>>>>> remoteAddress, closeConsumer.getConsumerId());
> > > >> > > >>>>>>>       final long consumerId =
> > closeConsumer.getConsumerId();
> > > >> > > >>>>>>>       ConsumerImpl<?> consumer =
> > consumers.get(consumerId);
> > > >> > > >>>>>>>       if (consumer != null) {
> > > >> > > >>>>>>>           consumer.connectionClosed(this);
> > > >> > > >>>>>>>       } else {
> > > >> > > >>>>>>>           log.warn("Consumer with id {} not found while
> > > >> closing
> > > >> > > >>> consumer
> > > >> > > >>>>>>> ", consumerId);
> > > >> > > >>>>>>>       }
> > > >> > > >>>>>>>   }
> > > >> > > >>>>>>> ```
> > > >> > > >>>>>>> **After**
> > > >> > > >>>>>>> ```java
> > > >> > > >>>>>>>   @Override
> > > >> > > >>>>>>>   protected void
> handleCloseConsumer(CommandCloseConsumer
> > > >> > > >>> closeConsumer)
> > > >> > > >>>>>>> {
> > > >> > > >>>>>>>       log.info("[{}] Broker notification of Closed
> > consumer:
> > > >> {}",
> > > >> > > >>>>>>> remoteAddress, closeConsumer.getConsumerId());
> > > >> > > >>>>>>>       final long consumerId =
> > closeConsumer.getConsumerId();
> > > >> > > >>>>>>>       ConsumerImpl<?> consumer =
> > consumers.get(consumerId);
> > > >> > > >>>>>>>       if (consumer != null) {
> > > >> > > >>>>>>>           if (closeConsumer.isAllowReconnect) {
> > > >> > > >>>>>>>               consumer.connectionClosed(this);
> > > >> > > >>>>>>>           } else {
> > > >> > > >>>>>>>               consumer.closeAsync();
> > > >> > > >>>>>>>           }
> > > >> > > >>>>>>>       } else {
> > > >> > > >>>>>>>           log.warn("Consumer with id {} not found while
> > > >> closing
> > > >> > > >>> consumer
> > > >> > > >>>>>>> ", consumerId);
> > > >> > > >>>>>>>       }
> > > >> > > >>>>>>>   }
> > > >> > > >>>>>>> ```
> > > >> > > >>>>>>>
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> ## Reject Alternatives
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> none.
> > > >> > > >>>>>
> > > >> > > >>>>>
> > > >> > > >>>>>
> > > >> > > >>>>> --
> > > >> > > >>>>> Zike Yang
> > > >> > > >>>
> > > >> > > >>>
> > > >> > >
> > > >>
> > > >
> >
>

Re: [DISCUSS] PIP-139 : Support Broker send command to real close producer/consumer.

Posted by PengHui Li <pe...@apache.org>.
> To me, the main question is whether we create a custom error or expand
the `CommandCloseProducer` command. I lean towards adding an error
because it will automatically be backward compatible based on the way
the client determines which errors are retriable. Although, I don't
have a strong opinion.

I have the same opinion as you. But it looks like not able to fix the
geo-replication topic deletion.

Either replicated topic or non-replicated topic, we should provide the
same topic deletion behavior, the topic can be deleted if no active
user's producers/consumers, should not be affected by the Pulsar
internal producers/consumers.

But for now, for a replicated topic, users are not able to delete it because
of the internal active producer, adding force delete option can delete
the topic, but it will create again if users enabled topic auto-creation.

I think in this case, it seems we don't have a chance to give the producer
an exception, because the reconnection will connect to a new topic only
with
the same topic name as before.

Thanks,
Penghui

On Sat, Feb 26, 2022 at 2:24 PM Michael Marshall <mm...@apache.org>
wrote:

> > We need to determine the overall plan, not the implementation
> > at this moment.
>
> Great point, I agree.
>
> > Looks like we need an option in the
> > `CommandCloseProducer` to avoid the replication producer
> > reconnecting again.
>
> This is an important point. One of the primary requirements for
> geo-replication must also be that a replicated namespace or topic can
> be deleted and recreated without any problems or leaks in the broker.
> That requires that the producer is cleaned up properly when the
> namespace is deleted. I think we could achieve that with a new server
> error or an expansion of the `CommandCloseProducer` command.
>
> To me, the main question is whether we create a custom error or expand
> the `CommandCloseProducer` command. I lean towards adding an error
> because it will automatically be backward compatible based on the way
> the client determines which errors are retriable. Although, I don't
> have a strong opinion.
>
> The key code paths that we need to consider are topic lookup and topic
> creation. Both the http admin endpoint and the binary protocol
> endpoint interact with these code paths, and we should fail requests
> for namespaces which have the "deleted" field set to true.
>
> In looking at the code, we could modify the logic in
> `PulsarWebResource#checkLocalOrGetPeerReplicationCluster`. That method
> is always called for topic lookups via HTTP or via the Pulsar
> binary protocol. If a namespace doesn't exist, that method currently
> returns a NOT_FOUND exception. In the binary protocol case, that
> exception is converted to a `ServerError.MetadataError`. We could
> modify the method like I do here [0], or we could add another case and
> return a custom error message.
>
> We'd also need to prevent topic creation via the admin endpoint and
> the auto topic creation via the binary protocol. I believe this will
> be just as easy as the lookup case, but I haven't confirmed.
>
> Thanks,
> Michael
>
> [0]
> https://github.com/michaeljmarshall/pulsar/commit/93611a741af163587f79b88bc1c9f1acc7953512
>
>
>
>
> On Fri, Feb 25, 2022 at 3:45 AM PengHui Li <pe...@apache.org> wrote:
> >
> > The replicated topic deletion also has a similar problem.
> > If the topic auto-creation is enabled, we are not able to delete
> > a replicated topic, because the producer used for replication
> > will try to reconnect. Looks like we need an option in the
> > `CommandCloseProducer` to avoid the replication producer
> > reconnecting again.
> >
> > Penghui
> >
> > On Fri, Feb 25, 2022 at 5:26 PM PengHui Li <pe...@apache.org> wrote:
> >
> > > > Penghui, are you suggesting that we implement the namespace/tenant
> > > terminated logic after completing this PIP?
> > >
> > > I'm ok with both being implemented together or separated.
> > > We need to determine the overall plan, not the implementation
> > > at this moment.
> > >
> > > > For the sake of discussion, if we implement the namespace terminated
> > > logic first, we could fulfill the underlying requirements for this PIP
> > > by returning a new non-retriable error response when a client tries to
> > > connect a producer or a consumer to a topic in a namespace that is
> > > "terminated". If we do add the "namespace terminated" feature, we'll
> > > need to add a non-retriable exception for this case, anyway. The main
> > > advantage here is that we'd only need one expansion of the protobuf
> > > instead of two. The downside is that the protocol for connected
> > > clients has a couple more roundtrips. The broker would disconnect
> > > connected clients and then fail their reconnection attempt with a
> > > non-retriable error.
> > >
> > > Regarding the namespace "terminated" concept, I just noticed that we
> > > already have a "deleted" field in a namespace's policies [0]. There is
> > > even a comment that says:
> > >
> > > Yes, if we can terminate the namespace first, we should return
> “namespace
> > > not found”
> > > if the producers and consumers try to reconnect, the client should stop
> > > reconnecting
> > > after getting this error. Looks like following this way, we don't need
> to
> > > introduce any protocol changes anymore.
> > >
> > > > As I think about it more, I no longer think "terminated" is the right
> > > term for what I proposed above. Our goal is to briefly prevent any
> > > topic creation to ensure we can delete all sub resources for a
> > > namespace. On the other hand, a terminated topic isn't necessarily
> > > short lived. If we want to apply the "terminated" term unequivocally
> > > to both topics and namespaces, I think a terminated namespace would
> > > need to be a namespace where all topics are in terminated state and no
> > > additional topics could be created. That's not the feature we're
> > > discussing here, though. Deleted seems like the right term to me,
> > > especially since we're already using it to prevent a race condition
> > >
> > > Yes, topic termination has different meanings, deletion should be the
> right
> > > term here.
> > >
> > > Thanks,
> > > Penghui
> > >
> > >
> > > On Fri, Feb 25, 2022 at 12:19 PM Michael Marshall <
> mmarshall@apache.org>
> > > wrote:
> > >
> > >> Regarding the namespace "terminated" concept, I just noticed that we
> > >> already have a "deleted" field in a namespace's policies [0]. There is
> > >> even a comment that says:
> > >>
> > >> > // set the policies to deleted so that somebody else cannot acquire
> > >> this namespace
> > >>
> > >> I am not familiar with this feature, but it seems like this policy
> > >> field could be checked before creating a topic in a namespace. That
> > >> would remove certain races described above.
> > >>
> > >> As I think about it more, I no longer think "terminated" is the right
> > >> term for what I proposed above. Our goal is to briefly prevent any
> > >> topic creation to ensure we can delete all sub resources for a
> > >> namespace. On the other hand, a terminated topic isn't necessarily
> > >> short lived. If we want to apply the "terminated" term unequivocally
> > >> to both topics and namespaces, I think a terminated namespace would
> > >> need to be a namespace where all topics are in terminated state and no
> > >> additional topics could be created. That's not the feature we're
> > >> discussing here, though. Deleted seems like the right term to me,
> > >> especially since we're already using it to prevent a race condition
> > >> [0].
> > >>
> > >> Thanks,
> > >> Michael
> > >>
> > >> [0]
> > >>
> https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/impl/NamespacesBase.java#L252-L262
> > >>
> > >> On Thu, Feb 24, 2022 at 9:58 PM Michael Marshall <
> mmarshall@apache.org>
> > >> wrote:
> > >> >
> > >> > Hi Dave,
> > >> >
> > >> > > automatically delete tenants and namespaces for not containing
> topics
> > >> >
> > >> > I don't think that is what we are discussing. I agree that the
> initial
> > >> > email says just that, though, which is why I asked above:
> > >> >
> > >> > >> When there are no user-created topics under a namespace,
> > >> > >> Namespace should be deleted.
> > >> > > This is supposed to mean that the namespace should be able to be
> > >> > > deleted, correct?
> > >> >
> > >> > Perhaps Mattison can clarify, too.
> > >> >
> > >> > My current understanding of the context for the PIP is that a user
> > >> > call to delete a namespace without force can fail when a producer
> > >> > reconnects to a deleted topic. The goal is to remove the race
> > >> > condition to ensure namespace deletion can succeed.
> > >> >
> > >> > Thanks,
> > >> > Michael
> > >> >
> > >> > On Thu, Feb 24, 2022 at 5:21 PM Dave Fisher <wa...@apache.org>
> wrote:
> > >> > >
> > >> > > Hi -
> > >> > >
> > >> > > I hope I’m understanding what’s being discussed.
> > >> > >
> > >> > > If we are going to automatically delete tenants and namespaces for
> > >> not containing topics then we need to make both of these automatic
> actions
> > >> configurable with a default to NOT do so. Otherwise we break existing
> use
> > >> cases.
> > >> > >
> > >> > > Automatic deletion of namespaces should be configurable at both
> the
> > >> cluster and tenant level.
> > >> > >
> > >> > > Regards,
> > >> > > Dave
> > >> > >
> > >> > > > On Feb 24, 2022, at 2:25 PM, Michael Marshall <
> mmarshall@apache.org>
> > >> wrote:
> > >> > > >
> > >> > > >> The old producer/consumer should be closed after applying the
> > >> changes from
> > >> > > >> this proposal.
> > >> > > >
> > >> > > > Penghui, are you suggesting that we implement the
> namespace/tenant
> > >> > > > terminated logic after completing this PIP?
> > >> > > >
> > >> > > > For the sake of discussion, if we implement the namespace
> terminated
> > >> > > > logic first, we could fulfill the underlying requirements for
> this
> > >> PIP
> > >> > > > by returning a new non-retriable error response when a client
> tries
> > >> to
> > >> > > > connect a producer or a consumer to a topic in a namespace that
> is
> > >> > > > "terminated". If we do add the "namespace terminated" feature,
> we'll
> > >> > > > need to add a non-retriable exception for this case, anyway. The
> > >> main
> > >> > > > advantage here is that we'd only need one expansion of the
> protobuf
> > >> > > > instead of two. The downside is that the protocol for connected
> > >> > > > clients has a couple more roundtrips. The broker would
> disconnect
> > >> > > > connected clients and then fail their reconnection attempt with
> a
> > >> > > > non-retriable error.
> > >> > > >
> > >> > > > Thanks,
> > >> > > > Michael
> > >> > > >
> > >> > > > On Thu, Feb 24, 2022 at 7:11 AM PengHui Li <pe...@apache.org>
> > >> wrote:
> > >> > > >>
> > >> > > >>> If we want to solve this problem, we need to add some sync
> > >> resources like
> > >> > > >> lock/state, I think it’s a harm for us, we don’t need to do
> that.
> > >> > > >>
> > >> > > >> I think we can make the namespace/tenants to the inactive state
> > >> first so
> > >> > > >> that we can avoid any new
> > >> > > >> producer/consumer connect to the topic under the
> namespace/tenant.
> > >> > > >>
> > >> > > >> The old producer/consumer should be closed after applying the
> > >> changes from
> > >> > > >> this proposal.
> > >> > > >>
> > >> > > >> Thanks,
> > >> > > >> Penghui
> > >> > > >>
> > >> > > >> On Tue, Feb 8, 2022 at 5:47 PM mattison chao <
> > >> mattisonchao@gmail.com> wrote:
> > >> > > >>
> > >> > > >>>> This is supposed to mean that the namespace should be able
> to be
> > >> > > >>>> deleted, correct?
> > >> > > >>>
> > >> > > >>> Yes, the main background is the user doesn’t have an active
> > >> topic. so,
> > >> > > >>> they want to delete the namespace.
> > >> > > >>>
> > >> > > >>>> However, I think
> > >> > > >>>> we might still have a race condition that could make tenant
> or
> > >> > > >>>> namespace deletion fail. Specifically, if a new producer or
> > >> consumer
> > >> > > >>>> creates a topic after the namespace deletion has started but
> > >> > > >>>> before it is complete. Do you agree that the underlying race
> > >> still
> > >> > > >>> exists?
> > >> > > >>>
> > >> > > >>> Yes, this condition exists. I think it’s not a big problem
> > >> because the
> > >> > > >>> user doesn’t want to use this namespace anymore.
> > >> > > >>> If this scenario appears, they will get an error and need to
> > >> delete it
> > >> > > >>> again.
> > >> > > >>>
> > >> > > >>>> What if we expand our usage of the "terminated" feature to
> apply
> > >> to
> > >> > > >>>> namespaces (and tenants)? Then, a terminated namespace can
> have
> > >> > > >>>> bundles and topics can be deleted but not created (just as a
> > >> terminated
> > >> > > >>>> topic cannot have any new messages published to it). This
> would
> > >> take
> > >> > > >>>> care of all topic creation race conditions. We'd probably
> need
> > >> to add
> > >> > > >>>> new protobuf exceptions for this feature.
> > >> > > >>>
> > >> > > >>>
> > >> > > >>> If we want to solve this problem, we need to add some sync
> > >> resources like
> > >> > > >>> lock/state, I think it’s a harm for us, we don’t need to do
> that.
> > >> > > >>>
> > >> > > >>> Thanks for your suggestions, let me know what you think.
> > >> > > >>>
> > >> > > >>> Best,
> > >> > > >>> Mattison
> > >> > > >>>
> > >> > > >>>> On Feb 1, 2022, at 2:26 PM, Michael Marshall <
> > >> mmarshall@apache.org>
> > >> > > >>> wrote:
> > >> > > >>>>
> > >> > > >>>> This proposal identifies an important issue that we should
> > >> definitely
> > >> > > >>>> solve. I have some questions.
> > >> > > >>>>
> > >> > > >>>>> When there are no user-created topics under a namespace,
> > >> > > >>>>> Namespace should be deleted.
> > >> > > >>>>
> > >> > > >>>> This is supposed to mean that the namespace should be able
> to be
> > >> > > >>>> deleted, correct?
> > >> > > >>>>
> > >> > > >>>>> For this reason, we need to close the system topic
> > >> reader/producer
> > >> > > >>>>> first, then remove the system topic. finally, remove the
> > >> namespace.
> > >> > > >>>>
> > >> > > >>>> I agree that expanding the protobuf CloseProducer and
> > >> CloseConsumer
> > >> > > >>>> commands could be valuable here. The expansion would ensure
> that
> > >> > > >>>> producers and consumers don't attempt to reconnect. However,
> I
> > >> think
> > >> > > >>>> we might still have a race condition that could make tenant
> or
> > >> > > >>>> namespace deletion fail. Specifically, if a new producer or
> > >> consumer
> > >> > > >>>> creates a topic after the namespace deletion has started but
> > >> > > >>>> before it is complete. Do you agree that the underlying race
> > >> still
> > >> > > >>> exists?
> > >> > > >>>>
> > >> > > >>>> In my view, the fundamental problem here is that deleting
> > >> certain Pulsar
> > >> > > >>>> resources takes time and, in a distributed system, that means
> > >> race
> > >> > > >>>> conditions.
> > >> > > >>>>
> > >> > > >>>> What if we expand our usage of the "terminated" feature to
> apply
> > >> to
> > >> > > >>>> namespaces (and tenants)? Then, a terminated namespace can
> have
> > >> > > >>>> bundles and topics can be deleted but not created (just as a
> > >> terminated
> > >> > > >>>> topic cannot have any new messages published to it). This
> would
> > >> take
> > >> > > >>>> care of all topic creation race conditions. We'd probably
> need
> > >> to add
> > >> > > >>>> new protobuf exceptions for this feature.
> > >> > > >>>>
> > >> > > >>>> Thanks,
> > >> > > >>>> Michael
> > >> > > >>>>
> > >> > > >>>>
> > >> > > >>>> On Sat, Jan 29, 2022 at 7:25 PM Zike Yang
> > >> > > >>>> <zk...@streamnative.io.invalid> wrote:
> > >> > > >>>>>
> > >> > > >>>>> +1
> > >> > > >>>>>
> > >> > > >>>>>
> > >> > > >>>>> Thanks,
> > >> > > >>>>> Zike
> > >> > > >>>>>
> > >> > > >>>>> On Sat, Jan 29, 2022 at 12:30 PM guo jiwei <
> > >> technoboy@apache.org>
> > >> > > >>> wrote:
> > >> > > >>>>>>
> > >> > > >>>>>> Hi
> > >> > > >>>>>> The PIP link :
> https://github.com/apache/pulsar/issues/13989
> > >> > > >>>>>>
> > >> > > >>>>>> Regards
> > >> > > >>>>>> Jiwei Guo (Tboy)
> > >> > > >>>>>>
> > >> > > >>>>>>
> > >> > > >>>>>> On Sat, Jan 29, 2022 at 11:46 AM mattison chao <
> > >> mattisonchao@gmail.com
> > >> > > >>>>
> > >> > > >>>>>> wrote:
> > >> > > >>>>>>
> > >> > > >>>>>>> Hello everyone,
> > >> > > >>>>>>>
> > >> > > >>>>>>> I want to start a discussion about PIP-139 : Support
> Broker
> > >> send
> > >> > > >>> command
> > >> > > >>>>>>> to real close producer/consumer.
> > >> > > >>>>>>>
> > >> > > >>>>>>> This is the PIP document
> > >> > > >>>>>>>
> > >> > > >>>>>>> https://github.com/apache/pulsar/issues/13989 <
> > >> > > >>>>>>> https://github.com/apache/pulsar/issues/13979>
> > >> > > >>>>>>>
> > >> > > >>>>>>> Please check it out and feel free to share your thoughts.
> > >> > > >>>>>>>
> > >> > > >>>>>>> Best,
> > >> > > >>>>>>> Mattison
> > >> > > >>>>>>>
> > >> > > >>>>>>>
> > >> > > >>>>>>> ———————— Pasted below for quoting convenience.
> > >> > > >>>>>>>
> > >> > > >>>>>>>
> > >> > > >>>>>>>
> > >> > > >>>>>>> Relation pull request:  #13337
> > >> > > >>>>>>> Authors: @Technoboy-  @mattisonchao
> > >> > > >>>>>>>
> > >> > > >>>>>>> ## Motivation
> > >> > > >>>>>>>
> > >> > > >>>>>>> Before we discuss this pip, I'd like to supplement some
> > >> context to
> > >> > > >>> help
> > >> > > >>>>>>> contributors who don't want to read the original pull
> request.
> > >> > > >>>>>>>
> > >> > > >>>>>>>> When there are no user-created topics under a namespace,
> > >> Namespace
> > >> > > >>>>>>> should be deleted. But currently, the system topic existed
> > >> and the
> > >> > > >>>>>>> reader/producer could auto-create the system which may
> cause
> > >> the
> > >> > > >>> namespace
> > >> > > >>>>>>> deletion to fail.
> > >> > > >>>>>>>
> > >> > > >>>>>>> For this reason, we need to close the system topic
> > >> reader/producer
> > >> > > >>> first,
> > >> > > >>>>>>> then remove the system topic. finally, remove the
> namespace.
> > >> > > >>>>>>>
> > >> > > >>>>>>> Following this way, we first want to use ``terminate`` to
> > >> solve this
> > >> > > >>>>>>> problem. then we found producers can disconnect, but
> > >> consumers are
> > >> > > >>> still
> > >> > > >>>>>>> alive. So, another PR #13960 wants to add consumers'
> closing
> > >> logic.
> > >> > > >>>>>>>
> > >> > > >>>>>>> After #13960, all things look good, but another problem
> > >> appears. that
> > >> > > >>> is
> > >> > > >>>>>>> we need to wait until consumers completely consume all
> > >> messages (this
> > >> > > >>> may
> > >> > > >>>>>>> make terminate topic so long and the operation
> timeout)then
> > >> get
> > >> > > >>>>>>> ``reachedEndOfTopic``. the relative code here :
> > >> > > >>>>>>>
> > >> > > >>>>>>>
> > >> > > >>>>>>>
> > >> > > >>>
> > >>
> https://github.com/apache/pulsar/blob/07ef9231db8b844586b9217ee2d59237eb9c54b7/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/CompactorSubscription.java#L102-L106
> > >> > > >>>>>>>
> > >> > > >>>>>>> In the #13337 case, we need to force close consumers
> > >> immediately. So
> > >> > > >>> we
> > >> > > >>>>>>> write this PIP to discuss another way to resolve this
> problem.
> > >> > > >>>>>>>
> > >> > > >>>>>>> ## Goal
> > >> > > >>>>>>>
> > >> > > >>>>>>> We can add a new field(`allow_reconnect`) in command
> > >> > > >>>>>>> ``CommandCloseProducer``/ ``CommandCloseConsumer`` to
> close
> > >> > > >>>>>>> producer/consumers immediately.
> > >> > > >>>>>>>
> > >> > > >>>>>>> ## API Changes
> > >> > > >>>>>>>
> > >> > > >>>>>>> - Add ``allow_reconnect`` to ``CommandCloseProducer``;
> > >> > > >>>>>>>
> > >> > > >>>>>>> ```java
> > >> > > >>>>>>> **Before**
> > >> > > >>>>>>>
> > >> > > >>>>>>> message CommandCloseProducer {
> > >> > > >>>>>>>   required uint64 producer_id = 1;
> > >> > > >>>>>>>   required uint64 request_id = 2;
> > >> > > >>>>>>> }
> > >> > > >>>>>>>
> > >> > > >>>>>>> **After**
> > >> > > >>>>>>> message CommandCloseProducer {
> > >> > > >>>>>>>   required uint64 producer_id = 1;
> > >> > > >>>>>>>   required uint64 request_id = 2;
> > >> > > >>>>>>>   optional bool allow_reconnect = 3 [default = true];
> > >> > > >>>>>>> }
> > >> > > >>>>>>> ```
> > >> > > >>>>>>>
> > >> > > >>>>>>> - Add ``allow_reconnect`` to ``CommandCloseConsumer``
> > >> > > >>>>>>> ```java
> > >> > > >>>>>>> **Before**
> > >> > > >>>>>>>
> > >> > > >>>>>>> message CommandCloseConsumer {
> > >> > > >>>>>>>   required uint64 consumer_id = 1;
> > >> > > >>>>>>>   required uint64 request_id = 2;
> > >> > > >>>>>>> }
> > >> > > >>>>>>>
> > >> > > >>>>>>> **After**
> > >> > > >>>>>>> message CommandCloseConsumer {
> > >> > > >>>>>>>   required uint64 consumer_id = 1;
> > >> > > >>>>>>>   required uint64 request_id = 2;
> > >> > > >>>>>>>   optional bool allow_reconnect = 3 [default = true];
> > >> > > >>>>>>> }
> > >> > > >>>>>>> ```
> > >> > > >>>>>>>
> > >> > > >>>>>>> ## Implementation
> > >> > > >>>>>>>
> > >> > > >>>>>>> ### ClientCnx - Producer:
> > >> > > >>>>>>>
> > >> > > >>>>>>> **Before**
> > >> > > >>>>>>> ```java
> > >> > > >>>>>>>   @Override
> > >> > > >>>>>>>   protected void handleCloseProducer(CommandCloseProducer
> > >> > > >>> closeProducer)
> > >> > > >>>>>>> {
> > >> > > >>>>>>>       log.info("[{}] Broker notification of Closed
> producer:
> > >> {}",
> > >> > > >>>>>>> remoteAddress, closeProducer.getProducerId());
> > >> > > >>>>>>>       final long producerId =
> closeProducer.getProducerId();
> > >> > > >>>>>>>       ProducerImpl<?> producer =
> producers.get(producerId);
> > >> > > >>>>>>>       if (producer != null) {
> > >> > > >>>>>>>           producer.connectionClosed(this);
> > >> > > >>>>>>>       } else {
> > >> > > >>>>>>>           log.warn("Producer with id {} not found while
> > >> closing
> > >> > > >>> producer
> > >> > > >>>>>>> ", producerId);
> > >> > > >>>>>>>       }
> > >> > > >>>>>>>   }
> > >> > > >>>>>>> ```
> > >> > > >>>>>>> After:
> > >> > > >>>>>>> ```java
> > >> > > >>>>>>>  @Override
> > >> > > >>>>>>>   protected void handleCloseProducer(CommandCloseProducer
> > >> > > >>> closeProducer)
> > >> > > >>>>>>> {
> > >> > > >>>>>>>       log.info("[{}] Broker notification of Closed
> producer:
> > >> {}",
> > >> > > >>>>>>> remoteAddress, closeProducer.getProducerId());
> > >> > > >>>>>>>       final long producerId =
> closeProducer.getProducerId();
> > >> > > >>>>>>>       ProducerImpl<?> producer =
> producers.get(producerId);
> > >> > > >>>>>>>       if (producer != null) {
> > >> > > >>>>>>>           if (closeProducer.isAllowReconnect) {
> > >> > > >>>>>>>               producer.connectionClosed(this);
> > >> > > >>>>>>>           } else {
> > >> > > >>>>>>>               producer.closeAsync();
> > >> > > >>>>>>>           }
> > >> > > >>>>>>>       } else {
> > >> > > >>>>>>>           log.warn("Producer with id {} not found while
> > >> closing
> > >> > > >>> producer
> > >> > > >>>>>>> ", producerId);
> > >> > > >>>>>>>       }
> > >> > > >>>>>>>   }
> > >> > > >>>>>>> ```
> > >> > > >>>>>>> ### ClientCnx - Consumer:
> > >> > > >>>>>>>
> > >> > > >>>>>>> **Before**
> > >> > > >>>>>>> ```java
> > >> > > >>>>>>>   @Override
> > >> > > >>>>>>>   protected void handleCloseConsumer(CommandCloseConsumer
> > >> > > >>> closeConsumer)
> > >> > > >>>>>>> {
> > >> > > >>>>>>>       log.info("[{}] Broker notification of Closed
> consumer:
> > >> {}",
> > >> > > >>>>>>> remoteAddress, closeConsumer.getConsumerId());
> > >> > > >>>>>>>       final long consumerId =
> closeConsumer.getConsumerId();
> > >> > > >>>>>>>       ConsumerImpl<?> consumer =
> consumers.get(consumerId);
> > >> > > >>>>>>>       if (consumer != null) {
> > >> > > >>>>>>>           consumer.connectionClosed(this);
> > >> > > >>>>>>>       } else {
> > >> > > >>>>>>>           log.warn("Consumer with id {} not found while
> > >> closing
> > >> > > >>> consumer
> > >> > > >>>>>>> ", consumerId);
> > >> > > >>>>>>>       }
> > >> > > >>>>>>>   }
> > >> > > >>>>>>> ```
> > >> > > >>>>>>> **After**
> > >> > > >>>>>>> ```java
> > >> > > >>>>>>>   @Override
> > >> > > >>>>>>>   protected void handleCloseConsumer(CommandCloseConsumer
> > >> > > >>> closeConsumer)
> > >> > > >>>>>>> {
> > >> > > >>>>>>>       log.info("[{}] Broker notification of Closed
> consumer:
> > >> {}",
> > >> > > >>>>>>> remoteAddress, closeConsumer.getConsumerId());
> > >> > > >>>>>>>       final long consumerId =
> closeConsumer.getConsumerId();
> > >> > > >>>>>>>       ConsumerImpl<?> consumer =
> consumers.get(consumerId);
> > >> > > >>>>>>>       if (consumer != null) {
> > >> > > >>>>>>>           if (closeConsumer.isAllowReconnect) {
> > >> > > >>>>>>>               consumer.connectionClosed(this);
> > >> > > >>>>>>>           } else {
> > >> > > >>>>>>>               consumer.closeAsync();
> > >> > > >>>>>>>           }
> > >> > > >>>>>>>       } else {
> > >> > > >>>>>>>           log.warn("Consumer with id {} not found while
> > >> closing
> > >> > > >>> consumer
> > >> > > >>>>>>> ", consumerId);
> > >> > > >>>>>>>       }
> > >> > > >>>>>>>   }
> > >> > > >>>>>>> ```
> > >> > > >>>>>>>
> > >> > > >>>>>>>
> > >> > > >>>>>>> ## Reject Alternatives
> > >> > > >>>>>>>
> > >> > > >>>>>>> none.
> > >> > > >>>>>
> > >> > > >>>>>
> > >> > > >>>>>
> > >> > > >>>>> --
> > >> > > >>>>> Zike Yang
> > >> > > >>>
> > >> > > >>>
> > >> > >
> > >>
> > >
>

Re: [DISCUSS] PIP-139 : Support Broker send command to real close producer/consumer.

Posted by Michael Marshall <mm...@apache.org>.
> We need to determine the overall plan, not the implementation
> at this moment.

Great point, I agree.

> Looks like we need an option in the
> `CommandCloseProducer` to avoid the replication producer
> reconnecting again.

This is an important point. One of the primary requirements for
geo-replication must also be that a replicated namespace or topic can
be deleted and recreated without any problems or leaks in the broker.
That requires that the producer is cleaned up properly when the
namespace is deleted. I think we could achieve that with a new server
error or an expansion of the `CommandCloseProducer` command.

To me, the main question is whether we create a custom error or expand
the `CommandCloseProducer` command. I lean towards adding an error
because it will automatically be backward compatible based on the way
the client determines which errors are retriable. Although, I don't
have a strong opinion.

The key code paths that we need to consider are topic lookup and topic
creation. Both the http admin endpoint and the binary protocol
endpoint interact with these code paths, and we should fail requests
for namespaces which have the "deleted" field set to true.

In looking at the code, we could modify the logic in
`PulsarWebResource#checkLocalOrGetPeerReplicationCluster`. That method
is always called for topic lookups via HTTP or via the Pulsar
binary protocol. If a namespace doesn't exist, that method currently
returns a NOT_FOUND exception. In the binary protocol case, that
exception is converted to a `ServerError.MetadataError`. We could
modify the method like I do here [0], or we could add another case and
return a custom error message.

We'd also need to prevent topic creation via the admin endpoint and
the auto topic creation via the binary protocol. I believe this will
be just as easy as the lookup case, but I haven't confirmed.

Thanks,
Michael

[0] https://github.com/michaeljmarshall/pulsar/commit/93611a741af163587f79b88bc1c9f1acc7953512




On Fri, Feb 25, 2022 at 3:45 AM PengHui Li <pe...@apache.org> wrote:
>
> The replicated topic deletion also has a similar problem.
> If the topic auto-creation is enabled, we are not able to delete
> a replicated topic, because the producer used for replication
> will try to reconnect. Looks like we need an option in the
> `CommandCloseProducer` to avoid the replication producer
> reconnecting again.
>
> Penghui
>
> On Fri, Feb 25, 2022 at 5:26 PM PengHui Li <pe...@apache.org> wrote:
>
> > > Penghui, are you suggesting that we implement the namespace/tenant
> > terminated logic after completing this PIP?
> >
> > I'm ok with both being implemented together or separated.
> > We need to determine the overall plan, not the implementation
> > at this moment.
> >
> > > For the sake of discussion, if we implement the namespace terminated
> > logic first, we could fulfill the underlying requirements for this PIP
> > by returning a new non-retriable error response when a client tries to
> > connect a producer or a consumer to a topic in a namespace that is
> > "terminated". If we do add the "namespace terminated" feature, we'll
> > need to add a non-retriable exception for this case, anyway. The main
> > advantage here is that we'd only need one expansion of the protobuf
> > instead of two. The downside is that the protocol for connected
> > clients has a couple more roundtrips. The broker would disconnect
> > connected clients and then fail their reconnection attempt with a
> > non-retriable error.
> >
> > Regarding the namespace "terminated" concept, I just noticed that we
> > already have a "deleted" field in a namespace's policies [0]. There is
> > even a comment that says:
> >
> > Yes, if we can terminate the namespace first, we should return “namespace
> > not found”
> > if the producers and consumers try to reconnect, the client should stop
> > reconnecting
> > after getting this error. Looks like following this way, we don't need to
> > introduce any protocol changes anymore.
> >
> > > As I think about it more, I no longer think "terminated" is the right
> > term for what I proposed above. Our goal is to briefly prevent any
> > topic creation to ensure we can delete all sub resources for a
> > namespace. On the other hand, a terminated topic isn't necessarily
> > short lived. If we want to apply the "terminated" term unequivocally
> > to both topics and namespaces, I think a terminated namespace would
> > need to be a namespace where all topics are in terminated state and no
> > additional topics could be created. That's not the feature we're
> > discussing here, though. Deleted seems like the right term to me,
> > especially since we're already using it to prevent a race condition
> >
> > Yes, topic termination has different meanings, deletion should be the right
> > term here.
> >
> > Thanks,
> > Penghui
> >
> >
> > On Fri, Feb 25, 2022 at 12:19 PM Michael Marshall <mm...@apache.org>
> > wrote:
> >
> >> Regarding the namespace "terminated" concept, I just noticed that we
> >> already have a "deleted" field in a namespace's policies [0]. There is
> >> even a comment that says:
> >>
> >> > // set the policies to deleted so that somebody else cannot acquire
> >> this namespace
> >>
> >> I am not familiar with this feature, but it seems like this policy
> >> field could be checked before creating a topic in a namespace. That
> >> would remove certain races described above.
> >>
> >> As I think about it more, I no longer think "terminated" is the right
> >> term for what I proposed above. Our goal is to briefly prevent any
> >> topic creation to ensure we can delete all sub resources for a
> >> namespace. On the other hand, a terminated topic isn't necessarily
> >> short lived. If we want to apply the "terminated" term unequivocally
> >> to both topics and namespaces, I think a terminated namespace would
> >> need to be a namespace where all topics are in terminated state and no
> >> additional topics could be created. That's not the feature we're
> >> discussing here, though. Deleted seems like the right term to me,
> >> especially since we're already using it to prevent a race condition
> >> [0].
> >>
> >> Thanks,
> >> Michael
> >>
> >> [0]
> >> https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/impl/NamespacesBase.java#L252-L262
> >>
> >> On Thu, Feb 24, 2022 at 9:58 PM Michael Marshall <mm...@apache.org>
> >> wrote:
> >> >
> >> > Hi Dave,
> >> >
> >> > > automatically delete tenants and namespaces for not containing topics
> >> >
> >> > I don't think that is what we are discussing. I agree that the initial
> >> > email says just that, though, which is why I asked above:
> >> >
> >> > >> When there are no user-created topics under a namespace,
> >> > >> Namespace should be deleted.
> >> > > This is supposed to mean that the namespace should be able to be
> >> > > deleted, correct?
> >> >
> >> > Perhaps Mattison can clarify, too.
> >> >
> >> > My current understanding of the context for the PIP is that a user
> >> > call to delete a namespace without force can fail when a producer
> >> > reconnects to a deleted topic. The goal is to remove the race
> >> > condition to ensure namespace deletion can succeed.
> >> >
> >> > Thanks,
> >> > Michael
> >> >
> >> > On Thu, Feb 24, 2022 at 5:21 PM Dave Fisher <wa...@apache.org> wrote:
> >> > >
> >> > > Hi -
> >> > >
> >> > > I hope I’m understanding what’s being discussed.
> >> > >
> >> > > If we are going to automatically delete tenants and namespaces for
> >> not containing topics then we need to make both of these automatic actions
> >> configurable with a default to NOT do so. Otherwise we break existing use
> >> cases.
> >> > >
> >> > > Automatic deletion of namespaces should be configurable at both the
> >> cluster and tenant level.
> >> > >
> >> > > Regards,
> >> > > Dave
> >> > >
> >> > > > On Feb 24, 2022, at 2:25 PM, Michael Marshall <mm...@apache.org>
> >> wrote:
> >> > > >
> >> > > >> The old producer/consumer should be closed after applying the
> >> changes from
> >> > > >> this proposal.
> >> > > >
> >> > > > Penghui, are you suggesting that we implement the namespace/tenant
> >> > > > terminated logic after completing this PIP?
> >> > > >
> >> > > > For the sake of discussion, if we implement the namespace terminated
> >> > > > logic first, we could fulfill the underlying requirements for this
> >> PIP
> >> > > > by returning a new non-retriable error response when a client tries
> >> to
> >> > > > connect a producer or a consumer to a topic in a namespace that is
> >> > > > "terminated". If we do add the "namespace terminated" feature, we'll
> >> > > > need to add a non-retriable exception for this case, anyway. The
> >> main
> >> > > > advantage here is that we'd only need one expansion of the protobuf
> >> > > > instead of two. The downside is that the protocol for connected
> >> > > > clients has a couple more roundtrips. The broker would disconnect
> >> > > > connected clients and then fail their reconnection attempt with a
> >> > > > non-retriable error.
> >> > > >
> >> > > > Thanks,
> >> > > > Michael
> >> > > >
> >> > > > On Thu, Feb 24, 2022 at 7:11 AM PengHui Li <pe...@apache.org>
> >> wrote:
> >> > > >>
> >> > > >>> If we want to solve this problem, we need to add some sync
> >> resources like
> >> > > >> lock/state, I think it’s a harm for us, we don’t need to do that.
> >> > > >>
> >> > > >> I think we can make the namespace/tenants to the inactive state
> >> first so
> >> > > >> that we can avoid any new
> >> > > >> producer/consumer connect to the topic under the namespace/tenant.
> >> > > >>
> >> > > >> The old producer/consumer should be closed after applying the
> >> changes from
> >> > > >> this proposal.
> >> > > >>
> >> > > >> Thanks,
> >> > > >> Penghui
> >> > > >>
> >> > > >> On Tue, Feb 8, 2022 at 5:47 PM mattison chao <
> >> mattisonchao@gmail.com> wrote:
> >> > > >>
> >> > > >>>> This is supposed to mean that the namespace should be able to be
> >> > > >>>> deleted, correct?
> >> > > >>>
> >> > > >>> Yes, the main background is the user doesn’t have an active
> >> topic. so,
> >> > > >>> they want to delete the namespace.
> >> > > >>>
> >> > > >>>> However, I think
> >> > > >>>> we might still have a race condition that could make tenant or
> >> > > >>>> namespace deletion fail. Specifically, if a new producer or
> >> consumer
> >> > > >>>> creates a topic after the namespace deletion has started but
> >> > > >>>> before it is complete. Do you agree that the underlying race
> >> still
> >> > > >>> exists?
> >> > > >>>
> >> > > >>> Yes, this condition exists. I think it’s not a big problem
> >> because the
> >> > > >>> user doesn’t want to use this namespace anymore.
> >> > > >>> If this scenario appears, they will get an error and need to
> >> delete it
> >> > > >>> again.
> >> > > >>>
> >> > > >>>> What if we expand our usage of the "terminated" feature to apply
> >> to
> >> > > >>>> namespaces (and tenants)? Then, a terminated namespace can have
> >> > > >>>> bundles and topics can be deleted but not created (just as a
> >> terminated
> >> > > >>>> topic cannot have any new messages published to it). This would
> >> take
> >> > > >>>> care of all topic creation race conditions. We'd probably need
> >> to add
> >> > > >>>> new protobuf exceptions for this feature.
> >> > > >>>
> >> > > >>>
> >> > > >>> If we want to solve this problem, we need to add some sync
> >> resources like
> >> > > >>> lock/state, I think it’s a harm for us, we don’t need to do that.
> >> > > >>>
> >> > > >>> Thanks for your suggestions, let me know what you think.
> >> > > >>>
> >> > > >>> Best,
> >> > > >>> Mattison
> >> > > >>>
> >> > > >>>> On Feb 1, 2022, at 2:26 PM, Michael Marshall <
> >> mmarshall@apache.org>
> >> > > >>> wrote:
> >> > > >>>>
> >> > > >>>> This proposal identifies an important issue that we should
> >> definitely
> >> > > >>>> solve. I have some questions.
> >> > > >>>>
> >> > > >>>>> When there are no user-created topics under a namespace,
> >> > > >>>>> Namespace should be deleted.
> >> > > >>>>
> >> > > >>>> This is supposed to mean that the namespace should be able to be
> >> > > >>>> deleted, correct?
> >> > > >>>>
> >> > > >>>>> For this reason, we need to close the system topic
> >> reader/producer
> >> > > >>>>> first, then remove the system topic. finally, remove the
> >> namespace.
> >> > > >>>>
> >> > > >>>> I agree that expanding the protobuf CloseProducer and
> >> CloseConsumer
> >> > > >>>> commands could be valuable here. The expansion would ensure that
> >> > > >>>> producers and consumers don't attempt to reconnect. However, I
> >> think
> >> > > >>>> we might still have a race condition that could make tenant or
> >> > > >>>> namespace deletion fail. Specifically, if a new producer or
> >> consumer
> >> > > >>>> creates a topic after the namespace deletion has started but
> >> > > >>>> before it is complete. Do you agree that the underlying race
> >> still
> >> > > >>> exists?
> >> > > >>>>
> >> > > >>>> In my view, the fundamental problem here is that deleting
> >> certain Pulsar
> >> > > >>>> resources takes time and, in a distributed system, that means
> >> race
> >> > > >>>> conditions.
> >> > > >>>>
> >> > > >>>> What if we expand our usage of the "terminated" feature to apply
> >> to
> >> > > >>>> namespaces (and tenants)? Then, a terminated namespace can have
> >> > > >>>> bundles and topics can be deleted but not created (just as a
> >> terminated
> >> > > >>>> topic cannot have any new messages published to it). This would
> >> take
> >> > > >>>> care of all topic creation race conditions. We'd probably need
> >> to add
> >> > > >>>> new protobuf exceptions for this feature.
> >> > > >>>>
> >> > > >>>> Thanks,
> >> > > >>>> Michael
> >> > > >>>>
> >> > > >>>>
> >> > > >>>> On Sat, Jan 29, 2022 at 7:25 PM Zike Yang
> >> > > >>>> <zk...@streamnative.io.invalid> wrote:
> >> > > >>>>>
> >> > > >>>>> +1
> >> > > >>>>>
> >> > > >>>>>
> >> > > >>>>> Thanks,
> >> > > >>>>> Zike
> >> > > >>>>>
> >> > > >>>>> On Sat, Jan 29, 2022 at 12:30 PM guo jiwei <
> >> technoboy@apache.org>
> >> > > >>> wrote:
> >> > > >>>>>>
> >> > > >>>>>> Hi
> >> > > >>>>>> The PIP link : https://github.com/apache/pulsar/issues/13989
> >> > > >>>>>>
> >> > > >>>>>> Regards
> >> > > >>>>>> Jiwei Guo (Tboy)
> >> > > >>>>>>
> >> > > >>>>>>
> >> > > >>>>>> On Sat, Jan 29, 2022 at 11:46 AM mattison chao <
> >> mattisonchao@gmail.com
> >> > > >>>>
> >> > > >>>>>> wrote:
> >> > > >>>>>>
> >> > > >>>>>>> Hello everyone,
> >> > > >>>>>>>
> >> > > >>>>>>> I want to start a discussion about PIP-139 : Support Broker
> >> send
> >> > > >>> command
> >> > > >>>>>>> to real close producer/consumer.
> >> > > >>>>>>>
> >> > > >>>>>>> This is the PIP document
> >> > > >>>>>>>
> >> > > >>>>>>> https://github.com/apache/pulsar/issues/13989 <
> >> > > >>>>>>> https://github.com/apache/pulsar/issues/13979>
> >> > > >>>>>>>
> >> > > >>>>>>> Please check it out and feel free to share your thoughts.
> >> > > >>>>>>>
> >> > > >>>>>>> Best,
> >> > > >>>>>>> Mattison
> >> > > >>>>>>>
> >> > > >>>>>>>
> >> > > >>>>>>> ———————— Pasted below for quoting convenience.
> >> > > >>>>>>>
> >> > > >>>>>>>
> >> > > >>>>>>>
> >> > > >>>>>>> Relation pull request:  #13337
> >> > > >>>>>>> Authors: @Technoboy-  @mattisonchao
> >> > > >>>>>>>
> >> > > >>>>>>> ## Motivation
> >> > > >>>>>>>
> >> > > >>>>>>> Before we discuss this pip, I'd like to supplement some
> >> context to
> >> > > >>> help
> >> > > >>>>>>> contributors who don't want to read the original pull request.
> >> > > >>>>>>>
> >> > > >>>>>>>> When there are no user-created topics under a namespace,
> >> Namespace
> >> > > >>>>>>> should be deleted. But currently, the system topic existed
> >> and the
> >> > > >>>>>>> reader/producer could auto-create the system which may cause
> >> the
> >> > > >>> namespace
> >> > > >>>>>>> deletion to fail.
> >> > > >>>>>>>
> >> > > >>>>>>> For this reason, we need to close the system topic
> >> reader/producer
> >> > > >>> first,
> >> > > >>>>>>> then remove the system topic. finally, remove the namespace.
> >> > > >>>>>>>
> >> > > >>>>>>> Following this way, we first want to use ``terminate`` to
> >> solve this
> >> > > >>>>>>> problem. then we found producers can disconnect, but
> >> consumers are
> >> > > >>> still
> >> > > >>>>>>> alive. So, another PR #13960 wants to add consumers' closing
> >> logic.
> >> > > >>>>>>>
> >> > > >>>>>>> After #13960, all things look good, but another problem
> >> appears. that
> >> > > >>> is
> >> > > >>>>>>> we need to wait until consumers completely consume all
> >> messages (this
> >> > > >>> may
> >> > > >>>>>>> make terminate topic so long and the operation timeout)then
> >> get
> >> > > >>>>>>> ``reachedEndOfTopic``. the relative code here :
> >> > > >>>>>>>
> >> > > >>>>>>>
> >> > > >>>>>>>
> >> > > >>>
> >> https://github.com/apache/pulsar/blob/07ef9231db8b844586b9217ee2d59237eb9c54b7/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/CompactorSubscription.java#L102-L106
> >> > > >>>>>>>
> >> > > >>>>>>> In the #13337 case, we need to force close consumers
> >> immediately. So
> >> > > >>> we
> >> > > >>>>>>> write this PIP to discuss another way to resolve this problem.
> >> > > >>>>>>>
> >> > > >>>>>>> ## Goal
> >> > > >>>>>>>
> >> > > >>>>>>> We can add a new field(`allow_reconnect`) in command
> >> > > >>>>>>> ``CommandCloseProducer``/ ``CommandCloseConsumer`` to close
> >> > > >>>>>>> producer/consumers immediately.
> >> > > >>>>>>>
> >> > > >>>>>>> ## API Changes
> >> > > >>>>>>>
> >> > > >>>>>>> - Add ``allow_reconnect`` to ``CommandCloseProducer``;
> >> > > >>>>>>>
> >> > > >>>>>>> ```java
> >> > > >>>>>>> **Before**
> >> > > >>>>>>>
> >> > > >>>>>>> message CommandCloseProducer {
> >> > > >>>>>>>   required uint64 producer_id = 1;
> >> > > >>>>>>>   required uint64 request_id = 2;
> >> > > >>>>>>> }
> >> > > >>>>>>>
> >> > > >>>>>>> **After**
> >> > > >>>>>>> message CommandCloseProducer {
> >> > > >>>>>>>   required uint64 producer_id = 1;
> >> > > >>>>>>>   required uint64 request_id = 2;
> >> > > >>>>>>>   optional bool allow_reconnect = 3 [default = true];
> >> > > >>>>>>> }
> >> > > >>>>>>> ```
> >> > > >>>>>>>
> >> > > >>>>>>> - Add ``allow_reconnect`` to ``CommandCloseConsumer``
> >> > > >>>>>>> ```java
> >> > > >>>>>>> **Before**
> >> > > >>>>>>>
> >> > > >>>>>>> message CommandCloseConsumer {
> >> > > >>>>>>>   required uint64 consumer_id = 1;
> >> > > >>>>>>>   required uint64 request_id = 2;
> >> > > >>>>>>> }
> >> > > >>>>>>>
> >> > > >>>>>>> **After**
> >> > > >>>>>>> message CommandCloseConsumer {
> >> > > >>>>>>>   required uint64 consumer_id = 1;
> >> > > >>>>>>>   required uint64 request_id = 2;
> >> > > >>>>>>>   optional bool allow_reconnect = 3 [default = true];
> >> > > >>>>>>> }
> >> > > >>>>>>> ```
> >> > > >>>>>>>
> >> > > >>>>>>> ## Implementation
> >> > > >>>>>>>
> >> > > >>>>>>> ### ClientCnx - Producer:
> >> > > >>>>>>>
> >> > > >>>>>>> **Before**
> >> > > >>>>>>> ```java
> >> > > >>>>>>>   @Override
> >> > > >>>>>>>   protected void handleCloseProducer(CommandCloseProducer
> >> > > >>> closeProducer)
> >> > > >>>>>>> {
> >> > > >>>>>>>       log.info("[{}] Broker notification of Closed producer:
> >> {}",
> >> > > >>>>>>> remoteAddress, closeProducer.getProducerId());
> >> > > >>>>>>>       final long producerId = closeProducer.getProducerId();
> >> > > >>>>>>>       ProducerImpl<?> producer = producers.get(producerId);
> >> > > >>>>>>>       if (producer != null) {
> >> > > >>>>>>>           producer.connectionClosed(this);
> >> > > >>>>>>>       } else {
> >> > > >>>>>>>           log.warn("Producer with id {} not found while
> >> closing
> >> > > >>> producer
> >> > > >>>>>>> ", producerId);
> >> > > >>>>>>>       }
> >> > > >>>>>>>   }
> >> > > >>>>>>> ```
> >> > > >>>>>>> After:
> >> > > >>>>>>> ```java
> >> > > >>>>>>>  @Override
> >> > > >>>>>>>   protected void handleCloseProducer(CommandCloseProducer
> >> > > >>> closeProducer)
> >> > > >>>>>>> {
> >> > > >>>>>>>       log.info("[{}] Broker notification of Closed producer:
> >> {}",
> >> > > >>>>>>> remoteAddress, closeProducer.getProducerId());
> >> > > >>>>>>>       final long producerId = closeProducer.getProducerId();
> >> > > >>>>>>>       ProducerImpl<?> producer = producers.get(producerId);
> >> > > >>>>>>>       if (producer != null) {
> >> > > >>>>>>>           if (closeProducer.isAllowReconnect) {
> >> > > >>>>>>>               producer.connectionClosed(this);
> >> > > >>>>>>>           } else {
> >> > > >>>>>>>               producer.closeAsync();
> >> > > >>>>>>>           }
> >> > > >>>>>>>       } else {
> >> > > >>>>>>>           log.warn("Producer with id {} not found while
> >> closing
> >> > > >>> producer
> >> > > >>>>>>> ", producerId);
> >> > > >>>>>>>       }
> >> > > >>>>>>>   }
> >> > > >>>>>>> ```
> >> > > >>>>>>> ### ClientCnx - Consumer:
> >> > > >>>>>>>
> >> > > >>>>>>> **Before**
> >> > > >>>>>>> ```java
> >> > > >>>>>>>   @Override
> >> > > >>>>>>>   protected void handleCloseConsumer(CommandCloseConsumer
> >> > > >>> closeConsumer)
> >> > > >>>>>>> {
> >> > > >>>>>>>       log.info("[{}] Broker notification of Closed consumer:
> >> {}",
> >> > > >>>>>>> remoteAddress, closeConsumer.getConsumerId());
> >> > > >>>>>>>       final long consumerId = closeConsumer.getConsumerId();
> >> > > >>>>>>>       ConsumerImpl<?> consumer = consumers.get(consumerId);
> >> > > >>>>>>>       if (consumer != null) {
> >> > > >>>>>>>           consumer.connectionClosed(this);
> >> > > >>>>>>>       } else {
> >> > > >>>>>>>           log.warn("Consumer with id {} not found while
> >> closing
> >> > > >>> consumer
> >> > > >>>>>>> ", consumerId);
> >> > > >>>>>>>       }
> >> > > >>>>>>>   }
> >> > > >>>>>>> ```
> >> > > >>>>>>> **After**
> >> > > >>>>>>> ```java
> >> > > >>>>>>>   @Override
> >> > > >>>>>>>   protected void handleCloseConsumer(CommandCloseConsumer
> >> > > >>> closeConsumer)
> >> > > >>>>>>> {
> >> > > >>>>>>>       log.info("[{}] Broker notification of Closed consumer:
> >> {}",
> >> > > >>>>>>> remoteAddress, closeConsumer.getConsumerId());
> >> > > >>>>>>>       final long consumerId = closeConsumer.getConsumerId();
> >> > > >>>>>>>       ConsumerImpl<?> consumer = consumers.get(consumerId);
> >> > > >>>>>>>       if (consumer != null) {
> >> > > >>>>>>>           if (closeConsumer.isAllowReconnect) {
> >> > > >>>>>>>               consumer.connectionClosed(this);
> >> > > >>>>>>>           } else {
> >> > > >>>>>>>               consumer.closeAsync();
> >> > > >>>>>>>           }
> >> > > >>>>>>>       } else {
> >> > > >>>>>>>           log.warn("Consumer with id {} not found while
> >> closing
> >> > > >>> consumer
> >> > > >>>>>>> ", consumerId);
> >> > > >>>>>>>       }
> >> > > >>>>>>>   }
> >> > > >>>>>>> ```
> >> > > >>>>>>>
> >> > > >>>>>>>
> >> > > >>>>>>> ## Reject Alternatives
> >> > > >>>>>>>
> >> > > >>>>>>> none.
> >> > > >>>>>
> >> > > >>>>>
> >> > > >>>>>
> >> > > >>>>> --
> >> > > >>>>> Zike Yang
> >> > > >>>
> >> > > >>>
> >> > >
> >>
> >

Re: [DISCUSS] PIP-139 : Support Broker send command to real close producer/consumer.

Posted by PengHui Li <pe...@apache.org>.
The replicated topic deletion also has a similar problem.
If the topic auto-creation is enabled, we are not able to delete
a replicated topic, because the producer used for replication
will try to reconnect. Looks like we need an option in the
`CommandCloseProducer` to avoid the replication producer
reconnecting again.

Penghui

On Fri, Feb 25, 2022 at 5:26 PM PengHui Li <pe...@apache.org> wrote:

> > Penghui, are you suggesting that we implement the namespace/tenant
> terminated logic after completing this PIP?
>
> I'm ok with both being implemented together or separated.
> We need to determine the overall plan, not the implementation
> at this moment.
>
> > For the sake of discussion, if we implement the namespace terminated
> logic first, we could fulfill the underlying requirements for this PIP
> by returning a new non-retriable error response when a client tries to
> connect a producer or a consumer to a topic in a namespace that is
> "terminated". If we do add the "namespace terminated" feature, we'll
> need to add a non-retriable exception for this case, anyway. The main
> advantage here is that we'd only need one expansion of the protobuf
> instead of two. The downside is that the protocol for connected
> clients has a couple more roundtrips. The broker would disconnect
> connected clients and then fail their reconnection attempt with a
> non-retriable error.
>
> Regarding the namespace "terminated" concept, I just noticed that we
> already have a "deleted" field in a namespace's policies [0]. There is
> even a comment that says:
>
> Yes, if we can terminate the namespace first, we should return “namespace
> not found”
> if the producers and consumers try to reconnect, the client should stop
> reconnecting
> after getting this error. Looks like following this way, we don't need to
> introduce any protocol changes anymore.
>
> > As I think about it more, I no longer think "terminated" is the right
> term for what I proposed above. Our goal is to briefly prevent any
> topic creation to ensure we can delete all sub resources for a
> namespace. On the other hand, a terminated topic isn't necessarily
> short lived. If we want to apply the "terminated" term unequivocally
> to both topics and namespaces, I think a terminated namespace would
> need to be a namespace where all topics are in terminated state and no
> additional topics could be created. That's not the feature we're
> discussing here, though. Deleted seems like the right term to me,
> especially since we're already using it to prevent a race condition
>
> Yes, topic termination has different meanings, deletion should be the right
> term here.
>
> Thanks,
> Penghui
>
>
> On Fri, Feb 25, 2022 at 12:19 PM Michael Marshall <mm...@apache.org>
> wrote:
>
>> Regarding the namespace "terminated" concept, I just noticed that we
>> already have a "deleted" field in a namespace's policies [0]. There is
>> even a comment that says:
>>
>> > // set the policies to deleted so that somebody else cannot acquire
>> this namespace
>>
>> I am not familiar with this feature, but it seems like this policy
>> field could be checked before creating a topic in a namespace. That
>> would remove certain races described above.
>>
>> As I think about it more, I no longer think "terminated" is the right
>> term for what I proposed above. Our goal is to briefly prevent any
>> topic creation to ensure we can delete all sub resources for a
>> namespace. On the other hand, a terminated topic isn't necessarily
>> short lived. If we want to apply the "terminated" term unequivocally
>> to both topics and namespaces, I think a terminated namespace would
>> need to be a namespace where all topics are in terminated state and no
>> additional topics could be created. That's not the feature we're
>> discussing here, though. Deleted seems like the right term to me,
>> especially since we're already using it to prevent a race condition
>> [0].
>>
>> Thanks,
>> Michael
>>
>> [0]
>> https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/impl/NamespacesBase.java#L252-L262
>>
>> On Thu, Feb 24, 2022 at 9:58 PM Michael Marshall <mm...@apache.org>
>> wrote:
>> >
>> > Hi Dave,
>> >
>> > > automatically delete tenants and namespaces for not containing topics
>> >
>> > I don't think that is what we are discussing. I agree that the initial
>> > email says just that, though, which is why I asked above:
>> >
>> > >> When there are no user-created topics under a namespace,
>> > >> Namespace should be deleted.
>> > > This is supposed to mean that the namespace should be able to be
>> > > deleted, correct?
>> >
>> > Perhaps Mattison can clarify, too.
>> >
>> > My current understanding of the context for the PIP is that a user
>> > call to delete a namespace without force can fail when a producer
>> > reconnects to a deleted topic. The goal is to remove the race
>> > condition to ensure namespace deletion can succeed.
>> >
>> > Thanks,
>> > Michael
>> >
>> > On Thu, Feb 24, 2022 at 5:21 PM Dave Fisher <wa...@apache.org> wrote:
>> > >
>> > > Hi -
>> > >
>> > > I hope I’m understanding what’s being discussed.
>> > >
>> > > If we are going to automatically delete tenants and namespaces for
>> not containing topics then we need to make both of these automatic actions
>> configurable with a default to NOT do so. Otherwise we break existing use
>> cases.
>> > >
>> > > Automatic deletion of namespaces should be configurable at both the
>> cluster and tenant level.
>> > >
>> > > Regards,
>> > > Dave
>> > >
>> > > > On Feb 24, 2022, at 2:25 PM, Michael Marshall <mm...@apache.org>
>> wrote:
>> > > >
>> > > >> The old producer/consumer should be closed after applying the
>> changes from
>> > > >> this proposal.
>> > > >
>> > > > Penghui, are you suggesting that we implement the namespace/tenant
>> > > > terminated logic after completing this PIP?
>> > > >
>> > > > For the sake of discussion, if we implement the namespace terminated
>> > > > logic first, we could fulfill the underlying requirements for this
>> PIP
>> > > > by returning a new non-retriable error response when a client tries
>> to
>> > > > connect a producer or a consumer to a topic in a namespace that is
>> > > > "terminated". If we do add the "namespace terminated" feature, we'll
>> > > > need to add a non-retriable exception for this case, anyway. The
>> main
>> > > > advantage here is that we'd only need one expansion of the protobuf
>> > > > instead of two. The downside is that the protocol for connected
>> > > > clients has a couple more roundtrips. The broker would disconnect
>> > > > connected clients and then fail their reconnection attempt with a
>> > > > non-retriable error.
>> > > >
>> > > > Thanks,
>> > > > Michael
>> > > >
>> > > > On Thu, Feb 24, 2022 at 7:11 AM PengHui Li <pe...@apache.org>
>> wrote:
>> > > >>
>> > > >>> If we want to solve this problem, we need to add some sync
>> resources like
>> > > >> lock/state, I think it’s a harm for us, we don’t need to do that.
>> > > >>
>> > > >> I think we can make the namespace/tenants to the inactive state
>> first so
>> > > >> that we can avoid any new
>> > > >> producer/consumer connect to the topic under the namespace/tenant.
>> > > >>
>> > > >> The old producer/consumer should be closed after applying the
>> changes from
>> > > >> this proposal.
>> > > >>
>> > > >> Thanks,
>> > > >> Penghui
>> > > >>
>> > > >> On Tue, Feb 8, 2022 at 5:47 PM mattison chao <
>> mattisonchao@gmail.com> wrote:
>> > > >>
>> > > >>>> This is supposed to mean that the namespace should be able to be
>> > > >>>> deleted, correct?
>> > > >>>
>> > > >>> Yes, the main background is the user doesn’t have an active
>> topic. so,
>> > > >>> they want to delete the namespace.
>> > > >>>
>> > > >>>> However, I think
>> > > >>>> we might still have a race condition that could make tenant or
>> > > >>>> namespace deletion fail. Specifically, if a new producer or
>> consumer
>> > > >>>> creates a topic after the namespace deletion has started but
>> > > >>>> before it is complete. Do you agree that the underlying race
>> still
>> > > >>> exists?
>> > > >>>
>> > > >>> Yes, this condition exists. I think it’s not a big problem
>> because the
>> > > >>> user doesn’t want to use this namespace anymore.
>> > > >>> If this scenario appears, they will get an error and need to
>> delete it
>> > > >>> again.
>> > > >>>
>> > > >>>> What if we expand our usage of the "terminated" feature to apply
>> to
>> > > >>>> namespaces (and tenants)? Then, a terminated namespace can have
>> > > >>>> bundles and topics can be deleted but not created (just as a
>> terminated
>> > > >>>> topic cannot have any new messages published to it). This would
>> take
>> > > >>>> care of all topic creation race conditions. We'd probably need
>> to add
>> > > >>>> new protobuf exceptions for this feature.
>> > > >>>
>> > > >>>
>> > > >>> If we want to solve this problem, we need to add some sync
>> resources like
>> > > >>> lock/state, I think it’s a harm for us, we don’t need to do that.
>> > > >>>
>> > > >>> Thanks for your suggestions, let me know what you think.
>> > > >>>
>> > > >>> Best,
>> > > >>> Mattison
>> > > >>>
>> > > >>>> On Feb 1, 2022, at 2:26 PM, Michael Marshall <
>> mmarshall@apache.org>
>> > > >>> wrote:
>> > > >>>>
>> > > >>>> This proposal identifies an important issue that we should
>> definitely
>> > > >>>> solve. I have some questions.
>> > > >>>>
>> > > >>>>> When there are no user-created topics under a namespace,
>> > > >>>>> Namespace should be deleted.
>> > > >>>>
>> > > >>>> This is supposed to mean that the namespace should be able to be
>> > > >>>> deleted, correct?
>> > > >>>>
>> > > >>>>> For this reason, we need to close the system topic
>> reader/producer
>> > > >>>>> first, then remove the system topic. finally, remove the
>> namespace.
>> > > >>>>
>> > > >>>> I agree that expanding the protobuf CloseProducer and
>> CloseConsumer
>> > > >>>> commands could be valuable here. The expansion would ensure that
>> > > >>>> producers and consumers don't attempt to reconnect. However, I
>> think
>> > > >>>> we might still have a race condition that could make tenant or
>> > > >>>> namespace deletion fail. Specifically, if a new producer or
>> consumer
>> > > >>>> creates a topic after the namespace deletion has started but
>> > > >>>> before it is complete. Do you agree that the underlying race
>> still
>> > > >>> exists?
>> > > >>>>
>> > > >>>> In my view, the fundamental problem here is that deleting
>> certain Pulsar
>> > > >>>> resources takes time and, in a distributed system, that means
>> race
>> > > >>>> conditions.
>> > > >>>>
>> > > >>>> What if we expand our usage of the "terminated" feature to apply
>> to
>> > > >>>> namespaces (and tenants)? Then, a terminated namespace can have
>> > > >>>> bundles and topics can be deleted but not created (just as a
>> terminated
>> > > >>>> topic cannot have any new messages published to it). This would
>> take
>> > > >>>> care of all topic creation race conditions. We'd probably need
>> to add
>> > > >>>> new protobuf exceptions for this feature.
>> > > >>>>
>> > > >>>> Thanks,
>> > > >>>> Michael
>> > > >>>>
>> > > >>>>
>> > > >>>> On Sat, Jan 29, 2022 at 7:25 PM Zike Yang
>> > > >>>> <zk...@streamnative.io.invalid> wrote:
>> > > >>>>>
>> > > >>>>> +1
>> > > >>>>>
>> > > >>>>>
>> > > >>>>> Thanks,
>> > > >>>>> Zike
>> > > >>>>>
>> > > >>>>> On Sat, Jan 29, 2022 at 12:30 PM guo jiwei <
>> technoboy@apache.org>
>> > > >>> wrote:
>> > > >>>>>>
>> > > >>>>>> Hi
>> > > >>>>>> The PIP link : https://github.com/apache/pulsar/issues/13989
>> > > >>>>>>
>> > > >>>>>> Regards
>> > > >>>>>> Jiwei Guo (Tboy)
>> > > >>>>>>
>> > > >>>>>>
>> > > >>>>>> On Sat, Jan 29, 2022 at 11:46 AM mattison chao <
>> mattisonchao@gmail.com
>> > > >>>>
>> > > >>>>>> wrote:
>> > > >>>>>>
>> > > >>>>>>> Hello everyone,
>> > > >>>>>>>
>> > > >>>>>>> I want to start a discussion about PIP-139 : Support Broker
>> send
>> > > >>> command
>> > > >>>>>>> to real close producer/consumer.
>> > > >>>>>>>
>> > > >>>>>>> This is the PIP document
>> > > >>>>>>>
>> > > >>>>>>> https://github.com/apache/pulsar/issues/13989 <
>> > > >>>>>>> https://github.com/apache/pulsar/issues/13979>
>> > > >>>>>>>
>> > > >>>>>>> Please check it out and feel free to share your thoughts.
>> > > >>>>>>>
>> > > >>>>>>> Best,
>> > > >>>>>>> Mattison
>> > > >>>>>>>
>> > > >>>>>>>
>> > > >>>>>>> ———————— Pasted below for quoting convenience.
>> > > >>>>>>>
>> > > >>>>>>>
>> > > >>>>>>>
>> > > >>>>>>> Relation pull request:  #13337
>> > > >>>>>>> Authors: @Technoboy-  @mattisonchao
>> > > >>>>>>>
>> > > >>>>>>> ## Motivation
>> > > >>>>>>>
>> > > >>>>>>> Before we discuss this pip, I'd like to supplement some
>> context to
>> > > >>> help
>> > > >>>>>>> contributors who don't want to read the original pull request.
>> > > >>>>>>>
>> > > >>>>>>>> When there are no user-created topics under a namespace,
>> Namespace
>> > > >>>>>>> should be deleted. But currently, the system topic existed
>> and the
>> > > >>>>>>> reader/producer could auto-create the system which may cause
>> the
>> > > >>> namespace
>> > > >>>>>>> deletion to fail.
>> > > >>>>>>>
>> > > >>>>>>> For this reason, we need to close the system topic
>> reader/producer
>> > > >>> first,
>> > > >>>>>>> then remove the system topic. finally, remove the namespace.
>> > > >>>>>>>
>> > > >>>>>>> Following this way, we first want to use ``terminate`` to
>> solve this
>> > > >>>>>>> problem. then we found producers can disconnect, but
>> consumers are
>> > > >>> still
>> > > >>>>>>> alive. So, another PR #13960 wants to add consumers' closing
>> logic.
>> > > >>>>>>>
>> > > >>>>>>> After #13960, all things look good, but another problem
>> appears. that
>> > > >>> is
>> > > >>>>>>> we need to wait until consumers completely consume all
>> messages (this
>> > > >>> may
>> > > >>>>>>> make terminate topic so long and the operation timeout)then
>> get
>> > > >>>>>>> ``reachedEndOfTopic``. the relative code here :
>> > > >>>>>>>
>> > > >>>>>>>
>> > > >>>>>>>
>> > > >>>
>> https://github.com/apache/pulsar/blob/07ef9231db8b844586b9217ee2d59237eb9c54b7/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/CompactorSubscription.java#L102-L106
>> > > >>>>>>>
>> > > >>>>>>> In the #13337 case, we need to force close consumers
>> immediately. So
>> > > >>> we
>> > > >>>>>>> write this PIP to discuss another way to resolve this problem.
>> > > >>>>>>>
>> > > >>>>>>> ## Goal
>> > > >>>>>>>
>> > > >>>>>>> We can add a new field(`allow_reconnect`) in command
>> > > >>>>>>> ``CommandCloseProducer``/ ``CommandCloseConsumer`` to close
>> > > >>>>>>> producer/consumers immediately.
>> > > >>>>>>>
>> > > >>>>>>> ## API Changes
>> > > >>>>>>>
>> > > >>>>>>> - Add ``allow_reconnect`` to ``CommandCloseProducer``;
>> > > >>>>>>>
>> > > >>>>>>> ```java
>> > > >>>>>>> **Before**
>> > > >>>>>>>
>> > > >>>>>>> message CommandCloseProducer {
>> > > >>>>>>>   required uint64 producer_id = 1;
>> > > >>>>>>>   required uint64 request_id = 2;
>> > > >>>>>>> }
>> > > >>>>>>>
>> > > >>>>>>> **After**
>> > > >>>>>>> message CommandCloseProducer {
>> > > >>>>>>>   required uint64 producer_id = 1;
>> > > >>>>>>>   required uint64 request_id = 2;
>> > > >>>>>>>   optional bool allow_reconnect = 3 [default = true];
>> > > >>>>>>> }
>> > > >>>>>>> ```
>> > > >>>>>>>
>> > > >>>>>>> - Add ``allow_reconnect`` to ``CommandCloseConsumer``
>> > > >>>>>>> ```java
>> > > >>>>>>> **Before**
>> > > >>>>>>>
>> > > >>>>>>> message CommandCloseConsumer {
>> > > >>>>>>>   required uint64 consumer_id = 1;
>> > > >>>>>>>   required uint64 request_id = 2;
>> > > >>>>>>> }
>> > > >>>>>>>
>> > > >>>>>>> **After**
>> > > >>>>>>> message CommandCloseConsumer {
>> > > >>>>>>>   required uint64 consumer_id = 1;
>> > > >>>>>>>   required uint64 request_id = 2;
>> > > >>>>>>>   optional bool allow_reconnect = 3 [default = true];
>> > > >>>>>>> }
>> > > >>>>>>> ```
>> > > >>>>>>>
>> > > >>>>>>> ## Implementation
>> > > >>>>>>>
>> > > >>>>>>> ### ClientCnx - Producer:
>> > > >>>>>>>
>> > > >>>>>>> **Before**
>> > > >>>>>>> ```java
>> > > >>>>>>>   @Override
>> > > >>>>>>>   protected void handleCloseProducer(CommandCloseProducer
>> > > >>> closeProducer)
>> > > >>>>>>> {
>> > > >>>>>>>       log.info("[{}] Broker notification of Closed producer:
>> {}",
>> > > >>>>>>> remoteAddress, closeProducer.getProducerId());
>> > > >>>>>>>       final long producerId = closeProducer.getProducerId();
>> > > >>>>>>>       ProducerImpl<?> producer = producers.get(producerId);
>> > > >>>>>>>       if (producer != null) {
>> > > >>>>>>>           producer.connectionClosed(this);
>> > > >>>>>>>       } else {
>> > > >>>>>>>           log.warn("Producer with id {} not found while
>> closing
>> > > >>> producer
>> > > >>>>>>> ", producerId);
>> > > >>>>>>>       }
>> > > >>>>>>>   }
>> > > >>>>>>> ```
>> > > >>>>>>> After:
>> > > >>>>>>> ```java
>> > > >>>>>>>  @Override
>> > > >>>>>>>   protected void handleCloseProducer(CommandCloseProducer
>> > > >>> closeProducer)
>> > > >>>>>>> {
>> > > >>>>>>>       log.info("[{}] Broker notification of Closed producer:
>> {}",
>> > > >>>>>>> remoteAddress, closeProducer.getProducerId());
>> > > >>>>>>>       final long producerId = closeProducer.getProducerId();
>> > > >>>>>>>       ProducerImpl<?> producer = producers.get(producerId);
>> > > >>>>>>>       if (producer != null) {
>> > > >>>>>>>           if (closeProducer.isAllowReconnect) {
>> > > >>>>>>>               producer.connectionClosed(this);
>> > > >>>>>>>           } else {
>> > > >>>>>>>               producer.closeAsync();
>> > > >>>>>>>           }
>> > > >>>>>>>       } else {
>> > > >>>>>>>           log.warn("Producer with id {} not found while
>> closing
>> > > >>> producer
>> > > >>>>>>> ", producerId);
>> > > >>>>>>>       }
>> > > >>>>>>>   }
>> > > >>>>>>> ```
>> > > >>>>>>> ### ClientCnx - Consumer:
>> > > >>>>>>>
>> > > >>>>>>> **Before**
>> > > >>>>>>> ```java
>> > > >>>>>>>   @Override
>> > > >>>>>>>   protected void handleCloseConsumer(CommandCloseConsumer
>> > > >>> closeConsumer)
>> > > >>>>>>> {
>> > > >>>>>>>       log.info("[{}] Broker notification of Closed consumer:
>> {}",
>> > > >>>>>>> remoteAddress, closeConsumer.getConsumerId());
>> > > >>>>>>>       final long consumerId = closeConsumer.getConsumerId();
>> > > >>>>>>>       ConsumerImpl<?> consumer = consumers.get(consumerId);
>> > > >>>>>>>       if (consumer != null) {
>> > > >>>>>>>           consumer.connectionClosed(this);
>> > > >>>>>>>       } else {
>> > > >>>>>>>           log.warn("Consumer with id {} not found while
>> closing
>> > > >>> consumer
>> > > >>>>>>> ", consumerId);
>> > > >>>>>>>       }
>> > > >>>>>>>   }
>> > > >>>>>>> ```
>> > > >>>>>>> **After**
>> > > >>>>>>> ```java
>> > > >>>>>>>   @Override
>> > > >>>>>>>   protected void handleCloseConsumer(CommandCloseConsumer
>> > > >>> closeConsumer)
>> > > >>>>>>> {
>> > > >>>>>>>       log.info("[{}] Broker notification of Closed consumer:
>> {}",
>> > > >>>>>>> remoteAddress, closeConsumer.getConsumerId());
>> > > >>>>>>>       final long consumerId = closeConsumer.getConsumerId();
>> > > >>>>>>>       ConsumerImpl<?> consumer = consumers.get(consumerId);
>> > > >>>>>>>       if (consumer != null) {
>> > > >>>>>>>           if (closeConsumer.isAllowReconnect) {
>> > > >>>>>>>               consumer.connectionClosed(this);
>> > > >>>>>>>           } else {
>> > > >>>>>>>               consumer.closeAsync();
>> > > >>>>>>>           }
>> > > >>>>>>>       } else {
>> > > >>>>>>>           log.warn("Consumer with id {} not found while
>> closing
>> > > >>> consumer
>> > > >>>>>>> ", consumerId);
>> > > >>>>>>>       }
>> > > >>>>>>>   }
>> > > >>>>>>> ```
>> > > >>>>>>>
>> > > >>>>>>>
>> > > >>>>>>> ## Reject Alternatives
>> > > >>>>>>>
>> > > >>>>>>> none.
>> > > >>>>>
>> > > >>>>>
>> > > >>>>>
>> > > >>>>> --
>> > > >>>>> Zike Yang
>> > > >>>
>> > > >>>
>> > >
>>
>

Re: [DISCUSS] PIP-139 : Support Broker send command to real close producer/consumer.

Posted by PengHui Li <pe...@apache.org>.
> Penghui, are you suggesting that we implement the namespace/tenant
terminated logic after completing this PIP?

I'm ok with both being implemented together or separated.
We need to determine the overall plan, not the implementation
at this moment.

> For the sake of discussion, if we implement the namespace terminated
logic first, we could fulfill the underlying requirements for this PIP
by returning a new non-retriable error response when a client tries to
connect a producer or a consumer to a topic in a namespace that is
"terminated". If we do add the "namespace terminated" feature, we'll
need to add a non-retriable exception for this case, anyway. The main
advantage here is that we'd only need one expansion of the protobuf
instead of two. The downside is that the protocol for connected
clients has a couple more roundtrips. The broker would disconnect
connected clients and then fail their reconnection attempt with a
non-retriable error.

Regarding the namespace "terminated" concept, I just noticed that we
already have a "deleted" field in a namespace's policies [0]. There is
even a comment that says:

Yes, if we can terminate the namespace first, we should return “namespace
not found”
if the producers and consumers try to reconnect, the client should stop
reconnecting
after getting this error. Looks like following this way, we don't need to
introduce any protocol changes anymore.

> As I think about it more, I no longer think "terminated" is the right
term for what I proposed above. Our goal is to briefly prevent any
topic creation to ensure we can delete all sub resources for a
namespace. On the other hand, a terminated topic isn't necessarily
short lived. If we want to apply the "terminated" term unequivocally
to both topics and namespaces, I think a terminated namespace would
need to be a namespace where all topics are in terminated state and no
additional topics could be created. That's not the feature we're
discussing here, though. Deleted seems like the right term to me,
especially since we're already using it to prevent a race condition

Yes, topic termination has different meanings, deletion should be the right
term here.

Thanks,
Penghui


On Fri, Feb 25, 2022 at 12:19 PM Michael Marshall <mm...@apache.org>
wrote:

> Regarding the namespace "terminated" concept, I just noticed that we
> already have a "deleted" field in a namespace's policies [0]. There is
> even a comment that says:
>
> > // set the policies to deleted so that somebody else cannot acquire this
> namespace
>
> I am not familiar with this feature, but it seems like this policy
> field could be checked before creating a topic in a namespace. That
> would remove certain races described above.
>
> As I think about it more, I no longer think "terminated" is the right
> term for what I proposed above. Our goal is to briefly prevent any
> topic creation to ensure we can delete all sub resources for a
> namespace. On the other hand, a terminated topic isn't necessarily
> short lived. If we want to apply the "terminated" term unequivocally
> to both topics and namespaces, I think a terminated namespace would
> need to be a namespace where all topics are in terminated state and no
> additional topics could be created. That's not the feature we're
> discussing here, though. Deleted seems like the right term to me,
> especially since we're already using it to prevent a race condition
> [0].
>
> Thanks,
> Michael
>
> [0]
> https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/impl/NamespacesBase.java#L252-L262
>
> On Thu, Feb 24, 2022 at 9:58 PM Michael Marshall <mm...@apache.org>
> wrote:
> >
> > Hi Dave,
> >
> > > automatically delete tenants and namespaces for not containing topics
> >
> > I don't think that is what we are discussing. I agree that the initial
> > email says just that, though, which is why I asked above:
> >
> > >> When there are no user-created topics under a namespace,
> > >> Namespace should be deleted.
> > > This is supposed to mean that the namespace should be able to be
> > > deleted, correct?
> >
> > Perhaps Mattison can clarify, too.
> >
> > My current understanding of the context for the PIP is that a user
> > call to delete a namespace without force can fail when a producer
> > reconnects to a deleted topic. The goal is to remove the race
> > condition to ensure namespace deletion can succeed.
> >
> > Thanks,
> > Michael
> >
> > On Thu, Feb 24, 2022 at 5:21 PM Dave Fisher <wa...@apache.org> wrote:
> > >
> > > Hi -
> > >
> > > I hope I’m understanding what’s being discussed.
> > >
> > > If we are going to automatically delete tenants and namespaces for not
> containing topics then we need to make both of these automatic actions
> configurable with a default to NOT do so. Otherwise we break existing use
> cases.
> > >
> > > Automatic deletion of namespaces should be configurable at both the
> cluster and tenant level.
> > >
> > > Regards,
> > > Dave
> > >
> > > > On Feb 24, 2022, at 2:25 PM, Michael Marshall <mm...@apache.org>
> wrote:
> > > >
> > > >> The old producer/consumer should be closed after applying the
> changes from
> > > >> this proposal.
> > > >
> > > > Penghui, are you suggesting that we implement the namespace/tenant
> > > > terminated logic after completing this PIP?
> > > >
> > > > For the sake of discussion, if we implement the namespace terminated
> > > > logic first, we could fulfill the underlying requirements for this
> PIP
> > > > by returning a new non-retriable error response when a client tries
> to
> > > > connect a producer or a consumer to a topic in a namespace that is
> > > > "terminated". If we do add the "namespace terminated" feature, we'll
> > > > need to add a non-retriable exception for this case, anyway. The main
> > > > advantage here is that we'd only need one expansion of the protobuf
> > > > instead of two. The downside is that the protocol for connected
> > > > clients has a couple more roundtrips. The broker would disconnect
> > > > connected clients and then fail their reconnection attempt with a
> > > > non-retriable error.
> > > >
> > > > Thanks,
> > > > Michael
> > > >
> > > > On Thu, Feb 24, 2022 at 7:11 AM PengHui Li <pe...@apache.org>
> wrote:
> > > >>
> > > >>> If we want to solve this problem, we need to add some sync
> resources like
> > > >> lock/state, I think it’s a harm for us, we don’t need to do that.
> > > >>
> > > >> I think we can make the namespace/tenants to the inactive state
> first so
> > > >> that we can avoid any new
> > > >> producer/consumer connect to the topic under the namespace/tenant.
> > > >>
> > > >> The old producer/consumer should be closed after applying the
> changes from
> > > >> this proposal.
> > > >>
> > > >> Thanks,
> > > >> Penghui
> > > >>
> > > >> On Tue, Feb 8, 2022 at 5:47 PM mattison chao <
> mattisonchao@gmail.com> wrote:
> > > >>
> > > >>>> This is supposed to mean that the namespace should be able to be
> > > >>>> deleted, correct?
> > > >>>
> > > >>> Yes, the main background is the user doesn’t have an active topic.
> so,
> > > >>> they want to delete the namespace.
> > > >>>
> > > >>>> However, I think
> > > >>>> we might still have a race condition that could make tenant or
> > > >>>> namespace deletion fail. Specifically, if a new producer or
> consumer
> > > >>>> creates a topic after the namespace deletion has started but
> > > >>>> before it is complete. Do you agree that the underlying race still
> > > >>> exists?
> > > >>>
> > > >>> Yes, this condition exists. I think it’s not a big problem because
> the
> > > >>> user doesn’t want to use this namespace anymore.
> > > >>> If this scenario appears, they will get an error and need to
> delete it
> > > >>> again.
> > > >>>
> > > >>>> What if we expand our usage of the "terminated" feature to apply
> to
> > > >>>> namespaces (and tenants)? Then, a terminated namespace can have
> > > >>>> bundles and topics can be deleted but not created (just as a
> terminated
> > > >>>> topic cannot have any new messages published to it). This would
> take
> > > >>>> care of all topic creation race conditions. We'd probably need to
> add
> > > >>>> new protobuf exceptions for this feature.
> > > >>>
> > > >>>
> > > >>> If we want to solve this problem, we need to add some sync
> resources like
> > > >>> lock/state, I think it’s a harm for us, we don’t need to do that.
> > > >>>
> > > >>> Thanks for your suggestions, let me know what you think.
> > > >>>
> > > >>> Best,
> > > >>> Mattison
> > > >>>
> > > >>>> On Feb 1, 2022, at 2:26 PM, Michael Marshall <
> mmarshall@apache.org>
> > > >>> wrote:
> > > >>>>
> > > >>>> This proposal identifies an important issue that we should
> definitely
> > > >>>> solve. I have some questions.
> > > >>>>
> > > >>>>> When there are no user-created topics under a namespace,
> > > >>>>> Namespace should be deleted.
> > > >>>>
> > > >>>> This is supposed to mean that the namespace should be able to be
> > > >>>> deleted, correct?
> > > >>>>
> > > >>>>> For this reason, we need to close the system topic
> reader/producer
> > > >>>>> first, then remove the system topic. finally, remove the
> namespace.
> > > >>>>
> > > >>>> I agree that expanding the protobuf CloseProducer and
> CloseConsumer
> > > >>>> commands could be valuable here. The expansion would ensure that
> > > >>>> producers and consumers don't attempt to reconnect. However, I
> think
> > > >>>> we might still have a race condition that could make tenant or
> > > >>>> namespace deletion fail. Specifically, if a new producer or
> consumer
> > > >>>> creates a topic after the namespace deletion has started but
> > > >>>> before it is complete. Do you agree that the underlying race still
> > > >>> exists?
> > > >>>>
> > > >>>> In my view, the fundamental problem here is that deleting certain
> Pulsar
> > > >>>> resources takes time and, in a distributed system, that means race
> > > >>>> conditions.
> > > >>>>
> > > >>>> What if we expand our usage of the "terminated" feature to apply
> to
> > > >>>> namespaces (and tenants)? Then, a terminated namespace can have
> > > >>>> bundles and topics can be deleted but not created (just as a
> terminated
> > > >>>> topic cannot have any new messages published to it). This would
> take
> > > >>>> care of all topic creation race conditions. We'd probably need to
> add
> > > >>>> new protobuf exceptions for this feature.
> > > >>>>
> > > >>>> Thanks,
> > > >>>> Michael
> > > >>>>
> > > >>>>
> > > >>>> On Sat, Jan 29, 2022 at 7:25 PM Zike Yang
> > > >>>> <zk...@streamnative.io.invalid> wrote:
> > > >>>>>
> > > >>>>> +1
> > > >>>>>
> > > >>>>>
> > > >>>>> Thanks,
> > > >>>>> Zike
> > > >>>>>
> > > >>>>> On Sat, Jan 29, 2022 at 12:30 PM guo jiwei <technoboy@apache.org
> >
> > > >>> wrote:
> > > >>>>>>
> > > >>>>>> Hi
> > > >>>>>> The PIP link : https://github.com/apache/pulsar/issues/13989
> > > >>>>>>
> > > >>>>>> Regards
> > > >>>>>> Jiwei Guo (Tboy)
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On Sat, Jan 29, 2022 at 11:46 AM mattison chao <
> mattisonchao@gmail.com
> > > >>>>
> > > >>>>>> wrote:
> > > >>>>>>
> > > >>>>>>> Hello everyone,
> > > >>>>>>>
> > > >>>>>>> I want to start a discussion about PIP-139 : Support Broker
> send
> > > >>> command
> > > >>>>>>> to real close producer/consumer.
> > > >>>>>>>
> > > >>>>>>> This is the PIP document
> > > >>>>>>>
> > > >>>>>>> https://github.com/apache/pulsar/issues/13989 <
> > > >>>>>>> https://github.com/apache/pulsar/issues/13979>
> > > >>>>>>>
> > > >>>>>>> Please check it out and feel free to share your thoughts.
> > > >>>>>>>
> > > >>>>>>> Best,
> > > >>>>>>> Mattison
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> ———————— Pasted below for quoting convenience.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> Relation pull request:  #13337
> > > >>>>>>> Authors: @Technoboy-  @mattisonchao
> > > >>>>>>>
> > > >>>>>>> ## Motivation
> > > >>>>>>>
> > > >>>>>>> Before we discuss this pip, I'd like to supplement some
> context to
> > > >>> help
> > > >>>>>>> contributors who don't want to read the original pull request.
> > > >>>>>>>
> > > >>>>>>>> When there are no user-created topics under a namespace,
> Namespace
> > > >>>>>>> should be deleted. But currently, the system topic existed and
> the
> > > >>>>>>> reader/producer could auto-create the system which may cause
> the
> > > >>> namespace
> > > >>>>>>> deletion to fail.
> > > >>>>>>>
> > > >>>>>>> For this reason, we need to close the system topic
> reader/producer
> > > >>> first,
> > > >>>>>>> then remove the system topic. finally, remove the namespace.
> > > >>>>>>>
> > > >>>>>>> Following this way, we first want to use ``terminate`` to
> solve this
> > > >>>>>>> problem. then we found producers can disconnect, but consumers
> are
> > > >>> still
> > > >>>>>>> alive. So, another PR #13960 wants to add consumers' closing
> logic.
> > > >>>>>>>
> > > >>>>>>> After #13960, all things look good, but another problem
> appears. that
> > > >>> is
> > > >>>>>>> we need to wait until consumers completely consume all
> messages (this
> > > >>> may
> > > >>>>>>> make terminate topic so long and the operation timeout)then get
> > > >>>>>>> ``reachedEndOfTopic``. the relative code here :
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>
> https://github.com/apache/pulsar/blob/07ef9231db8b844586b9217ee2d59237eb9c54b7/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/CompactorSubscription.java#L102-L106
> > > >>>>>>>
> > > >>>>>>> In the #13337 case, we need to force close consumers
> immediately. So
> > > >>> we
> > > >>>>>>> write this PIP to discuss another way to resolve this problem.
> > > >>>>>>>
> > > >>>>>>> ## Goal
> > > >>>>>>>
> > > >>>>>>> We can add a new field(`allow_reconnect`) in command
> > > >>>>>>> ``CommandCloseProducer``/ ``CommandCloseConsumer`` to close
> > > >>>>>>> producer/consumers immediately.
> > > >>>>>>>
> > > >>>>>>> ## API Changes
> > > >>>>>>>
> > > >>>>>>> - Add ``allow_reconnect`` to ``CommandCloseProducer``;
> > > >>>>>>>
> > > >>>>>>> ```java
> > > >>>>>>> **Before**
> > > >>>>>>>
> > > >>>>>>> message CommandCloseProducer {
> > > >>>>>>>   required uint64 producer_id = 1;
> > > >>>>>>>   required uint64 request_id = 2;
> > > >>>>>>> }
> > > >>>>>>>
> > > >>>>>>> **After**
> > > >>>>>>> message CommandCloseProducer {
> > > >>>>>>>   required uint64 producer_id = 1;
> > > >>>>>>>   required uint64 request_id = 2;
> > > >>>>>>>   optional bool allow_reconnect = 3 [default = true];
> > > >>>>>>> }
> > > >>>>>>> ```
> > > >>>>>>>
> > > >>>>>>> - Add ``allow_reconnect`` to ``CommandCloseConsumer``
> > > >>>>>>> ```java
> > > >>>>>>> **Before**
> > > >>>>>>>
> > > >>>>>>> message CommandCloseConsumer {
> > > >>>>>>>   required uint64 consumer_id = 1;
> > > >>>>>>>   required uint64 request_id = 2;
> > > >>>>>>> }
> > > >>>>>>>
> > > >>>>>>> **After**
> > > >>>>>>> message CommandCloseConsumer {
> > > >>>>>>>   required uint64 consumer_id = 1;
> > > >>>>>>>   required uint64 request_id = 2;
> > > >>>>>>>   optional bool allow_reconnect = 3 [default = true];
> > > >>>>>>> }
> > > >>>>>>> ```
> > > >>>>>>>
> > > >>>>>>> ## Implementation
> > > >>>>>>>
> > > >>>>>>> ### ClientCnx - Producer:
> > > >>>>>>>
> > > >>>>>>> **Before**
> > > >>>>>>> ```java
> > > >>>>>>>   @Override
> > > >>>>>>>   protected void handleCloseProducer(CommandCloseProducer
> > > >>> closeProducer)
> > > >>>>>>> {
> > > >>>>>>>       log.info("[{}] Broker notification of Closed producer:
> {}",
> > > >>>>>>> remoteAddress, closeProducer.getProducerId());
> > > >>>>>>>       final long producerId = closeProducer.getProducerId();
> > > >>>>>>>       ProducerImpl<?> producer = producers.get(producerId);
> > > >>>>>>>       if (producer != null) {
> > > >>>>>>>           producer.connectionClosed(this);
> > > >>>>>>>       } else {
> > > >>>>>>>           log.warn("Producer with id {} not found while closing
> > > >>> producer
> > > >>>>>>> ", producerId);
> > > >>>>>>>       }
> > > >>>>>>>   }
> > > >>>>>>> ```
> > > >>>>>>> After:
> > > >>>>>>> ```java
> > > >>>>>>>  @Override
> > > >>>>>>>   protected void handleCloseProducer(CommandCloseProducer
> > > >>> closeProducer)
> > > >>>>>>> {
> > > >>>>>>>       log.info("[{}] Broker notification of Closed producer:
> {}",
> > > >>>>>>> remoteAddress, closeProducer.getProducerId());
> > > >>>>>>>       final long producerId = closeProducer.getProducerId();
> > > >>>>>>>       ProducerImpl<?> producer = producers.get(producerId);
> > > >>>>>>>       if (producer != null) {
> > > >>>>>>>           if (closeProducer.isAllowReconnect) {
> > > >>>>>>>               producer.connectionClosed(this);
> > > >>>>>>>           } else {
> > > >>>>>>>               producer.closeAsync();
> > > >>>>>>>           }
> > > >>>>>>>       } else {
> > > >>>>>>>           log.warn("Producer with id {} not found while closing
> > > >>> producer
> > > >>>>>>> ", producerId);
> > > >>>>>>>       }
> > > >>>>>>>   }
> > > >>>>>>> ```
> > > >>>>>>> ### ClientCnx - Consumer:
> > > >>>>>>>
> > > >>>>>>> **Before**
> > > >>>>>>> ```java
> > > >>>>>>>   @Override
> > > >>>>>>>   protected void handleCloseConsumer(CommandCloseConsumer
> > > >>> closeConsumer)
> > > >>>>>>> {
> > > >>>>>>>       log.info("[{}] Broker notification of Closed consumer:
> {}",
> > > >>>>>>> remoteAddress, closeConsumer.getConsumerId());
> > > >>>>>>>       final long consumerId = closeConsumer.getConsumerId();
> > > >>>>>>>       ConsumerImpl<?> consumer = consumers.get(consumerId);
> > > >>>>>>>       if (consumer != null) {
> > > >>>>>>>           consumer.connectionClosed(this);
> > > >>>>>>>       } else {
> > > >>>>>>>           log.warn("Consumer with id {} not found while closing
> > > >>> consumer
> > > >>>>>>> ", consumerId);
> > > >>>>>>>       }
> > > >>>>>>>   }
> > > >>>>>>> ```
> > > >>>>>>> **After**
> > > >>>>>>> ```java
> > > >>>>>>>   @Override
> > > >>>>>>>   protected void handleCloseConsumer(CommandCloseConsumer
> > > >>> closeConsumer)
> > > >>>>>>> {
> > > >>>>>>>       log.info("[{}] Broker notification of Closed consumer:
> {}",
> > > >>>>>>> remoteAddress, closeConsumer.getConsumerId());
> > > >>>>>>>       final long consumerId = closeConsumer.getConsumerId();
> > > >>>>>>>       ConsumerImpl<?> consumer = consumers.get(consumerId);
> > > >>>>>>>       if (consumer != null) {
> > > >>>>>>>           if (closeConsumer.isAllowReconnect) {
> > > >>>>>>>               consumer.connectionClosed(this);
> > > >>>>>>>           } else {
> > > >>>>>>>               consumer.closeAsync();
> > > >>>>>>>           }
> > > >>>>>>>       } else {
> > > >>>>>>>           log.warn("Consumer with id {} not found while closing
> > > >>> consumer
> > > >>>>>>> ", consumerId);
> > > >>>>>>>       }
> > > >>>>>>>   }
> > > >>>>>>> ```
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> ## Reject Alternatives
> > > >>>>>>>
> > > >>>>>>> none.
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> --
> > > >>>>> Zike Yang
> > > >>>
> > > >>>
> > >
>

Re: [DISCUSS] PIP-139 : Support Broker send command to real close producer/consumer.

Posted by Michael Marshall <mm...@apache.org>.
Regarding the namespace "terminated" concept, I just noticed that we
already have a "deleted" field in a namespace's policies [0]. There is
even a comment that says:

> // set the policies to deleted so that somebody else cannot acquire this namespace

I am not familiar with this feature, but it seems like this policy
field could be checked before creating a topic in a namespace. That
would remove certain races described above.

As I think about it more, I no longer think "terminated" is the right
term for what I proposed above. Our goal is to briefly prevent any
topic creation to ensure we can delete all sub resources for a
namespace. On the other hand, a terminated topic isn't necessarily
short lived. If we want to apply the "terminated" term unequivocally
to both topics and namespaces, I think a terminated namespace would
need to be a namespace where all topics are in terminated state and no
additional topics could be created. That's not the feature we're
discussing here, though. Deleted seems like the right term to me,
especially since we're already using it to prevent a race condition
[0].

Thanks,
Michael

[0] https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/impl/NamespacesBase.java#L252-L262

On Thu, Feb 24, 2022 at 9:58 PM Michael Marshall <mm...@apache.org> wrote:
>
> Hi Dave,
>
> > automatically delete tenants and namespaces for not containing topics
>
> I don't think that is what we are discussing. I agree that the initial
> email says just that, though, which is why I asked above:
>
> >> When there are no user-created topics under a namespace,
> >> Namespace should be deleted.
> > This is supposed to mean that the namespace should be able to be
> > deleted, correct?
>
> Perhaps Mattison can clarify, too.
>
> My current understanding of the context for the PIP is that a user
> call to delete a namespace without force can fail when a producer
> reconnects to a deleted topic. The goal is to remove the race
> condition to ensure namespace deletion can succeed.
>
> Thanks,
> Michael
>
> On Thu, Feb 24, 2022 at 5:21 PM Dave Fisher <wa...@apache.org> wrote:
> >
> > Hi -
> >
> > I hope I’m understanding what’s being discussed.
> >
> > If we are going to automatically delete tenants and namespaces for not containing topics then we need to make both of these automatic actions configurable with a default to NOT do so. Otherwise we break existing use cases.
> >
> > Automatic deletion of namespaces should be configurable at both the cluster and tenant level.
> >
> > Regards,
> > Dave
> >
> > > On Feb 24, 2022, at 2:25 PM, Michael Marshall <mm...@apache.org> wrote:
> > >
> > >> The old producer/consumer should be closed after applying the changes from
> > >> this proposal.
> > >
> > > Penghui, are you suggesting that we implement the namespace/tenant
> > > terminated logic after completing this PIP?
> > >
> > > For the sake of discussion, if we implement the namespace terminated
> > > logic first, we could fulfill the underlying requirements for this PIP
> > > by returning a new non-retriable error response when a client tries to
> > > connect a producer or a consumer to a topic in a namespace that is
> > > "terminated". If we do add the "namespace terminated" feature, we'll
> > > need to add a non-retriable exception for this case, anyway. The main
> > > advantage here is that we'd only need one expansion of the protobuf
> > > instead of two. The downside is that the protocol for connected
> > > clients has a couple more roundtrips. The broker would disconnect
> > > connected clients and then fail their reconnection attempt with a
> > > non-retriable error.
> > >
> > > Thanks,
> > > Michael
> > >
> > > On Thu, Feb 24, 2022 at 7:11 AM PengHui Li <pe...@apache.org> wrote:
> > >>
> > >>> If we want to solve this problem, we need to add some sync resources like
> > >> lock/state, I think it’s a harm for us, we don’t need to do that.
> > >>
> > >> I think we can make the namespace/tenants to the inactive state first so
> > >> that we can avoid any new
> > >> producer/consumer connect to the topic under the namespace/tenant.
> > >>
> > >> The old producer/consumer should be closed after applying the changes from
> > >> this proposal.
> > >>
> > >> Thanks,
> > >> Penghui
> > >>
> > >> On Tue, Feb 8, 2022 at 5:47 PM mattison chao <ma...@gmail.com> wrote:
> > >>
> > >>>> This is supposed to mean that the namespace should be able to be
> > >>>> deleted, correct?
> > >>>
> > >>> Yes, the main background is the user doesn’t have an active topic. so,
> > >>> they want to delete the namespace.
> > >>>
> > >>>> However, I think
> > >>>> we might still have a race condition that could make tenant or
> > >>>> namespace deletion fail. Specifically, if a new producer or consumer
> > >>>> creates a topic after the namespace deletion has started but
> > >>>> before it is complete. Do you agree that the underlying race still
> > >>> exists?
> > >>>
> > >>> Yes, this condition exists. I think it’s not a big problem because the
> > >>> user doesn’t want to use this namespace anymore.
> > >>> If this scenario appears, they will get an error and need to delete it
> > >>> again.
> > >>>
> > >>>> What if we expand our usage of the "terminated" feature to apply to
> > >>>> namespaces (and tenants)? Then, a terminated namespace can have
> > >>>> bundles and topics can be deleted but not created (just as a terminated
> > >>>> topic cannot have any new messages published to it). This would take
> > >>>> care of all topic creation race conditions. We'd probably need to add
> > >>>> new protobuf exceptions for this feature.
> > >>>
> > >>>
> > >>> If we want to solve this problem, we need to add some sync resources like
> > >>> lock/state, I think it’s a harm for us, we don’t need to do that.
> > >>>
> > >>> Thanks for your suggestions, let me know what you think.
> > >>>
> > >>> Best,
> > >>> Mattison
> > >>>
> > >>>> On Feb 1, 2022, at 2:26 PM, Michael Marshall <mm...@apache.org>
> > >>> wrote:
> > >>>>
> > >>>> This proposal identifies an important issue that we should definitely
> > >>>> solve. I have some questions.
> > >>>>
> > >>>>> When there are no user-created topics under a namespace,
> > >>>>> Namespace should be deleted.
> > >>>>
> > >>>> This is supposed to mean that the namespace should be able to be
> > >>>> deleted, correct?
> > >>>>
> > >>>>> For this reason, we need to close the system topic reader/producer
> > >>>>> first, then remove the system topic. finally, remove the namespace.
> > >>>>
> > >>>> I agree that expanding the protobuf CloseProducer and CloseConsumer
> > >>>> commands could be valuable here. The expansion would ensure that
> > >>>> producers and consumers don't attempt to reconnect. However, I think
> > >>>> we might still have a race condition that could make tenant or
> > >>>> namespace deletion fail. Specifically, if a new producer or consumer
> > >>>> creates a topic after the namespace deletion has started but
> > >>>> before it is complete. Do you agree that the underlying race still
> > >>> exists?
> > >>>>
> > >>>> In my view, the fundamental problem here is that deleting certain Pulsar
> > >>>> resources takes time and, in a distributed system, that means race
> > >>>> conditions.
> > >>>>
> > >>>> What if we expand our usage of the "terminated" feature to apply to
> > >>>> namespaces (and tenants)? Then, a terminated namespace can have
> > >>>> bundles and topics can be deleted but not created (just as a terminated
> > >>>> topic cannot have any new messages published to it). This would take
> > >>>> care of all topic creation race conditions. We'd probably need to add
> > >>>> new protobuf exceptions for this feature.
> > >>>>
> > >>>> Thanks,
> > >>>> Michael
> > >>>>
> > >>>>
> > >>>> On Sat, Jan 29, 2022 at 7:25 PM Zike Yang
> > >>>> <zk...@streamnative.io.invalid> wrote:
> > >>>>>
> > >>>>> +1
> > >>>>>
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Zike
> > >>>>>
> > >>>>> On Sat, Jan 29, 2022 at 12:30 PM guo jiwei <te...@apache.org>
> > >>> wrote:
> > >>>>>>
> > >>>>>> Hi
> > >>>>>> The PIP link : https://github.com/apache/pulsar/issues/13989
> > >>>>>>
> > >>>>>> Regards
> > >>>>>> Jiwei Guo (Tboy)
> > >>>>>>
> > >>>>>>
> > >>>>>> On Sat, Jan 29, 2022 at 11:46 AM mattison chao <mattisonchao@gmail.com
> > >>>>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> Hello everyone,
> > >>>>>>>
> > >>>>>>> I want to start a discussion about PIP-139 : Support Broker send
> > >>> command
> > >>>>>>> to real close producer/consumer.
> > >>>>>>>
> > >>>>>>> This is the PIP document
> > >>>>>>>
> > >>>>>>> https://github.com/apache/pulsar/issues/13989 <
> > >>>>>>> https://github.com/apache/pulsar/issues/13979>
> > >>>>>>>
> > >>>>>>> Please check it out and feel free to share your thoughts.
> > >>>>>>>
> > >>>>>>> Best,
> > >>>>>>> Mattison
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> ———————— Pasted below for quoting convenience.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Relation pull request:  #13337
> > >>>>>>> Authors: @Technoboy-  @mattisonchao
> > >>>>>>>
> > >>>>>>> ## Motivation
> > >>>>>>>
> > >>>>>>> Before we discuss this pip, I'd like to supplement some context to
> > >>> help
> > >>>>>>> contributors who don't want to read the original pull request.
> > >>>>>>>
> > >>>>>>>> When there are no user-created topics under a namespace, Namespace
> > >>>>>>> should be deleted. But currently, the system topic existed and the
> > >>>>>>> reader/producer could auto-create the system which may cause the
> > >>> namespace
> > >>>>>>> deletion to fail.
> > >>>>>>>
> > >>>>>>> For this reason, we need to close the system topic reader/producer
> > >>> first,
> > >>>>>>> then remove the system topic. finally, remove the namespace.
> > >>>>>>>
> > >>>>>>> Following this way, we first want to use ``terminate`` to solve this
> > >>>>>>> problem. then we found producers can disconnect, but consumers are
> > >>> still
> > >>>>>>> alive. So, another PR #13960 wants to add consumers' closing logic.
> > >>>>>>>
> > >>>>>>> After #13960, all things look good, but another problem appears. that
> > >>> is
> > >>>>>>> we need to wait until consumers completely consume all messages (this
> > >>> may
> > >>>>>>> make terminate topic so long and the operation timeout)then get
> > >>>>>>> ``reachedEndOfTopic``. the relative code here :
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>> https://github.com/apache/pulsar/blob/07ef9231db8b844586b9217ee2d59237eb9c54b7/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/CompactorSubscription.java#L102-L106
> > >>>>>>>
> > >>>>>>> In the #13337 case, we need to force close consumers immediately. So
> > >>> we
> > >>>>>>> write this PIP to discuss another way to resolve this problem.
> > >>>>>>>
> > >>>>>>> ## Goal
> > >>>>>>>
> > >>>>>>> We can add a new field(`allow_reconnect`) in command
> > >>>>>>> ``CommandCloseProducer``/ ``CommandCloseConsumer`` to close
> > >>>>>>> producer/consumers immediately.
> > >>>>>>>
> > >>>>>>> ## API Changes
> > >>>>>>>
> > >>>>>>> - Add ``allow_reconnect`` to ``CommandCloseProducer``;
> > >>>>>>>
> > >>>>>>> ```java
> > >>>>>>> **Before**
> > >>>>>>>
> > >>>>>>> message CommandCloseProducer {
> > >>>>>>>   required uint64 producer_id = 1;
> > >>>>>>>   required uint64 request_id = 2;
> > >>>>>>> }
> > >>>>>>>
> > >>>>>>> **After**
> > >>>>>>> message CommandCloseProducer {
> > >>>>>>>   required uint64 producer_id = 1;
> > >>>>>>>   required uint64 request_id = 2;
> > >>>>>>>   optional bool allow_reconnect = 3 [default = true];
> > >>>>>>> }
> > >>>>>>> ```
> > >>>>>>>
> > >>>>>>> - Add ``allow_reconnect`` to ``CommandCloseConsumer``
> > >>>>>>> ```java
> > >>>>>>> **Before**
> > >>>>>>>
> > >>>>>>> message CommandCloseConsumer {
> > >>>>>>>   required uint64 consumer_id = 1;
> > >>>>>>>   required uint64 request_id = 2;
> > >>>>>>> }
> > >>>>>>>
> > >>>>>>> **After**
> > >>>>>>> message CommandCloseConsumer {
> > >>>>>>>   required uint64 consumer_id = 1;
> > >>>>>>>   required uint64 request_id = 2;
> > >>>>>>>   optional bool allow_reconnect = 3 [default = true];
> > >>>>>>> }
> > >>>>>>> ```
> > >>>>>>>
> > >>>>>>> ## Implementation
> > >>>>>>>
> > >>>>>>> ### ClientCnx - Producer:
> > >>>>>>>
> > >>>>>>> **Before**
> > >>>>>>> ```java
> > >>>>>>>   @Override
> > >>>>>>>   protected void handleCloseProducer(CommandCloseProducer
> > >>> closeProducer)
> > >>>>>>> {
> > >>>>>>>       log.info("[{}] Broker notification of Closed producer: {}",
> > >>>>>>> remoteAddress, closeProducer.getProducerId());
> > >>>>>>>       final long producerId = closeProducer.getProducerId();
> > >>>>>>>       ProducerImpl<?> producer = producers.get(producerId);
> > >>>>>>>       if (producer != null) {
> > >>>>>>>           producer.connectionClosed(this);
> > >>>>>>>       } else {
> > >>>>>>>           log.warn("Producer with id {} not found while closing
> > >>> producer
> > >>>>>>> ", producerId);
> > >>>>>>>       }
> > >>>>>>>   }
> > >>>>>>> ```
> > >>>>>>> After:
> > >>>>>>> ```java
> > >>>>>>>  @Override
> > >>>>>>>   protected void handleCloseProducer(CommandCloseProducer
> > >>> closeProducer)
> > >>>>>>> {
> > >>>>>>>       log.info("[{}] Broker notification of Closed producer: {}",
> > >>>>>>> remoteAddress, closeProducer.getProducerId());
> > >>>>>>>       final long producerId = closeProducer.getProducerId();
> > >>>>>>>       ProducerImpl<?> producer = producers.get(producerId);
> > >>>>>>>       if (producer != null) {
> > >>>>>>>           if (closeProducer.isAllowReconnect) {
> > >>>>>>>               producer.connectionClosed(this);
> > >>>>>>>           } else {
> > >>>>>>>               producer.closeAsync();
> > >>>>>>>           }
> > >>>>>>>       } else {
> > >>>>>>>           log.warn("Producer with id {} not found while closing
> > >>> producer
> > >>>>>>> ", producerId);
> > >>>>>>>       }
> > >>>>>>>   }
> > >>>>>>> ```
> > >>>>>>> ### ClientCnx - Consumer:
> > >>>>>>>
> > >>>>>>> **Before**
> > >>>>>>> ```java
> > >>>>>>>   @Override
> > >>>>>>>   protected void handleCloseConsumer(CommandCloseConsumer
> > >>> closeConsumer)
> > >>>>>>> {
> > >>>>>>>       log.info("[{}] Broker notification of Closed consumer: {}",
> > >>>>>>> remoteAddress, closeConsumer.getConsumerId());
> > >>>>>>>       final long consumerId = closeConsumer.getConsumerId();
> > >>>>>>>       ConsumerImpl<?> consumer = consumers.get(consumerId);
> > >>>>>>>       if (consumer != null) {
> > >>>>>>>           consumer.connectionClosed(this);
> > >>>>>>>       } else {
> > >>>>>>>           log.warn("Consumer with id {} not found while closing
> > >>> consumer
> > >>>>>>> ", consumerId);
> > >>>>>>>       }
> > >>>>>>>   }
> > >>>>>>> ```
> > >>>>>>> **After**
> > >>>>>>> ```java
> > >>>>>>>   @Override
> > >>>>>>>   protected void handleCloseConsumer(CommandCloseConsumer
> > >>> closeConsumer)
> > >>>>>>> {
> > >>>>>>>       log.info("[{}] Broker notification of Closed consumer: {}",
> > >>>>>>> remoteAddress, closeConsumer.getConsumerId());
> > >>>>>>>       final long consumerId = closeConsumer.getConsumerId();
> > >>>>>>>       ConsumerImpl<?> consumer = consumers.get(consumerId);
> > >>>>>>>       if (consumer != null) {
> > >>>>>>>           if (closeConsumer.isAllowReconnect) {
> > >>>>>>>               consumer.connectionClosed(this);
> > >>>>>>>           } else {
> > >>>>>>>               consumer.closeAsync();
> > >>>>>>>           }
> > >>>>>>>       } else {
> > >>>>>>>           log.warn("Consumer with id {} not found while closing
> > >>> consumer
> > >>>>>>> ", consumerId);
> > >>>>>>>       }
> > >>>>>>>   }
> > >>>>>>> ```
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> ## Reject Alternatives
> > >>>>>>>
> > >>>>>>> none.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> Zike Yang
> > >>>
> > >>>
> >

Re: [DISCUSS] PIP-139 : Support Broker send command to real close producer/consumer.

Posted by Michael Marshall <mm...@apache.org>.
Hi Dave,

> automatically delete tenants and namespaces for not containing topics

I don't think that is what we are discussing. I agree that the initial
email says just that, though, which is why I asked above:

>> When there are no user-created topics under a namespace,
>> Namespace should be deleted.
> This is supposed to mean that the namespace should be able to be
> deleted, correct?

Perhaps Mattison can clarify, too.

My current understanding of the context for the PIP is that a user
call to delete a namespace without force can fail when a producer
reconnects to a deleted topic. The goal is to remove the race
condition to ensure namespace deletion can succeed.

Thanks,
Michael

On Thu, Feb 24, 2022 at 5:21 PM Dave Fisher <wa...@apache.org> wrote:
>
> Hi -
>
> I hope I’m understanding what’s being discussed.
>
> If we are going to automatically delete tenants and namespaces for not containing topics then we need to make both of these automatic actions configurable with a default to NOT do so. Otherwise we break existing use cases.
>
> Automatic deletion of namespaces should be configurable at both the cluster and tenant level.
>
> Regards,
> Dave
>
> > On Feb 24, 2022, at 2:25 PM, Michael Marshall <mm...@apache.org> wrote:
> >
> >> The old producer/consumer should be closed after applying the changes from
> >> this proposal.
> >
> > Penghui, are you suggesting that we implement the namespace/tenant
> > terminated logic after completing this PIP?
> >
> > For the sake of discussion, if we implement the namespace terminated
> > logic first, we could fulfill the underlying requirements for this PIP
> > by returning a new non-retriable error response when a client tries to
> > connect a producer or a consumer to a topic in a namespace that is
> > "terminated". If we do add the "namespace terminated" feature, we'll
> > need to add a non-retriable exception for this case, anyway. The main
> > advantage here is that we'd only need one expansion of the protobuf
> > instead of two. The downside is that the protocol for connected
> > clients has a couple more roundtrips. The broker would disconnect
> > connected clients and then fail their reconnection attempt with a
> > non-retriable error.
> >
> > Thanks,
> > Michael
> >
> > On Thu, Feb 24, 2022 at 7:11 AM PengHui Li <pe...@apache.org> wrote:
> >>
> >>> If we want to solve this problem, we need to add some sync resources like
> >> lock/state, I think it’s a harm for us, we don’t need to do that.
> >>
> >> I think we can make the namespace/tenants to the inactive state first so
> >> that we can avoid any new
> >> producer/consumer connect to the topic under the namespace/tenant.
> >>
> >> The old producer/consumer should be closed after applying the changes from
> >> this proposal.
> >>
> >> Thanks,
> >> Penghui
> >>
> >> On Tue, Feb 8, 2022 at 5:47 PM mattison chao <ma...@gmail.com> wrote:
> >>
> >>>> This is supposed to mean that the namespace should be able to be
> >>>> deleted, correct?
> >>>
> >>> Yes, the main background is the user doesn’t have an active topic. so,
> >>> they want to delete the namespace.
> >>>
> >>>> However, I think
> >>>> we might still have a race condition that could make tenant or
> >>>> namespace deletion fail. Specifically, if a new producer or consumer
> >>>> creates a topic after the namespace deletion has started but
> >>>> before it is complete. Do you agree that the underlying race still
> >>> exists?
> >>>
> >>> Yes, this condition exists. I think it’s not a big problem because the
> >>> user doesn’t want to use this namespace anymore.
> >>> If this scenario appears, they will get an error and need to delete it
> >>> again.
> >>>
> >>>> What if we expand our usage of the "terminated" feature to apply to
> >>>> namespaces (and tenants)? Then, a terminated namespace can have
> >>>> bundles and topics can be deleted but not created (just as a terminated
> >>>> topic cannot have any new messages published to it). This would take
> >>>> care of all topic creation race conditions. We'd probably need to add
> >>>> new protobuf exceptions for this feature.
> >>>
> >>>
> >>> If we want to solve this problem, we need to add some sync resources like
> >>> lock/state, I think it’s a harm for us, we don’t need to do that.
> >>>
> >>> Thanks for your suggestions, let me know what you think.
> >>>
> >>> Best,
> >>> Mattison
> >>>
> >>>> On Feb 1, 2022, at 2:26 PM, Michael Marshall <mm...@apache.org>
> >>> wrote:
> >>>>
> >>>> This proposal identifies an important issue that we should definitely
> >>>> solve. I have some questions.
> >>>>
> >>>>> When there are no user-created topics under a namespace,
> >>>>> Namespace should be deleted.
> >>>>
> >>>> This is supposed to mean that the namespace should be able to be
> >>>> deleted, correct?
> >>>>
> >>>>> For this reason, we need to close the system topic reader/producer
> >>>>> first, then remove the system topic. finally, remove the namespace.
> >>>>
> >>>> I agree that expanding the protobuf CloseProducer and CloseConsumer
> >>>> commands could be valuable here. The expansion would ensure that
> >>>> producers and consumers don't attempt to reconnect. However, I think
> >>>> we might still have a race condition that could make tenant or
> >>>> namespace deletion fail. Specifically, if a new producer or consumer
> >>>> creates a topic after the namespace deletion has started but
> >>>> before it is complete. Do you agree that the underlying race still
> >>> exists?
> >>>>
> >>>> In my view, the fundamental problem here is that deleting certain Pulsar
> >>>> resources takes time and, in a distributed system, that means race
> >>>> conditions.
> >>>>
> >>>> What if we expand our usage of the "terminated" feature to apply to
> >>>> namespaces (and tenants)? Then, a terminated namespace can have
> >>>> bundles and topics can be deleted but not created (just as a terminated
> >>>> topic cannot have any new messages published to it). This would take
> >>>> care of all topic creation race conditions. We'd probably need to add
> >>>> new protobuf exceptions for this feature.
> >>>>
> >>>> Thanks,
> >>>> Michael
> >>>>
> >>>>
> >>>> On Sat, Jan 29, 2022 at 7:25 PM Zike Yang
> >>>> <zk...@streamnative.io.invalid> wrote:
> >>>>>
> >>>>> +1
> >>>>>
> >>>>>
> >>>>> Thanks,
> >>>>> Zike
> >>>>>
> >>>>> On Sat, Jan 29, 2022 at 12:30 PM guo jiwei <te...@apache.org>
> >>> wrote:
> >>>>>>
> >>>>>> Hi
> >>>>>> The PIP link : https://github.com/apache/pulsar/issues/13989
> >>>>>>
> >>>>>> Regards
> >>>>>> Jiwei Guo (Tboy)
> >>>>>>
> >>>>>>
> >>>>>> On Sat, Jan 29, 2022 at 11:46 AM mattison chao <mattisonchao@gmail.com
> >>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hello everyone,
> >>>>>>>
> >>>>>>> I want to start a discussion about PIP-139 : Support Broker send
> >>> command
> >>>>>>> to real close producer/consumer.
> >>>>>>>
> >>>>>>> This is the PIP document
> >>>>>>>
> >>>>>>> https://github.com/apache/pulsar/issues/13989 <
> >>>>>>> https://github.com/apache/pulsar/issues/13979>
> >>>>>>>
> >>>>>>> Please check it out and feel free to share your thoughts.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Mattison
> >>>>>>>
> >>>>>>>
> >>>>>>> ———————— Pasted below for quoting convenience.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Relation pull request:  #13337
> >>>>>>> Authors: @Technoboy-  @mattisonchao
> >>>>>>>
> >>>>>>> ## Motivation
> >>>>>>>
> >>>>>>> Before we discuss this pip, I'd like to supplement some context to
> >>> help
> >>>>>>> contributors who don't want to read the original pull request.
> >>>>>>>
> >>>>>>>> When there are no user-created topics under a namespace, Namespace
> >>>>>>> should be deleted. But currently, the system topic existed and the
> >>>>>>> reader/producer could auto-create the system which may cause the
> >>> namespace
> >>>>>>> deletion to fail.
> >>>>>>>
> >>>>>>> For this reason, we need to close the system topic reader/producer
> >>> first,
> >>>>>>> then remove the system topic. finally, remove the namespace.
> >>>>>>>
> >>>>>>> Following this way, we first want to use ``terminate`` to solve this
> >>>>>>> problem. then we found producers can disconnect, but consumers are
> >>> still
> >>>>>>> alive. So, another PR #13960 wants to add consumers' closing logic.
> >>>>>>>
> >>>>>>> After #13960, all things look good, but another problem appears. that
> >>> is
> >>>>>>> we need to wait until consumers completely consume all messages (this
> >>> may
> >>>>>>> make terminate topic so long and the operation timeout)then get
> >>>>>>> ``reachedEndOfTopic``. the relative code here :
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>> https://github.com/apache/pulsar/blob/07ef9231db8b844586b9217ee2d59237eb9c54b7/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/CompactorSubscription.java#L102-L106
> >>>>>>>
> >>>>>>> In the #13337 case, we need to force close consumers immediately. So
> >>> we
> >>>>>>> write this PIP to discuss another way to resolve this problem.
> >>>>>>>
> >>>>>>> ## Goal
> >>>>>>>
> >>>>>>> We can add a new field(`allow_reconnect`) in command
> >>>>>>> ``CommandCloseProducer``/ ``CommandCloseConsumer`` to close
> >>>>>>> producer/consumers immediately.
> >>>>>>>
> >>>>>>> ## API Changes
> >>>>>>>
> >>>>>>> - Add ``allow_reconnect`` to ``CommandCloseProducer``;
> >>>>>>>
> >>>>>>> ```java
> >>>>>>> **Before**
> >>>>>>>
> >>>>>>> message CommandCloseProducer {
> >>>>>>>   required uint64 producer_id = 1;
> >>>>>>>   required uint64 request_id = 2;
> >>>>>>> }
> >>>>>>>
> >>>>>>> **After**
> >>>>>>> message CommandCloseProducer {
> >>>>>>>   required uint64 producer_id = 1;
> >>>>>>>   required uint64 request_id = 2;
> >>>>>>>   optional bool allow_reconnect = 3 [default = true];
> >>>>>>> }
> >>>>>>> ```
> >>>>>>>
> >>>>>>> - Add ``allow_reconnect`` to ``CommandCloseConsumer``
> >>>>>>> ```java
> >>>>>>> **Before**
> >>>>>>>
> >>>>>>> message CommandCloseConsumer {
> >>>>>>>   required uint64 consumer_id = 1;
> >>>>>>>   required uint64 request_id = 2;
> >>>>>>> }
> >>>>>>>
> >>>>>>> **After**
> >>>>>>> message CommandCloseConsumer {
> >>>>>>>   required uint64 consumer_id = 1;
> >>>>>>>   required uint64 request_id = 2;
> >>>>>>>   optional bool allow_reconnect = 3 [default = true];
> >>>>>>> }
> >>>>>>> ```
> >>>>>>>
> >>>>>>> ## Implementation
> >>>>>>>
> >>>>>>> ### ClientCnx - Producer:
> >>>>>>>
> >>>>>>> **Before**
> >>>>>>> ```java
> >>>>>>>   @Override
> >>>>>>>   protected void handleCloseProducer(CommandCloseProducer
> >>> closeProducer)
> >>>>>>> {
> >>>>>>>       log.info("[{}] Broker notification of Closed producer: {}",
> >>>>>>> remoteAddress, closeProducer.getProducerId());
> >>>>>>>       final long producerId = closeProducer.getProducerId();
> >>>>>>>       ProducerImpl<?> producer = producers.get(producerId);
> >>>>>>>       if (producer != null) {
> >>>>>>>           producer.connectionClosed(this);
> >>>>>>>       } else {
> >>>>>>>           log.warn("Producer with id {} not found while closing
> >>> producer
> >>>>>>> ", producerId);
> >>>>>>>       }
> >>>>>>>   }
> >>>>>>> ```
> >>>>>>> After:
> >>>>>>> ```java
> >>>>>>>  @Override
> >>>>>>>   protected void handleCloseProducer(CommandCloseProducer
> >>> closeProducer)
> >>>>>>> {
> >>>>>>>       log.info("[{}] Broker notification of Closed producer: {}",
> >>>>>>> remoteAddress, closeProducer.getProducerId());
> >>>>>>>       final long producerId = closeProducer.getProducerId();
> >>>>>>>       ProducerImpl<?> producer = producers.get(producerId);
> >>>>>>>       if (producer != null) {
> >>>>>>>           if (closeProducer.isAllowReconnect) {
> >>>>>>>               producer.connectionClosed(this);
> >>>>>>>           } else {
> >>>>>>>               producer.closeAsync();
> >>>>>>>           }
> >>>>>>>       } else {
> >>>>>>>           log.warn("Producer with id {} not found while closing
> >>> producer
> >>>>>>> ", producerId);
> >>>>>>>       }
> >>>>>>>   }
> >>>>>>> ```
> >>>>>>> ### ClientCnx - Consumer:
> >>>>>>>
> >>>>>>> **Before**
> >>>>>>> ```java
> >>>>>>>   @Override
> >>>>>>>   protected void handleCloseConsumer(CommandCloseConsumer
> >>> closeConsumer)
> >>>>>>> {
> >>>>>>>       log.info("[{}] Broker notification of Closed consumer: {}",
> >>>>>>> remoteAddress, closeConsumer.getConsumerId());
> >>>>>>>       final long consumerId = closeConsumer.getConsumerId();
> >>>>>>>       ConsumerImpl<?> consumer = consumers.get(consumerId);
> >>>>>>>       if (consumer != null) {
> >>>>>>>           consumer.connectionClosed(this);
> >>>>>>>       } else {
> >>>>>>>           log.warn("Consumer with id {} not found while closing
> >>> consumer
> >>>>>>> ", consumerId);
> >>>>>>>       }
> >>>>>>>   }
> >>>>>>> ```
> >>>>>>> **After**
> >>>>>>> ```java
> >>>>>>>   @Override
> >>>>>>>   protected void handleCloseConsumer(CommandCloseConsumer
> >>> closeConsumer)
> >>>>>>> {
> >>>>>>>       log.info("[{}] Broker notification of Closed consumer: {}",
> >>>>>>> remoteAddress, closeConsumer.getConsumerId());
> >>>>>>>       final long consumerId = closeConsumer.getConsumerId();
> >>>>>>>       ConsumerImpl<?> consumer = consumers.get(consumerId);
> >>>>>>>       if (consumer != null) {
> >>>>>>>           if (closeConsumer.isAllowReconnect) {
> >>>>>>>               consumer.connectionClosed(this);
> >>>>>>>           } else {
> >>>>>>>               consumer.closeAsync();
> >>>>>>>           }
> >>>>>>>       } else {
> >>>>>>>           log.warn("Consumer with id {} not found while closing
> >>> consumer
> >>>>>>> ", consumerId);
> >>>>>>>       }
> >>>>>>>   }
> >>>>>>> ```
> >>>>>>>
> >>>>>>>
> >>>>>>> ## Reject Alternatives
> >>>>>>>
> >>>>>>> none.
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Zike Yang
> >>>
> >>>
>

Re: [DISCUSS] PIP-139 : Support Broker send command to real close producer/consumer.

Posted by Dave Fisher <wa...@apache.org>.
Hi -

I hope I’m understanding what’s being discussed.

If we are going to automatically delete tenants and namespaces for not containing topics then we need to make both of these automatic actions configurable with a default to NOT do so. Otherwise we break existing use cases.

Automatic deletion of namespaces should be configurable at both the cluster and tenant level.

Regards,
Dave

> On Feb 24, 2022, at 2:25 PM, Michael Marshall <mm...@apache.org> wrote:
> 
>> The old producer/consumer should be closed after applying the changes from
>> this proposal.
> 
> Penghui, are you suggesting that we implement the namespace/tenant
> terminated logic after completing this PIP?
> 
> For the sake of discussion, if we implement the namespace terminated
> logic first, we could fulfill the underlying requirements for this PIP
> by returning a new non-retriable error response when a client tries to
> connect a producer or a consumer to a topic in a namespace that is
> "terminated". If we do add the "namespace terminated" feature, we'll
> need to add a non-retriable exception for this case, anyway. The main
> advantage here is that we'd only need one expansion of the protobuf
> instead of two. The downside is that the protocol for connected
> clients has a couple more roundtrips. The broker would disconnect
> connected clients and then fail their reconnection attempt with a
> non-retriable error.
> 
> Thanks,
> Michael
> 
> On Thu, Feb 24, 2022 at 7:11 AM PengHui Li <pe...@apache.org> wrote:
>> 
>>> If we want to solve this problem, we need to add some sync resources like
>> lock/state, I think it’s a harm for us, we don’t need to do that.
>> 
>> I think we can make the namespace/tenants to the inactive state first so
>> that we can avoid any new
>> producer/consumer connect to the topic under the namespace/tenant.
>> 
>> The old producer/consumer should be closed after applying the changes from
>> this proposal.
>> 
>> Thanks,
>> Penghui
>> 
>> On Tue, Feb 8, 2022 at 5:47 PM mattison chao <ma...@gmail.com> wrote:
>> 
>>>> This is supposed to mean that the namespace should be able to be
>>>> deleted, correct?
>>> 
>>> Yes, the main background is the user doesn’t have an active topic. so,
>>> they want to delete the namespace.
>>> 
>>>> However, I think
>>>> we might still have a race condition that could make tenant or
>>>> namespace deletion fail. Specifically, if a new producer or consumer
>>>> creates a topic after the namespace deletion has started but
>>>> before it is complete. Do you agree that the underlying race still
>>> exists?
>>> 
>>> Yes, this condition exists. I think it’s not a big problem because the
>>> user doesn’t want to use this namespace anymore.
>>> If this scenario appears, they will get an error and need to delete it
>>> again.
>>> 
>>>> What if we expand our usage of the "terminated" feature to apply to
>>>> namespaces (and tenants)? Then, a terminated namespace can have
>>>> bundles and topics can be deleted but not created (just as a terminated
>>>> topic cannot have any new messages published to it). This would take
>>>> care of all topic creation race conditions. We'd probably need to add
>>>> new protobuf exceptions for this feature.
>>> 
>>> 
>>> If we want to solve this problem, we need to add some sync resources like
>>> lock/state, I think it’s a harm for us, we don’t need to do that.
>>> 
>>> Thanks for your suggestions, let me know what you think.
>>> 
>>> Best,
>>> Mattison
>>> 
>>>> On Feb 1, 2022, at 2:26 PM, Michael Marshall <mm...@apache.org>
>>> wrote:
>>>> 
>>>> This proposal identifies an important issue that we should definitely
>>>> solve. I have some questions.
>>>> 
>>>>> When there are no user-created topics under a namespace,
>>>>> Namespace should be deleted.
>>>> 
>>>> This is supposed to mean that the namespace should be able to be
>>>> deleted, correct?
>>>> 
>>>>> For this reason, we need to close the system topic reader/producer
>>>>> first, then remove the system topic. finally, remove the namespace.
>>>> 
>>>> I agree that expanding the protobuf CloseProducer and CloseConsumer
>>>> commands could be valuable here. The expansion would ensure that
>>>> producers and consumers don't attempt to reconnect. However, I think
>>>> we might still have a race condition that could make tenant or
>>>> namespace deletion fail. Specifically, if a new producer or consumer
>>>> creates a topic after the namespace deletion has started but
>>>> before it is complete. Do you agree that the underlying race still
>>> exists?
>>>> 
>>>> In my view, the fundamental problem here is that deleting certain Pulsar
>>>> resources takes time and, in a distributed system, that means race
>>>> conditions.
>>>> 
>>>> What if we expand our usage of the "terminated" feature to apply to
>>>> namespaces (and tenants)? Then, a terminated namespace can have
>>>> bundles and topics can be deleted but not created (just as a terminated
>>>> topic cannot have any new messages published to it). This would take
>>>> care of all topic creation race conditions. We'd probably need to add
>>>> new protobuf exceptions for this feature.
>>>> 
>>>> Thanks,
>>>> Michael
>>>> 
>>>> 
>>>> On Sat, Jan 29, 2022 at 7:25 PM Zike Yang
>>>> <zk...@streamnative.io.invalid> wrote:
>>>>> 
>>>>> +1
>>>>> 
>>>>> 
>>>>> Thanks,
>>>>> Zike
>>>>> 
>>>>> On Sat, Jan 29, 2022 at 12:30 PM guo jiwei <te...@apache.org>
>>> wrote:
>>>>>> 
>>>>>> Hi
>>>>>> The PIP link : https://github.com/apache/pulsar/issues/13989
>>>>>> 
>>>>>> Regards
>>>>>> Jiwei Guo (Tboy)
>>>>>> 
>>>>>> 
>>>>>> On Sat, Jan 29, 2022 at 11:46 AM mattison chao <mattisonchao@gmail.com
>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Hello everyone,
>>>>>>> 
>>>>>>> I want to start a discussion about PIP-139 : Support Broker send
>>> command
>>>>>>> to real close producer/consumer.
>>>>>>> 
>>>>>>> This is the PIP document
>>>>>>> 
>>>>>>> https://github.com/apache/pulsar/issues/13989 <
>>>>>>> https://github.com/apache/pulsar/issues/13979>
>>>>>>> 
>>>>>>> Please check it out and feel free to share your thoughts.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Mattison
>>>>>>> 
>>>>>>> 
>>>>>>> ———————— Pasted below for quoting convenience.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Relation pull request:  #13337
>>>>>>> Authors: @Technoboy-  @mattisonchao
>>>>>>> 
>>>>>>> ## Motivation
>>>>>>> 
>>>>>>> Before we discuss this pip, I'd like to supplement some context to
>>> help
>>>>>>> contributors who don't want to read the original pull request.
>>>>>>> 
>>>>>>>> When there are no user-created topics under a namespace, Namespace
>>>>>>> should be deleted. But currently, the system topic existed and the
>>>>>>> reader/producer could auto-create the system which may cause the
>>> namespace
>>>>>>> deletion to fail.
>>>>>>> 
>>>>>>> For this reason, we need to close the system topic reader/producer
>>> first,
>>>>>>> then remove the system topic. finally, remove the namespace.
>>>>>>> 
>>>>>>> Following this way, we first want to use ``terminate`` to solve this
>>>>>>> problem. then we found producers can disconnect, but consumers are
>>> still
>>>>>>> alive. So, another PR #13960 wants to add consumers' closing logic.
>>>>>>> 
>>>>>>> After #13960, all things look good, but another problem appears. that
>>> is
>>>>>>> we need to wait until consumers completely consume all messages (this
>>> may
>>>>>>> make terminate topic so long and the operation timeout)then get
>>>>>>> ``reachedEndOfTopic``. the relative code here :
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>> https://github.com/apache/pulsar/blob/07ef9231db8b844586b9217ee2d59237eb9c54b7/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/CompactorSubscription.java#L102-L106
>>>>>>> 
>>>>>>> In the #13337 case, we need to force close consumers immediately. So
>>> we
>>>>>>> write this PIP to discuss another way to resolve this problem.
>>>>>>> 
>>>>>>> ## Goal
>>>>>>> 
>>>>>>> We can add a new field(`allow_reconnect`) in command
>>>>>>> ``CommandCloseProducer``/ ``CommandCloseConsumer`` to close
>>>>>>> producer/consumers immediately.
>>>>>>> 
>>>>>>> ## API Changes
>>>>>>> 
>>>>>>> - Add ``allow_reconnect`` to ``CommandCloseProducer``;
>>>>>>> 
>>>>>>> ```java
>>>>>>> **Before**
>>>>>>> 
>>>>>>> message CommandCloseProducer {
>>>>>>>   required uint64 producer_id = 1;
>>>>>>>   required uint64 request_id = 2;
>>>>>>> }
>>>>>>> 
>>>>>>> **After**
>>>>>>> message CommandCloseProducer {
>>>>>>>   required uint64 producer_id = 1;
>>>>>>>   required uint64 request_id = 2;
>>>>>>>   optional bool allow_reconnect = 3 [default = true];
>>>>>>> }
>>>>>>> ```
>>>>>>> 
>>>>>>> - Add ``allow_reconnect`` to ``CommandCloseConsumer``
>>>>>>> ```java
>>>>>>> **Before**
>>>>>>> 
>>>>>>> message CommandCloseConsumer {
>>>>>>>   required uint64 consumer_id = 1;
>>>>>>>   required uint64 request_id = 2;
>>>>>>> }
>>>>>>> 
>>>>>>> **After**
>>>>>>> message CommandCloseConsumer {
>>>>>>>   required uint64 consumer_id = 1;
>>>>>>>   required uint64 request_id = 2;
>>>>>>>   optional bool allow_reconnect = 3 [default = true];
>>>>>>> }
>>>>>>> ```
>>>>>>> 
>>>>>>> ## Implementation
>>>>>>> 
>>>>>>> ### ClientCnx - Producer:
>>>>>>> 
>>>>>>> **Before**
>>>>>>> ```java
>>>>>>>   @Override
>>>>>>>   protected void handleCloseProducer(CommandCloseProducer
>>> closeProducer)
>>>>>>> {
>>>>>>>       log.info("[{}] Broker notification of Closed producer: {}",
>>>>>>> remoteAddress, closeProducer.getProducerId());
>>>>>>>       final long producerId = closeProducer.getProducerId();
>>>>>>>       ProducerImpl<?> producer = producers.get(producerId);
>>>>>>>       if (producer != null) {
>>>>>>>           producer.connectionClosed(this);
>>>>>>>       } else {
>>>>>>>           log.warn("Producer with id {} not found while closing
>>> producer
>>>>>>> ", producerId);
>>>>>>>       }
>>>>>>>   }
>>>>>>> ```
>>>>>>> After:
>>>>>>> ```java
>>>>>>>  @Override
>>>>>>>   protected void handleCloseProducer(CommandCloseProducer
>>> closeProducer)
>>>>>>> {
>>>>>>>       log.info("[{}] Broker notification of Closed producer: {}",
>>>>>>> remoteAddress, closeProducer.getProducerId());
>>>>>>>       final long producerId = closeProducer.getProducerId();
>>>>>>>       ProducerImpl<?> producer = producers.get(producerId);
>>>>>>>       if (producer != null) {
>>>>>>>           if (closeProducer.isAllowReconnect) {
>>>>>>>               producer.connectionClosed(this);
>>>>>>>           } else {
>>>>>>>               producer.closeAsync();
>>>>>>>           }
>>>>>>>       } else {
>>>>>>>           log.warn("Producer with id {} not found while closing
>>> producer
>>>>>>> ", producerId);
>>>>>>>       }
>>>>>>>   }
>>>>>>> ```
>>>>>>> ### ClientCnx - Consumer:
>>>>>>> 
>>>>>>> **Before**
>>>>>>> ```java
>>>>>>>   @Override
>>>>>>>   protected void handleCloseConsumer(CommandCloseConsumer
>>> closeConsumer)
>>>>>>> {
>>>>>>>       log.info("[{}] Broker notification of Closed consumer: {}",
>>>>>>> remoteAddress, closeConsumer.getConsumerId());
>>>>>>>       final long consumerId = closeConsumer.getConsumerId();
>>>>>>>       ConsumerImpl<?> consumer = consumers.get(consumerId);
>>>>>>>       if (consumer != null) {
>>>>>>>           consumer.connectionClosed(this);
>>>>>>>       } else {
>>>>>>>           log.warn("Consumer with id {} not found while closing
>>> consumer
>>>>>>> ", consumerId);
>>>>>>>       }
>>>>>>>   }
>>>>>>> ```
>>>>>>> **After**
>>>>>>> ```java
>>>>>>>   @Override
>>>>>>>   protected void handleCloseConsumer(CommandCloseConsumer
>>> closeConsumer)
>>>>>>> {
>>>>>>>       log.info("[{}] Broker notification of Closed consumer: {}",
>>>>>>> remoteAddress, closeConsumer.getConsumerId());
>>>>>>>       final long consumerId = closeConsumer.getConsumerId();
>>>>>>>       ConsumerImpl<?> consumer = consumers.get(consumerId);
>>>>>>>       if (consumer != null) {
>>>>>>>           if (closeConsumer.isAllowReconnect) {
>>>>>>>               consumer.connectionClosed(this);
>>>>>>>           } else {
>>>>>>>               consumer.closeAsync();
>>>>>>>           }
>>>>>>>       } else {
>>>>>>>           log.warn("Consumer with id {} not found while closing
>>> consumer
>>>>>>> ", consumerId);
>>>>>>>       }
>>>>>>>   }
>>>>>>> ```
>>>>>>> 
>>>>>>> 
>>>>>>> ## Reject Alternatives
>>>>>>> 
>>>>>>> none.
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Zike Yang
>>> 
>>> 


Re: [DISCUSS] PIP-139 : Support Broker send command to real close producer/consumer.

Posted by Michael Marshall <mm...@apache.org>.
> The old producer/consumer should be closed after applying the changes from
> this proposal.

Penghui, are you suggesting that we implement the namespace/tenant
terminated logic after completing this PIP?

For the sake of discussion, if we implement the namespace terminated
logic first, we could fulfill the underlying requirements for this PIP
by returning a new non-retriable error response when a client tries to
connect a producer or a consumer to a topic in a namespace that is
"terminated". If we do add the "namespace terminated" feature, we'll
need to add a non-retriable exception for this case, anyway. The main
advantage here is that we'd only need one expansion of the protobuf
instead of two. The downside is that the protocol for connected
clients has a couple more roundtrips. The broker would disconnect
connected clients and then fail their reconnection attempt with a
non-retriable error.

Thanks,
Michael

On Thu, Feb 24, 2022 at 7:11 AM PengHui Li <pe...@apache.org> wrote:
>
> > If we want to solve this problem, we need to add some sync resources like
> lock/state, I think it’s a harm for us, we don’t need to do that.
>
> I think we can make the namespace/tenants to the inactive state first so
> that we can avoid any new
> producer/consumer connect to the topic under the namespace/tenant.
>
> The old producer/consumer should be closed after applying the changes from
> this proposal.
>
> Thanks,
> Penghui
>
> On Tue, Feb 8, 2022 at 5:47 PM mattison chao <ma...@gmail.com> wrote:
>
> > > This is supposed to mean that the namespace should be able to be
> > > deleted, correct?
> >
> > Yes, the main background is the user doesn’t have an active topic. so,
> > they want to delete the namespace.
> >
> > > However, I think
> > > we might still have a race condition that could make tenant or
> > > namespace deletion fail. Specifically, if a new producer or consumer
> > > creates a topic after the namespace deletion has started but
> > > before it is complete. Do you agree that the underlying race still
> > exists?
> >
> > Yes, this condition exists. I think it’s not a big problem because the
> > user doesn’t want to use this namespace anymore.
> > If this scenario appears, they will get an error and need to delete it
> > again.
> >
> > > What if we expand our usage of the "terminated" feature to apply to
> > > namespaces (and tenants)? Then, a terminated namespace can have
> > > bundles and topics can be deleted but not created (just as a terminated
> > > topic cannot have any new messages published to it). This would take
> > > care of all topic creation race conditions. We'd probably need to add
> > > new protobuf exceptions for this feature.
> >
> >
> > If we want to solve this problem, we need to add some sync resources like
> > lock/state, I think it’s a harm for us, we don’t need to do that.
> >
> > Thanks for your suggestions, let me know what you think.
> >
> > Best,
> > Mattison
> >
> > > On Feb 1, 2022, at 2:26 PM, Michael Marshall <mm...@apache.org>
> > wrote:
> > >
> > > This proposal identifies an important issue that we should definitely
> > > solve. I have some questions.
> > >
> > >> When there are no user-created topics under a namespace,
> > >> Namespace should be deleted.
> > >
> > > This is supposed to mean that the namespace should be able to be
> > > deleted, correct?
> > >
> > >> For this reason, we need to close the system topic reader/producer
> > >> first, then remove the system topic. finally, remove the namespace.
> > >
> > > I agree that expanding the protobuf CloseProducer and CloseConsumer
> > > commands could be valuable here. The expansion would ensure that
> > > producers and consumers don't attempt to reconnect. However, I think
> > > we might still have a race condition that could make tenant or
> > > namespace deletion fail. Specifically, if a new producer or consumer
> > > creates a topic after the namespace deletion has started but
> > > before it is complete. Do you agree that the underlying race still
> > exists?
> > >
> > > In my view, the fundamental problem here is that deleting certain Pulsar
> > > resources takes time and, in a distributed system, that means race
> > > conditions.
> > >
> > > What if we expand our usage of the "terminated" feature to apply to
> > > namespaces (and tenants)? Then, a terminated namespace can have
> > > bundles and topics can be deleted but not created (just as a terminated
> > > topic cannot have any new messages published to it). This would take
> > > care of all topic creation race conditions. We'd probably need to add
> > > new protobuf exceptions for this feature.
> > >
> > > Thanks,
> > > Michael
> > >
> > >
> > > On Sat, Jan 29, 2022 at 7:25 PM Zike Yang
> > > <zk...@streamnative.io.invalid> wrote:
> > >>
> > >> +1
> > >>
> > >>
> > >> Thanks,
> > >> Zike
> > >>
> > >> On Sat, Jan 29, 2022 at 12:30 PM guo jiwei <te...@apache.org>
> > wrote:
> > >>>
> > >>> Hi
> > >>> The PIP link : https://github.com/apache/pulsar/issues/13989
> > >>>
> > >>> Regards
> > >>> Jiwei Guo (Tboy)
> > >>>
> > >>>
> > >>> On Sat, Jan 29, 2022 at 11:46 AM mattison chao <mattisonchao@gmail.com
> > >
> > >>> wrote:
> > >>>
> > >>>> Hello everyone,
> > >>>>
> > >>>> I want to start a discussion about PIP-139 : Support Broker send
> > command
> > >>>> to real close producer/consumer.
> > >>>>
> > >>>> This is the PIP document
> > >>>>
> > >>>> https://github.com/apache/pulsar/issues/13989 <
> > >>>> https://github.com/apache/pulsar/issues/13979>
> > >>>>
> > >>>> Please check it out and feel free to share your thoughts.
> > >>>>
> > >>>> Best,
> > >>>> Mattison
> > >>>>
> > >>>>
> > >>>> ———————— Pasted below for quoting convenience.
> > >>>>
> > >>>>
> > >>>>
> > >>>> Relation pull request:  #13337
> > >>>> Authors: @Technoboy-  @mattisonchao
> > >>>>
> > >>>> ## Motivation
> > >>>>
> > >>>> Before we discuss this pip, I'd like to supplement some context to
> > help
> > >>>> contributors who don't want to read the original pull request.
> > >>>>
> > >>>>> When there are no user-created topics under a namespace, Namespace
> > >>>> should be deleted. But currently, the system topic existed and the
> > >>>> reader/producer could auto-create the system which may cause the
> > namespace
> > >>>> deletion to fail.
> > >>>>
> > >>>> For this reason, we need to close the system topic reader/producer
> > first,
> > >>>> then remove the system topic. finally, remove the namespace.
> > >>>>
> > >>>> Following this way, we first want to use ``terminate`` to solve this
> > >>>> problem. then we found producers can disconnect, but consumers are
> > still
> > >>>> alive. So, another PR #13960 wants to add consumers' closing logic.
> > >>>>
> > >>>> After #13960, all things look good, but another problem appears. that
> > is
> > >>>> we need to wait until consumers completely consume all messages (this
> > may
> > >>>> make terminate topic so long and the operation timeout)then get
> > >>>> ``reachedEndOfTopic``. the relative code here :
> > >>>>
> > >>>>
> > >>>>
> > https://github.com/apache/pulsar/blob/07ef9231db8b844586b9217ee2d59237eb9c54b7/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/CompactorSubscription.java#L102-L106
> > >>>>
> > >>>> In the #13337 case, we need to force close consumers immediately. So
> > we
> > >>>> write this PIP to discuss another way to resolve this problem.
> > >>>>
> > >>>> ## Goal
> > >>>>
> > >>>> We can add a new field(`allow_reconnect`) in command
> > >>>> ``CommandCloseProducer``/ ``CommandCloseConsumer`` to close
> > >>>> producer/consumers immediately.
> > >>>>
> > >>>> ## API Changes
> > >>>>
> > >>>> - Add ``allow_reconnect`` to ``CommandCloseProducer``;
> > >>>>
> > >>>> ```java
> > >>>> **Before**
> > >>>>
> > >>>> message CommandCloseProducer {
> > >>>>    required uint64 producer_id = 1;
> > >>>>    required uint64 request_id = 2;
> > >>>> }
> > >>>>
> > >>>> **After**
> > >>>> message CommandCloseProducer {
> > >>>>    required uint64 producer_id = 1;
> > >>>>    required uint64 request_id = 2;
> > >>>>    optional bool allow_reconnect = 3 [default = true];
> > >>>> }
> > >>>> ```
> > >>>>
> > >>>> - Add ``allow_reconnect`` to ``CommandCloseConsumer``
> > >>>> ```java
> > >>>> **Before**
> > >>>>
> > >>>> message CommandCloseConsumer {
> > >>>>    required uint64 consumer_id = 1;
> > >>>>    required uint64 request_id = 2;
> > >>>> }
> > >>>>
> > >>>> **After**
> > >>>> message CommandCloseConsumer {
> > >>>>    required uint64 consumer_id = 1;
> > >>>>    required uint64 request_id = 2;
> > >>>>    optional bool allow_reconnect = 3 [default = true];
> > >>>> }
> > >>>> ```
> > >>>>
> > >>>> ## Implementation
> > >>>>
> > >>>> ### ClientCnx - Producer:
> > >>>>
> > >>>> **Before**
> > >>>> ```java
> > >>>>    @Override
> > >>>>    protected void handleCloseProducer(CommandCloseProducer
> > closeProducer)
> > >>>> {
> > >>>>        log.info("[{}] Broker notification of Closed producer: {}",
> > >>>> remoteAddress, closeProducer.getProducerId());
> > >>>>        final long producerId = closeProducer.getProducerId();
> > >>>>        ProducerImpl<?> producer = producers.get(producerId);
> > >>>>        if (producer != null) {
> > >>>>            producer.connectionClosed(this);
> > >>>>        } else {
> > >>>>            log.warn("Producer with id {} not found while closing
> > producer
> > >>>> ", producerId);
> > >>>>        }
> > >>>>    }
> > >>>> ```
> > >>>> After:
> > >>>> ```java
> > >>>>   @Override
> > >>>>    protected void handleCloseProducer(CommandCloseProducer
> > closeProducer)
> > >>>> {
> > >>>>        log.info("[{}] Broker notification of Closed producer: {}",
> > >>>> remoteAddress, closeProducer.getProducerId());
> > >>>>        final long producerId = closeProducer.getProducerId();
> > >>>>        ProducerImpl<?> producer = producers.get(producerId);
> > >>>>        if (producer != null) {
> > >>>>            if (closeProducer.isAllowReconnect) {
> > >>>>                producer.connectionClosed(this);
> > >>>>            } else {
> > >>>>                producer.closeAsync();
> > >>>>            }
> > >>>>        } else {
> > >>>>            log.warn("Producer with id {} not found while closing
> > producer
> > >>>> ", producerId);
> > >>>>        }
> > >>>>    }
> > >>>> ```
> > >>>> ### ClientCnx - Consumer:
> > >>>>
> > >>>> **Before**
> > >>>> ```java
> > >>>>    @Override
> > >>>>    protected void handleCloseConsumer(CommandCloseConsumer
> > closeConsumer)
> > >>>> {
> > >>>>        log.info("[{}] Broker notification of Closed consumer: {}",
> > >>>> remoteAddress, closeConsumer.getConsumerId());
> > >>>>        final long consumerId = closeConsumer.getConsumerId();
> > >>>>        ConsumerImpl<?> consumer = consumers.get(consumerId);
> > >>>>        if (consumer != null) {
> > >>>>            consumer.connectionClosed(this);
> > >>>>        } else {
> > >>>>            log.warn("Consumer with id {} not found while closing
> > consumer
> > >>>> ", consumerId);
> > >>>>        }
> > >>>>    }
> > >>>> ```
> > >>>> **After**
> > >>>> ```java
> > >>>>    @Override
> > >>>>    protected void handleCloseConsumer(CommandCloseConsumer
> > closeConsumer)
> > >>>> {
> > >>>>        log.info("[{}] Broker notification of Closed consumer: {}",
> > >>>> remoteAddress, closeConsumer.getConsumerId());
> > >>>>        final long consumerId = closeConsumer.getConsumerId();
> > >>>>        ConsumerImpl<?> consumer = consumers.get(consumerId);
> > >>>>        if (consumer != null) {
> > >>>>            if (closeConsumer.isAllowReconnect) {
> > >>>>                consumer.connectionClosed(this);
> > >>>>            } else {
> > >>>>                consumer.closeAsync();
> > >>>>            }
> > >>>>        } else {
> > >>>>            log.warn("Consumer with id {} not found while closing
> > consumer
> > >>>> ", consumerId);
> > >>>>        }
> > >>>>    }
> > >>>> ```
> > >>>>
> > >>>>
> > >>>> ## Reject Alternatives
> > >>>>
> > >>>> none.
> > >>
> > >>
> > >>
> > >> --
> > >> Zike Yang
> >
> >

Re: [DISCUSS] PIP-139 : Support Broker send command to real close producer/consumer.

Posted by PengHui Li <pe...@apache.org>.
> If we want to solve this problem, we need to add some sync resources like
lock/state, I think it’s a harm for us, we don’t need to do that.

I think we can make the namespace/tenants to the inactive state first so
that we can avoid any new
producer/consumer connect to the topic under the namespace/tenant.

The old producer/consumer should be closed after applying the changes from
this proposal.

Thanks,
Penghui

On Tue, Feb 8, 2022 at 5:47 PM mattison chao <ma...@gmail.com> wrote:

> > This is supposed to mean that the namespace should be able to be
> > deleted, correct?
>
> Yes, the main background is the user doesn’t have an active topic. so,
> they want to delete the namespace.
>
> > However, I think
> > we might still have a race condition that could make tenant or
> > namespace deletion fail. Specifically, if a new producer or consumer
> > creates a topic after the namespace deletion has started but
> > before it is complete. Do you agree that the underlying race still
> exists?
>
> Yes, this condition exists. I think it’s not a big problem because the
> user doesn’t want to use this namespace anymore.
> If this scenario appears, they will get an error and need to delete it
> again.
>
> > What if we expand our usage of the "terminated" feature to apply to
> > namespaces (and tenants)? Then, a terminated namespace can have
> > bundles and topics can be deleted but not created (just as a terminated
> > topic cannot have any new messages published to it). This would take
> > care of all topic creation race conditions. We'd probably need to add
> > new protobuf exceptions for this feature.
>
>
> If we want to solve this problem, we need to add some sync resources like
> lock/state, I think it’s a harm for us, we don’t need to do that.
>
> Thanks for your suggestions, let me know what you think.
>
> Best,
> Mattison
>
> > On Feb 1, 2022, at 2:26 PM, Michael Marshall <mm...@apache.org>
> wrote:
> >
> > This proposal identifies an important issue that we should definitely
> > solve. I have some questions.
> >
> >> When there are no user-created topics under a namespace,
> >> Namespace should be deleted.
> >
> > This is supposed to mean that the namespace should be able to be
> > deleted, correct?
> >
> >> For this reason, we need to close the system topic reader/producer
> >> first, then remove the system topic. finally, remove the namespace.
> >
> > I agree that expanding the protobuf CloseProducer and CloseConsumer
> > commands could be valuable here. The expansion would ensure that
> > producers and consumers don't attempt to reconnect. However, I think
> > we might still have a race condition that could make tenant or
> > namespace deletion fail. Specifically, if a new producer or consumer
> > creates a topic after the namespace deletion has started but
> > before it is complete. Do you agree that the underlying race still
> exists?
> >
> > In my view, the fundamental problem here is that deleting certain Pulsar
> > resources takes time and, in a distributed system, that means race
> > conditions.
> >
> > What if we expand our usage of the "terminated" feature to apply to
> > namespaces (and tenants)? Then, a terminated namespace can have
> > bundles and topics can be deleted but not created (just as a terminated
> > topic cannot have any new messages published to it). This would take
> > care of all topic creation race conditions. We'd probably need to add
> > new protobuf exceptions for this feature.
> >
> > Thanks,
> > Michael
> >
> >
> > On Sat, Jan 29, 2022 at 7:25 PM Zike Yang
> > <zk...@streamnative.io.invalid> wrote:
> >>
> >> +1
> >>
> >>
> >> Thanks,
> >> Zike
> >>
> >> On Sat, Jan 29, 2022 at 12:30 PM guo jiwei <te...@apache.org>
> wrote:
> >>>
> >>> Hi
> >>> The PIP link : https://github.com/apache/pulsar/issues/13989
> >>>
> >>> Regards
> >>> Jiwei Guo (Tboy)
> >>>
> >>>
> >>> On Sat, Jan 29, 2022 at 11:46 AM mattison chao <mattisonchao@gmail.com
> >
> >>> wrote:
> >>>
> >>>> Hello everyone,
> >>>>
> >>>> I want to start a discussion about PIP-139 : Support Broker send
> command
> >>>> to real close producer/consumer.
> >>>>
> >>>> This is the PIP document
> >>>>
> >>>> https://github.com/apache/pulsar/issues/13989 <
> >>>> https://github.com/apache/pulsar/issues/13979>
> >>>>
> >>>> Please check it out and feel free to share your thoughts.
> >>>>
> >>>> Best,
> >>>> Mattison
> >>>>
> >>>>
> >>>> ———————— Pasted below for quoting convenience.
> >>>>
> >>>>
> >>>>
> >>>> Relation pull request:  #13337
> >>>> Authors: @Technoboy-  @mattisonchao
> >>>>
> >>>> ## Motivation
> >>>>
> >>>> Before we discuss this pip, I'd like to supplement some context to
> help
> >>>> contributors who don't want to read the original pull request.
> >>>>
> >>>>> When there are no user-created topics under a namespace, Namespace
> >>>> should be deleted. But currently, the system topic existed and the
> >>>> reader/producer could auto-create the system which may cause the
> namespace
> >>>> deletion to fail.
> >>>>
> >>>> For this reason, we need to close the system topic reader/producer
> first,
> >>>> then remove the system topic. finally, remove the namespace.
> >>>>
> >>>> Following this way, we first want to use ``terminate`` to solve this
> >>>> problem. then we found producers can disconnect, but consumers are
> still
> >>>> alive. So, another PR #13960 wants to add consumers' closing logic.
> >>>>
> >>>> After #13960, all things look good, but another problem appears. that
> is
> >>>> we need to wait until consumers completely consume all messages (this
> may
> >>>> make terminate topic so long and the operation timeout)then get
> >>>> ``reachedEndOfTopic``. the relative code here :
> >>>>
> >>>>
> >>>>
> https://github.com/apache/pulsar/blob/07ef9231db8b844586b9217ee2d59237eb9c54b7/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/CompactorSubscription.java#L102-L106
> >>>>
> >>>> In the #13337 case, we need to force close consumers immediately. So
> we
> >>>> write this PIP to discuss another way to resolve this problem.
> >>>>
> >>>> ## Goal
> >>>>
> >>>> We can add a new field(`allow_reconnect`) in command
> >>>> ``CommandCloseProducer``/ ``CommandCloseConsumer`` to close
> >>>> producer/consumers immediately.
> >>>>
> >>>> ## API Changes
> >>>>
> >>>> - Add ``allow_reconnect`` to ``CommandCloseProducer``;
> >>>>
> >>>> ```java
> >>>> **Before**
> >>>>
> >>>> message CommandCloseProducer {
> >>>>    required uint64 producer_id = 1;
> >>>>    required uint64 request_id = 2;
> >>>> }
> >>>>
> >>>> **After**
> >>>> message CommandCloseProducer {
> >>>>    required uint64 producer_id = 1;
> >>>>    required uint64 request_id = 2;
> >>>>    optional bool allow_reconnect = 3 [default = true];
> >>>> }
> >>>> ```
> >>>>
> >>>> - Add ``allow_reconnect`` to ``CommandCloseConsumer``
> >>>> ```java
> >>>> **Before**
> >>>>
> >>>> message CommandCloseConsumer {
> >>>>    required uint64 consumer_id = 1;
> >>>>    required uint64 request_id = 2;
> >>>> }
> >>>>
> >>>> **After**
> >>>> message CommandCloseConsumer {
> >>>>    required uint64 consumer_id = 1;
> >>>>    required uint64 request_id = 2;
> >>>>    optional bool allow_reconnect = 3 [default = true];
> >>>> }
> >>>> ```
> >>>>
> >>>> ## Implementation
> >>>>
> >>>> ### ClientCnx - Producer:
> >>>>
> >>>> **Before**
> >>>> ```java
> >>>>    @Override
> >>>>    protected void handleCloseProducer(CommandCloseProducer
> closeProducer)
> >>>> {
> >>>>        log.info("[{}] Broker notification of Closed producer: {}",
> >>>> remoteAddress, closeProducer.getProducerId());
> >>>>        final long producerId = closeProducer.getProducerId();
> >>>>        ProducerImpl<?> producer = producers.get(producerId);
> >>>>        if (producer != null) {
> >>>>            producer.connectionClosed(this);
> >>>>        } else {
> >>>>            log.warn("Producer with id {} not found while closing
> producer
> >>>> ", producerId);
> >>>>        }
> >>>>    }
> >>>> ```
> >>>> After:
> >>>> ```java
> >>>>   @Override
> >>>>    protected void handleCloseProducer(CommandCloseProducer
> closeProducer)
> >>>> {
> >>>>        log.info("[{}] Broker notification of Closed producer: {}",
> >>>> remoteAddress, closeProducer.getProducerId());
> >>>>        final long producerId = closeProducer.getProducerId();
> >>>>        ProducerImpl<?> producer = producers.get(producerId);
> >>>>        if (producer != null) {
> >>>>            if (closeProducer.isAllowReconnect) {
> >>>>                producer.connectionClosed(this);
> >>>>            } else {
> >>>>                producer.closeAsync();
> >>>>            }
> >>>>        } else {
> >>>>            log.warn("Producer with id {} not found while closing
> producer
> >>>> ", producerId);
> >>>>        }
> >>>>    }
> >>>> ```
> >>>> ### ClientCnx - Consumer:
> >>>>
> >>>> **Before**
> >>>> ```java
> >>>>    @Override
> >>>>    protected void handleCloseConsumer(CommandCloseConsumer
> closeConsumer)
> >>>> {
> >>>>        log.info("[{}] Broker notification of Closed consumer: {}",
> >>>> remoteAddress, closeConsumer.getConsumerId());
> >>>>        final long consumerId = closeConsumer.getConsumerId();
> >>>>        ConsumerImpl<?> consumer = consumers.get(consumerId);
> >>>>        if (consumer != null) {
> >>>>            consumer.connectionClosed(this);
> >>>>        } else {
> >>>>            log.warn("Consumer with id {} not found while closing
> consumer
> >>>> ", consumerId);
> >>>>        }
> >>>>    }
> >>>> ```
> >>>> **After**
> >>>> ```java
> >>>>    @Override
> >>>>    protected void handleCloseConsumer(CommandCloseConsumer
> closeConsumer)
> >>>> {
> >>>>        log.info("[{}] Broker notification of Closed consumer: {}",
> >>>> remoteAddress, closeConsumer.getConsumerId());
> >>>>        final long consumerId = closeConsumer.getConsumerId();
> >>>>        ConsumerImpl<?> consumer = consumers.get(consumerId);
> >>>>        if (consumer != null) {
> >>>>            if (closeConsumer.isAllowReconnect) {
> >>>>                consumer.connectionClosed(this);
> >>>>            } else {
> >>>>                consumer.closeAsync();
> >>>>            }
> >>>>        } else {
> >>>>            log.warn("Consumer with id {} not found while closing
> consumer
> >>>> ", consumerId);
> >>>>        }
> >>>>    }
> >>>> ```
> >>>>
> >>>>
> >>>> ## Reject Alternatives
> >>>>
> >>>> none.
> >>
> >>
> >>
> >> --
> >> Zike Yang
>
>

Re: [DISCUSS] PIP-139 : Support Broker send command to real close producer/consumer.

Posted by mattison chao <ma...@gmail.com>.
> This is supposed to mean that the namespace should be able to be
> deleted, correct?

Yes, the main background is the user doesn’t have an active topic. so, they want to delete the namespace.

> However, I think
> we might still have a race condition that could make tenant or
> namespace deletion fail. Specifically, if a new producer or consumer
> creates a topic after the namespace deletion has started but
> before it is complete. Do you agree that the underlying race still exists?

Yes, this condition exists. I think it’s not a big problem because the user doesn’t want to use this namespace anymore. 
If this scenario appears, they will get an error and need to delete it again.

> What if we expand our usage of the "terminated" feature to apply to
> namespaces (and tenants)? Then, a terminated namespace can have
> bundles and topics can be deleted but not created (just as a terminated
> topic cannot have any new messages published to it). This would take
> care of all topic creation race conditions. We'd probably need to add
> new protobuf exceptions for this feature.


If we want to solve this problem, we need to add some sync resources like lock/state, I think it’s a harm for us, we don’t need to do that.

Thanks for your suggestions, let me know what you think.

Best,
Mattison

> On Feb 1, 2022, at 2:26 PM, Michael Marshall <mm...@apache.org> wrote:
> 
> This proposal identifies an important issue that we should definitely
> solve. I have some questions.
> 
>> When there are no user-created topics under a namespace,
>> Namespace should be deleted.
> 
> This is supposed to mean that the namespace should be able to be
> deleted, correct?
> 
>> For this reason, we need to close the system topic reader/producer
>> first, then remove the system topic. finally, remove the namespace.
> 
> I agree that expanding the protobuf CloseProducer and CloseConsumer
> commands could be valuable here. The expansion would ensure that
> producers and consumers don't attempt to reconnect. However, I think
> we might still have a race condition that could make tenant or
> namespace deletion fail. Specifically, if a new producer or consumer
> creates a topic after the namespace deletion has started but
> before it is complete. Do you agree that the underlying race still exists?
> 
> In my view, the fundamental problem here is that deleting certain Pulsar
> resources takes time and, in a distributed system, that means race
> conditions.
> 
> What if we expand our usage of the "terminated" feature to apply to
> namespaces (and tenants)? Then, a terminated namespace can have
> bundles and topics can be deleted but not created (just as a terminated
> topic cannot have any new messages published to it). This would take
> care of all topic creation race conditions. We'd probably need to add
> new protobuf exceptions for this feature.
> 
> Thanks,
> Michael
> 
> 
> On Sat, Jan 29, 2022 at 7:25 PM Zike Yang
> <zk...@streamnative.io.invalid> wrote:
>> 
>> +1
>> 
>> 
>> Thanks,
>> Zike
>> 
>> On Sat, Jan 29, 2022 at 12:30 PM guo jiwei <te...@apache.org> wrote:
>>> 
>>> Hi
>>> The PIP link : https://github.com/apache/pulsar/issues/13989
>>> 
>>> Regards
>>> Jiwei Guo (Tboy)
>>> 
>>> 
>>> On Sat, Jan 29, 2022 at 11:46 AM mattison chao <ma...@gmail.com>
>>> wrote:
>>> 
>>>> Hello everyone,
>>>> 
>>>> I want to start a discussion about PIP-139 : Support Broker send command
>>>> to real close producer/consumer.
>>>> 
>>>> This is the PIP document
>>>> 
>>>> https://github.com/apache/pulsar/issues/13989 <
>>>> https://github.com/apache/pulsar/issues/13979>
>>>> 
>>>> Please check it out and feel free to share your thoughts.
>>>> 
>>>> Best,
>>>> Mattison
>>>> 
>>>> 
>>>> ———————— Pasted below for quoting convenience.
>>>> 
>>>> 
>>>> 
>>>> Relation pull request:  #13337
>>>> Authors: @Technoboy-  @mattisonchao
>>>> 
>>>> ## Motivation
>>>> 
>>>> Before we discuss this pip, I'd like to supplement some context to help
>>>> contributors who don't want to read the original pull request.
>>>> 
>>>>> When there are no user-created topics under a namespace, Namespace
>>>> should be deleted. But currently, the system topic existed and the
>>>> reader/producer could auto-create the system which may cause the namespace
>>>> deletion to fail.
>>>> 
>>>> For this reason, we need to close the system topic reader/producer first,
>>>> then remove the system topic. finally, remove the namespace.
>>>> 
>>>> Following this way, we first want to use ``terminate`` to solve this
>>>> problem. then we found producers can disconnect, but consumers are still
>>>> alive. So, another PR #13960 wants to add consumers' closing logic.
>>>> 
>>>> After #13960, all things look good, but another problem appears. that is
>>>> we need to wait until consumers completely consume all messages (this may
>>>> make terminate topic so long and the operation timeout)then get
>>>> ``reachedEndOfTopic``. the relative code here :
>>>> 
>>>> 
>>>> https://github.com/apache/pulsar/blob/07ef9231db8b844586b9217ee2d59237eb9c54b7/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/CompactorSubscription.java#L102-L106
>>>> 
>>>> In the #13337 case, we need to force close consumers immediately. So we
>>>> write this PIP to discuss another way to resolve this problem.
>>>> 
>>>> ## Goal
>>>> 
>>>> We can add a new field(`allow_reconnect`) in command
>>>> ``CommandCloseProducer``/ ``CommandCloseConsumer`` to close
>>>> producer/consumers immediately.
>>>> 
>>>> ## API Changes
>>>> 
>>>> - Add ``allow_reconnect`` to ``CommandCloseProducer``;
>>>> 
>>>> ```java
>>>> **Before**
>>>> 
>>>> message CommandCloseProducer {
>>>>    required uint64 producer_id = 1;
>>>>    required uint64 request_id = 2;
>>>> }
>>>> 
>>>> **After**
>>>> message CommandCloseProducer {
>>>>    required uint64 producer_id = 1;
>>>>    required uint64 request_id = 2;
>>>>    optional bool allow_reconnect = 3 [default = true];
>>>> }
>>>> ```
>>>> 
>>>> - Add ``allow_reconnect`` to ``CommandCloseConsumer``
>>>> ```java
>>>> **Before**
>>>> 
>>>> message CommandCloseConsumer {
>>>>    required uint64 consumer_id = 1;
>>>>    required uint64 request_id = 2;
>>>> }
>>>> 
>>>> **After**
>>>> message CommandCloseConsumer {
>>>>    required uint64 consumer_id = 1;
>>>>    required uint64 request_id = 2;
>>>>    optional bool allow_reconnect = 3 [default = true];
>>>> }
>>>> ```
>>>> 
>>>> ## Implementation
>>>> 
>>>> ### ClientCnx - Producer:
>>>> 
>>>> **Before**
>>>> ```java
>>>>    @Override
>>>>    protected void handleCloseProducer(CommandCloseProducer closeProducer)
>>>> {
>>>>        log.info("[{}] Broker notification of Closed producer: {}",
>>>> remoteAddress, closeProducer.getProducerId());
>>>>        final long producerId = closeProducer.getProducerId();
>>>>        ProducerImpl<?> producer = producers.get(producerId);
>>>>        if (producer != null) {
>>>>            producer.connectionClosed(this);
>>>>        } else {
>>>>            log.warn("Producer with id {} not found while closing producer
>>>> ", producerId);
>>>>        }
>>>>    }
>>>> ```
>>>> After:
>>>> ```java
>>>>   @Override
>>>>    protected void handleCloseProducer(CommandCloseProducer closeProducer)
>>>> {
>>>>        log.info("[{}] Broker notification of Closed producer: {}",
>>>> remoteAddress, closeProducer.getProducerId());
>>>>        final long producerId = closeProducer.getProducerId();
>>>>        ProducerImpl<?> producer = producers.get(producerId);
>>>>        if (producer != null) {
>>>>            if (closeProducer.isAllowReconnect) {
>>>>                producer.connectionClosed(this);
>>>>            } else {
>>>>                producer.closeAsync();
>>>>            }
>>>>        } else {
>>>>            log.warn("Producer with id {} not found while closing producer
>>>> ", producerId);
>>>>        }
>>>>    }
>>>> ```
>>>> ### ClientCnx - Consumer:
>>>> 
>>>> **Before**
>>>> ```java
>>>>    @Override
>>>>    protected void handleCloseConsumer(CommandCloseConsumer closeConsumer)
>>>> {
>>>>        log.info("[{}] Broker notification of Closed consumer: {}",
>>>> remoteAddress, closeConsumer.getConsumerId());
>>>>        final long consumerId = closeConsumer.getConsumerId();
>>>>        ConsumerImpl<?> consumer = consumers.get(consumerId);
>>>>        if (consumer != null) {
>>>>            consumer.connectionClosed(this);
>>>>        } else {
>>>>            log.warn("Consumer with id {} not found while closing consumer
>>>> ", consumerId);
>>>>        }
>>>>    }
>>>> ```
>>>> **After**
>>>> ```java
>>>>    @Override
>>>>    protected void handleCloseConsumer(CommandCloseConsumer closeConsumer)
>>>> {
>>>>        log.info("[{}] Broker notification of Closed consumer: {}",
>>>> remoteAddress, closeConsumer.getConsumerId());
>>>>        final long consumerId = closeConsumer.getConsumerId();
>>>>        ConsumerImpl<?> consumer = consumers.get(consumerId);
>>>>        if (consumer != null) {
>>>>            if (closeConsumer.isAllowReconnect) {
>>>>                consumer.connectionClosed(this);
>>>>            } else {
>>>>                consumer.closeAsync();
>>>>            }
>>>>        } else {
>>>>            log.warn("Consumer with id {} not found while closing consumer
>>>> ", consumerId);
>>>>        }
>>>>    }
>>>> ```
>>>> 
>>>> 
>>>> ## Reject Alternatives
>>>> 
>>>> none.
>> 
>> 
>> 
>> --
>> Zike Yang