You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Yubiao Feng <yu...@streamnative.io.INVALID> on 2023/01/11 09:00:02 UTC

[DISCUSS] PIP-240 A new API to unload subscriptions

Hi community

I am starting a DISCUSS for PIP-240: A new API to unload subscriptions.

PIP issue: https://github.com/apache/pulsar/issues/19187

### Motivation

We sometimes try to unload the topic to resolve some consumption-stop
issues. But the unloading topic will also impact the producer side.

### Goal

Providing a new API to unload the subscription dimension triggers
reconnection of all consumers on that subscription and reconnection is
guaranteed by the client. The API will be used in these ways:
- unload special subscription of one topic(or partitioned topic)
- unload all subscriptions of one topic(or partitioned topic)
- unload subscriptions of one topic(or partitioned topic) by regular
expression
  - If a reader's subscription name is not set, a random subscription name
prefixed with 'multiTopicsReader-' or 'reader-' will be used, and users can
uninstall these subscriptions using regular expressions.

In addition to triggering consumer disconnection, Unloading Subscribers
will restart the Dispatcher, which resets the redeliver message queue and
delayed message queue in the Broker's memory, which can help resolve issues
caused by an abnormal dispatcher state. However, the execution flow of
Unloading Subscribers does not include a restart of the Managed Cursor
related to this dispatcher; if there is a problem with the cursor, we can
only rely on the unload topic to solve it.

Note: From the client's perspective, this connection may be shared by
consumers, producers, and transactions, so Unloading Subscribers maybe
impact the producer and transaction.

#### These scenarios are not supported
- Functions `message-dedup`, `geo-replication,` and `shadow-topic` also
read messages from the topic, but Unloading subscribers will not support
triggering restarts of these three functions( because the cursor is used
directly to read the data in these scenarios, not the consumer or reader ).
- The Compression task(subscription name is `__compaction`) also use a
reader to read data, but Unloading Subscribers does not support it because
this task creates a new reader each time it starts.
- Do not support all topics related to Transaction features.
  - `__transaction_buffer_snapshot` works with the task TB recover,  and
this task will create a new reader each time they start.
  - `__transaction_pending_ack` works with the task Transaction Pending Ack
Store replay,  and this task will use managed cursor directly to read data.
  - `__transaction_log_xxx` works with the task Transaction Log, which will
use managed cursor directly to read data.
  - `transaction_coordinator_assign` No data will be written on this topic.

#### Special system topic supports
The system topic `__change_events` is used to support topic-level policies,
there may also be some message delivery issues in this scenario, so
Unloading Subscribers will support this topic.

### API Changes

#### For persistent topic
```
pulsar-admin persistent unload {topic_name} -s {sub_name}
```

#### For non-persistent topic
```
pulsar-admin non-persistent unload {topic_name} -s {sub_name}
```

#### Explain the param `-s`
- set param `-s` to special sub name to unload special subscription
- set param `-s` to `**` to unload all subscriptions under this topic
- set param `-s` to `regexp` to unload a batch subscriptions under this
topic


Thanks
Yubiao Feng

Re: [DISCUSS] PIP-240 A new API to unload subscriptions

Posted by "rxl@apache.org" <ra...@gmail.com>.
Thanks YuBiao Feng:

> We sometimes try to unload the topic to resolve some consumption-stop
> issues. But the unloading topic will also impact the producer side.

This is indeed an interesting thing. In the current production operation
and maintenance, unload is a very frequently used operation. As you
describe, currently Pulsar may still have unknown blocks in the main link
processing logic of production and consumption, so sad, but unload can
indeed solve many problems quickly. Unload sub is a good idea, we can
reduce the large-scale impact brought by unload topic, and we can achieve
smaller-grained impact.

+1

--
Thanks
Xiaolong Ran

Yubiao Feng <yu...@streamnative.io.invalid> 于2023年1月11日周三 17:01写道:

> Hi community
>
> I am starting a DISCUSS for PIP-240: A new API to unload subscriptions.
>
> PIP issue: https://github.com/apache/pulsar/issues/19187
>
> ### Motivation
>
> We sometimes try to unload the topic to resolve some consumption-stop
> issues. But the unloading topic will also impact the producer side.
>
> ### Goal
>
> Providing a new API to unload the subscription dimension triggers
> reconnection of all consumers on that subscription and reconnection is
> guaranteed by the client. The API will be used in these ways:
> - unload special subscription of one topic(or partitioned topic)
> - unload all subscriptions of one topic(or partitioned topic)
> - unload subscriptions of one topic(or partitioned topic) by regular
> expression
>   - If a reader's subscription name is not set, a random subscription name
> prefixed with 'multiTopicsReader-' or 'reader-' will be used, and users can
> uninstall these subscriptions using regular expressions.
>
> In addition to triggering consumer disconnection, Unloading Subscribers
> will restart the Dispatcher, which resets the redeliver message queue and
> delayed message queue in the Broker's memory, which can help resolve issues
> caused by an abnormal dispatcher state. However, the execution flow of
> Unloading Subscribers does not include a restart of the Managed Cursor
> related to this dispatcher; if there is a problem with the cursor, we can
> only rely on the unload topic to solve it.
>
> Note: From the client's perspective, this connection may be shared by
> consumers, producers, and transactions, so Unloading Subscribers maybe
> impact the producer and transaction.
>
> #### These scenarios are not supported
> - Functions `message-dedup`, `geo-replication,` and `shadow-topic` also
> read messages from the topic, but Unloading subscribers will not support
> triggering restarts of these three functions( because the cursor is used
> directly to read the data in these scenarios, not the consumer or reader ).
> - The Compression task(subscription name is `__compaction`) also use a
> reader to read data, but Unloading Subscribers does not support it because
> this task creates a new reader each time it starts.
> - Do not support all topics related to Transaction features.
>   - `__transaction_buffer_snapshot` works with the task TB recover,  and
> this task will create a new reader each time they start.
>   - `__transaction_pending_ack` works with the task Transaction Pending Ack
> Store replay,  and this task will use managed cursor directly to read data.
>   - `__transaction_log_xxx` works with the task Transaction Log, which will
> use managed cursor directly to read data.
>   - `transaction_coordinator_assign` No data will be written on this topic.
>
> #### Special system topic supports
> The system topic `__change_events` is used to support topic-level policies,
> there may also be some message delivery issues in this scenario, so
> Unloading Subscribers will support this topic.
>
> ### API Changes
>
> #### For persistent topic
> ```
> pulsar-admin persistent unload {topic_name} -s {sub_name}
> ```
>
> #### For non-persistent topic
> ```
> pulsar-admin non-persistent unload {topic_name} -s {sub_name}
> ```
>
> #### Explain the param `-s`
> - set param `-s` to special sub name to unload special subscription
> - set param `-s` to `**` to unload all subscriptions under this topic
> - set param `-s` to `regexp` to unload a batch subscriptions under this
> topic
>
>
> Thanks
> Yubiao Feng
>

Re: [DISCUSS] PIP-240 A new API to unload subscriptions

Posted by 丛搏 <bo...@apache.org>.
> I would invest more time in:
> - monitoring tools (tools to detect quickly stuck consumers)
> - circuit breakers (fast fail/shut the door to consumers/producers
> that don't behave correctly)
> - guard rails (limits to clients to prevent them to exhaust the
resources on the brokers)

I agree with this point of view, we should not increase the complexity
of the code, add non-essential APIs to extend unload, unload is not
essentially to solve consumer stuck or producer block problem.

Thanks,
Bo

Enrico Olivelli <eo...@gmail.com> 于2023年1月12日周四 16:12写道:
>
> Yubiao,
> thanks for sharing your problem and a proposal, this is very helpful
> for the community to get in touch with the pain of Pulsar
> users/administrators.
>
> In my experience if a "subscription is stuck", the problems are:
> * the client has some problems (bug in the client/misconfiguration
> somewhere) - 99.9%
> * there is a bug in Pulsar - 0.1%
>
> Unloading a topic is an operation that triggers some reset of the
> state on both the broker and the clients and this usually TEMPORARY
> unblocks the subscription.
>
> I have never seen a problem that is temporarily solved by topic
> unload/broker restart to be permanently solved with that operation.
> If there is a problem we should spend time on investigating the
> problem and not in adding this kind of tool.
>
> I believe that we should not continue to add these kinds of hacks into Pulsar:
> - easy reset...
> - ignore errors... (catch Throwable...)
>
> The overall result is a system that "seems to work" but it actually
> doesn't work properly
>
> I would invest more time in:
> - monitoring tools (tools to detect quickly stuck consumers)
> - circuit breakers (fast fail/shut the door to consumers/producers
> that don't behave correctly)
> - guard rails (limits to clients to prevent them to exhaust the
> resources on the brokers)
>
>
>
> Enrico
>
> Il giorno gio 12 gen 2023 alle ore 08:22 <ma...@gmail.com> ha scritto:
> >
> > Hi, Yubiao
> >
> > I agree with this idea because some users care about the production rate. They don't want to unload the whole topic to fix the subscription problem.
> >
> > I've got some questions:
> >
> > 1. How do you handle the race condition when you are trying to unload the subscription, and the new consumer wants to subscribe to this subscription at the same time? I'm unsure if it has the race condition. I just want to remind you about that. :)
> > 2. Would you like to add some restful API design to clarify the implementation?
> >     a. Request method
> >     b. Request path
> >     c. Response code
> >     d. etc.
> >
> >
> > Thanks for your work.
> > Mattison
> > On Jan 11, 2023, 17:01 +0800, Yubiao Feng <yu...@streamnative.io.invalid>, wrote:
> > > Hi community
> > >
> > > I am starting a DISCUSS for PIP-240: A new API to unload subscriptions.
> > >
> > > PIP issue: https://github.com/apache/pulsar/issues/19187
> > >
> > > ### Motivation
> > >
> > > We sometimes try to unload the topic to resolve some consumption-stop
> > > issues. But the unloading topic will also impact the producer side.
> > >
> > > ### Goal
> > >
> > > Providing a new API to unload the subscription dimension triggers
> > > reconnection of all consumers on that subscription and reconnection is
> > > guaranteed by the client. The API will be used in these ways:
> > > - unload special subscription of one topic(or partitioned topic)
> > > - unload all subscriptions of one topic(or partitioned topic)
> > > - unload subscriptions of one topic(or partitioned topic) by regular
> > > expression
> > > - If a reader's subscription name is not set, a random subscription name
> > > prefixed with 'multiTopicsReader-' or 'reader-' will be used, and users can
> > > uninstall these subscriptions using regular expressions.
> > >
> > > In addition to triggering consumer disconnection, Unloading Subscribers
> > > will restart the Dispatcher, which resets the redeliver message queue and
> > > delayed message queue in the Broker's memory, which can help resolve issues
> > > caused by an abnormal dispatcher state. However, the execution flow of
> > > Unloading Subscribers does not include a restart of the Managed Cursor
> > > related to this dispatcher; if there is a problem with the cursor, we can
> > > only rely on the unload topic to solve it.
> > >
> > > Note: From the client's perspective, this connection may be shared by
> > > consumers, producers, and transactions, so Unloading Subscribers maybe
> > > impact the producer and transaction.
> > >
> > > #### These scenarios are not supported
> > > - Functions `message-dedup`, `geo-replication,` and `shadow-topic` also
> > > read messages from the topic, but Unloading subscribers will not support
> > > triggering restarts of these three functions( because the cursor is used
> > > directly to read the data in these scenarios, not the consumer or reader ).
> > > - The Compression task(subscription name is `__compaction`) also use a
> > > reader to read data, but Unloading Subscribers does not support it because
> > > this task creates a new reader each time it starts.
> > > - Do not support all topics related to Transaction features.
> > > - `__transaction_buffer_snapshot` works with the task TB recover, and
> > > this task will create a new reader each time they start.
> > > - `__transaction_pending_ack` works with the task Transaction Pending Ack
> > > Store replay, and this task will use managed cursor directly to read data.
> > > - `__transaction_log_xxx` works with the task Transaction Log, which will
> > > use managed cursor directly to read data.
> > > - `transaction_coordinator_assign` No data will be written on this topic.
> > >
> > > #### Special system topic supports
> > > The system topic `__change_events` is used to support topic-level policies,
> > > there may also be some message delivery issues in this scenario, so
> > > Unloading Subscribers will support this topic.
> > >
> > > ### API Changes
> > >
> > > #### For persistent topic
> > > ```
> > > pulsar-admin persistent unload {topic_name} -s {sub_name}
> > > ```
> > >
> > > #### For non-persistent topic
> > > ```
> > > pulsar-admin non-persistent unload {topic_name} -s {sub_name}
> > > ```
> > >
> > > #### Explain the param `-s`
> > > - set param `-s` to special sub name to unload special subscription
> > > - set param `-s` to `**` to unload all subscriptions under this topic
> > > - set param `-s` to `regexp` to unload a batch subscriptions under this
> > > topic
> > >
> > >
> > > Thanks
> > > Yubiao Feng

Re: [DISCUSS] PIP-240 A new API to unload subscriptions

Posted by Enrico Olivelli <eo...@gmail.com>.
Yubiao,
thanks for sharing your problem and a proposal, this is very helpful
for the community to get in touch with the pain of Pulsar
users/administrators.

In my experience if a "subscription is stuck", the problems are:
* the client has some problems (bug in the client/misconfiguration
somewhere) - 99.9%
* there is a bug in Pulsar - 0.1%

Unloading a topic is an operation that triggers some reset of the
state on both the broker and the clients and this usually TEMPORARY
unblocks the subscription.

I have never seen a problem that is temporarily solved by topic
unload/broker restart to be permanently solved with that operation.
If there is a problem we should spend time on investigating the
problem and not in adding this kind of tool.

I believe that we should not continue to add these kinds of hacks into Pulsar:
- easy reset...
- ignore errors... (catch Throwable...)

The overall result is a system that "seems to work" but it actually
doesn't work properly

I would invest more time in:
- monitoring tools (tools to detect quickly stuck consumers)
- circuit breakers (fast fail/shut the door to consumers/producers
that don't behave correctly)
- guard rails (limits to clients to prevent them to exhaust the
resources on the brokers)



Enrico

Il giorno gio 12 gen 2023 alle ore 08:22 <ma...@gmail.com> ha scritto:
>
> Hi, Yubiao
>
> I agree with this idea because some users care about the production rate. They don't want to unload the whole topic to fix the subscription problem.
>
> I've got some questions:
>
> 1. How do you handle the race condition when you are trying to unload the subscription, and the new consumer wants to subscribe to this subscription at the same time? I'm unsure if it has the race condition. I just want to remind you about that. :)
> 2. Would you like to add some restful API design to clarify the implementation?
>     a. Request method
>     b. Request path
>     c. Response code
>     d. etc.
>
>
> Thanks for your work.
> Mattison
> On Jan 11, 2023, 17:01 +0800, Yubiao Feng <yu...@streamnative.io.invalid>, wrote:
> > Hi community
> >
> > I am starting a DISCUSS for PIP-240: A new API to unload subscriptions.
> >
> > PIP issue: https://github.com/apache/pulsar/issues/19187
> >
> > ### Motivation
> >
> > We sometimes try to unload the topic to resolve some consumption-stop
> > issues. But the unloading topic will also impact the producer side.
> >
> > ### Goal
> >
> > Providing a new API to unload the subscription dimension triggers
> > reconnection of all consumers on that subscription and reconnection is
> > guaranteed by the client. The API will be used in these ways:
> > - unload special subscription of one topic(or partitioned topic)
> > - unload all subscriptions of one topic(or partitioned topic)
> > - unload subscriptions of one topic(or partitioned topic) by regular
> > expression
> > - If a reader's subscription name is not set, a random subscription name
> > prefixed with 'multiTopicsReader-' or 'reader-' will be used, and users can
> > uninstall these subscriptions using regular expressions.
> >
> > In addition to triggering consumer disconnection, Unloading Subscribers
> > will restart the Dispatcher, which resets the redeliver message queue and
> > delayed message queue in the Broker's memory, which can help resolve issues
> > caused by an abnormal dispatcher state. However, the execution flow of
> > Unloading Subscribers does not include a restart of the Managed Cursor
> > related to this dispatcher; if there is a problem with the cursor, we can
> > only rely on the unload topic to solve it.
> >
> > Note: From the client's perspective, this connection may be shared by
> > consumers, producers, and transactions, so Unloading Subscribers maybe
> > impact the producer and transaction.
> >
> > #### These scenarios are not supported
> > - Functions `message-dedup`, `geo-replication,` and `shadow-topic` also
> > read messages from the topic, but Unloading subscribers will not support
> > triggering restarts of these three functions( because the cursor is used
> > directly to read the data in these scenarios, not the consumer or reader ).
> > - The Compression task(subscription name is `__compaction`) also use a
> > reader to read data, but Unloading Subscribers does not support it because
> > this task creates a new reader each time it starts.
> > - Do not support all topics related to Transaction features.
> > - `__transaction_buffer_snapshot` works with the task TB recover, and
> > this task will create a new reader each time they start.
> > - `__transaction_pending_ack` works with the task Transaction Pending Ack
> > Store replay, and this task will use managed cursor directly to read data.
> > - `__transaction_log_xxx` works with the task Transaction Log, which will
> > use managed cursor directly to read data.
> > - `transaction_coordinator_assign` No data will be written on this topic.
> >
> > #### Special system topic supports
> > The system topic `__change_events` is used to support topic-level policies,
> > there may also be some message delivery issues in this scenario, so
> > Unloading Subscribers will support this topic.
> >
> > ### API Changes
> >
> > #### For persistent topic
> > ```
> > pulsar-admin persistent unload {topic_name} -s {sub_name}
> > ```
> >
> > #### For non-persistent topic
> > ```
> > pulsar-admin non-persistent unload {topic_name} -s {sub_name}
> > ```
> >
> > #### Explain the param `-s`
> > - set param `-s` to special sub name to unload special subscription
> > - set param `-s` to `**` to unload all subscriptions under this topic
> > - set param `-s` to `regexp` to unload a batch subscriptions under this
> > topic
> >
> >
> > Thanks
> > Yubiao Feng

Re: [DISCUSS] PIP-240 A new API to unload subscriptions

Posted by Yubiao Feng <yu...@streamnative.io.INVALID>.
I started the voting process for this PIP

Thanks
Yubiao

On Thu, Jan 19, 2023 at 5:55 PM Haiting Jiang <ji...@gmail.com>
wrote:

> I agree with Penghui & Xiaolong,
>
> 1. Restarting a service is usually the most common and effective
> option for service maintainers to recover a service and minimize the
> business loss.
> With this subscription unloading, we can reduce the impact
> significantly, as unloading topics will affect message writing, which
> has much more influence for online business.
>
> 2. Having this subscription doesn't conflict with solving the real
> issue. Like broker restarting, it just can buy us more time to locate
> the real problem.
>
> BR,
> Haiting
>
> On Thu, Jan 19, 2023 at 11:42 AM rxl@apache.org
> <ra...@gmail.com> wrote:
> >
> > Hello Joe and Enrico:
> >
> > I agree with what you've been emphasizing that we need to fix these
> issues
> > at the root cause. During the maintenance of the Go SDK, we have
> > encountered many stuck problems since version 0.4.0, some of which
> belonged
> > to the logic errors handled by the Go SDK itself, and some of which were
> > caused by the user's wrong use of the Go SDK, until the previous 0.8 .0
> > version, the Go SDK is used on a large scale in our environment. In the
> > iterations of these versions, we have been trying to completely fix these
> > BUGs. This is what our maintainers have been working hard on and it is
> also
> > a final form we expect Pulsar - everything looks OK.
> >
> > However, during the iteration of the Go SDK version from 0.4.0 to 0.8.0,
> > users of our production environment encountered similar problems many
> > times. Again, for a user in a production environment, for example, the
> > current user encounters a situation where consumption is blocked. The
> user
> > finds you and expects us to use some means to quickly allow consumers to
> > continue to consume news? Or do we keep users in the production
> environment
> > in a stuck state until we find the root cause of the problem and fix it
> for
> > users, pushing users to upgrade. I think everyone's answer tends to be
> the
> > latter. We will not directly expose the hack operations of unload topic
> and
> > unload sub to users, but to Pulsar's operation and maintenance personnel,
> > so it is more like an operation and maintenance tool , rather than the
> > interface called by the user. So I think this impact is controllable for
> > Pulsar as a whole, which is why I support it.
> >
> > Again, this PIP is more about buying more time for us to locate the
> problem
> > while minimizing the impact on production users. It’s not that with this
> > interface we don’t locate the real causes of the stuck. On the contrary,
> we
> > are making more trade-offs between users and positioning issues, buying
> us
> > more time for positioning issues.
> >
> > --
> > Thanks
> > xiaolong ran
> >
> > PengHui Li <pe...@apache.org> 于2023年1月18日周三 11:48写道:
> >
> > > > What kind of problems is this trying to fix?
> > > And why cannot that be solved by client-side fixes?
> > >
> > > Yes, most of the issue is from the client side, rarely from the broker.
> > > But the application also needs time to fix the issue to release and
> deploy
> > > the fix
> > > to the production environment. Unloading the subscription is just a
> > > temporary
> > > way to mitigate the issue and reduce the impact. It will not fix the
> issue
> > > completely.
> > >
> > > What I learned is to capture the heap dump, topics stats, internal
> stats,
> > > and logs from the broker and client and then try to unload the topic to
> > > see if the problem is mitigated. If not, then try to restart the
> broker or
> > > client,
> > > most of the time, the problem can be mitigated in this way.
> > > Then we can continue to reproduce the issue and investigate the issue
> > > from the captured heap dump and logs.
> > >
> > > > In shared sub issues, it's hard to  pinpoint which consumer/where
> > > the problem lies, and to reset that one at the client. The totality of
> > > state spread between the brokers and all the consumers of the shared
> sub
> > > needs to be put together .  Is that why we are doing this?
> > >
> > > From my experience, most are from Shared and key shared subscriptions.
> > > Most of the issues come from misuse, rarely from the BUGs of brokers or
> > > clients.
> > >
> > > Regards,
> > > Penghui
> > >
> > >
> > > On Wed, Jan 18, 2023 at 11:31 AM Joe F <jo...@gmail.com> wrote:
> > >
> > > > Inclined to agree with Enrico.  If it's a hard problem, it will
> repeat,
> > > and
> > > > this is not helping.  If it's some race on the client, it will occur
> > > > randomly and rarely, and this unload sub will get programmed in as a
> way
> > > of
> > > > life.
> > > >
> > > > >If you don't think unloading the subscription can't help anything.
> > > > Unloading
> > > > the topic should be the same. From my experience, most of the
> unloading
> > > > topic operations are to mitigate the problems related to message
> > > > consumption.
> > > >
> > > > Comparisons with unloading a topic are not the bar here, as that is a
> > > first
> > > > class broker utility that is needed for operational reasons outside
> of
> > > > "fixing"  consumer side issues . The side effect of using "unload
> topic"
> > > is
> > > > a loss of transient topic state. I will fully agree that this
> side-effect
> > > > has been  pervasively abused for fixing problems (ala Ctlrl-Alt-Del)
> ,
> > > but
> > > > that's not the rationale for having an unload topic utility.
> > > >
> > > > What kind of problems is this trying to fix?
> > > > And why cannot that be solved by client-side fixes?
> > > >
> > > > In shared sub issues, it's hard to  pinpoint which consumer/where
> > > > the problem lies, and to reset that one at the client. The totality
> of
> > > > state spread between the brokers and all the consumers of the shared
> sub
> > > > needs to be put together .  Is that why we are doing this?
> > > >
> > > >
> > > > On Tue, Jan 17, 2023 at 5:30 PM PengHui Li <pe...@apache.org>
> wrote:
> > > >
> > > > > I agree that if we encounter a stuck consumption issue, we should
> > > > continue
> > > > > to find the root cause of the problem.
> > > > >
> > > > > Subscription unloading is just an option to mitigate the impact
> first.
> > > > > Maybe it can mitigate the issue for 1 hour sometimes. Especially in
> > > > > key_shared subscription. Sometimes it's not a BUG from Pulsar.
> > > > > But users need time to fix the issue. But it doesn't make sense to
> let
> > > > > the impaction continues until the fix is applied.
> > > > >
> > > > > I also helped many people to troubleshoot the stuck consumption
> > > > > issue related to key_shared subscriptions and transactions etc.
> > > > > In most cases, unloading the topic can mitigate the impact.
> > > > > For example, due to the un-catched exception, the dispatch thread
> > > > > stopped reading messages from the managed-ledger. The exception
> > > > > is a very infrequent occurrence. Unloading the topic is the best
> choice
> > > > for
> > > > > now, right?
> > > > >
> > > > > If you don't think unloading the subscription can't help anything.
> > > > > Unloading
> > > > > the topic should be the same. From my experience, most of the
> unloading
> > > > > topic operations are to mitigate the problems related to message
> > > > > consumption.
> > > > >
> > > > > Best,
> > > > > Penghui
> > > > >
> > > > > On Tue, Jan 17, 2023 at 11:09 PM Enrico Olivelli <
> eolivelli@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Il giorno lun 16 gen 2023 alle ore 11:58 rxl@apache.org
> > > > > > <ra...@gmail.com> ha scritto:
> > > > > > >
> > > > > > > I agree with @Enrico @Bo, if we encounter a subscribe stuck
> > > > situation,
> > > > > we
> > > > > > > must continue to spend more time to locate and fix this
> problem,
> > > > which
> > > > > is
> > > > > > > what we have been doing.
> > > > > > >
> > > > > > > But let's think about this problem from another angle. At this
> > > time,
> > > > a
> > > > > > user
> > > > > > > in the production environment encounters a consumer stuck
> > > situation,
> > > > > what
> > > > > > > should we do? For a user in a production environment, our first
> > > > > reaction
> > > > > > > when encountering a problem is how to quickly recover and how
> to
> > > > > quickly
> > > > > > > reduce user losses. Even at this point in time, we don't think
> > > about
> > > > > > > whether this is a bug on the Broker side, a bug on the SDK
> side,
> > > or a
> > > > > bug
> > > > > > > used by the user himself? In the process of fast recovery, our
> most
> > > > > > common
> > > > > > > method is to quickly re-establish the connection between the
> broker
> > > > and
> > > > > > the
> > > > > > > client through the topic specified by unload. In this process,
> we
> > > try
> > > > > to
> > > > > > > retain as much context as possible to assist us in the
> subsequent
> > > > > > > continuous positioning and repair of this problem.
> > > > > > >
> > > > > > > So I don't think these two things conflict. Why we expose the
> admin
> > > > CLI
> > > > > > of
> > > > > > > the unload topic is why we expect to expose the unload
> subscribe.
> > > If
> > > > we
> > > > > > > stand from the perspective of a developer, we definitely want
> to
> > > > > > completely
> > > > > > > fix the problem that caused the stuck. If we think about this
> issue
> > > > > from
> > > > > > > the perspective of the user, when a scenario such as consumer
> stuck
> > > > > > occurs
> > > > > > > to the user, the user does not care about the specific cause
> of the
> > > > > > > problem, but expects the business to recover quickly in the
> > > shortest
> > > > > > > possible time to avoid further loss.
> > > > > > >
> > > > > > > I admit that this is a relatively hacky way, but it can indeed
> > > solve
> > > > > the
> > > > > > > problems we are currently encountering, and at the same time,
> it
> > > will
> > > > > not
> > > > > > > cause a major conflict with Pulsar's existing logic. So I still
> > > > insist
> > > > > on
> > > > > > > agreeing with yubiao's point of view.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Usually when a subscription is "stuck" even if you unload the
> topic
> > > > > > it returns to the "stuck" state again if you don't solve the
> problem.
> > > > > >
> > > > > > This is a very common issue with Pulsar users, I am spending much
> > > time
> > > > > > helping users to troubleshoot their production problems and
> unloading
> > > > the
> > > > > > topic
> > > > > > is never a solution, it can give you seconds, minutes or hours of
> > > > > > "working state",
> > > > > > then the problem will happen again.
> > > > > >
> > > > > > You say that it can solve the problems you are encountering.
> > > > > > Could you please give more context ? (in Slack if this is not
> > > > > > something that can be discussed in public)
> > > > > > I apologise if I seem  too much of a skeptic this time, I am sure
> > > that
> > > > > > you have a real problem
> > > > > > and you want to fix it, but I would like to help you find the
> best
> > > way.
> > > > > >
> > > > > > Pulsar is used by many people and we shouldn't add hacky tools
> for
> > > > > > temporary workarounds.
> > > > > > Once we deliver an API we should maintain it for an unlimited
> time.
> > > > > >
> > > > > > You could patch your system and use the patched version
> temporarily
> > > > > > until you find the root case.
> > > > > >
> > > > > > Enrico
> > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Thanks
> > > > > > > Xiaolong Ran
> > > > > > >
> > > > > > >
> > > > > > > Yubiao Feng <yu...@streamnative.io.invalid>
> 于2023年1月15日周日
> > > > > 20:59写道:
> > > > > > >
> > > > > > > > Hi Qiang
> > > > > > > >
> > > > > > > > > 1. How do you handle the race condition when you are
> trying to
> > > > > > unload the
> > > > > > > > subscription, and the new consumer wants to subscribe to this
> > > > > > subscription
> > > > > > > > at the same time? I'm unsure if it has the race condition. I
> just
> > > > > want
> > > > > > to
> > > > > > > > remind you about that.:)
> > > > > > > >
> > > > > > > > These methods `addConsumer`, `removeConsumer` all have
> > > synchronized
> > > > > > locks,
> > > > > > > > we also add synchronized lock when executing `reset
> subscription`
> > > > can
> > > > > > solve
> > > > > > > > the problem.
> > > > > > > >
> > > > > > > > > 2. Would you like to add some restful API design to
> clarify the
> > > > > > > > implementation?
> > > > > > > >
> > > > > > > > Already added the rest API design in the proposal
> > > > > > > > https://github.com/apache/pulsar/issues/19187
> > > > > > > >
> > > > > > > > On Thu, Jan 12, 2023 at 3:22 PM <ma...@gmail.com>
> wrote:
> > > > > > > >
> > > > > > > > > Hi, Yubiao
> > > > > > > > >
> > > > > > > > > I agree with this idea because some users care about the
> > > > production
> > > > > > rate.
> > > > > > > > > They don't want to unload the whole topic to fix the
> > > subscription
> > > > > > > > problem.
> > > > > > > > >
> > > > > > > > > I've got some questions:
> > > > > > > > >
> > > > > > > > > 1. How do you handle the race condition when you are
> trying to
> > > > > > unload the
> > > > > > > > > subscription, and the new consumer wants to subscribe to
> this
> > > > > > > > subscription
> > > > > > > > > at the same time? I'm unsure if it has the race condition.
> I
> > > just
> > > > > > want to
> > > > > > > > > remind you about that. :)
> > > > > > > > > 2. Would you like to add some restful API design to
> clarify the
> > > > > > > > > implementation?
> > > > > > > > >     a. Request method
> > > > > > > > >     b. Request path
> > > > > > > > >     c. Response code
> > > > > > > > >     d. etc.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks for your work.
> > > > > > > > > Mattison
> > > > > > > > > On Jan 11, 2023, 17:01 +0800, Yubiao Feng <
> > > > > > yubiao.feng@streamnative.io
> > > > > > > > .invalid>,
> > > > > > > > > wrote:
> > > > > > > > > > Hi community
> > > > > > > > > >
> > > > > > > > > > I am starting a DISCUSS for PIP-240: A new API to unload
> > > > > > subscriptions.
> > > > > > > > > >
> > > > > > > > > > PIP issue: https://github.com/apache/pulsar/issues/19187
> > > > > > > > > >
> > > > > > > > > > ### Motivation
> > > > > > > > > >
> > > > > > > > > > We sometimes try to unload the topic to resolve some
> > > > > > consumption-stop
> > > > > > > > > > issues. But the unloading topic will also impact the
> producer
> > > > > side.
> > > > > > > > > >
> > > > > > > > > > ### Goal
> > > > > > > > > >
> > > > > > > > > > Providing a new API to unload the subscription dimension
> > > > triggers
> > > > > > > > > > reconnection of all consumers on that subscription and
> > > > > > reconnection is
> > > > > > > > > > guaranteed by the client. The API will be used in these
> ways:
> > > > > > > > > > - unload special subscription of one topic(or partitioned
> > > > topic)
> > > > > > > > > > - unload all subscriptions of one topic(or partitioned
> topic)
> > > > > > > > > > - unload subscriptions of one topic(or partitioned
> topic) by
> > > > > > regular
> > > > > > > > > > expression
> > > > > > > > > > - If a reader's subscription name is not set, a random
> > > > > subscription
> > > > > > > > name
> > > > > > > > > > prefixed with 'multiTopicsReader-' or 'reader-' will be
> used,
> > > > and
> > > > > > users
> > > > > > > > > can
> > > > > > > > > > uninstall these subscriptions using regular expressions.
> > > > > > > > > >
> > > > > > > > > > In addition to triggering consumer disconnection,
> Unloading
> > > > > > Subscribers
> > > > > > > > > > will restart the Dispatcher, which resets the redeliver
> > > message
> > > > > > queue
> > > > > > > > and
> > > > > > > > > > delayed message queue in the Broker's memory, which can
> help
> > > > > > resolve
> > > > > > > > > issues
> > > > > > > > > > caused by an abnormal dispatcher state. However, the
> > > execution
> > > > > > flow of
> > > > > > > > > > Unloading Subscribers does not include a restart of the
> > > Managed
> > > > > > Cursor
> > > > > > > > > > related to this dispatcher; if there is a problem with
> the
> > > > > cursor,
> > > > > > we
> > > > > > > > can
> > > > > > > > > > only rely on the unload topic to solve it.
> > > > > > > > > >
> > > > > > > > > > Note: From the client's perspective, this connection may
> be
> > > > > shared
> > > > > > by
> > > > > > > > > > consumers, producers, and transactions, so Unloading
> > > > Subscribers
> > > > > > maybe
> > > > > > > > > > impact the producer and transaction.
> > > > > > > > > >
> > > > > > > > > > #### These scenarios are not supported
> > > > > > > > > > - Functions `message-dedup`, `geo-replication,` and
> > > > > `shadow-topic`
> > > > > > also
> > > > > > > > > > read messages from the topic, but Unloading subscribers
> will
> > > > not
> > > > > > > > support
> > > > > > > > > > triggering restarts of these three functions( because the
> > > > cursor
> > > > > is
> > > > > > > > used
> > > > > > > > > > directly to read the data in these scenarios, not the
> > > consumer
> > > > or
> > > > > > > > reader
> > > > > > > > > ).
> > > > > > > > > > - The Compression task(subscription name is
> `__compaction`)
> > > > also
> > > > > > use a
> > > > > > > > > > reader to read data, but Unloading Subscribers does not
> > > support
> > > > > it
> > > > > > > > > because
> > > > > > > > > > this task creates a new reader each time it starts.
> > > > > > > > > > - Do not support all topics related to Transaction
> features.
> > > > > > > > > > - `__transaction_buffer_snapshot` works with the task TB
> > > > recover,
> > > > > > and
> > > > > > > > > > this task will create a new reader each time they start.
> > > > > > > > > > - `__transaction_pending_ack` works with the task
> Transaction
> > > > > > Pending
> > > > > > > > Ack
> > > > > > > > > > Store replay, and this task will use managed cursor
> directly
> > > to
> > > > > > read
> > > > > > > > > data.
> > > > > > > > > > - `__transaction_log_xxx` works with the task Transaction
> > > Log,
> > > > > > which
> > > > > > > > will
> > > > > > > > > > use managed cursor directly to read data.
> > > > > > > > > > - `transaction_coordinator_assign` No data will be
> written on
> > > > > this
> > > > > > > > topic.
> > > > > > > > > >
> > > > > > > > > > #### Special system topic supports
> > > > > > > > > > The system topic `__change_events` is used to support
> > > > topic-level
> > > > > > > > > policies,
> > > > > > > > > > there may also be some message delivery issues in this
> > > > scenario,
> > > > > so
> > > > > > > > > > Unloading Subscribers will support this topic.
> > > > > > > > > >
> > > > > > > > > > ### API Changes
> > > > > > > > > >
> > > > > > > > > > #### For persistent topic
> > > > > > > > > > ```
> > > > > > > > > > pulsar-admin persistent unload {topic_name} -s {sub_name}
> > > > > > > > > > ```
> > > > > > > > > >
> > > > > > > > > > #### For non-persistent topic
> > > > > > > > > > ```
> > > > > > > > > > pulsar-admin non-persistent unload {topic_name} -s
> {sub_name}
> > > > > > > > > > ```
> > > > > > > > > >
> > > > > > > > > > #### Explain the param `-s`
> > > > > > > > > > - set param `-s` to special sub name to unload special
> > > > > subscription
> > > > > > > > > > - set param `-s` to `**` to unload all subscriptions
> under
> > > this
> > > > > > topic
> > > > > > > > > > - set param `-s` to `regexp` to unload a batch
> subscriptions
> > > > > under
> > > > > > this
> > > > > > > > > > topic
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > > Yubiao Feng
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
>

Re: [DISCUSS] PIP-240 A new API to unload subscriptions

Posted by Haiting Jiang <ji...@gmail.com>.
I agree with Penghui & Xiaolong,

1. Restarting a service is usually the most common and effective
option for service maintainers to recover a service and minimize the
business loss.
With this subscription unloading, we can reduce the impact
significantly, as unloading topics will affect message writing, which
has much more influence for online business.

2. Having this subscription doesn't conflict with solving the real
issue. Like broker restarting, it just can buy us more time to locate
the real problem.

BR,
Haiting

On Thu, Jan 19, 2023 at 11:42 AM rxl@apache.org
<ra...@gmail.com> wrote:
>
> Hello Joe and Enrico:
>
> I agree with what you've been emphasizing that we need to fix these issues
> at the root cause. During the maintenance of the Go SDK, we have
> encountered many stuck problems since version 0.4.0, some of which belonged
> to the logic errors handled by the Go SDK itself, and some of which were
> caused by the user's wrong use of the Go SDK, until the previous 0.8 .0
> version, the Go SDK is used on a large scale in our environment. In the
> iterations of these versions, we have been trying to completely fix these
> BUGs. This is what our maintainers have been working hard on and it is also
> a final form we expect Pulsar - everything looks OK.
>
> However, during the iteration of the Go SDK version from 0.4.0 to 0.8.0,
> users of our production environment encountered similar problems many
> times. Again, for a user in a production environment, for example, the
> current user encounters a situation where consumption is blocked. The user
> finds you and expects us to use some means to quickly allow consumers to
> continue to consume news? Or do we keep users in the production environment
> in a stuck state until we find the root cause of the problem and fix it for
> users, pushing users to upgrade. I think everyone's answer tends to be the
> latter. We will not directly expose the hack operations of unload topic and
> unload sub to users, but to Pulsar's operation and maintenance personnel,
> so it is more like an operation and maintenance tool , rather than the
> interface called by the user. So I think this impact is controllable for
> Pulsar as a whole, which is why I support it.
>
> Again, this PIP is more about buying more time for us to locate the problem
> while minimizing the impact on production users. It’s not that with this
> interface we don’t locate the real causes of the stuck. On the contrary, we
> are making more trade-offs between users and positioning issues, buying us
> more time for positioning issues.
>
> --
> Thanks
> xiaolong ran
>
> PengHui Li <pe...@apache.org> 于2023年1月18日周三 11:48写道:
>
> > > What kind of problems is this trying to fix?
> > And why cannot that be solved by client-side fixes?
> >
> > Yes, most of the issue is from the client side, rarely from the broker.
> > But the application also needs time to fix the issue to release and deploy
> > the fix
> > to the production environment. Unloading the subscription is just a
> > temporary
> > way to mitigate the issue and reduce the impact. It will not fix the issue
> > completely.
> >
> > What I learned is to capture the heap dump, topics stats, internal stats,
> > and logs from the broker and client and then try to unload the topic to
> > see if the problem is mitigated. If not, then try to restart the broker or
> > client,
> > most of the time, the problem can be mitigated in this way.
> > Then we can continue to reproduce the issue and investigate the issue
> > from the captured heap dump and logs.
> >
> > > In shared sub issues, it's hard to  pinpoint which consumer/where
> > the problem lies, and to reset that one at the client. The totality of
> > state spread between the brokers and all the consumers of the shared sub
> > needs to be put together .  Is that why we are doing this?
> >
> > From my experience, most are from Shared and key shared subscriptions.
> > Most of the issues come from misuse, rarely from the BUGs of brokers or
> > clients.
> >
> > Regards,
> > Penghui
> >
> >
> > On Wed, Jan 18, 2023 at 11:31 AM Joe F <jo...@gmail.com> wrote:
> >
> > > Inclined to agree with Enrico.  If it's a hard problem, it will repeat,
> > and
> > > this is not helping.  If it's some race on the client, it will occur
> > > randomly and rarely, and this unload sub will get programmed in as a way
> > of
> > > life.
> > >
> > > >If you don't think unloading the subscription can't help anything.
> > > Unloading
> > > the topic should be the same. From my experience, most of the unloading
> > > topic operations are to mitigate the problems related to message
> > > consumption.
> > >
> > > Comparisons with unloading a topic are not the bar here, as that is a
> > first
> > > class broker utility that is needed for operational reasons outside of
> > > "fixing"  consumer side issues . The side effect of using "unload topic"
> > is
> > > a loss of transient topic state. I will fully agree that this side-effect
> > > has been  pervasively abused for fixing problems (ala Ctlrl-Alt-Del) ,
> > but
> > > that's not the rationale for having an unload topic utility.
> > >
> > > What kind of problems is this trying to fix?
> > > And why cannot that be solved by client-side fixes?
> > >
> > > In shared sub issues, it's hard to  pinpoint which consumer/where
> > > the problem lies, and to reset that one at the client. The totality of
> > > state spread between the brokers and all the consumers of the shared sub
> > > needs to be put together .  Is that why we are doing this?
> > >
> > >
> > > On Tue, Jan 17, 2023 at 5:30 PM PengHui Li <pe...@apache.org> wrote:
> > >
> > > > I agree that if we encounter a stuck consumption issue, we should
> > > continue
> > > > to find the root cause of the problem.
> > > >
> > > > Subscription unloading is just an option to mitigate the impact first.
> > > > Maybe it can mitigate the issue for 1 hour sometimes. Especially in
> > > > key_shared subscription. Sometimes it's not a BUG from Pulsar.
> > > > But users need time to fix the issue. But it doesn't make sense to let
> > > > the impaction continues until the fix is applied.
> > > >
> > > > I also helped many people to troubleshoot the stuck consumption
> > > > issue related to key_shared subscriptions and transactions etc.
> > > > In most cases, unloading the topic can mitigate the impact.
> > > > For example, due to the un-catched exception, the dispatch thread
> > > > stopped reading messages from the managed-ledger. The exception
> > > > is a very infrequent occurrence. Unloading the topic is the best choice
> > > for
> > > > now, right?
> > > >
> > > > If you don't think unloading the subscription can't help anything.
> > > > Unloading
> > > > the topic should be the same. From my experience, most of the unloading
> > > > topic operations are to mitigate the problems related to message
> > > > consumption.
> > > >
> > > > Best,
> > > > Penghui
> > > >
> > > > On Tue, Jan 17, 2023 at 11:09 PM Enrico Olivelli <eo...@gmail.com>
> > > > wrote:
> > > >
> > > > > Il giorno lun 16 gen 2023 alle ore 11:58 rxl@apache.org
> > > > > <ra...@gmail.com> ha scritto:
> > > > > >
> > > > > > I agree with @Enrico @Bo, if we encounter a subscribe stuck
> > > situation,
> > > > we
> > > > > > must continue to spend more time to locate and fix this problem,
> > > which
> > > > is
> > > > > > what we have been doing.
> > > > > >
> > > > > > But let's think about this problem from another angle. At this
> > time,
> > > a
> > > > > user
> > > > > > in the production environment encounters a consumer stuck
> > situation,
> > > > what
> > > > > > should we do? For a user in a production environment, our first
> > > > reaction
> > > > > > when encountering a problem is how to quickly recover and how to
> > > > quickly
> > > > > > reduce user losses. Even at this point in time, we don't think
> > about
> > > > > > whether this is a bug on the Broker side, a bug on the SDK side,
> > or a
> > > > bug
> > > > > > used by the user himself? In the process of fast recovery, our most
> > > > > common
> > > > > > method is to quickly re-establish the connection between the broker
> > > and
> > > > > the
> > > > > > client through the topic specified by unload. In this process, we
> > try
> > > > to
> > > > > > retain as much context as possible to assist us in the subsequent
> > > > > > continuous positioning and repair of this problem.
> > > > > >
> > > > > > So I don't think these two things conflict. Why we expose the admin
> > > CLI
> > > > > of
> > > > > > the unload topic is why we expect to expose the unload subscribe.
> > If
> > > we
> > > > > > stand from the perspective of a developer, we definitely want to
> > > > > completely
> > > > > > fix the problem that caused the stuck. If we think about this issue
> > > > from
> > > > > > the perspective of the user, when a scenario such as consumer stuck
> > > > > occurs
> > > > > > to the user, the user does not care about the specific cause of the
> > > > > > problem, but expects the business to recover quickly in the
> > shortest
> > > > > > possible time to avoid further loss.
> > > > > >
> > > > > > I admit that this is a relatively hacky way, but it can indeed
> > solve
> > > > the
> > > > > > problems we are currently encountering, and at the same time, it
> > will
> > > > not
> > > > > > cause a major conflict with Pulsar's existing logic. So I still
> > > insist
> > > > on
> > > > > > agreeing with yubiao's point of view.
> > > > >
> > > > >
> > > > >
> > > > > Usually when a subscription is "stuck" even if you unload the topic
> > > > > it returns to the "stuck" state again if you don't solve the problem.
> > > > >
> > > > > This is a very common issue with Pulsar users, I am spending much
> > time
> > > > > helping users to troubleshoot their production problems and unloading
> > > the
> > > > > topic
> > > > > is never a solution, it can give you seconds, minutes or hours of
> > > > > "working state",
> > > > > then the problem will happen again.
> > > > >
> > > > > You say that it can solve the problems you are encountering.
> > > > > Could you please give more context ? (in Slack if this is not
> > > > > something that can be discussed in public)
> > > > > I apologise if I seem  too much of a skeptic this time, I am sure
> > that
> > > > > you have a real problem
> > > > > and you want to fix it, but I would like to help you find the best
> > way.
> > > > >
> > > > > Pulsar is used by many people and we shouldn't add hacky tools for
> > > > > temporary workarounds.
> > > > > Once we deliver an API we should maintain it for an unlimited time.
> > > > >
> > > > > You could patch your system and use the patched version temporarily
> > > > > until you find the root case.
> > > > >
> > > > > Enrico
> > > > >
> > > > > >
> > > > > > --
> > > > > > Thanks
> > > > > > Xiaolong Ran
> > > > > >
> > > > > >
> > > > > > Yubiao Feng <yu...@streamnative.io.invalid> 于2023年1月15日周日
> > > > 20:59写道:
> > > > > >
> > > > > > > Hi Qiang
> > > > > > >
> > > > > > > > 1. How do you handle the race condition when you are trying to
> > > > > unload the
> > > > > > > subscription, and the new consumer wants to subscribe to this
> > > > > subscription
> > > > > > > at the same time? I'm unsure if it has the race condition. I just
> > > > want
> > > > > to
> > > > > > > remind you about that.:)
> > > > > > >
> > > > > > > These methods `addConsumer`, `removeConsumer` all have
> > synchronized
> > > > > locks,
> > > > > > > we also add synchronized lock when executing `reset subscription`
> > > can
> > > > > solve
> > > > > > > the problem.
> > > > > > >
> > > > > > > > 2. Would you like to add some restful API design to clarify the
> > > > > > > implementation?
> > > > > > >
> > > > > > > Already added the rest API design in the proposal
> > > > > > > https://github.com/apache/pulsar/issues/19187
> > > > > > >
> > > > > > > On Thu, Jan 12, 2023 at 3:22 PM <ma...@gmail.com> wrote:
> > > > > > >
> > > > > > > > Hi, Yubiao
> > > > > > > >
> > > > > > > > I agree with this idea because some users care about the
> > > production
> > > > > rate.
> > > > > > > > They don't want to unload the whole topic to fix the
> > subscription
> > > > > > > problem.
> > > > > > > >
> > > > > > > > I've got some questions:
> > > > > > > >
> > > > > > > > 1. How do you handle the race condition when you are trying to
> > > > > unload the
> > > > > > > > subscription, and the new consumer wants to subscribe to this
> > > > > > > subscription
> > > > > > > > at the same time? I'm unsure if it has the race condition. I
> > just
> > > > > want to
> > > > > > > > remind you about that. :)
> > > > > > > > 2. Would you like to add some restful API design to clarify the
> > > > > > > > implementation?
> > > > > > > >     a. Request method
> > > > > > > >     b. Request path
> > > > > > > >     c. Response code
> > > > > > > >     d. etc.
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks for your work.
> > > > > > > > Mattison
> > > > > > > > On Jan 11, 2023, 17:01 +0800, Yubiao Feng <
> > > > > yubiao.feng@streamnative.io
> > > > > > > .invalid>,
> > > > > > > > wrote:
> > > > > > > > > Hi community
> > > > > > > > >
> > > > > > > > > I am starting a DISCUSS for PIP-240: A new API to unload
> > > > > subscriptions.
> > > > > > > > >
> > > > > > > > > PIP issue: https://github.com/apache/pulsar/issues/19187
> > > > > > > > >
> > > > > > > > > ### Motivation
> > > > > > > > >
> > > > > > > > > We sometimes try to unload the topic to resolve some
> > > > > consumption-stop
> > > > > > > > > issues. But the unloading topic will also impact the producer
> > > > side.
> > > > > > > > >
> > > > > > > > > ### Goal
> > > > > > > > >
> > > > > > > > > Providing a new API to unload the subscription dimension
> > > triggers
> > > > > > > > > reconnection of all consumers on that subscription and
> > > > > reconnection is
> > > > > > > > > guaranteed by the client. The API will be used in these ways:
> > > > > > > > > - unload special subscription of one topic(or partitioned
> > > topic)
> > > > > > > > > - unload all subscriptions of one topic(or partitioned topic)
> > > > > > > > > - unload subscriptions of one topic(or partitioned topic) by
> > > > > regular
> > > > > > > > > expression
> > > > > > > > > - If a reader's subscription name is not set, a random
> > > > subscription
> > > > > > > name
> > > > > > > > > prefixed with 'multiTopicsReader-' or 'reader-' will be used,
> > > and
> > > > > users
> > > > > > > > can
> > > > > > > > > uninstall these subscriptions using regular expressions.
> > > > > > > > >
> > > > > > > > > In addition to triggering consumer disconnection, Unloading
> > > > > Subscribers
> > > > > > > > > will restart the Dispatcher, which resets the redeliver
> > message
> > > > > queue
> > > > > > > and
> > > > > > > > > delayed message queue in the Broker's memory, which can help
> > > > > resolve
> > > > > > > > issues
> > > > > > > > > caused by an abnormal dispatcher state. However, the
> > execution
> > > > > flow of
> > > > > > > > > Unloading Subscribers does not include a restart of the
> > Managed
> > > > > Cursor
> > > > > > > > > related to this dispatcher; if there is a problem with the
> > > > cursor,
> > > > > we
> > > > > > > can
> > > > > > > > > only rely on the unload topic to solve it.
> > > > > > > > >
> > > > > > > > > Note: From the client's perspective, this connection may be
> > > > shared
> > > > > by
> > > > > > > > > consumers, producers, and transactions, so Unloading
> > > Subscribers
> > > > > maybe
> > > > > > > > > impact the producer and transaction.
> > > > > > > > >
> > > > > > > > > #### These scenarios are not supported
> > > > > > > > > - Functions `message-dedup`, `geo-replication,` and
> > > > `shadow-topic`
> > > > > also
> > > > > > > > > read messages from the topic, but Unloading subscribers will
> > > not
> > > > > > > support
> > > > > > > > > triggering restarts of these three functions( because the
> > > cursor
> > > > is
> > > > > > > used
> > > > > > > > > directly to read the data in these scenarios, not the
> > consumer
> > > or
> > > > > > > reader
> > > > > > > > ).
> > > > > > > > > - The Compression task(subscription name is `__compaction`)
> > > also
> > > > > use a
> > > > > > > > > reader to read data, but Unloading Subscribers does not
> > support
> > > > it
> > > > > > > > because
> > > > > > > > > this task creates a new reader each time it starts.
> > > > > > > > > - Do not support all topics related to Transaction features.
> > > > > > > > > - `__transaction_buffer_snapshot` works with the task TB
> > > recover,
> > > > > and
> > > > > > > > > this task will create a new reader each time they start.
> > > > > > > > > - `__transaction_pending_ack` works with the task Transaction
> > > > > Pending
> > > > > > > Ack
> > > > > > > > > Store replay, and this task will use managed cursor directly
> > to
> > > > > read
> > > > > > > > data.
> > > > > > > > > - `__transaction_log_xxx` works with the task Transaction
> > Log,
> > > > > which
> > > > > > > will
> > > > > > > > > use managed cursor directly to read data.
> > > > > > > > > - `transaction_coordinator_assign` No data will be written on
> > > > this
> > > > > > > topic.
> > > > > > > > >
> > > > > > > > > #### Special system topic supports
> > > > > > > > > The system topic `__change_events` is used to support
> > > topic-level
> > > > > > > > policies,
> > > > > > > > > there may also be some message delivery issues in this
> > > scenario,
> > > > so
> > > > > > > > > Unloading Subscribers will support this topic.
> > > > > > > > >
> > > > > > > > > ### API Changes
> > > > > > > > >
> > > > > > > > > #### For persistent topic
> > > > > > > > > ```
> > > > > > > > > pulsar-admin persistent unload {topic_name} -s {sub_name}
> > > > > > > > > ```
> > > > > > > > >
> > > > > > > > > #### For non-persistent topic
> > > > > > > > > ```
> > > > > > > > > pulsar-admin non-persistent unload {topic_name} -s {sub_name}
> > > > > > > > > ```
> > > > > > > > >
> > > > > > > > > #### Explain the param `-s`
> > > > > > > > > - set param `-s` to special sub name to unload special
> > > > subscription
> > > > > > > > > - set param `-s` to `**` to unload all subscriptions under
> > this
> > > > > topic
> > > > > > > > > - set param `-s` to `regexp` to unload a batch subscriptions
> > > > under
> > > > > this
> > > > > > > > > topic
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > > Yubiao Feng
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >

Re: [DISCUSS] PIP-240 A new API to unload subscriptions

Posted by "rxl@apache.org" <ra...@gmail.com>.
Hello Joe and Enrico:

I agree with what you've been emphasizing that we need to fix these issues
at the root cause. During the maintenance of the Go SDK, we have
encountered many stuck problems since version 0.4.0, some of which belonged
to the logic errors handled by the Go SDK itself, and some of which were
caused by the user's wrong use of the Go SDK, until the previous 0.8 .0
version, the Go SDK is used on a large scale in our environment. In the
iterations of these versions, we have been trying to completely fix these
BUGs. This is what our maintainers have been working hard on and it is also
a final form we expect Pulsar - everything looks OK.

However, during the iteration of the Go SDK version from 0.4.0 to 0.8.0,
users of our production environment encountered similar problems many
times. Again, for a user in a production environment, for example, the
current user encounters a situation where consumption is blocked. The user
finds you and expects us to use some means to quickly allow consumers to
continue to consume news? Or do we keep users in the production environment
in a stuck state until we find the root cause of the problem and fix it for
users, pushing users to upgrade. I think everyone's answer tends to be the
latter. We will not directly expose the hack operations of unload topic and
unload sub to users, but to Pulsar's operation and maintenance personnel,
so it is more like an operation and maintenance tool , rather than the
interface called by the user. So I think this impact is controllable for
Pulsar as a whole, which is why I support it.

Again, this PIP is more about buying more time for us to locate the problem
while minimizing the impact on production users. It’s not that with this
interface we don’t locate the real causes of the stuck. On the contrary, we
are making more trade-offs between users and positioning issues, buying us
more time for positioning issues.

--
Thanks
xiaolong ran

PengHui Li <pe...@apache.org> 于2023年1月18日周三 11:48写道:

> > What kind of problems is this trying to fix?
> And why cannot that be solved by client-side fixes?
>
> Yes, most of the issue is from the client side, rarely from the broker.
> But the application also needs time to fix the issue to release and deploy
> the fix
> to the production environment. Unloading the subscription is just a
> temporary
> way to mitigate the issue and reduce the impact. It will not fix the issue
> completely.
>
> What I learned is to capture the heap dump, topics stats, internal stats,
> and logs from the broker and client and then try to unload the topic to
> see if the problem is mitigated. If not, then try to restart the broker or
> client,
> most of the time, the problem can be mitigated in this way.
> Then we can continue to reproduce the issue and investigate the issue
> from the captured heap dump and logs.
>
> > In shared sub issues, it's hard to  pinpoint which consumer/where
> the problem lies, and to reset that one at the client. The totality of
> state spread between the brokers and all the consumers of the shared sub
> needs to be put together .  Is that why we are doing this?
>
> From my experience, most are from Shared and key shared subscriptions.
> Most of the issues come from misuse, rarely from the BUGs of brokers or
> clients.
>
> Regards,
> Penghui
>
>
> On Wed, Jan 18, 2023 at 11:31 AM Joe F <jo...@gmail.com> wrote:
>
> > Inclined to agree with Enrico.  If it's a hard problem, it will repeat,
> and
> > this is not helping.  If it's some race on the client, it will occur
> > randomly and rarely, and this unload sub will get programmed in as a way
> of
> > life.
> >
> > >If you don't think unloading the subscription can't help anything.
> > Unloading
> > the topic should be the same. From my experience, most of the unloading
> > topic operations are to mitigate the problems related to message
> > consumption.
> >
> > Comparisons with unloading a topic are not the bar here, as that is a
> first
> > class broker utility that is needed for operational reasons outside of
> > "fixing"  consumer side issues . The side effect of using "unload topic"
> is
> > a loss of transient topic state. I will fully agree that this side-effect
> > has been  pervasively abused for fixing problems (ala Ctlrl-Alt-Del) ,
> but
> > that's not the rationale for having an unload topic utility.
> >
> > What kind of problems is this trying to fix?
> > And why cannot that be solved by client-side fixes?
> >
> > In shared sub issues, it's hard to  pinpoint which consumer/where
> > the problem lies, and to reset that one at the client. The totality of
> > state spread between the brokers and all the consumers of the shared sub
> > needs to be put together .  Is that why we are doing this?
> >
> >
> > On Tue, Jan 17, 2023 at 5:30 PM PengHui Li <pe...@apache.org> wrote:
> >
> > > I agree that if we encounter a stuck consumption issue, we should
> > continue
> > > to find the root cause of the problem.
> > >
> > > Subscription unloading is just an option to mitigate the impact first.
> > > Maybe it can mitigate the issue for 1 hour sometimes. Especially in
> > > key_shared subscription. Sometimes it's not a BUG from Pulsar.
> > > But users need time to fix the issue. But it doesn't make sense to let
> > > the impaction continues until the fix is applied.
> > >
> > > I also helped many people to troubleshoot the stuck consumption
> > > issue related to key_shared subscriptions and transactions etc.
> > > In most cases, unloading the topic can mitigate the impact.
> > > For example, due to the un-catched exception, the dispatch thread
> > > stopped reading messages from the managed-ledger. The exception
> > > is a very infrequent occurrence. Unloading the topic is the best choice
> > for
> > > now, right?
> > >
> > > If you don't think unloading the subscription can't help anything.
> > > Unloading
> > > the topic should be the same. From my experience, most of the unloading
> > > topic operations are to mitigate the problems related to message
> > > consumption.
> > >
> > > Best,
> > > Penghui
> > >
> > > On Tue, Jan 17, 2023 at 11:09 PM Enrico Olivelli <eo...@gmail.com>
> > > wrote:
> > >
> > > > Il giorno lun 16 gen 2023 alle ore 11:58 rxl@apache.org
> > > > <ra...@gmail.com> ha scritto:
> > > > >
> > > > > I agree with @Enrico @Bo, if we encounter a subscribe stuck
> > situation,
> > > we
> > > > > must continue to spend more time to locate and fix this problem,
> > which
> > > is
> > > > > what we have been doing.
> > > > >
> > > > > But let's think about this problem from another angle. At this
> time,
> > a
> > > > user
> > > > > in the production environment encounters a consumer stuck
> situation,
> > > what
> > > > > should we do? For a user in a production environment, our first
> > > reaction
> > > > > when encountering a problem is how to quickly recover and how to
> > > quickly
> > > > > reduce user losses. Even at this point in time, we don't think
> about
> > > > > whether this is a bug on the Broker side, a bug on the SDK side,
> or a
> > > bug
> > > > > used by the user himself? In the process of fast recovery, our most
> > > > common
> > > > > method is to quickly re-establish the connection between the broker
> > and
> > > > the
> > > > > client through the topic specified by unload. In this process, we
> try
> > > to
> > > > > retain as much context as possible to assist us in the subsequent
> > > > > continuous positioning and repair of this problem.
> > > > >
> > > > > So I don't think these two things conflict. Why we expose the admin
> > CLI
> > > > of
> > > > > the unload topic is why we expect to expose the unload subscribe.
> If
> > we
> > > > > stand from the perspective of a developer, we definitely want to
> > > > completely
> > > > > fix the problem that caused the stuck. If we think about this issue
> > > from
> > > > > the perspective of the user, when a scenario such as consumer stuck
> > > > occurs
> > > > > to the user, the user does not care about the specific cause of the
> > > > > problem, but expects the business to recover quickly in the
> shortest
> > > > > possible time to avoid further loss.
> > > > >
> > > > > I admit that this is a relatively hacky way, but it can indeed
> solve
> > > the
> > > > > problems we are currently encountering, and at the same time, it
> will
> > > not
> > > > > cause a major conflict with Pulsar's existing logic. So I still
> > insist
> > > on
> > > > > agreeing with yubiao's point of view.
> > > >
> > > >
> > > >
> > > > Usually when a subscription is "stuck" even if you unload the topic
> > > > it returns to the "stuck" state again if you don't solve the problem.
> > > >
> > > > This is a very common issue with Pulsar users, I am spending much
> time
> > > > helping users to troubleshoot their production problems and unloading
> > the
> > > > topic
> > > > is never a solution, it can give you seconds, minutes or hours of
> > > > "working state",
> > > > then the problem will happen again.
> > > >
> > > > You say that it can solve the problems you are encountering.
> > > > Could you please give more context ? (in Slack if this is not
> > > > something that can be discussed in public)
> > > > I apologise if I seem  too much of a skeptic this time, I am sure
> that
> > > > you have a real problem
> > > > and you want to fix it, but I would like to help you find the best
> way.
> > > >
> > > > Pulsar is used by many people and we shouldn't add hacky tools for
> > > > temporary workarounds.
> > > > Once we deliver an API we should maintain it for an unlimited time.
> > > >
> > > > You could patch your system and use the patched version temporarily
> > > > until you find the root case.
> > > >
> > > > Enrico
> > > >
> > > > >
> > > > > --
> > > > > Thanks
> > > > > Xiaolong Ran
> > > > >
> > > > >
> > > > > Yubiao Feng <yu...@streamnative.io.invalid> 于2023年1月15日周日
> > > 20:59写道:
> > > > >
> > > > > > Hi Qiang
> > > > > >
> > > > > > > 1. How do you handle the race condition when you are trying to
> > > > unload the
> > > > > > subscription, and the new consumer wants to subscribe to this
> > > > subscription
> > > > > > at the same time? I'm unsure if it has the race condition. I just
> > > want
> > > > to
> > > > > > remind you about that.:)
> > > > > >
> > > > > > These methods `addConsumer`, `removeConsumer` all have
> synchronized
> > > > locks,
> > > > > > we also add synchronized lock when executing `reset subscription`
> > can
> > > > solve
> > > > > > the problem.
> > > > > >
> > > > > > > 2. Would you like to add some restful API design to clarify the
> > > > > > implementation?
> > > > > >
> > > > > > Already added the rest API design in the proposal
> > > > > > https://github.com/apache/pulsar/issues/19187
> > > > > >
> > > > > > On Thu, Jan 12, 2023 at 3:22 PM <ma...@gmail.com> wrote:
> > > > > >
> > > > > > > Hi, Yubiao
> > > > > > >
> > > > > > > I agree with this idea because some users care about the
> > production
> > > > rate.
> > > > > > > They don't want to unload the whole topic to fix the
> subscription
> > > > > > problem.
> > > > > > >
> > > > > > > I've got some questions:
> > > > > > >
> > > > > > > 1. How do you handle the race condition when you are trying to
> > > > unload the
> > > > > > > subscription, and the new consumer wants to subscribe to this
> > > > > > subscription
> > > > > > > at the same time? I'm unsure if it has the race condition. I
> just
> > > > want to
> > > > > > > remind you about that. :)
> > > > > > > 2. Would you like to add some restful API design to clarify the
> > > > > > > implementation?
> > > > > > >     a. Request method
> > > > > > >     b. Request path
> > > > > > >     c. Response code
> > > > > > >     d. etc.
> > > > > > >
> > > > > > >
> > > > > > > Thanks for your work.
> > > > > > > Mattison
> > > > > > > On Jan 11, 2023, 17:01 +0800, Yubiao Feng <
> > > > yubiao.feng@streamnative.io
> > > > > > .invalid>,
> > > > > > > wrote:
> > > > > > > > Hi community
> > > > > > > >
> > > > > > > > I am starting a DISCUSS for PIP-240: A new API to unload
> > > > subscriptions.
> > > > > > > >
> > > > > > > > PIP issue: https://github.com/apache/pulsar/issues/19187
> > > > > > > >
> > > > > > > > ### Motivation
> > > > > > > >
> > > > > > > > We sometimes try to unload the topic to resolve some
> > > > consumption-stop
> > > > > > > > issues. But the unloading topic will also impact the producer
> > > side.
> > > > > > > >
> > > > > > > > ### Goal
> > > > > > > >
> > > > > > > > Providing a new API to unload the subscription dimension
> > triggers
> > > > > > > > reconnection of all consumers on that subscription and
> > > > reconnection is
> > > > > > > > guaranteed by the client. The API will be used in these ways:
> > > > > > > > - unload special subscription of one topic(or partitioned
> > topic)
> > > > > > > > - unload all subscriptions of one topic(or partitioned topic)
> > > > > > > > - unload subscriptions of one topic(or partitioned topic) by
> > > > regular
> > > > > > > > expression
> > > > > > > > - If a reader's subscription name is not set, a random
> > > subscription
> > > > > > name
> > > > > > > > prefixed with 'multiTopicsReader-' or 'reader-' will be used,
> > and
> > > > users
> > > > > > > can
> > > > > > > > uninstall these subscriptions using regular expressions.
> > > > > > > >
> > > > > > > > In addition to triggering consumer disconnection, Unloading
> > > > Subscribers
> > > > > > > > will restart the Dispatcher, which resets the redeliver
> message
> > > > queue
> > > > > > and
> > > > > > > > delayed message queue in the Broker's memory, which can help
> > > > resolve
> > > > > > > issues
> > > > > > > > caused by an abnormal dispatcher state. However, the
> execution
> > > > flow of
> > > > > > > > Unloading Subscribers does not include a restart of the
> Managed
> > > > Cursor
> > > > > > > > related to this dispatcher; if there is a problem with the
> > > cursor,
> > > > we
> > > > > > can
> > > > > > > > only rely on the unload topic to solve it.
> > > > > > > >
> > > > > > > > Note: From the client's perspective, this connection may be
> > > shared
> > > > by
> > > > > > > > consumers, producers, and transactions, so Unloading
> > Subscribers
> > > > maybe
> > > > > > > > impact the producer and transaction.
> > > > > > > >
> > > > > > > > #### These scenarios are not supported
> > > > > > > > - Functions `message-dedup`, `geo-replication,` and
> > > `shadow-topic`
> > > > also
> > > > > > > > read messages from the topic, but Unloading subscribers will
> > not
> > > > > > support
> > > > > > > > triggering restarts of these three functions( because the
> > cursor
> > > is
> > > > > > used
> > > > > > > > directly to read the data in these scenarios, not the
> consumer
> > or
> > > > > > reader
> > > > > > > ).
> > > > > > > > - The Compression task(subscription name is `__compaction`)
> > also
> > > > use a
> > > > > > > > reader to read data, but Unloading Subscribers does not
> support
> > > it
> > > > > > > because
> > > > > > > > this task creates a new reader each time it starts.
> > > > > > > > - Do not support all topics related to Transaction features.
> > > > > > > > - `__transaction_buffer_snapshot` works with the task TB
> > recover,
> > > > and
> > > > > > > > this task will create a new reader each time they start.
> > > > > > > > - `__transaction_pending_ack` works with the task Transaction
> > > > Pending
> > > > > > Ack
> > > > > > > > Store replay, and this task will use managed cursor directly
> to
> > > > read
> > > > > > > data.
> > > > > > > > - `__transaction_log_xxx` works with the task Transaction
> Log,
> > > > which
> > > > > > will
> > > > > > > > use managed cursor directly to read data.
> > > > > > > > - `transaction_coordinator_assign` No data will be written on
> > > this
> > > > > > topic.
> > > > > > > >
> > > > > > > > #### Special system topic supports
> > > > > > > > The system topic `__change_events` is used to support
> > topic-level
> > > > > > > policies,
> > > > > > > > there may also be some message delivery issues in this
> > scenario,
> > > so
> > > > > > > > Unloading Subscribers will support this topic.
> > > > > > > >
> > > > > > > > ### API Changes
> > > > > > > >
> > > > > > > > #### For persistent topic
> > > > > > > > ```
> > > > > > > > pulsar-admin persistent unload {topic_name} -s {sub_name}
> > > > > > > > ```
> > > > > > > >
> > > > > > > > #### For non-persistent topic
> > > > > > > > ```
> > > > > > > > pulsar-admin non-persistent unload {topic_name} -s {sub_name}
> > > > > > > > ```
> > > > > > > >
> > > > > > > > #### Explain the param `-s`
> > > > > > > > - set param `-s` to special sub name to unload special
> > > subscription
> > > > > > > > - set param `-s` to `**` to unload all subscriptions under
> this
> > > > topic
> > > > > > > > - set param `-s` to `regexp` to unload a batch subscriptions
> > > under
> > > > this
> > > > > > > > topic
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > > Yubiao Feng
> > > > > > >
> > > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] PIP-240 A new API to unload subscriptions

Posted by PengHui Li <pe...@apache.org>.
> What kind of problems is this trying to fix?
And why cannot that be solved by client-side fixes?

Yes, most of the issue is from the client side, rarely from the broker.
But the application also needs time to fix the issue to release and deploy
the fix
to the production environment. Unloading the subscription is just a
temporary
way to mitigate the issue and reduce the impact. It will not fix the issue
completely.

What I learned is to capture the heap dump, topics stats, internal stats,
and logs from the broker and client and then try to unload the topic to
see if the problem is mitigated. If not, then try to restart the broker or
client,
most of the time, the problem can be mitigated in this way.
Then we can continue to reproduce the issue and investigate the issue
from the captured heap dump and logs.

> In shared sub issues, it's hard to  pinpoint which consumer/where
the problem lies, and to reset that one at the client. The totality of
state spread between the brokers and all the consumers of the shared sub
needs to be put together .  Is that why we are doing this?

From my experience, most are from Shared and key shared subscriptions.
Most of the issues come from misuse, rarely from the BUGs of brokers or
clients.

Regards,
Penghui


On Wed, Jan 18, 2023 at 11:31 AM Joe F <jo...@gmail.com> wrote:

> Inclined to agree with Enrico.  If it's a hard problem, it will repeat, and
> this is not helping.  If it's some race on the client, it will occur
> randomly and rarely, and this unload sub will get programmed in as a way of
> life.
>
> >If you don't think unloading the subscription can't help anything.
> Unloading
> the topic should be the same. From my experience, most of the unloading
> topic operations are to mitigate the problems related to message
> consumption.
>
> Comparisons with unloading a topic are not the bar here, as that is a first
> class broker utility that is needed for operational reasons outside of
> "fixing"  consumer side issues . The side effect of using "unload topic" is
> a loss of transient topic state. I will fully agree that this side-effect
> has been  pervasively abused for fixing problems (ala Ctlrl-Alt-Del) , but
> that's not the rationale for having an unload topic utility.
>
> What kind of problems is this trying to fix?
> And why cannot that be solved by client-side fixes?
>
> In shared sub issues, it's hard to  pinpoint which consumer/where
> the problem lies, and to reset that one at the client. The totality of
> state spread between the brokers and all the consumers of the shared sub
> needs to be put together .  Is that why we are doing this?
>
>
> On Tue, Jan 17, 2023 at 5:30 PM PengHui Li <pe...@apache.org> wrote:
>
> > I agree that if we encounter a stuck consumption issue, we should
> continue
> > to find the root cause of the problem.
> >
> > Subscription unloading is just an option to mitigate the impact first.
> > Maybe it can mitigate the issue for 1 hour sometimes. Especially in
> > key_shared subscription. Sometimes it's not a BUG from Pulsar.
> > But users need time to fix the issue. But it doesn't make sense to let
> > the impaction continues until the fix is applied.
> >
> > I also helped many people to troubleshoot the stuck consumption
> > issue related to key_shared subscriptions and transactions etc.
> > In most cases, unloading the topic can mitigate the impact.
> > For example, due to the un-catched exception, the dispatch thread
> > stopped reading messages from the managed-ledger. The exception
> > is a very infrequent occurrence. Unloading the topic is the best choice
> for
> > now, right?
> >
> > If you don't think unloading the subscription can't help anything.
> > Unloading
> > the topic should be the same. From my experience, most of the unloading
> > topic operations are to mitigate the problems related to message
> > consumption.
> >
> > Best,
> > Penghui
> >
> > On Tue, Jan 17, 2023 at 11:09 PM Enrico Olivelli <eo...@gmail.com>
> > wrote:
> >
> > > Il giorno lun 16 gen 2023 alle ore 11:58 rxl@apache.org
> > > <ra...@gmail.com> ha scritto:
> > > >
> > > > I agree with @Enrico @Bo, if we encounter a subscribe stuck
> situation,
> > we
> > > > must continue to spend more time to locate and fix this problem,
> which
> > is
> > > > what we have been doing.
> > > >
> > > > But let's think about this problem from another angle. At this time,
> a
> > > user
> > > > in the production environment encounters a consumer stuck situation,
> > what
> > > > should we do? For a user in a production environment, our first
> > reaction
> > > > when encountering a problem is how to quickly recover and how to
> > quickly
> > > > reduce user losses. Even at this point in time, we don't think about
> > > > whether this is a bug on the Broker side, a bug on the SDK side, or a
> > bug
> > > > used by the user himself? In the process of fast recovery, our most
> > > common
> > > > method is to quickly re-establish the connection between the broker
> and
> > > the
> > > > client through the topic specified by unload. In this process, we try
> > to
> > > > retain as much context as possible to assist us in the subsequent
> > > > continuous positioning and repair of this problem.
> > > >
> > > > So I don't think these two things conflict. Why we expose the admin
> CLI
> > > of
> > > > the unload topic is why we expect to expose the unload subscribe. If
> we
> > > > stand from the perspective of a developer, we definitely want to
> > > completely
> > > > fix the problem that caused the stuck. If we think about this issue
> > from
> > > > the perspective of the user, when a scenario such as consumer stuck
> > > occurs
> > > > to the user, the user does not care about the specific cause of the
> > > > problem, but expects the business to recover quickly in the shortest
> > > > possible time to avoid further loss.
> > > >
> > > > I admit that this is a relatively hacky way, but it can indeed solve
> > the
> > > > problems we are currently encountering, and at the same time, it will
> > not
> > > > cause a major conflict with Pulsar's existing logic. So I still
> insist
> > on
> > > > agreeing with yubiao's point of view.
> > >
> > >
> > >
> > > Usually when a subscription is "stuck" even if you unload the topic
> > > it returns to the "stuck" state again if you don't solve the problem.
> > >
> > > This is a very common issue with Pulsar users, I am spending much time
> > > helping users to troubleshoot their production problems and unloading
> the
> > > topic
> > > is never a solution, it can give you seconds, minutes or hours of
> > > "working state",
> > > then the problem will happen again.
> > >
> > > You say that it can solve the problems you are encountering.
> > > Could you please give more context ? (in Slack if this is not
> > > something that can be discussed in public)
> > > I apologise if I seem  too much of a skeptic this time, I am sure that
> > > you have a real problem
> > > and you want to fix it, but I would like to help you find the best way.
> > >
> > > Pulsar is used by many people and we shouldn't add hacky tools for
> > > temporary workarounds.
> > > Once we deliver an API we should maintain it for an unlimited time.
> > >
> > > You could patch your system and use the patched version temporarily
> > > until you find the root case.
> > >
> > > Enrico
> > >
> > > >
> > > > --
> > > > Thanks
> > > > Xiaolong Ran
> > > >
> > > >
> > > > Yubiao Feng <yu...@streamnative.io.invalid> 于2023年1月15日周日
> > 20:59写道:
> > > >
> > > > > Hi Qiang
> > > > >
> > > > > > 1. How do you handle the race condition when you are trying to
> > > unload the
> > > > > subscription, and the new consumer wants to subscribe to this
> > > subscription
> > > > > at the same time? I'm unsure if it has the race condition. I just
> > want
> > > to
> > > > > remind you about that.:)
> > > > >
> > > > > These methods `addConsumer`, `removeConsumer` all have synchronized
> > > locks,
> > > > > we also add synchronized lock when executing `reset subscription`
> can
> > > solve
> > > > > the problem.
> > > > >
> > > > > > 2. Would you like to add some restful API design to clarify the
> > > > > implementation?
> > > > >
> > > > > Already added the rest API design in the proposal
> > > > > https://github.com/apache/pulsar/issues/19187
> > > > >
> > > > > On Thu, Jan 12, 2023 at 3:22 PM <ma...@gmail.com> wrote:
> > > > >
> > > > > > Hi, Yubiao
> > > > > >
> > > > > > I agree with this idea because some users care about the
> production
> > > rate.
> > > > > > They don't want to unload the whole topic to fix the subscription
> > > > > problem.
> > > > > >
> > > > > > I've got some questions:
> > > > > >
> > > > > > 1. How do you handle the race condition when you are trying to
> > > unload the
> > > > > > subscription, and the new consumer wants to subscribe to this
> > > > > subscription
> > > > > > at the same time? I'm unsure if it has the race condition. I just
> > > want to
> > > > > > remind you about that. :)
> > > > > > 2. Would you like to add some restful API design to clarify the
> > > > > > implementation?
> > > > > >     a. Request method
> > > > > >     b. Request path
> > > > > >     c. Response code
> > > > > >     d. etc.
> > > > > >
> > > > > >
> > > > > > Thanks for your work.
> > > > > > Mattison
> > > > > > On Jan 11, 2023, 17:01 +0800, Yubiao Feng <
> > > yubiao.feng@streamnative.io
> > > > > .invalid>,
> > > > > > wrote:
> > > > > > > Hi community
> > > > > > >
> > > > > > > I am starting a DISCUSS for PIP-240: A new API to unload
> > > subscriptions.
> > > > > > >
> > > > > > > PIP issue: https://github.com/apache/pulsar/issues/19187
> > > > > > >
> > > > > > > ### Motivation
> > > > > > >
> > > > > > > We sometimes try to unload the topic to resolve some
> > > consumption-stop
> > > > > > > issues. But the unloading topic will also impact the producer
> > side.
> > > > > > >
> > > > > > > ### Goal
> > > > > > >
> > > > > > > Providing a new API to unload the subscription dimension
> triggers
> > > > > > > reconnection of all consumers on that subscription and
> > > reconnection is
> > > > > > > guaranteed by the client. The API will be used in these ways:
> > > > > > > - unload special subscription of one topic(or partitioned
> topic)
> > > > > > > - unload all subscriptions of one topic(or partitioned topic)
> > > > > > > - unload subscriptions of one topic(or partitioned topic) by
> > > regular
> > > > > > > expression
> > > > > > > - If a reader's subscription name is not set, a random
> > subscription
> > > > > name
> > > > > > > prefixed with 'multiTopicsReader-' or 'reader-' will be used,
> and
> > > users
> > > > > > can
> > > > > > > uninstall these subscriptions using regular expressions.
> > > > > > >
> > > > > > > In addition to triggering consumer disconnection, Unloading
> > > Subscribers
> > > > > > > will restart the Dispatcher, which resets the redeliver message
> > > queue
> > > > > and
> > > > > > > delayed message queue in the Broker's memory, which can help
> > > resolve
> > > > > > issues
> > > > > > > caused by an abnormal dispatcher state. However, the execution
> > > flow of
> > > > > > > Unloading Subscribers does not include a restart of the Managed
> > > Cursor
> > > > > > > related to this dispatcher; if there is a problem with the
> > cursor,
> > > we
> > > > > can
> > > > > > > only rely on the unload topic to solve it.
> > > > > > >
> > > > > > > Note: From the client's perspective, this connection may be
> > shared
> > > by
> > > > > > > consumers, producers, and transactions, so Unloading
> Subscribers
> > > maybe
> > > > > > > impact the producer and transaction.
> > > > > > >
> > > > > > > #### These scenarios are not supported
> > > > > > > - Functions `message-dedup`, `geo-replication,` and
> > `shadow-topic`
> > > also
> > > > > > > read messages from the topic, but Unloading subscribers will
> not
> > > > > support
> > > > > > > triggering restarts of these three functions( because the
> cursor
> > is
> > > > > used
> > > > > > > directly to read the data in these scenarios, not the consumer
> or
> > > > > reader
> > > > > > ).
> > > > > > > - The Compression task(subscription name is `__compaction`)
> also
> > > use a
> > > > > > > reader to read data, but Unloading Subscribers does not support
> > it
> > > > > > because
> > > > > > > this task creates a new reader each time it starts.
> > > > > > > - Do not support all topics related to Transaction features.
> > > > > > > - `__transaction_buffer_snapshot` works with the task TB
> recover,
> > > and
> > > > > > > this task will create a new reader each time they start.
> > > > > > > - `__transaction_pending_ack` works with the task Transaction
> > > Pending
> > > > > Ack
> > > > > > > Store replay, and this task will use managed cursor directly to
> > > read
> > > > > > data.
> > > > > > > - `__transaction_log_xxx` works with the task Transaction Log,
> > > which
> > > > > will
> > > > > > > use managed cursor directly to read data.
> > > > > > > - `transaction_coordinator_assign` No data will be written on
> > this
> > > > > topic.
> > > > > > >
> > > > > > > #### Special system topic supports
> > > > > > > The system topic `__change_events` is used to support
> topic-level
> > > > > > policies,
> > > > > > > there may also be some message delivery issues in this
> scenario,
> > so
> > > > > > > Unloading Subscribers will support this topic.
> > > > > > >
> > > > > > > ### API Changes
> > > > > > >
> > > > > > > #### For persistent topic
> > > > > > > ```
> > > > > > > pulsar-admin persistent unload {topic_name} -s {sub_name}
> > > > > > > ```
> > > > > > >
> > > > > > > #### For non-persistent topic
> > > > > > > ```
> > > > > > > pulsar-admin non-persistent unload {topic_name} -s {sub_name}
> > > > > > > ```
> > > > > > >
> > > > > > > #### Explain the param `-s`
> > > > > > > - set param `-s` to special sub name to unload special
> > subscription
> > > > > > > - set param `-s` to `**` to unload all subscriptions under this
> > > topic
> > > > > > > - set param `-s` to `regexp` to unload a batch subscriptions
> > under
> > > this
> > > > > > > topic
> > > > > > >
> > > > > > >
> > > > > > > Thanks
> > > > > > > Yubiao Feng
> > > > > >
> > > > >
> > >
> >
>

Re: [DISCUSS] PIP-240 A new API to unload subscriptions

Posted by Joe F <jo...@gmail.com>.
Inclined to agree with Enrico.  If it's a hard problem, it will repeat, and
this is not helping.  If it's some race on the client, it will occur
randomly and rarely, and this unload sub will get programmed in as a way of
life.

>If you don't think unloading the subscription can't help anything.
Unloading
the topic should be the same. From my experience, most of the unloading
topic operations are to mitigate the problems related to message
consumption.

Comparisons with unloading a topic are not the bar here, as that is a first
class broker utility that is needed for operational reasons outside of
"fixing"  consumer side issues . The side effect of using "unload topic" is
a loss of transient topic state. I will fully agree that this side-effect
has been  pervasively abused for fixing problems (ala Ctlrl-Alt-Del) , but
that's not the rationale for having an unload topic utility.

What kind of problems is this trying to fix?
And why cannot that be solved by client-side fixes?

In shared sub issues, it's hard to  pinpoint which consumer/where
the problem lies, and to reset that one at the client. The totality of
state spread between the brokers and all the consumers of the shared sub
needs to be put together .  Is that why we are doing this?


On Tue, Jan 17, 2023 at 5:30 PM PengHui Li <pe...@apache.org> wrote:

> I agree that if we encounter a stuck consumption issue, we should continue
> to find the root cause of the problem.
>
> Subscription unloading is just an option to mitigate the impact first.
> Maybe it can mitigate the issue for 1 hour sometimes. Especially in
> key_shared subscription. Sometimes it's not a BUG from Pulsar.
> But users need time to fix the issue. But it doesn't make sense to let
> the impaction continues until the fix is applied.
>
> I also helped many people to troubleshoot the stuck consumption
> issue related to key_shared subscriptions and transactions etc.
> In most cases, unloading the topic can mitigate the impact.
> For example, due to the un-catched exception, the dispatch thread
> stopped reading messages from the managed-ledger. The exception
> is a very infrequent occurrence. Unloading the topic is the best choice for
> now, right?
>
> If you don't think unloading the subscription can't help anything.
> Unloading
> the topic should be the same. From my experience, most of the unloading
> topic operations are to mitigate the problems related to message
> consumption.
>
> Best,
> Penghui
>
> On Tue, Jan 17, 2023 at 11:09 PM Enrico Olivelli <eo...@gmail.com>
> wrote:
>
> > Il giorno lun 16 gen 2023 alle ore 11:58 rxl@apache.org
> > <ra...@gmail.com> ha scritto:
> > >
> > > I agree with @Enrico @Bo, if we encounter a subscribe stuck situation,
> we
> > > must continue to spend more time to locate and fix this problem, which
> is
> > > what we have been doing.
> > >
> > > But let's think about this problem from another angle. At this time, a
> > user
> > > in the production environment encounters a consumer stuck situation,
> what
> > > should we do? For a user in a production environment, our first
> reaction
> > > when encountering a problem is how to quickly recover and how to
> quickly
> > > reduce user losses. Even at this point in time, we don't think about
> > > whether this is a bug on the Broker side, a bug on the SDK side, or a
> bug
> > > used by the user himself? In the process of fast recovery, our most
> > common
> > > method is to quickly re-establish the connection between the broker and
> > the
> > > client through the topic specified by unload. In this process, we try
> to
> > > retain as much context as possible to assist us in the subsequent
> > > continuous positioning and repair of this problem.
> > >
> > > So I don't think these two things conflict. Why we expose the admin CLI
> > of
> > > the unload topic is why we expect to expose the unload subscribe. If we
> > > stand from the perspective of a developer, we definitely want to
> > completely
> > > fix the problem that caused the stuck. If we think about this issue
> from
> > > the perspective of the user, when a scenario such as consumer stuck
> > occurs
> > > to the user, the user does not care about the specific cause of the
> > > problem, but expects the business to recover quickly in the shortest
> > > possible time to avoid further loss.
> > >
> > > I admit that this is a relatively hacky way, but it can indeed solve
> the
> > > problems we are currently encountering, and at the same time, it will
> not
> > > cause a major conflict with Pulsar's existing logic. So I still insist
> on
> > > agreeing with yubiao's point of view.
> >
> >
> >
> > Usually when a subscription is "stuck" even if you unload the topic
> > it returns to the "stuck" state again if you don't solve the problem.
> >
> > This is a very common issue with Pulsar users, I am spending much time
> > helping users to troubleshoot their production problems and unloading the
> > topic
> > is never a solution, it can give you seconds, minutes or hours of
> > "working state",
> > then the problem will happen again.
> >
> > You say that it can solve the problems you are encountering.
> > Could you please give more context ? (in Slack if this is not
> > something that can be discussed in public)
> > I apologise if I seem  too much of a skeptic this time, I am sure that
> > you have a real problem
> > and you want to fix it, but I would like to help you find the best way.
> >
> > Pulsar is used by many people and we shouldn't add hacky tools for
> > temporary workarounds.
> > Once we deliver an API we should maintain it for an unlimited time.
> >
> > You could patch your system and use the patched version temporarily
> > until you find the root case.
> >
> > Enrico
> >
> > >
> > > --
> > > Thanks
> > > Xiaolong Ran
> > >
> > >
> > > Yubiao Feng <yu...@streamnative.io.invalid> 于2023年1月15日周日
> 20:59写道:
> > >
> > > > Hi Qiang
> > > >
> > > > > 1. How do you handle the race condition when you are trying to
> > unload the
> > > > subscription, and the new consumer wants to subscribe to this
> > subscription
> > > > at the same time? I'm unsure if it has the race condition. I just
> want
> > to
> > > > remind you about that.:)
> > > >
> > > > These methods `addConsumer`, `removeConsumer` all have synchronized
> > locks,
> > > > we also add synchronized lock when executing `reset subscription` can
> > solve
> > > > the problem.
> > > >
> > > > > 2. Would you like to add some restful API design to clarify the
> > > > implementation?
> > > >
> > > > Already added the rest API design in the proposal
> > > > https://github.com/apache/pulsar/issues/19187
> > > >
> > > > On Thu, Jan 12, 2023 at 3:22 PM <ma...@gmail.com> wrote:
> > > >
> > > > > Hi, Yubiao
> > > > >
> > > > > I agree with this idea because some users care about the production
> > rate.
> > > > > They don't want to unload the whole topic to fix the subscription
> > > > problem.
> > > > >
> > > > > I've got some questions:
> > > > >
> > > > > 1. How do you handle the race condition when you are trying to
> > unload the
> > > > > subscription, and the new consumer wants to subscribe to this
> > > > subscription
> > > > > at the same time? I'm unsure if it has the race condition. I just
> > want to
> > > > > remind you about that. :)
> > > > > 2. Would you like to add some restful API design to clarify the
> > > > > implementation?
> > > > >     a. Request method
> > > > >     b. Request path
> > > > >     c. Response code
> > > > >     d. etc.
> > > > >
> > > > >
> > > > > Thanks for your work.
> > > > > Mattison
> > > > > On Jan 11, 2023, 17:01 +0800, Yubiao Feng <
> > yubiao.feng@streamnative.io
> > > > .invalid>,
> > > > > wrote:
> > > > > > Hi community
> > > > > >
> > > > > > I am starting a DISCUSS for PIP-240: A new API to unload
> > subscriptions.
> > > > > >
> > > > > > PIP issue: https://github.com/apache/pulsar/issues/19187
> > > > > >
> > > > > > ### Motivation
> > > > > >
> > > > > > We sometimes try to unload the topic to resolve some
> > consumption-stop
> > > > > > issues. But the unloading topic will also impact the producer
> side.
> > > > > >
> > > > > > ### Goal
> > > > > >
> > > > > > Providing a new API to unload the subscription dimension triggers
> > > > > > reconnection of all consumers on that subscription and
> > reconnection is
> > > > > > guaranteed by the client. The API will be used in these ways:
> > > > > > - unload special subscription of one topic(or partitioned topic)
> > > > > > - unload all subscriptions of one topic(or partitioned topic)
> > > > > > - unload subscriptions of one topic(or partitioned topic) by
> > regular
> > > > > > expression
> > > > > > - If a reader's subscription name is not set, a random
> subscription
> > > > name
> > > > > > prefixed with 'multiTopicsReader-' or 'reader-' will be used, and
> > users
> > > > > can
> > > > > > uninstall these subscriptions using regular expressions.
> > > > > >
> > > > > > In addition to triggering consumer disconnection, Unloading
> > Subscribers
> > > > > > will restart the Dispatcher, which resets the redeliver message
> > queue
> > > > and
> > > > > > delayed message queue in the Broker's memory, which can help
> > resolve
> > > > > issues
> > > > > > caused by an abnormal dispatcher state. However, the execution
> > flow of
> > > > > > Unloading Subscribers does not include a restart of the Managed
> > Cursor
> > > > > > related to this dispatcher; if there is a problem with the
> cursor,
> > we
> > > > can
> > > > > > only rely on the unload topic to solve it.
> > > > > >
> > > > > > Note: From the client's perspective, this connection may be
> shared
> > by
> > > > > > consumers, producers, and transactions, so Unloading Subscribers
> > maybe
> > > > > > impact the producer and transaction.
> > > > > >
> > > > > > #### These scenarios are not supported
> > > > > > - Functions `message-dedup`, `geo-replication,` and
> `shadow-topic`
> > also
> > > > > > read messages from the topic, but Unloading subscribers will not
> > > > support
> > > > > > triggering restarts of these three functions( because the cursor
> is
> > > > used
> > > > > > directly to read the data in these scenarios, not the consumer or
> > > > reader
> > > > > ).
> > > > > > - The Compression task(subscription name is `__compaction`) also
> > use a
> > > > > > reader to read data, but Unloading Subscribers does not support
> it
> > > > > because
> > > > > > this task creates a new reader each time it starts.
> > > > > > - Do not support all topics related to Transaction features.
> > > > > > - `__transaction_buffer_snapshot` works with the task TB recover,
> > and
> > > > > > this task will create a new reader each time they start.
> > > > > > - `__transaction_pending_ack` works with the task Transaction
> > Pending
> > > > Ack
> > > > > > Store replay, and this task will use managed cursor directly to
> > read
> > > > > data.
> > > > > > - `__transaction_log_xxx` works with the task Transaction Log,
> > which
> > > > will
> > > > > > use managed cursor directly to read data.
> > > > > > - `transaction_coordinator_assign` No data will be written on
> this
> > > > topic.
> > > > > >
> > > > > > #### Special system topic supports
> > > > > > The system topic `__change_events` is used to support topic-level
> > > > > policies,
> > > > > > there may also be some message delivery issues in this scenario,
> so
> > > > > > Unloading Subscribers will support this topic.
> > > > > >
> > > > > > ### API Changes
> > > > > >
> > > > > > #### For persistent topic
> > > > > > ```
> > > > > > pulsar-admin persistent unload {topic_name} -s {sub_name}
> > > > > > ```
> > > > > >
> > > > > > #### For non-persistent topic
> > > > > > ```
> > > > > > pulsar-admin non-persistent unload {topic_name} -s {sub_name}
> > > > > > ```
> > > > > >
> > > > > > #### Explain the param `-s`
> > > > > > - set param `-s` to special sub name to unload special
> subscription
> > > > > > - set param `-s` to `**` to unload all subscriptions under this
> > topic
> > > > > > - set param `-s` to `regexp` to unload a batch subscriptions
> under
> > this
> > > > > > topic
> > > > > >
> > > > > >
> > > > > > Thanks
> > > > > > Yubiao Feng
> > > > >
> > > >
> >
>

Re: [DISCUSS] PIP-240 A new API to unload subscriptions

Posted by PengHui Li <pe...@apache.org>.
I agree that if we encounter a stuck consumption issue, we should continue
to find the root cause of the problem.

Subscription unloading is just an option to mitigate the impact first.
Maybe it can mitigate the issue for 1 hour sometimes. Especially in
key_shared subscription. Sometimes it's not a BUG from Pulsar.
But users need time to fix the issue. But it doesn't make sense to let
the impaction continues until the fix is applied.

I also helped many people to troubleshoot the stuck consumption
issue related to key_shared subscriptions and transactions etc.
In most cases, unloading the topic can mitigate the impact.
For example, due to the un-catched exception, the dispatch thread
stopped reading messages from the managed-ledger. The exception
is a very infrequent occurrence. Unloading the topic is the best choice for
now, right?

If you don't think unloading the subscription can't help anything. Unloading
the topic should be the same. From my experience, most of the unloading
topic operations are to mitigate the problems related to message
consumption.

Best,
Penghui

On Tue, Jan 17, 2023 at 11:09 PM Enrico Olivelli <eo...@gmail.com>
wrote:

> Il giorno lun 16 gen 2023 alle ore 11:58 rxl@apache.org
> <ra...@gmail.com> ha scritto:
> >
> > I agree with @Enrico @Bo, if we encounter a subscribe stuck situation, we
> > must continue to spend more time to locate and fix this problem, which is
> > what we have been doing.
> >
> > But let's think about this problem from another angle. At this time, a
> user
> > in the production environment encounters a consumer stuck situation, what
> > should we do? For a user in a production environment, our first reaction
> > when encountering a problem is how to quickly recover and how to quickly
> > reduce user losses. Even at this point in time, we don't think about
> > whether this is a bug on the Broker side, a bug on the SDK side, or a bug
> > used by the user himself? In the process of fast recovery, our most
> common
> > method is to quickly re-establish the connection between the broker and
> the
> > client through the topic specified by unload. In this process, we try to
> > retain as much context as possible to assist us in the subsequent
> > continuous positioning and repair of this problem.
> >
> > So I don't think these two things conflict. Why we expose the admin CLI
> of
> > the unload topic is why we expect to expose the unload subscribe. If we
> > stand from the perspective of a developer, we definitely want to
> completely
> > fix the problem that caused the stuck. If we think about this issue from
> > the perspective of the user, when a scenario such as consumer stuck
> occurs
> > to the user, the user does not care about the specific cause of the
> > problem, but expects the business to recover quickly in the shortest
> > possible time to avoid further loss.
> >
> > I admit that this is a relatively hacky way, but it can indeed solve the
> > problems we are currently encountering, and at the same time, it will not
> > cause a major conflict with Pulsar's existing logic. So I still insist on
> > agreeing with yubiao's point of view.
>
>
>
> Usually when a subscription is "stuck" even if you unload the topic
> it returns to the "stuck" state again if you don't solve the problem.
>
> This is a very common issue with Pulsar users, I am spending much time
> helping users to troubleshoot their production problems and unloading the
> topic
> is never a solution, it can give you seconds, minutes or hours of
> "working state",
> then the problem will happen again.
>
> You say that it can solve the problems you are encountering.
> Could you please give more context ? (in Slack if this is not
> something that can be discussed in public)
> I apologise if I seem  too much of a skeptic this time, I am sure that
> you have a real problem
> and you want to fix it, but I would like to help you find the best way.
>
> Pulsar is used by many people and we shouldn't add hacky tools for
> temporary workarounds.
> Once we deliver an API we should maintain it for an unlimited time.
>
> You could patch your system and use the patched version temporarily
> until you find the root case.
>
> Enrico
>
> >
> > --
> > Thanks
> > Xiaolong Ran
> >
> >
> > Yubiao Feng <yu...@streamnative.io.invalid> 于2023年1月15日周日 20:59写道:
> >
> > > Hi Qiang
> > >
> > > > 1. How do you handle the race condition when you are trying to
> unload the
> > > subscription, and the new consumer wants to subscribe to this
> subscription
> > > at the same time? I'm unsure if it has the race condition. I just want
> to
> > > remind you about that.:)
> > >
> > > These methods `addConsumer`, `removeConsumer` all have synchronized
> locks,
> > > we also add synchronized lock when executing `reset subscription` can
> solve
> > > the problem.
> > >
> > > > 2. Would you like to add some restful API design to clarify the
> > > implementation?
> > >
> > > Already added the rest API design in the proposal
> > > https://github.com/apache/pulsar/issues/19187
> > >
> > > On Thu, Jan 12, 2023 at 3:22 PM <ma...@gmail.com> wrote:
> > >
> > > > Hi, Yubiao
> > > >
> > > > I agree with this idea because some users care about the production
> rate.
> > > > They don't want to unload the whole topic to fix the subscription
> > > problem.
> > > >
> > > > I've got some questions:
> > > >
> > > > 1. How do you handle the race condition when you are trying to
> unload the
> > > > subscription, and the new consumer wants to subscribe to this
> > > subscription
> > > > at the same time? I'm unsure if it has the race condition. I just
> want to
> > > > remind you about that. :)
> > > > 2. Would you like to add some restful API design to clarify the
> > > > implementation?
> > > >     a. Request method
> > > >     b. Request path
> > > >     c. Response code
> > > >     d. etc.
> > > >
> > > >
> > > > Thanks for your work.
> > > > Mattison
> > > > On Jan 11, 2023, 17:01 +0800, Yubiao Feng <
> yubiao.feng@streamnative.io
> > > .invalid>,
> > > > wrote:
> > > > > Hi community
> > > > >
> > > > > I am starting a DISCUSS for PIP-240: A new API to unload
> subscriptions.
> > > > >
> > > > > PIP issue: https://github.com/apache/pulsar/issues/19187
> > > > >
> > > > > ### Motivation
> > > > >
> > > > > We sometimes try to unload the topic to resolve some
> consumption-stop
> > > > > issues. But the unloading topic will also impact the producer side.
> > > > >
> > > > > ### Goal
> > > > >
> > > > > Providing a new API to unload the subscription dimension triggers
> > > > > reconnection of all consumers on that subscription and
> reconnection is
> > > > > guaranteed by the client. The API will be used in these ways:
> > > > > - unload special subscription of one topic(or partitioned topic)
> > > > > - unload all subscriptions of one topic(or partitioned topic)
> > > > > - unload subscriptions of one topic(or partitioned topic) by
> regular
> > > > > expression
> > > > > - If a reader's subscription name is not set, a random subscription
> > > name
> > > > > prefixed with 'multiTopicsReader-' or 'reader-' will be used, and
> users
> > > > can
> > > > > uninstall these subscriptions using regular expressions.
> > > > >
> > > > > In addition to triggering consumer disconnection, Unloading
> Subscribers
> > > > > will restart the Dispatcher, which resets the redeliver message
> queue
> > > and
> > > > > delayed message queue in the Broker's memory, which can help
> resolve
> > > > issues
> > > > > caused by an abnormal dispatcher state. However, the execution
> flow of
> > > > > Unloading Subscribers does not include a restart of the Managed
> Cursor
> > > > > related to this dispatcher; if there is a problem with the cursor,
> we
> > > can
> > > > > only rely on the unload topic to solve it.
> > > > >
> > > > > Note: From the client's perspective, this connection may be shared
> by
> > > > > consumers, producers, and transactions, so Unloading Subscribers
> maybe
> > > > > impact the producer and transaction.
> > > > >
> > > > > #### These scenarios are not supported
> > > > > - Functions `message-dedup`, `geo-replication,` and `shadow-topic`
> also
> > > > > read messages from the topic, but Unloading subscribers will not
> > > support
> > > > > triggering restarts of these three functions( because the cursor is
> > > used
> > > > > directly to read the data in these scenarios, not the consumer or
> > > reader
> > > > ).
> > > > > - The Compression task(subscription name is `__compaction`) also
> use a
> > > > > reader to read data, but Unloading Subscribers does not support it
> > > > because
> > > > > this task creates a new reader each time it starts.
> > > > > - Do not support all topics related to Transaction features.
> > > > > - `__transaction_buffer_snapshot` works with the task TB recover,
> and
> > > > > this task will create a new reader each time they start.
> > > > > - `__transaction_pending_ack` works with the task Transaction
> Pending
> > > Ack
> > > > > Store replay, and this task will use managed cursor directly to
> read
> > > > data.
> > > > > - `__transaction_log_xxx` works with the task Transaction Log,
> which
> > > will
> > > > > use managed cursor directly to read data.
> > > > > - `transaction_coordinator_assign` No data will be written on this
> > > topic.
> > > > >
> > > > > #### Special system topic supports
> > > > > The system topic `__change_events` is used to support topic-level
> > > > policies,
> > > > > there may also be some message delivery issues in this scenario, so
> > > > > Unloading Subscribers will support this topic.
> > > > >
> > > > > ### API Changes
> > > > >
> > > > > #### For persistent topic
> > > > > ```
> > > > > pulsar-admin persistent unload {topic_name} -s {sub_name}
> > > > > ```
> > > > >
> > > > > #### For non-persistent topic
> > > > > ```
> > > > > pulsar-admin non-persistent unload {topic_name} -s {sub_name}
> > > > > ```
> > > > >
> > > > > #### Explain the param `-s`
> > > > > - set param `-s` to special sub name to unload special subscription
> > > > > - set param `-s` to `**` to unload all subscriptions under this
> topic
> > > > > - set param `-s` to `regexp` to unload a batch subscriptions under
> this
> > > > > topic
> > > > >
> > > > >
> > > > > Thanks
> > > > > Yubiao Feng
> > > >
> > >
>

Re: [DISCUSS] PIP-240 A new API to unload subscriptions

Posted by Enrico Olivelli <eo...@gmail.com>.
Il giorno lun 16 gen 2023 alle ore 11:58 rxl@apache.org
<ra...@gmail.com> ha scritto:
>
> I agree with @Enrico @Bo, if we encounter a subscribe stuck situation, we
> must continue to spend more time to locate and fix this problem, which is
> what we have been doing.
>
> But let's think about this problem from another angle. At this time, a user
> in the production environment encounters a consumer stuck situation, what
> should we do? For a user in a production environment, our first reaction
> when encountering a problem is how to quickly recover and how to quickly
> reduce user losses. Even at this point in time, we don't think about
> whether this is a bug on the Broker side, a bug on the SDK side, or a bug
> used by the user himself? In the process of fast recovery, our most common
> method is to quickly re-establish the connection between the broker and the
> client through the topic specified by unload. In this process, we try to
> retain as much context as possible to assist us in the subsequent
> continuous positioning and repair of this problem.
>
> So I don't think these two things conflict. Why we expose the admin CLI of
> the unload topic is why we expect to expose the unload subscribe. If we
> stand from the perspective of a developer, we definitely want to completely
> fix the problem that caused the stuck. If we think about this issue from
> the perspective of the user, when a scenario such as consumer stuck occurs
> to the user, the user does not care about the specific cause of the
> problem, but expects the business to recover quickly in the shortest
> possible time to avoid further loss.
>
> I admit that this is a relatively hacky way, but it can indeed solve the
> problems we are currently encountering, and at the same time, it will not
> cause a major conflict with Pulsar's existing logic. So I still insist on
> agreeing with yubiao's point of view.



Usually when a subscription is "stuck" even if you unload the topic
it returns to the "stuck" state again if you don't solve the problem.

This is a very common issue with Pulsar users, I am spending much time
helping users to troubleshoot their production problems and unloading the topic
is never a solution, it can give you seconds, minutes or hours of
"working state",
then the problem will happen again.

You say that it can solve the problems you are encountering.
Could you please give more context ? (in Slack if this is not
something that can be discussed in public)
I apologise if I seem  too much of a skeptic this time, I am sure that
you have a real problem
and you want to fix it, but I would like to help you find the best way.

Pulsar is used by many people and we shouldn't add hacky tools for
temporary workarounds.
Once we deliver an API we should maintain it for an unlimited time.

You could patch your system and use the patched version temporarily
until you find the root case.

Enrico

>
> --
> Thanks
> Xiaolong Ran
>
>
> Yubiao Feng <yu...@streamnative.io.invalid> 于2023年1月15日周日 20:59写道:
>
> > Hi Qiang
> >
> > > 1. How do you handle the race condition when you are trying to unload the
> > subscription, and the new consumer wants to subscribe to this subscription
> > at the same time? I'm unsure if it has the race condition. I just want to
> > remind you about that.:)
> >
> > These methods `addConsumer`, `removeConsumer` all have synchronized locks,
> > we also add synchronized lock when executing `reset subscription` can solve
> > the problem.
> >
> > > 2. Would you like to add some restful API design to clarify the
> > implementation?
> >
> > Already added the rest API design in the proposal
> > https://github.com/apache/pulsar/issues/19187
> >
> > On Thu, Jan 12, 2023 at 3:22 PM <ma...@gmail.com> wrote:
> >
> > > Hi, Yubiao
> > >
> > > I agree with this idea because some users care about the production rate.
> > > They don't want to unload the whole topic to fix the subscription
> > problem.
> > >
> > > I've got some questions:
> > >
> > > 1. How do you handle the race condition when you are trying to unload the
> > > subscription, and the new consumer wants to subscribe to this
> > subscription
> > > at the same time? I'm unsure if it has the race condition. I just want to
> > > remind you about that. :)
> > > 2. Would you like to add some restful API design to clarify the
> > > implementation?
> > >     a. Request method
> > >     b. Request path
> > >     c. Response code
> > >     d. etc.
> > >
> > >
> > > Thanks for your work.
> > > Mattison
> > > On Jan 11, 2023, 17:01 +0800, Yubiao Feng <yubiao.feng@streamnative.io
> > .invalid>,
> > > wrote:
> > > > Hi community
> > > >
> > > > I am starting a DISCUSS for PIP-240: A new API to unload subscriptions.
> > > >
> > > > PIP issue: https://github.com/apache/pulsar/issues/19187
> > > >
> > > > ### Motivation
> > > >
> > > > We sometimes try to unload the topic to resolve some consumption-stop
> > > > issues. But the unloading topic will also impact the producer side.
> > > >
> > > > ### Goal
> > > >
> > > > Providing a new API to unload the subscription dimension triggers
> > > > reconnection of all consumers on that subscription and reconnection is
> > > > guaranteed by the client. The API will be used in these ways:
> > > > - unload special subscription of one topic(or partitioned topic)
> > > > - unload all subscriptions of one topic(or partitioned topic)
> > > > - unload subscriptions of one topic(or partitioned topic) by regular
> > > > expression
> > > > - If a reader's subscription name is not set, a random subscription
> > name
> > > > prefixed with 'multiTopicsReader-' or 'reader-' will be used, and users
> > > can
> > > > uninstall these subscriptions using regular expressions.
> > > >
> > > > In addition to triggering consumer disconnection, Unloading Subscribers
> > > > will restart the Dispatcher, which resets the redeliver message queue
> > and
> > > > delayed message queue in the Broker's memory, which can help resolve
> > > issues
> > > > caused by an abnormal dispatcher state. However, the execution flow of
> > > > Unloading Subscribers does not include a restart of the Managed Cursor
> > > > related to this dispatcher; if there is a problem with the cursor, we
> > can
> > > > only rely on the unload topic to solve it.
> > > >
> > > > Note: From the client's perspective, this connection may be shared by
> > > > consumers, producers, and transactions, so Unloading Subscribers maybe
> > > > impact the producer and transaction.
> > > >
> > > > #### These scenarios are not supported
> > > > - Functions `message-dedup`, `geo-replication,` and `shadow-topic` also
> > > > read messages from the topic, but Unloading subscribers will not
> > support
> > > > triggering restarts of these three functions( because the cursor is
> > used
> > > > directly to read the data in these scenarios, not the consumer or
> > reader
> > > ).
> > > > - The Compression task(subscription name is `__compaction`) also use a
> > > > reader to read data, but Unloading Subscribers does not support it
> > > because
> > > > this task creates a new reader each time it starts.
> > > > - Do not support all topics related to Transaction features.
> > > > - `__transaction_buffer_snapshot` works with the task TB recover, and
> > > > this task will create a new reader each time they start.
> > > > - `__transaction_pending_ack` works with the task Transaction Pending
> > Ack
> > > > Store replay, and this task will use managed cursor directly to read
> > > data.
> > > > - `__transaction_log_xxx` works with the task Transaction Log, which
> > will
> > > > use managed cursor directly to read data.
> > > > - `transaction_coordinator_assign` No data will be written on this
> > topic.
> > > >
> > > > #### Special system topic supports
> > > > The system topic `__change_events` is used to support topic-level
> > > policies,
> > > > there may also be some message delivery issues in this scenario, so
> > > > Unloading Subscribers will support this topic.
> > > >
> > > > ### API Changes
> > > >
> > > > #### For persistent topic
> > > > ```
> > > > pulsar-admin persistent unload {topic_name} -s {sub_name}
> > > > ```
> > > >
> > > > #### For non-persistent topic
> > > > ```
> > > > pulsar-admin non-persistent unload {topic_name} -s {sub_name}
> > > > ```
> > > >
> > > > #### Explain the param `-s`
> > > > - set param `-s` to special sub name to unload special subscription
> > > > - set param `-s` to `**` to unload all subscriptions under this topic
> > > > - set param `-s` to `regexp` to unload a batch subscriptions under this
> > > > topic
> > > >
> > > >
> > > > Thanks
> > > > Yubiao Feng
> > >
> >

Re: [DISCUSS] PIP-240 A new API to unload subscriptions

Posted by "rxl@apache.org" <ra...@gmail.com>.
I agree with @Enrico @Bo, if we encounter a subscribe stuck situation, we
must continue to spend more time to locate and fix this problem, which is
what we have been doing.

But let's think about this problem from another angle. At this time, a user
in the production environment encounters a consumer stuck situation, what
should we do? For a user in a production environment, our first reaction
when encountering a problem is how to quickly recover and how to quickly
reduce user losses. Even at this point in time, we don't think about
whether this is a bug on the Broker side, a bug on the SDK side, or a bug
used by the user himself? In the process of fast recovery, our most common
method is to quickly re-establish the connection between the broker and the
client through the topic specified by unload. In this process, we try to
retain as much context as possible to assist us in the subsequent
continuous positioning and repair of this problem.

So I don't think these two things conflict. Why we expose the admin CLI of
the unload topic is why we expect to expose the unload subscribe. If we
stand from the perspective of a developer, we definitely want to completely
fix the problem that caused the stuck. If we think about this issue from
the perspective of the user, when a scenario such as consumer stuck occurs
to the user, the user does not care about the specific cause of the
problem, but expects the business to recover quickly in the shortest
possible time to avoid further loss.

I admit that this is a relatively hacky way, but it can indeed solve the
problems we are currently encountering, and at the same time, it will not
cause a major conflict with Pulsar's existing logic. So I still insist on
agreeing with yubiao's point of view.

--
Thanks
Xiaolong Ran


Yubiao Feng <yu...@streamnative.io.invalid> 于2023年1月15日周日 20:59写道:

> Hi Qiang
>
> > 1. How do you handle the race condition when you are trying to unload the
> subscription, and the new consumer wants to subscribe to this subscription
> at the same time? I'm unsure if it has the race condition. I just want to
> remind you about that.:)
>
> These methods `addConsumer`, `removeConsumer` all have synchronized locks,
> we also add synchronized lock when executing `reset subscription` can solve
> the problem.
>
> > 2. Would you like to add some restful API design to clarify the
> implementation?
>
> Already added the rest API design in the proposal
> https://github.com/apache/pulsar/issues/19187
>
> On Thu, Jan 12, 2023 at 3:22 PM <ma...@gmail.com> wrote:
>
> > Hi, Yubiao
> >
> > I agree with this idea because some users care about the production rate.
> > They don't want to unload the whole topic to fix the subscription
> problem.
> >
> > I've got some questions:
> >
> > 1. How do you handle the race condition when you are trying to unload the
> > subscription, and the new consumer wants to subscribe to this
> subscription
> > at the same time? I'm unsure if it has the race condition. I just want to
> > remind you about that. :)
> > 2. Would you like to add some restful API design to clarify the
> > implementation?
> >     a. Request method
> >     b. Request path
> >     c. Response code
> >     d. etc.
> >
> >
> > Thanks for your work.
> > Mattison
> > On Jan 11, 2023, 17:01 +0800, Yubiao Feng <yubiao.feng@streamnative.io
> .invalid>,
> > wrote:
> > > Hi community
> > >
> > > I am starting a DISCUSS for PIP-240: A new API to unload subscriptions.
> > >
> > > PIP issue: https://github.com/apache/pulsar/issues/19187
> > >
> > > ### Motivation
> > >
> > > We sometimes try to unload the topic to resolve some consumption-stop
> > > issues. But the unloading topic will also impact the producer side.
> > >
> > > ### Goal
> > >
> > > Providing a new API to unload the subscription dimension triggers
> > > reconnection of all consumers on that subscription and reconnection is
> > > guaranteed by the client. The API will be used in these ways:
> > > - unload special subscription of one topic(or partitioned topic)
> > > - unload all subscriptions of one topic(or partitioned topic)
> > > - unload subscriptions of one topic(or partitioned topic) by regular
> > > expression
> > > - If a reader's subscription name is not set, a random subscription
> name
> > > prefixed with 'multiTopicsReader-' or 'reader-' will be used, and users
> > can
> > > uninstall these subscriptions using regular expressions.
> > >
> > > In addition to triggering consumer disconnection, Unloading Subscribers
> > > will restart the Dispatcher, which resets the redeliver message queue
> and
> > > delayed message queue in the Broker's memory, which can help resolve
> > issues
> > > caused by an abnormal dispatcher state. However, the execution flow of
> > > Unloading Subscribers does not include a restart of the Managed Cursor
> > > related to this dispatcher; if there is a problem with the cursor, we
> can
> > > only rely on the unload topic to solve it.
> > >
> > > Note: From the client's perspective, this connection may be shared by
> > > consumers, producers, and transactions, so Unloading Subscribers maybe
> > > impact the producer and transaction.
> > >
> > > #### These scenarios are not supported
> > > - Functions `message-dedup`, `geo-replication,` and `shadow-topic` also
> > > read messages from the topic, but Unloading subscribers will not
> support
> > > triggering restarts of these three functions( because the cursor is
> used
> > > directly to read the data in these scenarios, not the consumer or
> reader
> > ).
> > > - The Compression task(subscription name is `__compaction`) also use a
> > > reader to read data, but Unloading Subscribers does not support it
> > because
> > > this task creates a new reader each time it starts.
> > > - Do not support all topics related to Transaction features.
> > > - `__transaction_buffer_snapshot` works with the task TB recover, and
> > > this task will create a new reader each time they start.
> > > - `__transaction_pending_ack` works with the task Transaction Pending
> Ack
> > > Store replay, and this task will use managed cursor directly to read
> > data.
> > > - `__transaction_log_xxx` works with the task Transaction Log, which
> will
> > > use managed cursor directly to read data.
> > > - `transaction_coordinator_assign` No data will be written on this
> topic.
> > >
> > > #### Special system topic supports
> > > The system topic `__change_events` is used to support topic-level
> > policies,
> > > there may also be some message delivery issues in this scenario, so
> > > Unloading Subscribers will support this topic.
> > >
> > > ### API Changes
> > >
> > > #### For persistent topic
> > > ```
> > > pulsar-admin persistent unload {topic_name} -s {sub_name}
> > > ```
> > >
> > > #### For non-persistent topic
> > > ```
> > > pulsar-admin non-persistent unload {topic_name} -s {sub_name}
> > > ```
> > >
> > > #### Explain the param `-s`
> > > - set param `-s` to special sub name to unload special subscription
> > > - set param `-s` to `**` to unload all subscriptions under this topic
> > > - set param `-s` to `regexp` to unload a batch subscriptions under this
> > > topic
> > >
> > >
> > > Thanks
> > > Yubiao Feng
> >
>

Re: [DISCUSS] PIP-240 A new API to unload subscriptions

Posted by Yubiao Feng <yu...@streamnative.io.INVALID>.
Hi Qiang

> 1. How do you handle the race condition when you are trying to unload the
subscription, and the new consumer wants to subscribe to this subscription
at the same time? I'm unsure if it has the race condition. I just want to
remind you about that.:)

These methods `addConsumer`, `removeConsumer` all have synchronized locks,
we also add synchronized lock when executing `reset subscription` can solve
the problem.

> 2. Would you like to add some restful API design to clarify the
implementation?

Already added the rest API design in the proposal
https://github.com/apache/pulsar/issues/19187

On Thu, Jan 12, 2023 at 3:22 PM <ma...@gmail.com> wrote:

> Hi, Yubiao
>
> I agree with this idea because some users care about the production rate.
> They don't want to unload the whole topic to fix the subscription problem.
>
> I've got some questions:
>
> 1. How do you handle the race condition when you are trying to unload the
> subscription, and the new consumer wants to subscribe to this subscription
> at the same time? I'm unsure if it has the race condition. I just want to
> remind you about that. :)
> 2. Would you like to add some restful API design to clarify the
> implementation?
>     a. Request method
>     b. Request path
>     c. Response code
>     d. etc.
>
>
> Thanks for your work.
> Mattison
> On Jan 11, 2023, 17:01 +0800, Yubiao Feng <yu...@streamnative.io.invalid>,
> wrote:
> > Hi community
> >
> > I am starting a DISCUSS for PIP-240: A new API to unload subscriptions.
> >
> > PIP issue: https://github.com/apache/pulsar/issues/19187
> >
> > ### Motivation
> >
> > We sometimes try to unload the topic to resolve some consumption-stop
> > issues. But the unloading topic will also impact the producer side.
> >
> > ### Goal
> >
> > Providing a new API to unload the subscription dimension triggers
> > reconnection of all consumers on that subscription and reconnection is
> > guaranteed by the client. The API will be used in these ways:
> > - unload special subscription of one topic(or partitioned topic)
> > - unload all subscriptions of one topic(or partitioned topic)
> > - unload subscriptions of one topic(or partitioned topic) by regular
> > expression
> > - If a reader's subscription name is not set, a random subscription name
> > prefixed with 'multiTopicsReader-' or 'reader-' will be used, and users
> can
> > uninstall these subscriptions using regular expressions.
> >
> > In addition to triggering consumer disconnection, Unloading Subscribers
> > will restart the Dispatcher, which resets the redeliver message queue and
> > delayed message queue in the Broker's memory, which can help resolve
> issues
> > caused by an abnormal dispatcher state. However, the execution flow of
> > Unloading Subscribers does not include a restart of the Managed Cursor
> > related to this dispatcher; if there is a problem with the cursor, we can
> > only rely on the unload topic to solve it.
> >
> > Note: From the client's perspective, this connection may be shared by
> > consumers, producers, and transactions, so Unloading Subscribers maybe
> > impact the producer and transaction.
> >
> > #### These scenarios are not supported
> > - Functions `message-dedup`, `geo-replication,` and `shadow-topic` also
> > read messages from the topic, but Unloading subscribers will not support
> > triggering restarts of these three functions( because the cursor is used
> > directly to read the data in these scenarios, not the consumer or reader
> ).
> > - The Compression task(subscription name is `__compaction`) also use a
> > reader to read data, but Unloading Subscribers does not support it
> because
> > this task creates a new reader each time it starts.
> > - Do not support all topics related to Transaction features.
> > - `__transaction_buffer_snapshot` works with the task TB recover, and
> > this task will create a new reader each time they start.
> > - `__transaction_pending_ack` works with the task Transaction Pending Ack
> > Store replay, and this task will use managed cursor directly to read
> data.
> > - `__transaction_log_xxx` works with the task Transaction Log, which will
> > use managed cursor directly to read data.
> > - `transaction_coordinator_assign` No data will be written on this topic.
> >
> > #### Special system topic supports
> > The system topic `__change_events` is used to support topic-level
> policies,
> > there may also be some message delivery issues in this scenario, so
> > Unloading Subscribers will support this topic.
> >
> > ### API Changes
> >
> > #### For persistent topic
> > ```
> > pulsar-admin persistent unload {topic_name} -s {sub_name}
> > ```
> >
> > #### For non-persistent topic
> > ```
> > pulsar-admin non-persistent unload {topic_name} -s {sub_name}
> > ```
> >
> > #### Explain the param `-s`
> > - set param `-s` to special sub name to unload special subscription
> > - set param `-s` to `**` to unload all subscriptions under this topic
> > - set param `-s` to `regexp` to unload a batch subscriptions under this
> > topic
> >
> >
> > Thanks
> > Yubiao Feng
>

Re: [DISCUSS] PIP-240 A new API to unload subscriptions

Posted by ma...@gmail.com.
Hi, Yubiao

I agree with this idea because some users care about the production rate. They don't want to unload the whole topic to fix the subscription problem.

I've got some questions:

1. How do you handle the race condition when you are trying to unload the subscription, and the new consumer wants to subscribe to this subscription at the same time? I'm unsure if it has the race condition. I just want to remind you about that. :)
2. Would you like to add some restful API design to clarify the implementation?
    a. Request method
    b. Request path
    c. Response code
    d. etc.


Thanks for your work.
Mattison
On Jan 11, 2023, 17:01 +0800, Yubiao Feng <yu...@streamnative.io.invalid>, wrote:
> Hi community
>
> I am starting a DISCUSS for PIP-240: A new API to unload subscriptions.
>
> PIP issue: https://github.com/apache/pulsar/issues/19187
>
> ### Motivation
>
> We sometimes try to unload the topic to resolve some consumption-stop
> issues. But the unloading topic will also impact the producer side.
>
> ### Goal
>
> Providing a new API to unload the subscription dimension triggers
> reconnection of all consumers on that subscription and reconnection is
> guaranteed by the client. The API will be used in these ways:
> - unload special subscription of one topic(or partitioned topic)
> - unload all subscriptions of one topic(or partitioned topic)
> - unload subscriptions of one topic(or partitioned topic) by regular
> expression
> - If a reader's subscription name is not set, a random subscription name
> prefixed with 'multiTopicsReader-' or 'reader-' will be used, and users can
> uninstall these subscriptions using regular expressions.
>
> In addition to triggering consumer disconnection, Unloading Subscribers
> will restart the Dispatcher, which resets the redeliver message queue and
> delayed message queue in the Broker's memory, which can help resolve issues
> caused by an abnormal dispatcher state. However, the execution flow of
> Unloading Subscribers does not include a restart of the Managed Cursor
> related to this dispatcher; if there is a problem with the cursor, we can
> only rely on the unload topic to solve it.
>
> Note: From the client's perspective, this connection may be shared by
> consumers, producers, and transactions, so Unloading Subscribers maybe
> impact the producer and transaction.
>
> #### These scenarios are not supported
> - Functions `message-dedup`, `geo-replication,` and `shadow-topic` also
> read messages from the topic, but Unloading subscribers will not support
> triggering restarts of these three functions( because the cursor is used
> directly to read the data in these scenarios, not the consumer or reader ).
> - The Compression task(subscription name is `__compaction`) also use a
> reader to read data, but Unloading Subscribers does not support it because
> this task creates a new reader each time it starts.
> - Do not support all topics related to Transaction features.
> - `__transaction_buffer_snapshot` works with the task TB recover, and
> this task will create a new reader each time they start.
> - `__transaction_pending_ack` works with the task Transaction Pending Ack
> Store replay, and this task will use managed cursor directly to read data.
> - `__transaction_log_xxx` works with the task Transaction Log, which will
> use managed cursor directly to read data.
> - `transaction_coordinator_assign` No data will be written on this topic.
>
> #### Special system topic supports
> The system topic `__change_events` is used to support topic-level policies,
> there may also be some message delivery issues in this scenario, so
> Unloading Subscribers will support this topic.
>
> ### API Changes
>
> #### For persistent topic
> ```
> pulsar-admin persistent unload {topic_name} -s {sub_name}
> ```
>
> #### For non-persistent topic
> ```
> pulsar-admin non-persistent unload {topic_name} -s {sub_name}
> ```
>
> #### Explain the param `-s`
> - set param `-s` to special sub name to unload special subscription
> - set param `-s` to `**` to unload all subscriptions under this topic
> - set param `-s` to `regexp` to unload a batch subscriptions under this
> topic
>
>
> Thanks
> Yubiao Feng

Re: [DISCUSS] PIP-240 A new API to unload subscriptions

Posted by Yubiao Feng <yu...@streamnative.io.INVALID>.
Hi @Enrico @Bo

> If there is a problem we should spend time on investigating the problem
and not in adding this kind of tools.

You are right. When a user encounters a problem, it often takes a while to
solve the root cause. It is important to provide a tool to recover services
quickly, and cmd 'unload topic' is often used to solve problems
temporarily.

But now a topic can be used by multiple teams, such as Business-team A to
produce messages, Business-team B and C to consume messages (each using a
different subscription name), and Data-team D to read messages using reader
API (using a random subscription name). When there is a consumption problem
in subscription B, if we do unload the topic, this will affect teams A, B,
C, D, and when we provide a new API to reset subscribers, the impact can be
controlled only to affect team B.

The new API can also be used in this scenario: Trigger consumer rebalance.
So I feel like we can add this API with relatively small changes.

Thanks
Yubiao Feng

On Wed, Jan 11, 2023 at 5:00 PM Yubiao Feng <yu...@streamnative.io>
wrote:

> Hi community
>
> I am starting a DISCUSS for PIP-240: A new API to unload subscriptions.
>
> PIP issue: https://github.com/apache/pulsar/issues/19187
>
> ### Motivation
>
> We sometimes try to unload the topic to resolve some consumption-stop
> issues. But the unloading topic will also impact the producer side.
>
> ### Goal
>
> Providing a new API to unload the subscription dimension triggers
> reconnection of all consumers on that subscription and reconnection is
> guaranteed by the client. The API will be used in these ways:
> - unload special subscription of one topic(or partitioned topic)
> - unload all subscriptions of one topic(or partitioned topic)
> - unload subscriptions of one topic(or partitioned topic) by regular
> expression
>   - If a reader's subscription name is not set, a random subscription name
> prefixed with 'multiTopicsReader-' or 'reader-' will be used, and users can
> uninstall these subscriptions using regular expressions.
>
> In addition to triggering consumer disconnection, Unloading Subscribers
> will restart the Dispatcher, which resets the redeliver message queue and
> delayed message queue in the Broker's memory, which can help resolve issues
> caused by an abnormal dispatcher state. However, the execution flow of
> Unloading Subscribers does not include a restart of the Managed Cursor
> related to this dispatcher; if there is a problem with the cursor, we can
> only rely on the unload topic to solve it.
>
> Note: From the client's perspective, this connection may be shared by
> consumers, producers, and transactions, so Unloading Subscribers maybe
> impact the producer and transaction.
>
> #### These scenarios are not supported
> - Functions `message-dedup`, `geo-replication,` and `shadow-topic` also
> read messages from the topic, but Unloading subscribers will not support
> triggering restarts of these three functions( because the cursor is used
> directly to read the data in these scenarios, not the consumer or reader ).
> - The Compression task(subscription name is `__compaction`) also use a
> reader to read data, but Unloading Subscribers does not support it because
> this task creates a new reader each time it starts.
> - Do not support all topics related to Transaction features.
>   - `__transaction_buffer_snapshot` works with the task TB recover,  and
> this task will create a new reader each time they start.
>   - `__transaction_pending_ack` works with the task Transaction Pending
> Ack Store replay,  and this task will use managed cursor directly to read
> data.
>   - `__transaction_log_xxx` works with the task Transaction Log, which
> will use managed cursor directly to read data.
>   - `transaction_coordinator_assign` No data will be written on this topic.
>
> #### Special system topic supports
> The system topic `__change_events` is used to support topic-level
> policies, there may also be some message delivery issues in this scenario,
> so Unloading Subscribers will support this topic.
>
> ### API Changes
>
> #### For persistent topic
> ```
> pulsar-admin persistent unload {topic_name} -s {sub_name}
> ```
>
> #### For non-persistent topic
> ```
> pulsar-admin non-persistent unload {topic_name} -s {sub_name}
> ```
>
> #### Explain the param `-s`
> - set param `-s` to special sub name to unload special subscription
> - set param `-s` to `**` to unload all subscriptions under this topic
> - set param `-s` to `regexp` to unload a batch subscriptions under this
> topic
>
>
> Thanks
> Yubiao Feng
>