You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pulsar.apache.org by Prashant Kumar <pr...@gmail.com> on 2022/11/17 06:19:46 UTC

[DISCUSS] PIP-210

PIP: https://github.com/apache/pulsar/issues/18510

Problem Statement

When a topic is a partitioned topic and a partition is not available for
producing messages, currently pulsar client will still try to produce
messages on unavailable partitions, which it may not necessarily need to do
in certain cases. Pulsar Client may simply pick up another partition and
try producing in certain cases.
Partition Unavailable

There could be a plethora of reasons a partition can become unavailable.
But the most prominent reason is partition is moving from one broker to
another, and until every actor is in sync with which broker owns the
partition, the partition will be unavailable for producing. Actors are
producers, old broker, new broker.
Client Behavior

This is the typical produce code.

producer.sendAsync(payLoad.getBytes(StandardCharsets.UTF_8));

When send is called message is enqueued in a queue(called pending message
queue) and the future is returned.

And future is only completed when the message is picked from the queue and
sent to the broker asynchronously and ack is received asynchronously again.
Max size of the pending message queue is controlled by producer config
maxPendingMessages.

When pending message queue is full, the application will start getting
publish failures. Pending message queue provide a cushion towards
unavailable partitions. But again it has some limits.

When another partitions can be picked

   1.

   When the message is not keyed. That means the message is not ordered
   based on a key.
   2.

   When routing mode is round-robin, that means a message can be produced
   to any of the partitions. So If a partition is unavailable try and pick up
   another partition for producing, by using the same round-robin algorithm.

Re: [DISCUSS] PIP-210

Posted by Enrico Olivelli <eo...@gmail.com>.

Prashant,
I support this PIP, especially because it mitigates a lot of the micro
latency spikes during partition transfer from one broker to and another
broker.

Enrico

Il Gio 17 Nov 2022, 07:20 Prashant Kumar <pr...@gmail.com>
ha scritto:

> PIP: https://github.com/apache/pulsar/issues/18510
>
> Problem Statement
>
> When a topic is a partitioned topic and a partition is not available for
> producing messages, currently pulsar client will still try to produce
> messages on unavailable partitions, which it may not necessarily need to do
> in certain cases. Pulsar Client may simply pick up another partition and
> try producing in certain cases.
> Partition Unavailable
>
> There could be a plethora of reasons a partition can become unavailable.
> But the most prominent reason is partition is moving from one broker to
> another, and until every actor is in sync with which broker owns the
> partition, the partition will be unavailable for producing. Actors are
> producers, old broker, new broker.
> Client Behavior
>
> This is the typical produce code.
>
> producer.sendAsync(payLoad.getBytes(StandardCharsets.UTF_8));
>
> When send is called message is enqueued in a queue(called pending message
> queue) and the future is returned.
>
> And future is only completed when the message is picked from the queue and
> sent to the broker asynchronously and ack is received asynchronously again.
> Max size of the pending message queue is controlled by producer config
> maxPendingMessages.
>
> When pending message queue is full, the application will start getting
> publish failures. Pending message queue provide a cushion towards
> unavailable partitions. But again it has some limits.
>
> When another partitions can be picked
>
>    1.
>
>    When the message is not keyed. That means the message is not ordered
>    based on a key.
>    2.
>
>    When routing mode is round-robin, that means a message can be produced
>    to any of the partitions. So If a partition is unavailable try and pick
> up
>    another partition for producing, by using the same round-robin
> algorithm.
>

Re: [DISCUSS] PIP-210

Posted by houxiaoyu <an...@gmail.com>.

Hi Prashant,

I generally support this PIP.

Is it possible that we also add a flag to prevent messages pick this
partition for a period of time if the pendingMessage queue is full?

Assume that the partition has broken,  the following will happen:
1. The pendingMessage queue will be full;
2. Producer pick another partition to send the message
3. The pendingMessage queue will be clean by timeout mechanism.
4. Messages wil be picked to this broken parition again.
5. The pendingMessage queue will be full.
....

So we could add a flag to prevent messages pick this partition for a period
of time, e.g., 2*timeout, which will decrease the failures of sending.


Thanks,
Xiaoyu Hou



Prashant Kumar <pr...@gmail.com> 于2022年11月17日周四 14:20写道：

> PIP: https://github.com/apache/pulsar/issues/18510
>
> Problem Statement
>
> When a topic is a partitioned topic and a partition is not available for
> producing messages, currently pulsar client will still try to produce
> messages on unavailable partitions, which it may not necessarily need to do
> in certain cases. Pulsar Client may simply pick up another partition and
> try producing in certain cases.
> Partition Unavailable
>
> There could be a plethora of reasons a partition can become unavailable.
> But the most prominent reason is partition is moving from one broker to
> another, and until every actor is in sync with which broker owns the
> partition, the partition will be unavailable for producing. Actors are
> producers, old broker, new broker.
> Client Behavior
>
> This is the typical produce code.
>
> producer.sendAsync(payLoad.getBytes(StandardCharsets.UTF_8));
>
> When send is called message is enqueued in a queue(called pending message
> queue) and the future is returned.
>
> And future is only completed when the message is picked from the queue and
> sent to the broker asynchronously and ack is received asynchronously again.
> Max size of the pending message queue is controlled by producer config
> maxPendingMessages.
>
> When pending message queue is full, the application will start getting
> publish failures. Pending message queue provide a cushion towards
> unavailable partitions. But again it has some limits.
>
> When another partitions can be picked
>
>    1.
>
>    When the message is not keyed. That means the message is not ordered
>    based on a key.
>    2.
>
>    When routing mode is round-robin, that means a message can be produced
>    to any of the partitions. So If a partition is unavailable try and pick
> up
>    another partition for producing, by using the same round-robin
> algorithm.
>