You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by linlin <li...@apache.org> on 2023/03/14 12:58:02 UTC

[DISCUSS] PIP-255: Assign topic partitions to bundle by round robin

Hi all,
I created a proposal to
assign topic partitions to bundles by round robin:
https://github.com/apache/pulsar/issues/19806

It is already running in our production environment,
and it has a good performance.

Thanks!

Re: [DISCUSS] PIP-255: Assign topic partitions to bundle by round robin

Posted by Lin Lin <li...@apache.org>.
Thanks for your review.


> Could you clarify the limitation of the current logic?

The current logic cannot guarantee that the traffic of each bundle is the same, and must be balanced through split.
However, the load of the topic is not always the same, and the traffic of the business changes with time,
so the load of bundle will continue to change.

If we rely on split+unload to balance, the number of bundles will eventually reach the upper limit.

In order to avoid frequent split and unload, the current logic has many thresholds, allowing Broker to tolerate load imbalance, which is one of the reasons why the load gap between different nodes of the Pulsar cluster is large


> For this issue, the community introduced a new assignment strategy, LeastResourceUsageWithWeight, which better randomizes assignments.

Yes, but LeastResourceUsageWithWeight still cannot completely solve the current problem, only alleviate it.
We also optimized based on this implementation, but we will discuss this optimization in the following PIP,
The current pip is not covered.



> If each partition has the same load, then having the same number of topics
per bundle should lead to the load balance.
Then, I wonder how the current way, "hashing" does not achieve the goal here.

We think that the loads of different partitions under a same topic are the same, but the loads of partitions of different topics are different. 
Bundles are shared by all topics in the entire namespace. 
If we guarantee each bundle has the same number of partitions, but these partitions may come from different topics, resulting in different loads for each bundle.
If we split bundle according to load, the load of each topic may be different in different time periods, and it is impossible to keep the load of each Bundle the same.
Using the round robin strategy, we can ensure that the number of partitions from a same Topic on each Bundle is roughly consistent, so that the load of each Bundle is also consistent.


> happens if the leader restarts? how do we guarantee this mappingpersistence?

1)First of all, we need to find the starting bundle. partition-0 finds a bundle through consistent hashing, so as long as the number of bundles remains the same, the starting bundle is the same every time, and then other partitions 1, 2, 3, 4 ... is assigned the same result every time.
2)If the number of bundles changes, i.e. triggering split, the bundles of the entire namespace will be forced to be unloaded and all reassigned


> It is unclear how RoundRobinPartitionAssigner will work with the existing code.

The specific implementation has been refined, please check the latest PIP issue



On 2023/03/16 18:20:35 Heesung Sohn wrote:
> Hi,
> 
> Thank you for sharing this.
> In general, I think this can be another good option for Pulsar load
> assignment logic.
> However, I have some comments below.

Re: [DISCUSS] PIP-255: Assign topic partitions to bundle by round robin

Posted by Heesung Sohn <he...@streamnative.io.INVALID>.
Hi,

Thank you for sharing this.
In general, I think this can be another good option for Pulsar load
assignment logic.
However, I have some comments below.


> The load managed by each Bundle is not even.

Even if the number of partitions managed by each bundle is the same,

there is no guarantee that the sum of the loads of these partitions will be
> the same.



Each bundle can split and be unloaded to other brokers. Also, the current
hashing logic should distribute approximately the same number of
partitioned topics to each bundle.

Could you clarify the limitation of the current logic?


Doesn't shed loads very well. The existing default policy ThresholdShedder
> has a relatively high usage threshold,

and various traffic thresholds need to be set. Many clusters with high TPS
> and small message bodies may have high CPU but low traffic;

And for many small-scale clusters, the threshold needs to be modified
> according to the actual business.


Yes, fine-tuning is expected for ThresholdShedder. From what I have
observed, loadBalancerBundleUnloadMinThroughputThreshold must be adjusted
based on the cluster's avg throughput.

Also, there is a config, lowerBoundarySheddingEnabled, recently introduced
to unload more aggressively to lower-loaded brokers.


The removed Bundle cannot be well distributed to other Brokers.

The load information of each Broker will be reported at regular intervals,

so the judgment of the Leader Broker when allocating Bundles cannot be
> guaranteed to be completely correct.

Secondly, if there are a large number of Bundles to be redistributed,

the Leader may make the low-load Broker a new high-load node when the load
> information is not up-to-date.


For this issue, the community introduced a new assignment strategy,
LeastResourceUsageWithWeight, which better randomizes assignments.


Implementation
> The client sends a message to a multi-partition Topic, which uses polling
> by default.
> Therefore, we believe that the load of partitions of the same topic is
> balanced.
> We assign partitions of the same topic to bundle by round-robin.
> In this way, the difference in the number of partitions carried by the
> bundle will not exceed 1.
> Since we consider the load of each partition of the same topic to be
> balanced, the load carried by each bundle is also balanced.



If each partition has the same load, then having the same number of topics
per bundle should lead to the load balance.

Then, I wonder how the current way, "hashing" does not achieve the goal
here.



Operation steps:
>
>    1. Partition 0 finds a starting bundle through the consistent hash
>    algorithm, assuming it is bundle0, we start from this bundle
>    2. By round-robin, assign partition 1 to the next bundle1, assign
>    partition 2 to the next bundle2, and so on
>
> Do we store this partition to bundle mapping information?(If we do, what
happens if the leader restarts? how do we guarantee this mapping
persistence?)

How do we find the assigned bundle from a partitioned topic?

Currently, each (partitioned) topic is statically assigned to bundles by "
findBundle" in the following code, so that any broker can know what bundle
a (partitioned) topic is assigned to. Can you clarify the behavior change
here?

public NamespaceBundle findBundle(TopicName topicName) {
    checkArgument(this.nsname.equals(topicName.getNamespaceObject()));
    long hashCode = factory.getLongHashCode(topicName.toString());
    NamespaceBundle bundle = getBundle(hashCode);
    if (topicName.getDomain().equals(TopicDomain.non_persistent)) {
        bundle.setHasNonPersistentTopic(true);
    }
    return bundle;
}

protected NamespaceBundle getBundle(long hash) {
    int idx = Arrays.binarySearch(partitions, hash);
    int lowerIdx = idx < 0 ? -(idx + 2) : idx;
    return bundles.get(lowerIdx);
}



API Changes
>
>    1. Add a configuration item partitionAssignerClassName, so that
>    different partition assignment algorithms can be dynamically configured.
>    2. The existing algorithm will be used as the default
>    partitionAssignerClassName=ConsistentHashingPartitionAssigner
>    3. Implement a new partition assignment class
>    RoundRobinPartitionAssigner
>
> Can't we add this assignment logic to a class that
implements ModularLoadManagerStrategy and BrokerSelectionStrategy(for
PIP-192 Load Balancer Extension)?

It is unclear how RoundRobinPartitionAssigner will work with the existing
code.

Also, note that BrokerSelectionStrategy can run on each broker (not only
the leader broker)




Thanks,

Heesung

On Tue, Mar 14, 2023 at 5:58 AM linlin <li...@apache.org> wrote:

> Hi all,
> I created a proposal to
> assign topic partitions to bundles by round robin:
> https://github.com/apache/pulsar/issues/19806
>
> It is already running in our production environment,
> and it has a good performance.
>
> Thanks!
>