You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/09/14 09:28:21 UTC
[GitHub] [pulsar] hozumi opened a new issue #12036: Broker high cpu usage due to producer's batching misconfiguration.
hozumi opened a new issue #12036:
URL: https://github.com/apache/pulsar/issues/12036
Hi,
I asked about the high CPU usage of my brokers on the pulsar slack channel several months ago, which I cannot see that post now.
I just want to share that I solved the problem by changing producer's batch configuration properly.
I thought that I had already enabled batching, but I did set the following wrong configuration.
1. 3000 micro seconds batch duration instead of 3000 ms.
```
.batchingMaxPublishDelay(3000, TimeUnit.MICROSECONDS)
```
Yeah, this is silly mistake.
Also It should be note that the default value of batchingMaxPublishDelay is `1ms` , which will have no batching effects, I think.
2. Unnecessary KEY_BASED BatcherBuilder
```
.batcherBuilder(BatcherBuilder.KEY_BASED)
```
I somehow thought that `BatcherBuilder.KEY_BASED` is necessary in order to send messages with the same key into a particular partition.
A batch made with KEY_BASED only contains messages with the same key, which result in massive 1 message batches in my use case.
```
Key based batch message container
incoming single messages:
(k1, v1), (k2, v1), (k3, v1), (k1, v2), (k2, v2), (k3, v2), (k1, v3), (k2, v3), (k3, v3)
batched into multiple batch messages:
[(k1, v1), (k1, v2), (k1, v3)], [(k2, v1), (k2, v2), (k2, v3)], [(k3, v1), (k3, v2), (k3, v3)]
```
As the partitioned producer in the default routing-mode does assign message to a particular partition, I don't need `BatcherBuilder.KEY_BASED` for my use cases.
https://pulsar.apache.org/docs/en/admin-api-topics/#routing-mode
> RoundRobinPartition
> If a key is specified on the message, the partitioned producer hashes the key and assigns message to a particular partition. This is the default mode.
For those who encounter the similar performance problem, I will recommend you to check the actual number of batched messages by cli such as `examine-messages` , `peek-messages` and `get-message-by-id`.
You can see number of batched messages as `X-Pulsar-num-batch-message`.
```
$ docker exec -it pulsar_broker bin/pulsar-admin topics examine-messages --initialPosition latest "persistent://mytenant/mynamespace/mytopic-partition-0" | head
Message ID: 4572594:27489
Tenants:
"X-Pulsar-batch-size 23678"
"X-Pulsar-num-batch-message 48"
...
$ docker exec -it pulsar_broker bin/pulsar-admin topics get-message-by-id --ledgerId 4572594 --entryId 27489 "persistent://mytenant/mynamespace/mytopic-partition-0"
Batch Message ID: 4572594:27489:0
Properties:
"X-Pulsar-batch-size 23678"
"X-Pulsar-num-batch-message 48"
...
$ docker exec -it pulsar_broker bin/pulsar-admin topics peek-messages --subscription mysub1 --count 1 "persistent://mytenant/mynamespace/mytopic-partition-0" | head
Batch Message ID: 4572594:33046:0
Publish time: 1631608014336
Event time: 0
Properties:
"X-Pulsar-batch-size 20086"
"X-Pulsar-num-batch-message 43"
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] hozumi closed issue #12036: Broker high cpu usage due to producer's batching misconfiguration.
Posted by GitBox <gi...@apache.org>.
hozumi closed issue #12036:
URL: https://github.com/apache/pulsar/issues/12036
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org