You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by ch...@cmartinit.co.uk on 2022/01/17 18:23:56 UTC

Pulsar topics with a very large number of partitions

Hi,

We’re evaluating Pulsar for something of an unusual use case in that we want to create a number of topics with a very large number of partitions (tens , or ideally even hundreds of thousands). The reasons here is that we want consumers to be able to seek efficiently to a given message key.  By hashing a given key to a given topic partition we can let consumers subscribe only to that partition and thus ignore the vast majority of other messages.

I’ve had a go at proof of concepting this with Pulsar, without much success. What happens is something like the following:

Environment:
Pulsar 2.7.3 configured with 10 brokers, 10 bookies and 5 zookeepers. PrevioUsly tested as handling 100k messages/sec on a topic with 100 partitions.

* Create a partitioned topic with 50k partitions
* Create a publisher using the Go library that publishes to the topic.
* Publisher tries to create 50k producers (this is done by the go library, in my code I am creating a single producer). I can see the log lines that producers are being created but after a minute or so they seem to disconnect. The publisher then seems to get itself into a state whereby it is trying to create 50k producers, but before it can do so they all disconnect and the cycle repeats.
* During the above I can see that both the brokers and the zookeepers are using high cpu.

Does anyone have any hints as to how I can achieve what I want here[1] or, alternatively confirm that Pulsar is the wrong tool for the job? I do realise that I could remodel the situation as having 50k topics each with a single partition, but I’m assuming that as far as pulsar is concerned these two situations are largely equivalent as an n-partition topic is modelled as n individual topics under the hood.

Thanks,

Chris

[1] where “doing what I want” could either be setting up pulsar to have topics with a large number of partitions or, more generally, some pattern that would allow consumers to be able to efficiently consume a given message key when the number of message keys is measured in the hundreds of thousand or even millions.