You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2018/01/23 18:16:07 UTC

Slack digest for #general - 2018-01-23

2018-01-22 18:59:47 UTC - Allen Wang: @Matteo Merli It looks like the Kafka wrapper always subscribe to topic using “failover” mode and limits the rate of consuming when there is a large number of partitions.
----
2018-01-22 19:02:05 UTC - Matteo Merli: using the “failover” mode is to match the same mode of consumption as Kafka. The problem with using “Shared” subscription type would be that Kafka API is not expressing individual acks, just offset updates
----
2018-01-22 19:05:27 UTC - Allen Wang: We created 20 consumer instances but they can only consume at 90 messages/second (aggregated) when the producers produce at 1000 messages/second. The topic has 4800 partitions.
----
2018-01-22 19:08:16 UTC - Allen Wang: It looks like in “failover” mode, only one consumer instance will be consuming, correct?
----
2018-01-22 19:30:49 UTC - Matteo Merli: one per partition
----
2018-01-22 19:33:02 UTC - Matteo Merli: I haven’t seen this issue. Let me do the test with same number of partititions to see if I can reproduce
----
2018-01-22 22:41:00 UTC - Nicolas Ha: What is the common way to `.receive` continuously from multiple consumers? Put all these `.receive` operations in a thread pool?
And if you want to cancel it, check before calling the next `.receive` for a given consumer?
----
2018-01-22 22:41:47 UTC - Matteo Merli: I think the easier approach is to set a listener on the consumer
----
2018-01-22 22:42:20 UTC - Matteo Merli: the listener for many topics are invoked on a dedicated threadpool (default size:1 thread)
----
2018-01-22 22:42:55 UTC - Matteo Merli: <https://github.com/apache/incubator-pulsar/blob/master/pulsar-client/src/test/java/org/apache/pulsar/client/tutorial/SampleConsumerListener.java#L37>
eyes : Nicolas Ha
+1 : Nicolas Ha
----
2018-01-22 22:50:14 UTC - Allen Wang: @Matteo Merli We changed the consumer to be the native consumer that uses “shared” mode. It works well and has no problem consuming at high rate with large number of partitions.
----
2018-01-22 22:50:56 UTC - Matteo Merli: Ok, but that shouldn’t have issues even in the Failover configuration :slightly_smiling_face:
----
2018-01-22 22:51:54 UTC - Matteo Merli: I haven’t got yet to reproduce it. Will do it shortly
----
2018-01-22 23:29:08 UTC - Jaebin Yoon: We have currently 10 brokers, 10 bookies in the cluster. And 20 producers produce over the topic of 4800 partitions. I noticed that only 4 brokers are being used currently. I'm not sure how the current load balancer works but will this be rebalanced when the traffic increases? Where should I look into to tweak this load balancing?
----
2018-01-22 23:33:54 UTC - Matteo Merli: There are few parameters to look at :
1. The topic assignments to brokers are done in terms of “bundles”, that is in group of topic
2. Topics are matched to bundles by hashing on the name
3. Effectively, a bundle is a hash-range where topics falls into
4. Initially the default is to have 4 “bundles” for a namespace
5. When the traffic increases on a given bundle, it will be split in 2 and reassigned to a different broker
6. There are some adjustable thresholds that can be used to control when the split happens, based on number of topics/partitions, messages in/out, bytes in/out, etc..
7. It’s also possible to specify a higher number of bundles when creating a namepsace
----
2018-01-22 23:34:46 UTC - Matteo Merli: And in additions, there are the load-manager thresholds that control when a broker should offload some of the bundles to other brokers
----
2018-01-22 23:38:23 UTC - Jaebin Yoon: @Matteo Merli thanks a lot for detail explanation. This gives me some ideas. I'll look into it.
----
2018-01-23 03:01:23 UTC - Matteo Merli: @Allen Wang I’m running the producers and consumers with 4800 partitions, using `pulsar-perf` with the consumers in Failover mode (and running multiple of them). I’m not seeing any strange behavior, the traffic is evenly spread across all the available consumers. I haven’t tested with the Kafka wrapper yet, that will be my next test.
----
2018-01-23 08:35:10 UTC - Julien Laurenceau: @Julien Laurenceau has joined the channel
----
2018-01-23 12:01:12 UTC - Benjamin Lupton: What is the minimum requirements for Apache Pulsar? Looking at <https://pulsar.incubator.apache.org/docs/latest/deployment/cluster/> that is quite an expensive set of requirements for an early startup.
----
2018-01-23 13:57:35 UTC - jia zhai: @Benjamin Lupton need 3 kind of cluster: bookie, broker, zookeeper, But if not have enough resource, It is OK run bookie, zookeeper, and broker on same machine.
There was already a command and config that could run broker and bookie together : `PulsarBrokerStarter --broker-conf broker.conf --run-bookie --bookie-conf bookie.conf`. PR 1023 contains more info for this — <https://github.com/apache/incubator-pulsar/pull/1023>
----
2018-01-23 16:19:28 UTC - Matteo Merli: @Benjamin Lupton As @jia zhai said, there are several components but in a small deployment they can be collapsed in a handful of nodes. If you’re in AWS, there’s a Terraform+Ansible combination to get a cluster up with 3 nodes: <http://pulsar.apache.org/docs/latest/deployment/aws-cluster/> (+ 3 small VMs for ZooKeeper)
----
2018-01-23 16:20:14 UTC - Matteo Merli: and even the 3 ZK processes could be co-hosted on the same 3 VMs
----
2018-01-23 16:20:52 UTC - Matteo Merli: In general, if you want the data to be replicated in at-least 3 machines, having 3 nodes is the minimum cluster size.
----
2018-01-23 16:21:23 UTC - Matteo Merli: (If you don’t need replication, then the minimal is to use the Standalone service)
----