You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2018/10/19 09:11:02 UTC

Slack digest for #general - 2018-10-19

2018-10-18 09:27:50 UTC - Wenfeng Wang: @Wenfeng Wang has joined the channel
----
2018-10-18 10:31:55 UTC - Nicolas Ha: I am trying to understand the recommendation you made last time to use a Daemonset instead of a StatefulSet (previous msg <https://apache-pulsar.slack.com/archives/C5Z4T36F7/p1538779207000100> )

My understanding is that in the pulsar helm chart:
- currently BK and ZK use a StatefulSet (and a volumeClaim) - they do not care where the storage is provided, but the kubernetes cluster has to have a way to provide storage (not sure where this bit is specified?)
- DaemonSet would make ZK and BK pods “stick” to one physical node, and use the local storage there (so would work on bare metal clusters too, the operator would have to keep number of machines &gt;= number of ZK/BK DaemonSet replicas)

Also I see Volume claims, but I was expecting `PersistentVolumeClaim` in these:
- <https://github.com/apache/pulsar/blob/master/deployment/kubernetes/helm/pulsar/templates/bookkeeper-statefulset.yaml>
- <https://github.com/apache/pulsar/blob/master/deployment/kubernetes/helm/pulsar/templates/zookeeper-statefulset.yaml>

Or even something like what is described there: <https://kubernetes.io/blog/2018/04/13/local-persistent-volumes-beta/> and <https://kubernetes.io/docs/concepts/storage/volumes/#local> - it looks like this should allow not changing the statefulset and still use local volumes?

Am I missing something? Was there a specific reason to use a DaemonSet in "kubernetes/generic" and Statefulset in "helm"?
----
2018-10-18 12:54:31 UTC - Sijie Guo: &gt; it looks like this should allow not changing the statefulset and still use local volumes?

yes localvolumes should be a better solution than a daemonset.

&gt; Was there a specific reason to use a DaemonSet in “kubernetes/generic” and Statefulset in “helm”?

kubernetes/generic was added before k8s introduces local volumes. so we use daemonset there. but it probably can be changed to be using stateful set and local volumes.

“helm” was added for deploying in cloud environment. hence statefulset with persistent volumes is more resonable there.
----
2018-10-18 13:24:20 UTC - Martin Svensson: yeah, that’s right
----
2018-10-18 13:48:47 UTC - Nicolas Ha: ok that makes sense.
So if I understand correctly, there is nothing wrong with having a "generic" that would use local volumes, and even a "helm" version of the "generic" deployment?
----
2018-10-18 13:54:40 UTC - Sijie Guo: YES correct
----
2018-10-18 13:56:05 UTC - Nicolas Ha: thanks a lot - very helpful answers as usual :smile:
----
2018-10-18 16:44:10 UTC - Jerry Moore: @Jerry Moore has joined the channel
----
2018-10-18 18:33:57 UTC - Zuyu Zhang: Hi guys, I have a simple q regarding the multi-topic consumer in C++ client. When the consumer receives a message, how to know which topic it belongs to?
----
2018-10-18 18:40:25 UTC - Jerry Peng: MessageId::getTopicName()
----
2018-10-18 18:40:34 UTC - Jerry Peng: ```
/**
* Get the topic Name
*/
const std::string&amp; getTopicName() const;
```
clap : Rodrigo Malacarne
----
2018-10-18 18:40:52 UTC - Zuyu Zhang: ok. Thanks!
----
2018-10-18 18:42:00 UTC - Jerry Peng: FYI This is a feature not in an official release yet. You will have to compile from master
clap : Rodrigo Malacarne
----
2018-10-18 18:53:10 UTC - Zuyu Zhang: Yes, I did.
----
2018-10-18 22:03:51 UTC - Ryan Samo: Hey guys, I’m trying to load test Pulsar but keep receiving “Connection reset by peer messages”. It only happens when I start multiple producers and consumers on a topic. If the topic is not partitioned then it does not seem to happen as often. I have gone through the broker.conf and set many of the limits to 0 so that I can push the boundaries of the hardware. Any ideas on what my by throwing out all of my connections? All of them drop at once, consumers and producers and then immediately reconnect.

Thanks!
----
2018-10-18 22:05:34 UTC - Ali Ahmed: @Ryan Samo what’s your setup and how much traffic are you pushing through
----
2018-10-18 22:09:14 UTC - Ryan Samo: 3 brokers, 3 zookeepers each on 40 core servers with plenty of RAM. If I do a topic that’s not partitioned, I can spin up 25 producers with Pulsar-perf and generate around 1 million msg/s. The consumer side lags in that case so I decided to partition and that’s when the connection issues popped up. I want to sustain 1 million msg/s without lag if I can get there.
----
2018-10-18 22:09:55 UTC - Ali Ahmed: how many bookies ?
----
2018-10-18 22:13:02 UTC - Ryan Samo: 3, they are shared on the same servers as the brokers. 3 brokers, 3 bookies, 3 zookeepers
----
2018-10-18 22:13:26 UTC - Ryan Samo: Will reply back soon, need to drive :)
----
2018-10-18 23:23:34 UTC - Rodrigo Malacarne: @Ryan Samo, were you able to start a cluster with with bookies? Are you using v2.1.1-incubating?
----
2018-10-18 23:31:53 UTC - Ryan Samo: Yes I was able to get it all running. V2.1.1-incubating yup. Yeah so the cluster works fine until I try to have the multiple connections and then it just keeps dropping them with timeouts. The servers themselves are hardly being taxed with the load so I thought maybe a setting needs adjustment or the JVM is running into issues maybe.
----
2018-10-18 23:32:30 UTC - Matteo Merli: @Ryan Samo Are you reaching any bandwidth limit ?
----
2018-10-18 23:33:32 UTC - Matteo Merli: There certainly some setting that can be used when testing for high-throughput. The defaults are conservative, meant to work out of the box in all scenarios
----
2018-10-18 23:33:44 UTC - Ryan Samo: Well I was able to do 25 producers and a couple consumers with a non-partitioned table so I don’t think so.
----
2018-10-18 23:34:29 UTC - Ryan Samo: If you have any recommended settings for beating the crap out of the cluster I’m all for it. Need to prove out how far we can go
----
2018-10-18 23:34:58 UTC - Matteo Merli: One of the reasons for “connection reset by peer” is that one end appears to be unresponsive, so the broker cuts the TCP connection off, after 30-60 sec
----
2018-10-18 23:36:16 UTC - Ryan Samo: Hmm yeah I saw a setting for that in the broker config. I might try the Prometheus and grafana monitoring too to see if I see anything network wise
----
2018-10-18 23:36:31 UTC - Ryan Samo: CPU, RAM, etc all good
----
2018-10-18 23:36:58 UTC - Matteo Merli: For settings, take a look at <https://github.com/openmessaging/openmessaging-benchmark/blob/master/driver-pulsar/deploy/templates>
----
2018-10-18 23:37:21 UTC - Ryan Samo: I am using Pulsar-perf produce and consume to perform the tests. Cool thanks, I’ll check that out!
----
2018-10-18 23:37:56 UTC - Matteo Merli: Are the machines with 10Gbsp NICs ?
----
2018-10-18 23:40:41 UTC - Ryan Samo: Yes they sure are. In the same rack too actually
----
2018-10-18 23:41:39 UTC - Matteo Merli: Ok, then, apart from metrics, take a look at broker logs for any additional clue
----
2018-10-18 23:43:04 UTC - Ryan Samo: Ok, I’ll keep searching. It feels like a broker.config issue because when I started the initial testing it was conservative like you said. I tweaked a few values to unlimited and that helped a ton. Thanks for all the support!
+1 : Matteo Merli
----
2018-10-19 07:15:43 UTC - Matti-Pekka Laaksonen: Hi! I tried running my cluster for the first time with production-level workloads, and I ended up running out of memory with the brokers. I have a few questions regarding this:
1. Is the sum of maxDirectMemory and maxHeapMemory the maximum memory JVM can use? Or is the heap memory a subset of the direct memory?
2. What is the recommended total memory (I assume the sum of the two memory sizes) for brokers?
3. What is the recommended share of instance's total memory used by Pulsar? The sample deployment scripts in the Git repo show that c5.2xlarge instances are used for the brokers. The instance has 16 gb of memory, but the direct and heap memory are both 12gb, which sum up to 24 gb, so more than the available memory
----
2018-10-19 07:27:38 UTC - Matti-Pekka Laaksonen: Oh, and I could swear that I've seen a guide to upgrading a running Pulsar cluster, but I can't seem to find it anymore. Anyone know about this?
----