You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2020/06/24 09:11:05 UTC

Slack digest for #general - 2020-06-24

2020-06-23 11:28:24 UTC - Oleg Brovko: @Oleg Brovko has joined the channel
----
2020-06-23 11:38:08 UTC - Pierre-Yves Lebecq: Hey :wave: I’m using the C++ client in a build script, using the following link to download the .deb package: <https://www.apache.org/dyn/mirrors/mirrors.cgi?action=download&amp;filename=pulsar/pulsar-2.5.2/DEB/apache-pulsar-client.deb>
This is the only link I found in the documentation when I wrote it.
It seems that since the 2.6 release, this link now returns a 404 error. Does anyone know any permanent link for previous versions so I can keep my script running using the 2.5.2 version?
----
2020-06-23 12:28:35 UTC - testinglab89:
----
2020-06-23 13:06:09 UTC - Yifan: Hi, All, I have been having problem with pulsar-client since 2.5.2 on OSX Catalina. The problem is:
```E ImportError: dlopen(*/.tox/py37/lib/python3.7/site-packages/_pulsar.cpython-37m-darwin.so, 2): Symbol not found: __Py_tracemalloc_config
E Referenced from: */.tox/py37/lib/python3.7/site-packages/_pulsar.cpython-37m-darwin.so
E Expected in: flat namespace
E in */.tox/py37/lib/python3.7/site-packages/_pulsar.cpython-37m-darwin.so```
Does anyone else see this problem?
----
2020-06-23 13:11:15 UTC - Yifan: I was able to fix it by locking pulsar-client to 2.5.1, it stopped working when Homebrew upgraded libpulsar to 0.6.0.
----
2020-06-23 13:39:24 UTC - Gilles Barbier: Hi, have you successfully used the last standalone 2.6.0 (-all) docker image? I fail to run it .
----
2020-06-23 16:05:55 UTC - Matteo Merli: @Pierre-Yves Lebecq All releases are available at <http://archive.apache.org/dist/pulsar/>

We link the latest release to the mirrors, based on ASF policies.
----
2020-06-23 16:18:57 UTC - rani: *[Pulsar 2.5.1] Bookie Decommissioning (`autoRecoveryDaemonEnabled` = `true`) @Sijie Guo*
I am running a bookie cluster on AWS that is scaled by an ASG. I operate with a 3 node bookkeeper cluster at all times. If i need to release a new AMI, my current strategy is to scale the bookkeeper nodes 3 -&gt; 6 and then scale them back down again 6 -&gt; 3 (in order to remove the old bookkeeper nodes).

The bookkeeper decommissioning steps i’m taking on the nodes that need to be scaled down are:
1. Shutdown the bookkeeeper service
2. `/opt/pulsar/bin/bookkeeper shell decommissionbookie`
The `decommissionbookie` command hangs most of the time/takes a long time to complete, considering that I have produced almost no custom events into my Pulsar cluster. Any clues as to what’s happening/if there’s any way in which I can optimise my workflow?
----
2020-06-23 16:23:40 UTC - Sijie Guo: What was the issue you encountered?
----
2020-06-23 16:24:21 UTC - Adriaan de Haan: Hi, I have just installed the 2.6.0 based kubernetes helm chart
----
2020-06-23 16:24:44 UTC - Adriaan de Haan: and everything seems to have started up ok, except for the "recovery" POD
----
2020-06-23 16:25:46 UTC - Adriaan de Haan: Checking the logs, it is stuck in the pulsar-bookkeeper-verify-clusterid container with the following error:
----
2020-06-23 16:25:49 UTC - Adriaan de Haan: ```JMX enabled by default
Error: Could not find or load main class "
JMX enabled by default
Error: Could not find or load main class "
JMX enabled by default
Error: Could not find or load main class "
JMX enabled by default
Error: Could not find or load main class "```

----
2020-06-23 16:26:22 UTC - Adriaan de Haan: any ideas? anybody else with a similar issue?
----
2020-06-23 16:27:09 UTC - Sijie Guo: 1. Any logs from the decommionbookie? I would recommend you first try to run this manually to see if there is any issue with this approach?
2. Any reason why do you go with 3-6? Why can’t you do 3-&gt; 4? Since you need to take down bookie one by one.
----
2020-06-23 16:27:56 UTC - Sijie Guo: Are you using the latest master from <https://github.com/apache/pulsar-helm-chart> ?
----
2020-06-23 16:28:09 UTC - Sijie Guo: I think I fixed the issue along with 2.6.0 release.
----
2020-06-23 16:28:28 UTC - Sijie Guo: Because there is a change on 2.6.0 about how we apply environment variables.
----
2020-06-23 16:29:20 UTC - Adriaan de Haan: I see there is already an issue
----
2020-06-23 16:29:31 UTC - Adriaan de Haan: yes
----
2020-06-23 16:29:43 UTC - Adriaan de Haan: I just did a git clone from that repo
----
2020-06-23 16:30:13 UTC - Adriaan de Haan: <https://github.com/apache/pulsar/issues/7243>
----
2020-06-23 16:30:37 UTC - Adriaan de Haan: Somebody else confirmed the same problem 2 days ago
----
2020-06-23 16:30:50 UTC - Sijie Guo: Yes.
----
2020-06-23 16:31:06 UTC - Sijie Guo: I think I missed one change in the bookie recovery pod
----
2020-06-23 16:31:12 UTC - Adriaan de Haan: probably :slightly_smiling_face:
----
2020-06-23 16:31:30 UTC - Adriaan de Haan: quick fix? :slightly_smiling_face:
----
2020-06-23 16:31:40 UTC - rani: 1. I do not receive any error logs. Simply the following for example:
```16:29:03.009 [main] INFO org.apache.bookkeeper.client.BookKeeperAdmin - Count of Ledgers which need to be rereplicated: 8```
2. Going from 3 -&gt; 6 simplifies the the process of ami rotation. As it only involves 2 steps. If I were to do 3 -&gt; 4 -&gt; 3 -&gt; 4 -&gt; 3 -&gt; 4 it’ll add a few more steps considering we’ll need to repeat the same for all zookeeper components (proxy, bookkeeper, zookeeper, etc). However, if you’re suggesting that this approach is cleaner, then we can definitely script this ami rotation process to simplify things
----
2020-06-23 16:33:50 UTC - Sijie Guo: sending out a pr now
----
2020-06-23 16:35:04 UTC - Sijie Guo: <https://github.com/apache/pulsar-helm-chart/pull/24>
----
2020-06-23 16:36:40 UTC - Adriaan de Haan: great, I can fix that! a pity I can't approve
----
2020-06-23 16:38:39 UTC - Adriaan de Haan: Quick off-topic question
----
2020-06-23 16:39:16 UTC - Adriaan de Haan: The java memory settings, are they pretty battle-tested? Or could some of them require some tweaking?
----
2020-06-23 16:40:30 UTC - Adriaan de Haan: I noticed my previous cluster had some PODs where the garbage collection was going crazy (2.5.2 release). I also noticed however that the G1 garbage collector wasn't active, so perhaps those settings were not applied before?
----
2020-06-23 16:42:08 UTC - Pierre-Yves Lebecq: @Matteo Merli Thank you very much!
----
2020-06-23 18:19:25 UTC - sundar: Hello all, I am trying to write test for pulsarfunction- localrun module more specifically localrunner.java. Here there is an import error, org.apache.pulsar.functions.proto.Function
Which says "cannot resolve symbol Function". This error is present in few more modules under pulsar-functions. This is hindering me from using the debugger. Can anyone help me out with this?
----
2020-06-23 18:33:34 UTC - Sijie Guo: The default values are used for minimal setup.
----
2020-06-23 18:33:55 UTC - Sijie Guo: GC settings are pretty good. I don’t think you need to tune them.
----
2020-06-23 18:34:07 UTC - Sijie Guo: But you need to adjust the memory settings based on your requirements.
----
2020-06-23 18:44:16 UTC - Adriaan de Haan: I am a bit perplexed about storage usage of pulsar topics
----
2020-06-23 18:45:42 UTC - Adriaan de Haan: When is storage actually released? I did some testing and there's a lot of storage being used by my topics, but all subscriptions have ack'ed all messages and I don't change any persistence settings of the topics.
----
2020-06-23 18:45:46 UTC - Adriaan de Haan:
----
2020-06-23 18:46:26 UTC - Adriaan de Haan: as you can see, the backlog is zero, but the storage size keeps going up
----
2020-06-23 18:50:53 UTC - Chris Bartholomew: This blog post might: <https://kesque.com/understanding-pulsar-message-ttl-backlog-and-retention/>
----
2020-06-23 19:07:20 UTC - Sijie Guo: The storage size is the amount of the data that is still “live” (not deleted). Pulsar’s partition is a segment based implementation. Data is deleted segment by segment. So even the backlog is zero, you will still see the storage size is not empty. Because there is a segment or multiple segments not deleted. They are not deleted either because of retention policy or because it is the last segment in the partition and is not sealed yet.

You can watch this video. I walk through the lifecycle of a Pulsar message: <https://www.youtube.com/watch?v=R197TYYFaiI>

<https://www.slideshare.net/streamnative/tgipulsar-ep-006-lifecycle-of-a-pulsar-message>
----
2020-06-23 19:43:01 UTC - rani: Just tried doing it manually. It took ~12minutes to decommision 2/3 bookies. Is it expected to take this long?
```19:27:05.318 [main] INFO org.apache.bookkeeper.client.BookKeeperAdmin - Count of Ledgers which need to be rereplicated: 1
19:27:15.320 [main] INFO org.apache.bookkeeper.client.BookKeeperAdmin - Count of Ledgers which need to be rereplicated: 1
.
.
.
19:39:55.663 [main] INFO org.apache.bookkeeper.tools.cli.commands.bookies.DecommissionCommand - Cookie of the decommissioned bookie: 1.2.3.4:3181 is deleted successfully```

----
2020-06-23 20:13:15 UTC - rani: Repeated the experiment again this time with `pulsar-perf` producing data into a topic (~0.7gb so far). Its now been ~25minutes and the decommissioning command is still running on 3 bookkeepers simultaneously!
----
2020-06-23 20:15:30 UTC - rani: any hints @Sijie Guo? Could there be a parameter that I need to re-configure?
----
2020-06-23 20:39:50 UTC - Sijie Guo: So decommission a bookie requires copying the data from that bookie to others. It depends on 1) how large is the amount of the data; 2) tuning the re-replication settings. If you have one ledger, it will read the entries in sequence of that ledger to replicate. It might be limited by the re-replication batch size. Because you don’t want re-replicate the entries take the huge amount of your bandwidth.

The question I have here is why do you need decomission? You are just updating AMI. Can’t you update the AMI on existing bookies?
----
2020-06-23 21:19:06 UTC - rwaweber: Without hijacking Adriaan’s thread:

On the topic of “un-sealed segments” is there a way to identify if segments are still open and which ones they are?

Also, to that same idea — does a segment stay open until it is filled? And is that size dictated by the `logSizeLimit` on bookkeeper?
----
2020-06-23 21:35:54 UTC - Sijie Guo: `pulsar-admin topics stats-internal`
----
2020-06-23 21:36:27 UTC - Sijie Guo: `topics stats` and `topics stats-internal` are the two commands you can rely on for your daily operations on Pulsar
----
2020-06-23 21:37:00 UTC - Sijie Guo: logSizeLimit is the bookkeeper’s side setting. It controls the size of the files at bookie side.
----
2020-06-23 21:37:48 UTC - Sijie Guo: The size of the segment is controlled at broker side. You can check the settings `*LedgerRollover*`. Those are used for controlling when to roll over a new ledger.
----
2020-06-23 22:21:26 UTC - Adriaan de Haan: Great, thanks for asking these questions because those would have been my own follow-up doubts :slightly_smiling_face:
----
2020-06-23 22:43:42 UTC - Adriaan de Haan: Now I have another issue... during my attempts to see if I can trigger cleanup of the storage usage, I tried "compaction" of the one topic. The compaction finished quite quickly, but didn't have any impact on the storage space used. But it had an unfortunate side-effect... I now have a __compaction subscription on this topic without a consumer... no idea why it's hanging around.
----
2020-06-24 00:17:03 UTC - Adriaan de Haan: I have checked the video and the settings you mentioned, but the amount of space being used still doesn't make sense
----
2020-06-24 00:17:06 UTC - Adriaan de Haan:
----
2020-06-24 00:19:13 UTC - Adriaan de Haan: It is using 2.5GB already and not releasing, and I have all the default settings, which should rollover after 50k entries - this was 6million messages, so it should have rolled over a lot of times already...
----
2020-06-24 00:20:17 UTC - Adriaan de Haan: and the 10min min has also passed a long time ago
----
2020-06-24 01:22:39 UTC - Sijie Guo: You can run `bin/pulsar-admin topics stats-internal` to get the internal stats of a given topic.
----
2020-06-24 01:22:54 UTC - Sijie Guo: That will tell you how the storage size was used.
----
2020-06-24 05:45:37 UTC - Karthik Ramasamy: Slides of my keynote 'Why Splunk Chose Pulsar' at Pulsar Summit - <https://www.slideshare.net/KarthikRamasamy3/pulsar-summitkeynotefinal>
+1 : Sijie Guo, Julius S, Ali Ahmed, Toktok Rambo
100 : Sijie Guo, Julius S
----
2020-06-24 06:22:39 UTC - Dan Melman: @Dan Melman has joined the channel
----
2020-06-24 08:21:17 UTC - Gilles Barbier: The docker has an error right away with : `Error: Could not find or load main class "`
----
2020-06-24 08:21:42 UTC - Gilles Barbier: it does not go further
----
2020-06-24 08:36:42 UTC - Sijie Guo: docker compose file has an issue.
----
2020-06-24 08:36:49 UTC - Sijie Guo: There is a issue tracking that.
----
2020-06-24 08:39:18 UTC - Gilles Barbier: Thx - <https://github.com/apache/pulsar/issues/7315> - I'm going to try that
----
2020-06-24 08:42:20 UTC - Gilles Barbier: It worked by replacing PULSAR_MEM by BOOKIE_MEM thx
----