You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2020/06/18 09:11:05 UTC

Slack digest for #general - 2020-06-18

2020-06-17 09:47:42 UTC - Mathias Karlsson: @Mathias Karlsson has joined the channel
----
2020-06-17 11:42:54 UTC - Marcio Martins: How would I disable it? I didn't find anything in the docs.
----
2020-06-17 11:43:01 UTC - Marcio Martins: @Sijie Guo
----
2020-06-17 14:10:08 UTC - jujugrrr: Hi, I'm testing losing all bookkeeper and recovering with only tiered storage offloaded messages. It works well, however the last ledger was not offloaded, therefore my reader is blocking trying to read it. I'm added some other messages now, so I have something like:
```ledger 1 offloaded
ledger 2 offloaded
ledger 3 not offloaded (no data, lost when I deleted local storage)
ledger 4 offloaded
ledger 5 offloaded
ledger 6 opened```
Is there a way to remove ledger 3 so my reader can keep going and jump from 2 to 4? I've tried, ./bin/bookkeeper delete ledger,  ./bin/pulsar-managed-ledger-admin delete-managed-ledger-ids, delete /ledgers/00/0000/L003 in zookeeper. But it looks like there are still references to it. pulsar-admin topics stats-internal still shows the ledger. Is it not possible to remove it?
----
2020-06-17 14:28:44 UTC - Fred George: &gt; You can use KeyShared subscription type to be able to scale consumers within a single partition, while retaining ordering (per key)
This isn't strictly true unless it's been fixed recently.  Messages can be delivered to multiple consumers during rehashing causing ordering anomalies.
----
2020-06-17 15:09:20 UTC - Matteo Merli: That's correct. I've been working in fixing several issue on KeyShared delivery lately. The 2.6 release is having a more solid story there
----
2020-06-17 15:22:32 UTC - Addison Higham: does it work with a normal consumer? (not a reader) if you manually set the position before ledger 3?
----
2020-06-17 15:24:09 UTC - jujugrrr: I haven't tried as I focused on my use case. let me give it a go a bit later
----
2020-06-17 15:39:52 UTC - Marshall Brandt: @Marshall Brandt has joined the channel
----
2020-06-17 16:02:23 UTC - Caito Scherr: @Caito Scherr has joined the channel
----
2020-06-17 16:07:04 UTC - Manpreet Babbra: @Manpreet Babbra has joined the channel
----
2020-06-17 16:07:28 UTC - Mike: @Mike has joined the channel
----
2020-06-17 16:09:14 UTC - Craig Haywood: @Craig Haywood has joined the channel
----
2020-06-17 16:10:30 UTC - Leonard Ge: @Leonard Ge has joined the channel
----
2020-06-17 16:53:08 UTC - Joao Oliveirinha: @Joao Oliveirinha has joined the channel
----
2020-06-17 16:53:56 UTC - Jesse Anderson: to hit the millions of topics potential in Pulsar, would the cluster have to be scaled to handle the metadata? e.g. could a 5 node cluster have 5 million topics provided the message load could be handled?
----
2020-06-17 16:57:00 UTC - Matteo Merli: It depends on various aspect of the work-load, but yes, there's a limit of topics per broker that is recommendable.

That is for memory reasons as well as for how fast do you need your failovers to be (eg: worst case scenario acceptable publish latency).

For low-latency requirements, we were typically putting the recommended ballbpark of max topics/broker to ~100K.
+1 : Jesse Anderson, Julius S, Shivji Kumar Jha, Tamer
----
2020-06-17 16:58:22 UTC - Andrew: @Andrew has joined the channel
----
2020-06-17 17:46:24 UTC - PLarboulette: @PLarboulette has joined the channel
----
2020-06-17 18:33:45 UTC - Kate Kinnear: @Kate Kinnear has joined the channel
----
2020-06-17 18:42:14 UTC - Patrik Kleindl: @Matteo Merli Just for my understanding, for partitioned topics this would mean 100k partitions per broker or is this independent of partitioning?
----
2020-06-17 18:43:09 UTC - Matteo Merli: correct, that would be 100k partitions. From broker's perspective a partition is exactly the same as a non-partitioned topic.
+1 : Patrik Kleindl
----
2020-06-17 18:49:27 UTC - Muljadi: @Muljadi has joined the channel
----
2020-06-17 18:52:14 UTC - Abhishek: @Abhishek has joined the channel
----
2020-06-17 18:54:49 UTC - Vijay Bhore: @Vijay Bhore has joined the channel
----
2020-06-17 18:57:05 UTC - Yezen: @Yezen has joined the channel
----
2020-06-17 18:57:24 UTC - Simon Crosby: @Simon Crosby has joined the channel
----
2020-06-17 19:20:09 UTC - Ankit Jain: @Ankit Jain has joined the channel
----
2020-06-17 19:23:16 UTC - maurice barnum: @maurice barnum has joined the channel
----
2020-06-17 19:29:06 UTC - Joe Francis: @Jesse Anderson We have gone beyond a million with PIP-8 and we will mention it in our talk tomorrow
----
2020-06-17 19:33:12 UTC - Jesse Anderson: @Joe Francis are you sticking to ~100k topics per broker?
----
2020-06-17 19:36:31 UTC - Joe Francis: We have stringent start up limits, our limit is 60K or so. It is all about how quickly you want to do a cold start. We have actually done DC power loss recovery scenarios and this number is based on our recovery time guarantees for the whole cluster
----
2020-06-17 19:53:38 UTC - Mihir Rane: @Mihir Rane has joined the channel
----
2020-06-17 19:57:53 UTC - Philip Ittmann: @Philip Ittmann has joined the channel
----
2020-06-17 21:05:02 UTC - Olivier Brazet: @Olivier Brazet has joined the channel
----
2020-06-17 21:33:28 UTC - Ankur Jain: Was attending the summit and hearing success stories of moving from kafka to pulsar. I had one question around equivalence with kafka when it comes to consumer groups.
If processing capacity of one consumer is a bottleneck, how can we scale to N consumers in pulsar for a partitioned topic (with M partitions where M &gt;= N) while maintaining strict ordering guarantees for unacked messages (similar to consumer group in kafka)? This would mean that if I were to add or remove consumers, partitions should be auto balanced among available consumers.
----
2020-06-17 22:19:24 UTC - Kalyn Coose: @Kalyn Coose has joined the channel
----
2020-06-17 22:20:07 UTC - Jesse Anderson: Pulsar works different than Kafka in that scenario. For this, you'd use key shared subscriptions <http://pulsar.apache.org/docs/en/concepts-messaging/#key_shared>. I'll talk about it in my talk tomorrow. IMHO, this is a huge feature difference between Pulsar and Kafka.
----
2020-06-18 01:09:10 UTC - Renault: @Renault has joined the channel
----
2020-06-18 01:14:16 UTC - Oleg Kozlov: Hello everyone. I have a question for pulsar developers: is there any way to update scheduled delivery time  of a previously produced delayed message? Basically ,if i send a message with 1 hour delivery delay, and then 20 minutes later want to change that delay to 2 hours - what are my options?
----
2020-06-18 01:18:26 UTC - Matteo Merli: It's tough because messages are immutable
----
2020-06-18 01:22:59 UTC - Renault: Hi. Has anyone used the <https://pulsar.apache.org/docs/en/io-kinesis-source/|Kinesis source connector>  functionality? I have a Pulsar cluster running via Helm, but I’m seeing two different errors when creating a kinesis source depending on the way I’m running the `pulsar-admin source create` command. Thanks in advance!

Error 1 - likely due to misconfiguration of the `source create` command; it seems like the broker doesn’t know the kinesis-logging-source is a Kinesis stream
```root@pulsar-toolset-0:/pulsar# bin/pulsar-admin source status --name kinesis-logging-source
{
  "numInstances" : 1,
  "numRunning" : 0,
  "instances" : [ {
    "instanceId" : 0,
    "status" : {
      "running" : false,
      "error" : "UNAVAILABLE: Unable to resolve host pf-public-default-kinesis-logging-source-0.pf-public-default-kinesis-logging-source.pulsar-system.svc.cluster.local",
      "numRestarts" : 0,
      "numReceivedFromSource" : 0,
      "numSystemExceptions" : 0,
      "latestSystemExceptions" : [ ],
      "numSourceExceptions" : 0,
      "latestSourceExceptions" : [ ],
      "numWritten" : 0,
      "lastReceivedTime" : 0,
      "workerId" : "c-pulsar-fw-pulsar-broker-2.pulsar-broker.pulsar-system.svc.cluster.local-8080"
    }
  } ]
}```
Error 2 - broker throws a 500 error possibly due to lack of memory
```00:31:08.165 [pulsar-web-44-3] WARN  org.eclipse.jetty.server.HttpChannel - /admin/v3/source/public/default/kinesis-logging-source
javax.servlet.ServletException: javax.servlet.ServletException: org.glassfish.jersey.server.ContainerException: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 251658247, max: 268435456)
...
00:31:08.167 [pulsar-web-44-3] INFO  org.eclipse.jetty.server.RequestLog - 127.0.0.1 - - [18/Jun/2020:00:30:56 +0000] "POST /admin/v3/source/public/default/kinesis-logging-source HTTP/1.1" 500 382 "-" "Puls
ar-Java-v2.5.2" 11427```
----
2020-06-18 01:32:06 UTC - Oleg Kozlov: Right. We don't really need to change the message body, just it' s schedule. Either that, or cancel / delete that message altogether, so that we can produce a new one with a new delayed value.
----
2020-06-18 01:33:35 UTC - Oleg Kozlov: Either update delivery time in DelayedDeliveryTracker described here <http://pulsar.apache.org/docs/en/concepts-messaging/#delayed-message-delivery>. But if can just cancel a message before it's scheduled delivery - that would work too
----
2020-06-18 02:35:57 UTC - Matteo Merli: If you know the message id, you should be able to ack the message and therefore it would be "cancelled"
----
2020-06-18 03:13:59 UTC - Oleg Kozlov: i can ack it even before it's delivered to consumer?
----
2020-06-18 03:52:49 UTC - Madhu A: @Madhu A has joined the channel
----
2020-06-18 04:12:58 UTC - Sankararao Routhu: @Luke Stephenson @Matteo Merli any thoughts
----
2020-06-18 04:35:14 UTC - Luke Stephenson: Not from me, I just shared what I had done in case it could help
----
2020-06-18 07:17:34 UTC - Patrik Kleindl: Yet another question, as Pulsar uses Zookeeper too, are the limits in Pulsar related to ZK usage like it is in Kafka or is there a difference?
----
2020-06-18 07:47:58 UTC - Pavels Sisojevs: @Pavels Sisojevs has joined the channel
----
2020-06-18 08:18:21 UTC - Pavels Sisojevs: hello, I’ve noticed interesting behaviour of topic clean up, which looks like a bug to me:

Scenario A:
System publishes messages to topic A. When there are no consumers and publisher stops emitting messages topic A is garbage collected.

Scenario B:
System publishes messages to topic B, but also a Pulsar Function publishes messages to topic B. When there are no consumers and publishers in the system, and also the function do not emit any messages to topic B, I would expect topic B to be garbage collected but it is not. Topic B stays there forever. Eg I have a topic which didn’t had any consumers or publishers for 3 days but I still can see it when listing topics.
Also, it might be important that I’m using a `org.apache.pulsar.functions.api.Function` (Java API) and sending the message using `newOutputMessage` function
----
2020-06-18 08:39:30 UTC - Pushkar Sawant: @Sijie Guo Any guidance here? Right now i have one node at about 99% utilization. I had to force the node into read only mode. The node did not transition to read only mode upon 95% usage.
Other servers are showing variable usage between 60% and 80%
----
2020-06-18 09:00:11 UTC - Pushkar Sawant: Now the bookie can not start with the error  (<http://java.io|java.io>.IOException: Error open RocksDB database)
Caused by: org.rocksdb.RocksDBException: While open a file for appending: /mnt/bookie-hdd/current/ledgers/MANIFEST-000014: No space left on device
----
2020-06-18 09:04:59 UTC - Pushkar Sawant: There are about 1872 unreplicated ledger. The number is slowly going down
----