You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2019/09/13 09:11:02 UTC

Slack digest for #general - 2019-09-13

2019-09-12 09:25:49 UTC - Shishir Pandey: Anyone ? I'd really appreciate it if I could get some pointers.
----
2019-09-12 09:50:47 UTC - vikash: @Vladimir Shchur above solution works for .net Framework after changing in app.config and added dependentAssembly,But in our case i have used Service Fabric where i am not getting app.config so in that case how to solve above issue,Here is issue with service fabric <https://github.com/Azure/service-fabric-issues/issues/779>
----
2019-09-12 09:53:49 UTC - Vladimir Shchur: Hi! Look at the response "No, the SF app shouldn't have an app.config file. Individual services may each have one however or in the case of web services would have web.config.," E.g. each service can have app.config or web.config
----
2019-09-12 12:17:13 UTC - vikash: ok thanku
----
2019-09-12 13:05:33 UTC - Fredrick P Eisele: I would like some verification on expiry and retention.
If the retention policy is set to -1 then the message is not deleted once it has acknowledged.
If the expiry is set to some positive value, which is allowed to expire without the message being consumed, what happens to the message?
If the expiration is treated like an implied ack (what I want) then the message will be implicitly consumed and trivially retained.
<https://pulsar.apache.org/docs/en/concepts-messaging/#message-retention-and-expiry>
"With message expiry, shown at the bottom, some messages are *deleted*...".
Are they actually deleted or implicitly acknowledged?
----
2019-09-12 13:59:41 UTC - Shishir Pandey: folks does anyone have any suggestion for my question earlier?
----
2019-09-12 14:53:02 UTC - David Kjerrumgaard: Have you looking into the shared key subscription? This would ensure that the same consumer receives messages with the same key. It would NOT guarantee the message ordering. However, if you were to send the messages from the same producer then message ordering would be guaranteed.
----
2019-09-12 15:00:20 UTC - Tarek Shaar: I was testing book keeper shutdown while messages are being produced and consumed. I have a topic with these settings bookkeeperEnsemble: 2 bookkeeperWriteQuorum: 2 bookkeeperAckQuorum: 2. While doing message production and consumption, I purposefully shutdown one of the book keeper nodes but message production and consumption continued as normal. Is that right? Isn't the message supposed to be saved in two book keeper nodes before delivering it? I only have 2 Book Keeper nodes and I shutdown one of them so it means I only had one bode available
----
2019-09-12 15:00:25 UTC - Shishir Pandey: Thank you @David Kjerrumgaard, key_shared subscription with multiple consumers would not guarantee ordering and since I need to process all messages between [t_0, t_0 + defined limit] together so I am guessing you're suggesting that since messages for same key or same ordering key is delivered to same consumer, I could keep track of this in the consumer..is my understanding correct? Unfortunately my producers are different..and my message's defined limit for processing could be as large as 60 days. [ due to the nature of the domain].
----
2019-09-12 15:02:33 UTC - David Kjerrumgaard: That was my suggestion. Or you could use a single consumer depending on the message volume.
----
2019-09-12 15:03:52 UTC - Shishir Pandey: Got it! I'd have about 600-800 million messages per 7 day period, message sizes are relatively small so I am guessing this should be ok for a single consumer.
----
2019-09-12 15:04:23 UTC - Shishir Pandey: Thank you @David Kjerrumgaard! Much appreciated.
----
2019-09-12 15:04:49 UTC - David Kjerrumgaard: Perhaps a stateful function would be able to retain the data, and when a new item arrives the function would first check the state to see if the item already exists or not. If not, store it. If it does exist, you now have both the "pieces" and can perform the logic on them both.
----
2019-09-12 15:05:39 UTC - Shishir Pandey: Yes, that actually is a better idea I think
----
2019-09-12 15:05:43 UTC - David Kjerrumgaard: the tradeoff to the above approach is latency on each message (to check the state), and increase storage in BK.
----
2019-09-12 15:06:16 UTC - Shishir Pandey: The publish latency is not as much of an issue for me since as I said the messages arrive relatively slowly anyway.
----
2019-09-12 15:07:42 UTC - Shishir Pandey: increase of storage in BK, we would be purging every 60 days, and message ingestion rate is nearly fixed between that range so we should be able to stabilise the storage after sometime and I can plan for that up ahead.
----
2019-09-12 15:08:03 UTC - Shishir Pandey: The function proposal does appear to be considerably better, I will do some more research on that and test it out.
----
2019-09-12 15:09:03 UTC - Shishir Pandey: Once again, thank you!
----
2019-09-12 15:10:12 UTC - David Kjerrumgaard: Sure. One last step on the stateful approach. When you are finished processing the "pair" you can delete the key from state.
----
2019-09-12 15:11:10 UTC - David Kjerrumgaard: @Tarek Shaar It depends on the ack and write quorm configs you have in place.
----
2019-09-12 15:11:30 UTC - Tarek Shaar: bookkeeperEnsemble: 2 bookkeeperWriteQuorum: 2 bookkeeperAckQuorum: 2
----
2019-09-12 15:11:59 UTC - Tarek Shaar: Actually production and consumption continued, even though I shut down both Book Keeper nodes
----
2019-09-12 15:22:27 UTC - David Kjerrumgaard: @Tarek Shaar In the above scenario you are most likely consuming messages that were immediately published (tailing -reads), and as such they were able to be served out of the message cache in memory. Are your publishing the messages asynchronously?
----
2019-09-12 15:23:03 UTC - Tarek Shaar: yes I am doing asyn publish
----
2019-09-12 15:24:31 UTC - David Kjerrumgaard: "If the retention policy is set to -1 then the message is not deleted once it has acknowledged." Yes, AFAIK
----
2019-09-12 15:25:04 UTC - Tarek Shaar: That makes sense they are served from memory, but how is the producer getting the ack? I thought the ack comes back only if the messages are saved into two book keeper nodes. (Two saves to the Journal)
----
2019-09-12 15:26:03 UTC - David Kjerrumgaard: "If the expiry is set to some positive value, which is allowed to expire without the message being consumed, what happens to the message?" It is just deleted from the topic.
----
2019-09-12 15:26:48 UTC - David Kjerrumgaard: "If the expiration is treated like an implied ack (what I want) then the message will be implicitly consumed and trivially retained." No, they are treated as if they never existed. The messages are essentially "skipped"
----
2019-09-12 15:29:49 UTC - David Kjerrumgaard: The ack (or lack thereof) is communicated in the CompletableFuture that is returned from the async call. Published messages are first written to cache and then synced to disk before an ack is returned. However in your case the flow is message to cache which succeeds, message to disk (fails) and no ack is returned. However the message is still in the cache, so it can be served.
----
2019-09-12 15:30:34 UTC - David Kjerrumgaard: This allows Pulsar to continue serving messages even in the event of a BK failure.
----
2019-09-12 15:53:35 UTC - Tarek Shaar: Thanks David understood. Another observation I have is that while producing and consuming messages, if I shut down the broker (that's serving my topic), the production and consumption continues smoothly if I am sending one message every 10 milli seconds. But if I am sending messages continuously without waiting, and I stop the broker (that's serving my topic), production and consumption stops, until my broker is back up which is when production and consumption resumes
----
2019-09-12 16:07:20 UTC - Nick Marchessault: Is there a default ackTimeout set in pulsar 2.3.1 if that configuration is not explicitly set?
----
2019-09-12 16:08:00 UTC - Matteo Merli: No, the ack timeout is not set by default because it’s impossible to establish a safe value
----
2019-09-12 16:08:35 UTC - Matteo Merli: eg. if you set 1min by default.. any application for which the processing takes &gt;1min will see a storm of redeliveries
----
2019-09-12 16:09:03 UTC - Matteo Merli: a better option is to instead rely on “negative acks” (since 2.4)
----
2019-09-12 16:23:00 UTC - Matteo Merli: Of course, the order is guaranteed for the single producer and Pulsar offers full linearizable order semantic.

Above, I was referring to 2 independent producers publishing to a topic. In that case, the messages from the 2 producers are interleaved in the topic, though, still, messages from the same producer will be ordered.
----
2019-09-12 16:23:11 UTC - Fredrick P Eisele: Hmm, not what I was hoping. I suppose I could make a consumer whose only purpose is to ack the message thus preventing it from expiring but that seems wrong. Is there a nicer approach?
----
2019-09-12 17:10:48 UTC - David Kjerrumgaard: @Tarek Shaar Yes, by default a single broker serves a topic (both producer and consumer) . therefore on failure the cached messages aren't replicated across brokers quickly enough
----
2019-09-12 17:15:56 UTC - David Kjerrumgaard: Yes, you can just set the TTL to -1 , which is effectively forever. Then the messages will not be deleted simply because they weren't acknowledged.
----
2019-09-12 17:25:43 UTC - Addison Higham: :thinking_face: can a consumer be configured to only read messages that have been synced to disk? or is this always the case?
----
2019-09-12 17:29:21 UTC - David Kjerrumgaard: @Addison Higham It would be best to halt the producer if you are notified that the data is not synched to disk. The consumers might not be online at that time and won't be able to react to the BK outage.
----
2019-09-12 17:35:13 UTC - Addison Higham: but that would still mean that some given batch of messages is delivered to the consumer, and then assuming the broker also died, either the producer re-connects and re-sends "duplicates" (the consumer won't know) or it doesn't re-send the messages and a future consumer replaying the stream would get a different result. Seems like a pretty edge case... but it does seem like for certain cases it seems like a consumer should be able to have a subscription that won't pull messages until the message is fsynced by BK
----
2019-09-12 18:32:35 UTC - Jerry Peng: @Tarek Shaar are you using persistent topics or non-persistent topics?
----
2019-09-12 18:33:14 UTC - Tarek Shaar: I am using persistent topics
----
2019-09-12 18:42:52 UTC - Tarek Shaar: @David Kjerrumgaard I have narrowed this down to 6000 messages per minute. So if I am producing 6000 per minute then the broker shuts down, my producer just stops (and so does my consumer) until the broker is back up, at which point traffic resumes again. Are you saying that is the expected behavior? If I producer less than 6000 then production and consumption just carries on (barring a very small pause)
----
2019-09-12 18:45:17 UTC - Jerry Peng: @Tarek Shaar if all your bookies are down, messages should not be able to be produced successfully, i.e. received ack successfully.
----
2019-09-12 18:46:03 UTC - Matteo Merli: with `bookkeeperEnsemble: 2` you’ll need &gt;=2 bookies to operate
----
2019-09-12 18:47:06 UTC - Jerry Peng: You shouldn’t also be able to consume the messages you produced when all your bookies are down
----
2019-09-12 18:51:17 UTC - Tarek Shaar: @Jerry Peng when my consumer is down, it will miss all the messages, regardless of whether I shut down one book keeper node or both of them during production. Perhaps this may be due to what @Matteo Merli pointed out for the bookkeeprEnsenble value or may be this is expected behavior
----
2019-09-12 18:51:53 UTC - Matteo Merli: &gt; when my consumer is down, it will miss all the messages

Messages are not lost, the subscription keeps track of the consumer position
----
2019-09-12 18:56:10 UTC - Nicolas Ha: Hello :slightly_smiling_face: is there a way to backup / restore all messages? Ideally to a file or S3 bucket
----
2019-09-12 18:57:32 UTC - Ali Ahmed: @Nicolas Ha not really you can backup say the data folder for the standalone cluster otherwise for production cluster you want to setup replication
----
2019-09-12 18:57:52 UTC - Fredrick P Eisele: That sounds right, thanks. TTL can be set as the default with ?
ttlDurationDefaultInSeconds=-1
----
2019-09-12 18:58:45 UTC - Ryan Samo: Is there a way to load the client certs via memory instead of the file system? Like via a call to HashiVault for example?
----
2019-09-12 18:59:11 UTC - Matteo Merli: currently, not for TLS certs… just for tokens
----
2019-09-12 18:59:36 UTC - Nicolas Ha: ah that’s bad news - I see now that bookkeeper does not support it apparently <https://github.com/apache/bookkeeper/issues/1193> so maybe that comes from there
----
2019-09-12 19:00:02 UTC - Ryan Samo: Ok thanks, do you see an enhancement coming for tls?
----
2019-09-12 19:05:57 UTC - David Kjerrumgaard: I would have check the docs to confirm, but yes there is a value that means infinity.
----
2019-09-12 19:06:51 UTC - Matteo Merli: that would be certainly possible to extend it to accept it from strings
----
2019-09-12 19:08:52 UTC - Ryan Samo: Cool thanks
----
2019-09-12 19:35:03 UTC - Tarek Shaar: Sorry @Matteo Merli I meant to say if my consumer was down and while producing messages I shut down one or all my Book Keeper nodes, then I being up my consumer, I noticed that it missed the messages that were produced (while one or all Book Keeper nodes were down). I am missing something?
----
2019-09-12 19:35:39 UTC - Karthik Ramasamy: @Nicolas Ha --- use the tiered storage capability in Pulsar
----
2019-09-12 19:39:23 UTC - Jon Bock: What use case for that did you have in mind? As Karthik says, tiered storage in Pulsar addresses a number of the scenarios where people otherwise might manually export and restore messages.
----
2019-09-12 19:43:26 UTC - Matteo Merli: While the BookKeeper nodes were down, you would have not received a positive ack when publishing a message
----
2019-09-12 19:54:22 UTC - Ryan Samo: Is there a way for the consumer client to detect backlog? Or know that there is a backlog?
----
2019-09-12 20:30:14 UTC - Nicolas Ha: I'll have a look, thanks!
----
2019-09-12 20:31:32 UTC - Nicolas Ha: Backup / restore really. Which allows moving data from one cloud provider to the other, across environments, and also recover from a disaster
----
2019-09-12 20:44:23 UTC - Jon Bock: OK. Replication could possibly provide most of those scenarios, so may be worth a look.
+1 : Nicolas Ha
----
2019-09-12 20:59:00 UTC - Tarek Shaar: While my consumer was down I shutdown one of the two Book Keeper nodes (while still publishing). I then started my consumer, but I missed all those messages that were published during the time that one of the Book Keeper nodes was down.
----
2019-09-12 21:00:52 UTC - Tarek Shaar: @Ryan Samo you can probably use the Java Admin API within the consumer process or make a REST call.
----
2019-09-12 21:03:02 UTC - Ryan Samo: Ok thanks @Tarek Shaar . I was hoping to allow a consumer to see the backlog and receiverQueueSize stats locally so that they can dynamically adjust themselves. Without involving the admin API which would allow major access to the cluster
----
2019-09-12 21:07:15 UTC - Ryan Samo: Thinking of that and auto scaling shared consumers based on that
----
2019-09-12 21:09:00 UTC - Tarek Shaar: I am not sure the consumer API or the consumer builder API allows one to dynamically adjust its parameters. But that's an interesting one please do let me know if you find an alternative to Admin or REST calls
----
2019-09-12 21:10:07 UTC - Ryan Samo: Ok thanks!
----
2019-09-12 21:43:53 UTC - Ali Ahmed: I am adding an option to specify the splittable character for messages for pulsar-client this should allow one to produce json messages without issues <https://github.com/apache/pulsar/pull/5187>
----
2019-09-12 22:49:55 UTC - Fredrick P Eisele: We are talking about `messageTTL`? Because when I try to set it to `-1` I get "Invalid value for message TTL", "", "Reason: Invalid value for message TTL"
----
2019-09-12 22:52:32 UTC - Fredrick P Eisele: ["./bin/pulsar-admin", "namespaces", "set-message-ttl", "foo/bar", "--messageTTL", "-1"]
----
2019-09-12 22:56:23 UTC - Fredrick P Eisele: Using a really big integer, 2147483647, works fine. Should I report a bug?
----
2019-09-12 23:00:01 UTC - Fredrick P Eisele: I think setting the ttl to 0 means forever. Is that true?
----
2019-09-12 23:12:55 UTC - Ted Hwang: @Ted Hwang has joined the channel
----
2019-09-12 23:28:16 UTC - David Kjerrumgaard: the default cluster setting is -1 which means forever. So you won't need to set it explicitly
----
2019-09-13 00:06:13 UTC - tmcothran: @tmcothran has joined the channel
----
2019-09-13 03:49:20 UTC - Luke Lu: It appears that reader (both Java and C++ client) leaks subscriptions (reader-xxx subscriptions that don’t go away upon close) that have to been cleaned up manually with pulsar-admin?
----
2019-09-13 04:25:22 UTC - Matteo Merli: Yes, it’s a bug — there’s a fix in <https://github.com/apache/pulsar/pull/5022> though some work on unit tests is still pending
----
2019-09-13 04:31:22 UTC - Matteo Merli: Actually, the setting is already there in the storage abstraction.. but it’s not configurable on the broker.conf… we just set to 2GB
----
2019-09-13 04:31:32 UTC - Matteo Merli: that should be easy to add
----
2019-09-13 04:35:42 UTC - vikash: @Vladimir Shchur i have used f# client for Producer and sending Payload through Nifi (Data injestion tool) on sending messages i am getting below Error
----
2019-09-13 04:35:43 UTC - vikash: System.AggregateException: One or more errors occurred. ---&gt; Pulsar.Client.Common.ProducerBusyException: Exception of type 'Pulsar.Client.Common.ProducerBusyException' was thrown
----
2019-09-13 04:37:29 UTC - vikash: i have sent 7478 messages
----
2019-09-13 04:37:44 UTC - vikash: Messages sending Exception
----
2019-09-13 06:42:00 UTC - Vladimir Shchur: @Karthik Ramasamy If I use tiered storage and then something bad happens and I loose all cluster with its storage, only S3 bucket are lefts. Will it be possible to start a new cluster again with that data saved in S3 buckets?
----
2019-09-13 06:52:32 UTC - Vladimir Shchur: Hi I'm not sure I've understood it well. How does Nifi relates to .net client producer? One more thing - could you please configure logging like that <https://github.com/fsharplang-ru/pulsar-client-dotnet/blob/develop/tests/IntegrationTests/Common.fs#L24-L30> and provide the full exception message.
----
2019-09-13 06:54:45 UTC - Vladimir Shchur: Regarding the ProducerBusyException - I don't have full understanding what it means and how client should handle it (it is sent from broker). Can someone clear things up?
----
2019-09-13 06:56:41 UTC - Karthik Ramasamy: @Matteo Merli
----
2019-09-13 07:42:29 UTC - vikash: i think i have code issue
----
2019-09-13 07:42:30 UTC - vikash: <https://pulsar.apache.org/api/client/org/apache/pulsar/client/api/ProducerBuilder.html>
----
2019-09-13 07:43:00 UTC - vikash: PulsarClientException.ProducerBusyException - if a producer with the same "producer name" is already connected to the topic
+1 : Vladimir Shchur
----