You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2019/08/13 09:11:08 UTC

Slack digest for #general - 2019-08-13

2019-08-12 13:46:48 UTC - jah: @jah has joined the channel
----
2019-08-12 13:49:49 UTC - Kim Christian Gaarder: I get an unexpected BookKeeper error when broker attempts to remove a consumer. Is this related to a known issue or should I report this as a bug?
----
2019-08-12 13:50:52 UTC - Alexandre DUVAL: @Sijie Guo I don't find what means already recycle error, do you know? Maybe @Matteo Merli?
----
2019-08-12 14:44:21 UTC - Sijie Guo: can you file a bug for it?
----
2019-08-12 14:45:52 UTC - jah: For multi-topic subscriptions, the docs mention that there are no ordering guarantees. Is it really the case the messages received are not ordered relative to the topic? My expectation was that I might receive messages from topics in a variety of orders, but that subject to a received topic, the messages would still be in order.
----
2019-08-12 14:50:10 UTC - Sijie Guo: in failover or exclusive subsccription, it is in partition based order.
+1 : jah
----
2019-08-12 14:50:18 UTC - Sijie Guo: in key-shared subscription, it is key-based order
----
2019-08-12 14:52:34 UTC - jah: Another question: I've read the docs on tiered storage and the offloading mechanism. Awesome.

Is there any support for re-onloading data from the cloud. For example, if I know I need to reread some very old data repeatedly for a period of time, there are scenarios where I would prefer to download from the cloud using my own tooling and then make those segments known to the system so that access is local. Eventually, I may offload again.
----
2019-08-12 14:54:36 UTC - jah: The idea is that the cloud is used for cold storage but there are situations when we need repeated access to that cold storage and doing that cold each time is too slow. So we would want to download it in bulk, process it as many times as needed, and then some point in the future, restore the offloaded state.
----
2019-08-12 15:04:56 UTC - Kim Christian Gaarder: sure, I’m working on reproducing this consistently. I’m able to reproduce it, but currently it’s hard to know what is causing it. I’ll submit a bug with code to reproduce as soon as I got something.
----
2019-08-12 16:15:26 UTC - David Kjerrumgaard: @jah There isn't any such capability now AFAIK, but that would be an interesting use case. Perhaps you can open a PIP request for this feature?
----
2019-08-12 17:02:22 UTC - Jacob Fugal: at my employer, we prefer declarative config as much as possible. e.g. terraform. I'm writing a terraform provider for pulsar resources (e.g. namespaces and topics) within a tenant. intent would be that it's eventually an official terraform provider (there doesn't appear to be one already that I could determine). would there be interest in having this be part of the pulsar project that I'm contributing to, rather than it being a separate project that I (or my employer, but still open source) fronts?
----
2019-08-12 17:02:43 UTC - Kim Christian Gaarder: @Sijie Guo <https://github.com/apache/pulsar/issues/4941>
----
2019-08-12 17:02:55 UTC - Jacob Fugal: I'm trying to decide where to put my initial commit :smile:
----
2019-08-12 17:14:47 UTC - Luke Lu: Hey guys, trying to figure out offloading vs retention policies. It appears (according to <https://github.com/apache/pulsar/blob/be7b24f9f8aa67b2235e523485249aef8d2a611a/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L2132>) that retention policies also applies to offloaded ledgers. I’d like to have some confirmation from maintainers.
----
2019-08-12 17:25:25 UTC - Luke Lu: On the topic of offloading. Is it possible to bootstrap a new pulsar cluster with an existing offloaded s3 bucket?
----
2019-08-12 17:30:26 UTC - Jon Bock: I believe you would need the ZooKeeper metadata as well for the cluster to know what the segments are.
----
2019-08-12 17:31:59 UTC - Luke Lu: Sure, I wonder if people has done something like this. Hence the “possible” question. I’d like to have this feature officially supported as this offer a much cheaper DR solution.
----
2019-08-12 17:34:49 UTC - Luke Lu: Basically an offload “snapshot” like feature to offload all necessary metadata (including those from zk).
----
2019-08-12 17:38:40 UTC - Sam Leung: Found <https://pulsar.apache.org/docs/en/cookbooks-deduplication/#message-deduplication-and-pulsar-clients-clients> Stating sendTimeout should be 0 so it’s “infinity”. Doesn’t 0 just mean fail immediately so it shouldn’t retry (which will get dropped by deduplication)
----
2019-08-12 17:50:44 UTC - Addison Higham: @Luke Lu just so I understand... in the event of a total failure of bookkeeper, you would be okay with messages lost from BK and instead be able to rebuild from what is retained in s3 and just would need the ability to snapshot the relevant state out of ZK?
----
2019-08-12 17:52:00 UTC - Addison Higham: I plan on using geo replication to do DR, but I still want to figure out some plans to have resiliency against an "oops I deleted a topic" to recover to some snapshot, so that *might* be a workable solution for me as well...
----
2019-08-12 17:55:31 UTC - Jon Bock: I’m not aware that anyone using Pulsar has implemented something like that yet. There is one company who requested a snapshot recovery feature, the devs at Streamlio have been thinking about how that could be provided. You may want to file a feature request to the project.
----
2019-08-12 18:47:35 UTC - Luke Lu: <https://github.com/apache/pulsar/issues/4942>
----
2019-08-12 18:51:57 UTC - Sam Leung: I want to rephrase a question I had before regarding message deduplication. I’m finding that because deduplication drops messages based on the the largest sequence id recorded pre-persist, if there’s an error persisting in BK, a retry attempt will just be “deduplicated” with no message ever getting persisted.
Is there some configuration or some concept I’m missing?
----
2019-08-12 18:53:15 UTC - Addison Higham: sounds like a bug to me...
----
2019-08-12 19:27:50 UTC - Poule: ```
pulsar-admin functions trigger --fqfn test/app/func1 --trigger-value yoshi

java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException:
Retrieve schema instance from schema info for type 'NONE' is not supported yet

Reason: HTTP 500 Internal Server Error
```
----
2019-08-12 19:28:23 UTC - Poule: what did I do wrong?
----
2019-08-12 19:59:57 UTC - David Kjerrumgaard: What command did you use to create the function? It looks like it is missing a schema
----
2019-08-12 20:01:41 UTC - Poule: I did not specify a schema in the Yaml
----
2019-08-12 20:02:58 UTC - Poule: In the Yaml I have only: `name, className, py, tenant, namespace, inputs, output`
----
2019-08-12 21:15:56 UTC - Addison Higham: hrm, wondering what the best way would be to build a per message exponential back-off consumer with pulsar. Example:
Let's say I am using Pulsar as a queue (multiple consumers in a shared subscription) and those consumers publish those messages to a webhook. If the webhook fails, I want to try up to 5 times with an exponential backoff that gets pretty long for the last retry (let's say 4 hours).

Options:
- I could sort of use nacks, but the re-delivery time is all static and can't be set per message. If I did my own tracking and used the `redeliverUnacknowledgedMessages` API (which is what nacks appear to do), I could control that with some granularity, but that forces me to have ackTimeout be longer then my max backoff, which leads to some weird behavior in the case of a consumer failing
- I could ack the message and then re-publish it to the topic after the timeout in the client, but then I lose pulsar message tracking and would have to implement my own metadata and retry tracking, likely not able to make use of pulsar dead-letter functionality

New functionality that could help with this:
if pulsar had the option to do a "visibility timeout" like AWS SQS, that is pretty ideal for these cases. I immediately respond per message with a timeout that is respected before it will be redelivered and all the state tracking is offloaded to the server. However, this may not fit well with the pulsar model, especially a failover/exclusive subscription.
----
2019-08-12 21:45:29 UTC - Ali Ahmed: @Addison Higham there is no simple answer but pulsar dead letter queue functionality will probably be useful here.
----
2019-08-12 21:49:51 UTC - Addison Higham: yeah, I am sort of thinking of doing an ackTimeout on my main queue of like 5-10 minutes so I can get 5 retries to happen over about 20 minutes, then the deadletter would have another consumer with a much longer ackTimeout and therefore much longer time I could wait to send a nack
----
2019-08-12 21:50:03 UTC - Ali Ahmed: makes sense
----
2019-08-12 23:33:32 UTC - Ali Ahmed: @Jacob Fugal You can create a pr for pulsar in the open source just put in a folder say terraform-scripts
----
2019-08-12 23:34:10 UTC - Ali Ahmed: @krishna you can take a look at this tutorial
<https://debezium.io/blog/2019/05/23/tutorial-using-debezium-connectors-with-apache-pulsar/>
----
2019-08-13 01:43:05 UTC - VDDCSS: @VDDCSS has joined the channel
----
2019-08-13 01:52:42 UTC - g891052195: @g891052195 has joined the channel
----
2019-08-13 03:15:47 UTC - Chitra Babu: @Chitra Babu has joined the channel
----
2019-08-13 08:00:48 UTC - Kim Christian Gaarder: Question about Pulsar SQL:
Given that a query like (SELECT __message_id__ FROM …) returns the string (167,2,0), what are the different parts of that string?
I know that 167 is the ledger-id, and I’m guessing that 2 is entry-id, but what is 0? is it batch-id? or is it partition-index? … and how can I construct a MessageId instance in java from these values?
----
2019-08-13 08:03:53 UTC - Sijie Guo: batch-slot-id
----
2019-08-13 08:05:25 UTC - Kim Christian Gaarder: Is the best way to get a MessageId from that then: new BatchMessageIdImpl(ledgerId, entryId, -1, batchIndex) ?
----
2019-08-13 08:05:49 UTC - Kim Christian Gaarder: is it correct to do partitionIndex = -1 when it’s a non-partitioned topic, or is that unrelated to this?
----
2019-08-13 08:08:29 UTC - Kim Christian Gaarder: Ok, so next question. When I do Consumer.seek(messageId) and that messageId was the one from the pulsar-sql, the behavior I see is that the next receive() call gets that message and not the message after, is this the intended behavior?
----
2019-08-13 08:10:03 UTC - Sijie Guo: &gt; is it correct to do partitionIndex = -1 when it’s a non-partitioned topic, or is that unrelated to this?
-1 is the non-partitioned topic.

&gt; is this the intended behavior?

yes. it is inclusive.
----
2019-08-13 08:10:41 UTC - Kim Christian Gaarder: ok, so all is good then, and Pulsar behaves as it should, great thanks :slightly_smiling_face:
+1 : Sijie Guo
----
2019-08-13 08:51:38 UTC - Kim Christian Gaarder: I have a bug related to Pulsar SQL:
It appears that all messages except the very last published message is available for query in Pulsar SQL. Is this a known bug?
----
2019-08-13 08:54:21 UTC - Yuvaraj Loganathan: Yes this is known one
----