You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2018/03/08 09:11:08 UTC

Slack digest for #general - 2018-03-08

2018-03-07 10:07:51 UTC - Ivan Kelly: @david-jin yes
----
2018-03-07 10:08:39 UTC - Ivan Kelly: well, depends actually, you shouldn't use the same ensemble for local and global
----
2018-03-07 10:08:49 UTC - Ivan Kelly: but they can run on the same machine
----
2018-03-07 16:16:31 UTC - Matteo Merli: @david-jin You don’t even strictly need to have the global ZK. Another possibility is to use the local ZK in each region and “manually” synchronize the configuration across the different regions. (eg: by creating the namespace with same config in each region)
----
2018-03-07 16:17:47 UTC - Matteo Merli: Of course, that becomes error prone if you have many tenants/namespaces, but for a single tenant cluster it might be a viable option
----
2018-03-07 17:11:59 UTC - Daniel Ferreira Jorge: Hello! Does pulsar have ANY mechanism to prevent producing the exact same message twice into a topic? (deduplication mechanism, idempotent producer feature, whatever...)
----
2018-03-07 17:13:54 UTC - Ivan Kelly: yes, there's a blog post about it: <https://streaml.io/blog/pulsar-effectively-once/>
----
2018-03-07 17:46:41 UTC - Daniel Ferreira Jorge: Thanks @Ivan Kelly! I think I completely misrepresented what I want. Let me see if I can explain. We are tailing the Couchbase transactions log (DCP) and producing messages to pulsar. The problem is that with Couchbase DCP we can achieve "at least once" delivery. Upon DCP client restart we will have a small subset of messages resent to pulsar. This is what I want to avoid. What I'm looking for is something like "if the contents of the message that pulsar is receiving is exactly the same, pulsar will simply discard the message". From what I read in this article, specifically the parts "a producer application might have sent some messages but then didn’t receive a successful response from the broker" ---AND--- "Each broker keeps track of the last “successfully” published message ID for each individual message producer", this use case is not covered, because if the couchbase reader client send a second, repeated message to pulsar, there is no way pulsar will know. Am I right?
----
2018-03-07 18:03:53 UTC - Matteo Merli: @Daniel Ferreira Jorge I think the important part is to make sure to have some kind of `txnId` property from couchbase. I don’t know about couchbase, but for example this is definitely possible in Postgres. There you can tail the Postgres commit log and get all record with the associated `txnId`. When you publish on Pulsar, you have to use that `txnId` as the `sequenceId` for the message.

The nice part is that you don’t need to store the “last txnId published on Pulsar” anywhere, because you can recover it from the producer after a restart (`producer.getLastSequenceId()`)
----
2018-03-07 18:15:32 UTC - Vladimir Shchur: @Vladimir Shchur has joined the channel
----
2018-03-07 18:17:59 UTC - Vladimir Shchur: Hi, guys! Joined to ask if there are plans for dotnet client?
----
2018-03-07 18:21:01 UTC - Ali Ahmed: currently no official plans , pr’s are welcome , there is support for websocket for simple clients , a base C++ layer for languages like python to wrap around but pure binary compatible .net client is not being considered currently.
----
2018-03-07 18:22:26 UTC - Vladimir Shchur: I see, thank you for clearing things up
----
2018-03-07 18:44:54 UTC - Daniel Ferreira Jorge: Hi @Matteo Merli thank you for the answer! I can have a unique, but non-sequential `txnId` (something like a GUID)... I can't see how pulsar will know when the duplicated messages stopped and new messages are arriving if it only know the last published "non-sequential" sequenceId... maybe I'm missing something?
----
2018-03-07 18:46:14 UTC - Matteo Merli: by non-sequential, you mean completely random or does it increase?

For example: having 1 - 5 - 12 - 45 .. would work
----
2018-03-07 18:46:27 UTC - Daniel Ferreira Jorge: completely random
----
2018-03-07 18:47:34 UTC - Matteo Merli: ok.. then it’s tricky :slightly_smiling_face:
----
2018-03-07 18:49:53 UTC - Daniel Ferreira Jorge: I could assume that the DCP client will always send some duplicated message upon restart and wait for the `producer.getLastSequenceId()` but this will not work if the "crash" was "clean"
----
2018-03-07 18:50:46 UTC - Daniel Ferreira Jorge: by clean I mean the DCP state was persisted exactly where the last `producer.getLastSequenceId()` is
----
2018-03-07 18:52:56 UTC - Daniel Ferreira Jorge: I think I need to look at some other strategy
----
2018-03-07 18:54:49 UTC - Daniel Ferreira Jorge: thanks @Matteo Merli!
----
2018-03-07 18:55:17 UTC - Matteo Merli: you’re welcome
----
2018-03-07 20:33:38 UTC - Daniel Ferreira Jorge: Hi, I have another question. If in a consumer I do not ack some messages (maybe the destination is down), when will it be redelivered? Is it configurable (try to redeliver every X minutes) or I can only explicitly ask for it?
----
2018-03-07 21:50:41 UTC - Adam Williams: @Daniel Ferreira Jorge This only answers part of your question - but you can ask for unacknowledged messages: <https://pulsar.incubator.apache.org/docs/latest/project/BinaryProtocol/#command-redeliverunacknowledgedmessages>
----
2018-03-07 22:48:07 UTC - Matteo Merli: @Daniel Ferreira Jorge the default is to not re-deliver the messages (unless the consumer disconnects).

On the consumer configuration you can configure the acknowledge-timeout. If you don’t acknowledge a message within that time, it will be automatically redelivered either to this or another consumer (if on a shared subscription).

In Java : `ConsumerConfiguration.setAckTimeout()`
In Python: `unacked_messages_timeout_ms` option in the `subscribe()` calls
----