You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2019/10/22 09:11:03 UTC

Slack digest for #general - 2019-10-22

2019-10-21 09:20:41 UTC - Nicolas Ha: So it seems I have a case where I have to call `redeliverUnacknowledgedMessages` after every `negativeAcknowledge` although I am not sure why this is the case
----
2019-10-21 09:35:01 UTC - Retardust: Why not?:) Im expect to write only business/parsing logic.
ack if there is no exception - that's what I expect
----
2019-10-21 09:36:37 UTC - Retardust: I think 99.9% of listeners ack message on the last line of implementation. So I was thinking that it's responsibility of the 'framework'
----
2019-10-21 09:44:48 UTC - Vladimir Shchur: :slightly_smiling_face: In pulsar acking means storing processed message marker on server side. What if you callback requires async logic?
----
2019-10-21 09:47:31 UTC - Retardust: yep, but still not obviouse for noobie like me) I'd add javadoc
----
2019-10-21 10:05:26 UTC - Yong Zhang: Oh. Sorry. I misunderstand the question.
----
2019-10-21 10:23:39 UTC - Markos SF: @Markos SF has joined the channel
----
2019-10-21 11:04:50 UTC - Vladimir Shchur: By the way, kafka doesn't commit offset on the last line of implementation :smiley:
----
2019-10-21 11:17:41 UTC - Vladimir Shchur: And looks like in RabbitMQ as well, still there is a parameter that enables autoack, but by default it doesn't
----
2019-10-21 11:50:52 UTC - Mahesh: Hi,
I ran into this issue while starting the broker.
Caused by: org.apache.pulsar.client.api.PulsarClientException$BrokerMetadataException: Namespace missing local cluster name in clusters list: local_cluster=bpulsar ns=public/functions clusters=[apulsar]
I had to delete all the data in pulsar and zookeeper clusters. This worked without any issues when I installed pulsar for the first time, now when I purged the data for a clean start, its not working
----
2019-10-21 12:29:03 UTC - xiaolong.ran: In pulsar, the consumer receives the message and we can split it into two operations: `receive` and `ack`. For receive but no ack messages, pulsar will store these messages in the `unackmessage` collection, when `acktimeout` is reached,
message not acknowledged within the give time, will be replayed by the broker to the same or a different consumer
----
2019-10-21 12:29:39 UTC - xiaolong.ran: This operation is the `redeliverUnacknowledgedMessages` you see.
----
2019-10-21 12:31:02 UTC - xiaolong.ran: &gt; Also I somethimes see `Reconnecting the client to redeliver the messages` in my logs even though I don’t call `redeliverUnacknowledgedMessages`myself


Once you have configured `acktimeout`, this operation is triggered periodically, which is why you can see them.
----
2019-10-21 12:38:09 UTC - xiaolong.ran: Both nack and acktimeout are a way to handle failed messages.

In many cases, it is hard for application to handle the failure of processing a particular message.

So, we support `negative acks`  feature.
----
2019-10-21 12:42:17 UTC - tuteng: You can try to change the cluster name. It is possible that the data has not been cleaned up.
----
2019-10-21 12:55:24 UTC - Mahesh: cluster names have not changed, we have 2 clusters. apulsar and bpulsar. Data is cleaned from both, Any namespaces created are replicated to other clusters.
----
2019-10-21 13:22:50 UTC - Nicolas Ha: aaah yes much clearer now thanks !:smile:
----
2019-10-21 13:25:59 UTC - Shishir Pandey: @Matteo Merli /@Sijie Guo - any chances of getting C++ interface to functions ?
----
2019-10-21 13:27:30 UTC - Matteo Merli: It would not be difficult to add the API for that 
----
2019-10-21 13:28:00 UTC - Shishir Pandey: That would be great.
----
2019-10-21 13:30:37 UTC - Nicolas Ha: I see this exception in my logs `ProducerQueueIsFullError`
```
org.apache.pulsar.client.api.PulsarClientException$ProducerQueueIsFullError: Producer send queue is full
```
Is it related to back-pressure? What are the common ways to handle this?
It seems the Producer has a bunch of options that are all deprecated.
----
2019-10-21 13:31:11 UTC - Shishir Pandey: I don't think the acknowledgement should be automatic. If in future we were to try and add - possibly XA transactions, you'll get into cases where message acknowledgement must only done as a single transaction. Also it'd make the ability to differentiate between "browsing" messages and consuming difficult.
----
2019-10-21 13:35:45 UTC - Nicolas Ha: Ah I was looking at the old one. Now looking at `ProducerBuilder`

However it is still not clear to me what the course of action should be:
- Ask the user to retry?
- Create a new publisher?
- Use a bigger `maxPendingMessages` ahead of time (what is the default?)
----
2019-10-21 13:36:01 UTC - Matteo Merli: Yes, that’s from back pressure. This happens when using sendAsync. You can set `blockIfQueueFull(true)` to get blocked instead of getting the error. 
----
2019-10-21 13:36:48 UTC - Nicolas Ha: It happens when using `send` too apparently :slightly_smiling_face: I will experiment with async + `blockIfQueueFull`
----
2019-10-21 13:37:15 UTC - Matteo Merli: Multiple threads?
----
2019-10-21 13:38:07 UTC - Nicolas Ha: Probably yes - is the producer not meant to be used with multiple threads?
----
2019-10-21 13:38:42 UTC - Nicolas Ha: ah here are the defaults :slightly_smiling_face: <https://github.com/apache/pulsar/blob/3c8a44e017c792de164547c536c14e277a5a0141/pulsar-client/src/main/java/org/apache/pulsar/client/impl/conf/ProducerConfigurationData.java#L58-L59>
----
2019-10-21 13:41:29 UTC - Nicolas Ha: confirmed - multiple threads
----
2019-10-21 13:42:18 UTC - Matteo Merli: Yes, all Pulsar components are meant to be used by multiple thread. Just checking to confirm, since it would be the only reason to get that error. 
+1 : Nicolas Ha
----
2019-10-21 13:57:48 UTC - Naby: Is it possible to suppress the log output of the pulsar-client (in Python)?
----
2019-10-21 13:58:28 UTC - Chris DiGiovanni: I was wondering if anyone would be able to share their 50pct and 99Pct read entry latencies of their cluster(s) that are using direct attached SSDs?  Currently running our bookkeepers on a VMware ESXi Cluster backed by an all SSD vSAN configured in a policy for RAID1.  For a peak Read entry rate of around 700, I see a peak 50pct read latency of around 242ms and my 99pct at around 471ms.  Trying to gauge how poor this performance is compared to directly attached SSDs.
----
2019-10-21 14:02:41 UTC - Matteo Merli: There’s a PR open: <https://github.com/apache/pulsar/pull/5279>

It would be included in 2.4.2
----
2019-10-21 14:03:10 UTC - Naby: Thanks
----
2019-10-21 14:09:40 UTC - Gilberto Muñoz Hernández: hey guys am having a problem with persistent partitioned topics
----
2019-10-21 14:10:12 UTC - Gilberto Muñoz Hernández: I am using a docker entrypoint with the following lines
----
2019-10-21 14:10:13 UTC - Gilberto Muñoz Hernández: $PULSAR_HOME/bin/pulsar-admin persistent create-partitioned-topic <persistent://public/default/geoLocation> --partitions 3
    $PULSAR_HOME/bin/pulsar-admin persistent create-partitioned-topic <persistent://public/default/zipkin> --partitions 3
----
2019-10-21 14:10:31 UTC - Gilberto Muñoz Hernández: they both report ok, no problem
----
2019-10-21 14:11:01 UTC - Gilberto Muñoz Hernández: but in pulsar dashboard I can only see geoLocation's 3 partitions, non for zipkin
----
2019-10-21 14:11:11 UTC - Gilberto Muñoz Hernández: I am using a 3 nodes cluster
----
2019-10-21 14:11:51 UTC - Gilberto Muñoz Hernández: is it possible that i can only have on partition per node so am trying to have 6 partitions in 3 nodes?
----
2019-10-21 14:12:55 UTC - Gilberto Muñoz Hernández: and the big problem here is that i am also loosing the first messages sent to zipkin topic
----
2019-10-21 14:13:42 UTC - Gilberto Muñoz Hernández: then everithing starts working fine like if the zipkin partitions were created later on when messages actually started to appear
----
2019-10-21 14:16:32 UTC - Gilberto Muñoz Hernández: any ideas guys? @Matteo Merli or @David Kjerrumgaard can you help me?
----
2019-10-21 14:20:58 UTC - Gautam Singh: @Gautam Singh has joined the channel
----
2019-10-21 14:49:18 UTC - Paul Makkar: @Paul Makkar has joined the channel
----
2019-10-21 15:02:25 UTC - Gautam Singh: Hi Folks, We are just about to start building a system to take over from a monolithic NodeJS application. Till now I have done a lot of research on Kafka and have done small PoCs as well. Recently came across Pulsar and it makes sense choosing Pulsar over Kafka, but lack of documentation/references, as mentioned above is a pain point. My Question is, Kafka Streams allows for joining data from multiple topics but I couldn't find any reference or example for joining data from topics within a Pulsar function (No documentation for using Pulsar SQL worker in a function). There are two possible solutions that I can think of 1) Read data from multiple topics using schemas and construct the joined data manually, this option seems to suffer from problem that Pulsar function will trigger for all topics data, when actually I want to trigger the join/calculation from only one topic.  Other topics would be used a join data.   2) The other option could be store data from all the topics and store them in state store. Then I can write a function which is triggered from my topic of interest and the join data can be read from state store.  Are these choices practical or would you recommend another approach. Would appreciate any inputs .. Thanks in advance
----
2019-10-21 15:52:26 UTC - Shishir Pandey: @Gautam Singh to the best of my understanding a function can consume data from multiple topics so doesn't that work for your use case?
----
2019-10-21 16:26:16 UTC - Gautam Singh: @Shishir Pandey Yes you are right,  we can consume multiple topics in single function, but the function gets triggered on new message in any of the topic. The use case that I am trying to achieve is I get a message on Order topic and I want to join the customerId in Order to the Customer topic to get complete Order data. If I subscribe to both Order and Customer in the function, the function will get triggered on addition/update of new customer, but I want the trigger only for new/updated Orders.    Could the Reader (<https://pulsar.apache.org/api/client/org/apache/pulsar/client/api/Reader.html>) interface be what I am looking for ?
----
2019-10-21 16:30:06 UTC - Jerry Peng: @Gautam Singh there also windowed functions you can use 
----
2019-10-21 16:32:58 UTC - Jerry Peng: <https://www.jesse-anderson.com/2019/10/why-i-recommend-my-clients-not-use-ksql-and-kafka-streams/>
+1::skin-tone-4 : Shishir Pandey
----
2019-10-21 16:33:03 UTC - Shishir Pandey: @Gautam Singh - couldn't you write a function to filter on customer-id on both order and customer topic and move it to separate topic from where you process in standard manner ?
----
2019-10-21 16:34:01 UTC - Shishir Pandey: but question is how are you going to ensure that you'd always get the customer topic's message before the order topic
----
2019-10-21 16:34:59 UTC - Jerry Peng: Joining on streams is always windowed
----
2019-10-21 16:39:37 UTC - Gautam Singh: Thanks Jerry  for the Link to article. Brought up issues I hadn't considered :+1:
----
2019-10-21 16:42:40 UTC - Gautam Singh: @Shishir Pandey Yes I think I have to go with an intermediate topic approach, I was trying to avoid that since I was thinking on the lines of Kafka Streams.   What do you think of the approach 2 mentioned in my post above ?
----
2019-10-21 17:05:15 UTC - Shishir Pandey: I guess option 2 is simpler
----
2019-10-21 17:12:52 UTC - Fawad Halim: @Fawad Halim has joined the channel
----
2019-10-21 18:00:04 UTC - Jerry Peng: @Gautam Singh you also use Pulsar SQL to create join queries between topics
----
2019-10-21 19:35:03 UTC - Retardust: yep, I'm talking only about documenting that
<https://github.com/apache/pulsar/pull/5429>
----
2019-10-22 05:36:16 UTC - Retardust: I have ordered stream of events in source. And I need to send it to pulsar topic with same ordering. Am I loose order if I implement MessageListener with async acks?
----
2019-10-22 05:55:04 UTC - Gautam Singh: Thanks Shishir
----
2019-10-22 05:55:34 UTC - Gautam Singh: Is it possible to use SQL inside a Pulsar function ?
----
2019-10-22 07:53:57 UTC - dba: Yesterday we released version 0.7.0 of DotPulsar, which supports TLS authentication: <https://www.nuget.org/packages/DotPulsar>
Since version 0.4.0 we have had support for both .NET Standard 2.0 and 2.1 and with the current feature-list: <https://github.com/danske-commodities/dotpulsar#supported-features>
we are feature complete for version 1.0.0.
If you are using .NET and want to help testing before 1.0.0, then now is a good time and your help is very much appreciated :slightly_smiling_face:
+1 : Ali Ahmed, xiaolong.ran, jia zhai, Matteo Merli, Shishir Pandey
----