You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2019/12/18 09:11:04 UTC

Slack digest for #general - 2019-12-18

2019-12-17 09:17:37 UTC - Martin Kunev: Hi,
I am running pulsar 2.3.2 in stand alone mode. I have the following issue:
I subscribe to a topic with pulsar-client on region2 (shared subscription). Then I use the pulsar-client to publish on region1. Both regions have the replication clusters correctly set, but I never receive the message on region2.
What can cause this problem?
----
2019-12-17 10:23:23 UTC - rmb: @rmb has joined the channel
----
2019-12-17 10:39:28 UTC - Fernando: Hi guys, I’d appreciate any feedback on a problem we’re having with the current implementation of the redelivery count. Here’s the issue <https://github.com/apache/pulsar/issues/5881>
----
2019-12-17 10:44:37 UTC - rmb: Hi, I'm trying to get pulsar standalone running, and I'm having trouble with some basic python scripts.  I downloaded pulsar-2.4.1 and unpacked the tarball; `bin/pulsar standalone` seems to start fine, but when I try to use the python libraries to produce or consume messages, I get errors like
```ERROR ClientImpl:182 | Error Checking/Getting Partition Metadata while creating producer on <persistent://public/default/my-topic> -- 5```
(and sometimes pulsar then shuts down).  the bin/pulsar-client script isn't any better; I get a long stream of java exceptions and then
```INFO  org.apache.pulsar.client.cli.PulsarClientTool - 0 messages successfully produced```
(or consumed, respectively).  any suggestions?

fwiw, I can produce and consume messages using the docker image (but I'm having other issues there, so I thought I'd give `bin/pulsar standalone` a try)
----
2019-12-17 10:47:41 UTC - tihomir: @rmb check if public/default namespace is created
----
2019-12-17 10:48:08 UTC - rmb: I thought that was automatically created on startup?
----
2019-12-17 10:50:04 UTC - tihomir: yes but it takes time for that. My guess is that you are executing your call too early and the namespace does not exist yet
----
2019-12-17 10:51:01 UTC - rmb: I don't think so, I've been waiting substantial amounts of time
----
2019-12-17 10:52:12 UTC - rmb: anyway, pulsar just exited before I had a chance to run any admin commands
----
2019-12-17 10:55:48 UTC - rmb: ```$ bin/pulsar-admin tenants list
null

Reason: javax.ws.rs.ProcessingException: Connection refused: localhost/127.0.0.1:8080```
----
2019-12-17 11:05:18 UTC - rmb: anyway, since the docker image is running and letting me produce and consume, I'll post the questions I have about that
----
2019-12-17 11:26:26 UTC - rmb: the command I'm running is `docker run -it -p 6650:6650 -p 8080:8080  --mount source=pulsardata,target=/pulsar/data --mount source=pulsarconf,target=/pulsar/conf apachepulsar/pulsar:2.4.1 bin/pulsar standalone`
...ok, I'm having trouble replicating the weird deduplication behavior I was seeing, so here's a different question, about ordering.  I had thought that for an un-partitioned topic, message delivery ordering was guaranteed per-producer (for messages sent without a key).  so I'm using the node client to send some messages:
``` 12 »       const producer = await client.createProducer({
 14 »       »       topic: `${topic}`,
 15 »       »       producerName: 'my-producer',
 16 »       »       messageRoutingMode: 'UseSinglePartition',
 17 »       »       sendTimeoutMs: -1,
 18 »       });
 22 
 23 »       // Send messages
 24 »       for (let i = 0; i &lt; 10; i += 1) {
 25 »       »       const msg = `my-message-${i}`;
 26 »       »       const message = {
 27 »       »       »       data: Buffer.from(msg),
 28 »       »       }
 29 »       »       producer.send(message);
 31 »       }```
but I'm receiving them in a different order:
```Received message: 'b'my-message-0''
Received message: 'b'my-message-1''
Received message: 'b'my-message-3''
Received message: 'b'my-message-2''
Received message: 'b'my-message-4''
Received message: 'b'my-message-5''
Received message: 'b'my-message-6''
Received message: 'b'my-message-7''
Received message: 'b'my-message-8''
Received message: 'b'my-message-9''```
----
2019-12-17 13:37:00 UTC - rmb: some further questions about deduplication (based on <https://github.com/apache/pulsar/wiki/PIP-6:-Guaranteed-Message-Deduplication>):
• if a producer is setting sequenceId on a per-message basis, are there any constraints imposed by the broker? for example, does the sequence have to be monotonic?  the doc says that the broker keeps track of the highest sequenceId received from a particular producer; does that mean that if a producer sent messages 1, 2, 4, 3, the last one would be rejected because its sequenceId is less than 4? how would the broker distinguish that situation from the messages simply arriving out-of-order?
• if a producers uses a custom sequence of sequenceIds, with 'holes', what does that mean for acknowledgements? is the consumer's view of the sequence of messages entirely separate from the producer's sequenceIds?
----
2019-12-17 15:00:34 UTC - Greg Hoover: Is this a known issue or is this unique to me?
----
2019-12-17 15:05:26 UTC - Greg Hoover: I should have searched the slack channel for grafana. There are a couple of things mentioned I have not tried yet. Will report back. 
----
2019-12-17 18:43:50 UTC - Daniel Ferreira Jorge: @rmb Let me try to go point by point:

• The sequenceId does not need to be monotonic, just increasing.
• If you send messages 1,2,4,3 the message 3 will be discarded.
• The deduplication is a simple mechanism that maps a *producer name and a number.* You should not have 2 producers with the same name if you are using deduplication. 
• The consumer has absolutely nothing to do with sequence numbers of producers. When a broker receives a message from producer named "X", it will check in the map what is the last sequenceId of the producer X. If the message being produced now by producer X has an equal or greater sequence number, the message will be discarded. The deduplication mechanism has absolutely no other implications. 
----
2019-12-17 18:55:39 UTC - rmb: thanks, @Daniel Ferreira Jorge.  what if a producer sent messages 1, 2, 3, 4, but due to network issues they arrived at the broker in the order 1, 2, 4, 3? would the broker assume message 3 was a duplicate and discard it?
----
2019-12-17 18:58:39 UTC - Daniel Ferreira Jorge: I do not believe that this can happen in this context because for the producer to send message 4 the broker must have sent an ack of message 3... maybe I'm wrong and someone more knowledgeable, like @Sijie Guo or @Matteo Merli can give you a certainty about that...
----
2019-12-17 18:58:59 UTC - Joe Francis: ^^,  you are right
slightly_smiling_face : Daniel Ferreira Jorge
----
2019-12-17 18:59:36 UTC - rmb: really? the diagrams at <https://pulsar.apache.org/docs/en/develop-binary-protocol/> seem to suggest otherwise
----
2019-12-17 18:59:46 UTC - rmb: and what about batch mode?
----
2019-12-17 19:00:32 UTC - Joe Francis: Messages are published in the order they are queued by the producer..
----
2019-12-17 19:02:39 UTC - rmb: and is that queue order determined by the sequenceIds?
----
2019-12-17 19:06:42 UTC - Joe Francis: No. Its the order of send() invoked by the Producer.
----
2019-12-17 19:14:39 UTC - rmb: ok, so the diagrams are misleading and every send() has to be ack'ed before the next one is called.  what about deduplication in batch mode?  does the broker update the highest sequenceId for a given producer after it reads the entire batch of messages, or as it reads it?
----
2019-12-17 19:15:21 UTC - Joe Francis: The diagrams are correct
----
2019-12-17 19:16:44 UTC - rmb: the diagram under "producers" shows send() being called twice before sendReceipt is called
----
2019-12-17 19:19:26 UTC - Joe Francis: ??
----
2019-12-17 19:19:44 UTC - rmb: <https://pulsar.apache.org/docs/en/develop-binary-protocol/#producer>
----
2019-12-17 19:21:16 UTC - Joe Francis: That is a sequence diagram. It shows messages are acked in the order they are published. Nothing else is implied
----
2019-12-17 19:27:26 UTC - rmb: it would be considerably clearer if the second send() and the first sendReceipt() were interchanged, but ok
----
2019-12-17 19:27:49 UTC - rmb: any idea why I was receiving messages out-of-order above?
----
2019-12-17 19:28:42 UTC - Jason Fisher: it might be an async logging artifact
----
2019-12-17 19:30:08 UTC - Joe Francis: Actually, its meant to imply the opposite (and contrary to what you understood). There is no send1-ack1-send2-ack2 waits. Its send1-send2, and ack1, ack2 is independent. The diagram actaully shows pipelining, and shows exactly what it's meant to do
----
2019-12-17 19:30:18 UTC - Nick Nezis: @Nick Nezis has joined the channel
----
2019-12-17 19:31:29 UTC - rmb: ok.  so then my question remains: how is the broker determining order in the producer queue?  your initial answer was that the producer was waiting for each message to be ack-ed
----
2019-12-17 19:32:18 UTC - Joe Francis: No, my answer was that it was the order in which you invoke send,or (sendasync).
----
2019-12-17 19:34:02 UTC - Joe Francis: If you use send(), of course it will block for the ack. That'a an artifact of using a blocking API. But that's not required for dedup You can use sendAsync.
----
2019-12-17 19:35:55 UTC - Joshua Dunham: Hi Everyone, I'm getting `"python pulsar-producer.py" terminated by signal SIGILL (Illegal instruction)"` with the 2.4.2 client. Anyone else see this?
----
2019-12-17 19:36:18 UTC - Joshua Dunham: I have a MWE that worked with 2.4.1p1 and no longer in 2.4.2
----
2019-12-17 19:36:50 UTC - Joe Francis: As for what you observed, I dont have enough details. But messages will be published in the same order they are send. Be aware that if you send 1,2,4,3 seq-id in that order, Pulsar will attempt to publish in  that order, and the broker will reject it if dedup is on. In other words, Pulsar will not sort  disordered seq-id
----
2019-12-17 19:37:23 UTC - Joshua Dunham: Once I upgraded to 2.4.2 the pulsar python module complained about also wanting protobuf@2.6 which I had to install from the brew retired repository.
----
2019-12-17 19:38:38 UTC - rmb: thanks.  but how does the broker distinguish 1, 2, 4, 3 from 1, 2, 3, 4 arriving out of order?
----
2019-12-17 19:40:34 UTC - Joe Francis: Whatever order e you invoke send on the client, is the order the broker will see
----
2019-12-17 19:41:46 UTC - rmb: yes, you've said, but I've been asking you how the broker determines that order.  is it looking at a timestamp?
----
2019-12-17 19:43:14 UTC - Joe Francis: Perhaps you are leaving something out that you know, but I dont.. this is a very simple concept. If you do send(1)/send(2), send(3) in that order Or (sendasync()) , the broker will see 1, 2, 3.  There is no timpestamps its the oder in which you send
----
2019-12-17 19:44:53 UTC - rmb: producers are talking to brokers over a network, which means that if there's a network issue, messages can arrive in a different order than they were sent.  I assume there must be some way for brokers to deal with that
----
2019-12-17 19:45:03 UTC - Joshua Dunham: Hi @rmb: I can possibly help. For each client cxn pulsar only acks when it's reached a write quorum. If it acks, the record has been noted. If you are connecting async then one thread could write ahead of another. If you write bulk the ack is on the bulk payload and the client needs to understand if there is order in the individual records.
----
2019-12-17 19:47:21 UTC - Joe Francis: Batching and quorum are entirely orthogonal to message ordering
----
2019-12-17 19:47:29 UTC - jmogden: Hello, I'm trying to understand mutli-topic subscription using regex patterns. From what I've found, when a consumer subscribes to topics in a namespace using a regex expression it will subscribe to everything that matches the regex. I noticed that if a new topic is made that the consumer wasn't initially subscribed to, then it won't actually subscribe to and consume from the new topic; even if it would match the regex. Is there a way to have the consumer subscribe to the new topic with being closed and re-made?
----
2019-12-17 19:49:31 UTC - Joe Francis: Ordering is determined at the client, based on the order in which you invoke send()/sendasync.  Batching is a transport and i/O optimization, that is entirely immaterial to odering. Quuorum is not visible to client.
----
2019-12-17 19:50:01 UTC - Joe Francis: Fo eg: <https://github.com/streamlio/pulsar-java-tutorial/blob/master/src/main/java/tutorial/async/AsyncProducerTutorial.java>
----
2019-12-17 19:51:22 UTC - Joe Francis: Messages will be publsihed in the  order that sends are invoked in the loop, so they will be published in the loop counter order
----
2019-12-17 19:53:59 UTC - rmb: thanks, @Joshua Dunham, I'm trying to understand how that interacts with the deduplication feature.  if a producer is connecting asynchronously and one thread writes ahead of another (so that the producers sendAsync's 1, 2, 3, 4 but the broker sees messages in the order 1, 2, 4, 3), will the broker delete messages?
----
2019-12-17 19:54:37 UTC - Joe Francis: Ha - so you are using mult-threading in the Producer?
----
2019-12-17 19:54:53 UTC - rmb: no, I'm just worried about an unreliable network
----
2019-12-17 19:54:57 UTC - rmb: packets can arrive out of order
----
2019-12-17 19:55:05 UTC - Joe Francis: That's not something you have to worry about
----
2019-12-17 19:55:10 UTC - rmb: I assume there must be some way to deal with this
----
2019-12-17 19:55:12 UTC - rmb: why not?
----
2019-12-17 19:55:52 UTC - Joe Francis: Because Pulsar guarantees that the order in which you send is the order in which it gets published.
----
2019-12-17 19:56:01 UTC - Jason Fisher: Switch to sync and not async on the consumer
----
2019-12-17 19:56:18 UTC - Joshua Dunham: @rmb Definitely create a app scoped timestamp or id in this case. You cannot use any queue as ordered if you are writing fifo.
----
2019-12-17 19:56:22 UTC - Jason Fisher: You can’t judge the logging output to be the actual order things arrive 
----
2019-12-17 19:56:34 UTC - Jason Fisher: Add a received timestamp to the log 
----
2019-12-17 19:56:59 UTC - rmb: how does pulsar guarantee that the order in which you send is the order in which it gets published?
----
2019-12-17 19:57:02 UTC - Jason Fisher: Your console output is not async safe 
----
2019-12-17 19:57:11 UTC - Jason Fisher: In terms of keeping things in order 
----
2019-12-17 19:57:46 UTC - Joshua Dunham: @rmb, not the order you send, the order that pulsar acks.
----
2019-12-17 19:58:23 UTC - Roman Popenov: The only order you can guarantee is within Pulsar cluster
----
2019-12-17 19:58:49 UTC - Joshua Dunham: If you send synchronously you wait for each ack. If async then some other threads can get ackd depending on how slow the backend is to achieve write quorum etc.
----
2019-12-17 19:59:22 UTC - rmb: ok, great.  so if the broker has deduplication enabled and sees 1, 2, 4, 3, it will assume that 3 is a duplicate and delete it, even if the producers sent it async before 4?
----
2019-12-17 19:59:43 UTC - Roman Popenov: `Java client components are thread-safe: a consumer can acknowledge messages from different threads.`
----
2019-12-17 19:59:50 UTC - Roman Popenov: But there is no such guarantees with producers
----
2019-12-17 20:00:20 UTC - Joshua Dunham: It doesn't see 1,2,4,3, it sees 1,2,3,4 and your app sees the contents out of order.
----
2019-12-17 20:00:50 UTC - rmb: how do you guarantee that it sees 1,2,3,4?
----
2019-12-17 20:01:22 UTC - Joshua Dunham: I mean, pulsar orders linearly what it sees.
----
2019-12-17 20:01:53 UTC - Roman Popenov: Well, I would assume if you have one producer that produces to one topic, it will be in order
----
2019-12-17 20:01:54 UTC - Joshua Dunham: 1,2,3,4 being pulsar derived IDs.
----
2019-12-17 20:02:51 UTC - rmb: this thread of conversation started with me trying to understand custom sequenceIds and deduplication
----
2019-12-17 20:02:52 UTC - Joshua Dunham: Like if you have two small python loops filling in a spreadsheet with contents. The index of the spreadsheet is always 1-&gt;N in order. but the contents have no guarantee to make sense to the app.
----
2019-12-17 20:05:05 UTC - Joe Francis: @rnb There is a Q on the client, which is where ordering is imposed. The only way to ensure this order is to invoke send()/aysnc() in the order you desire. (which you cannot ensure if you use a multi-threaded Producer and invoke send from multiple threads) . This Q is what gets transported to the client.  Batching/compression etc are tranpsort mechanisms, which only affect how the Q is moved, and not the order. Underneath, TCP is used, which guarantees network order on a given connection.  Ordering is enforced, in that if the transfer fails on a msg in the middle of the Q, everything after msg  will also be failed.
----
2019-12-17 20:08:12 UTC - Joe Francis: Receipts/acks will also come in similar order. Everything from a producer will be acked in the order in the Q. All these acks will be delivered into a consumer Q in the client.  If you read it out one by one, you will get the publish order. (If you read it with multi-threaded consumer, you will lose ack ordering)
----
2019-12-17 20:08:53 UTC - Roman Popenov: Although consumers ARE thread safe
----
2019-12-17 20:10:17 UTC - Joe Francis: Thread safe =/= ordering. It means that they can dequeue without stepping on each other. It does not gaurantee they execute in the same order they dequeued.
----
2019-12-17 20:10:21 UTC - Roman Popenov: It isn’t
----
2019-12-17 20:10:46 UTC - Roman Popenov: The other question I have, how to handle a chunked message
----
2019-12-17 20:11:08 UTC - rmb: ok, thanks.  why does one message getting lost mean that all subsequent messages will get lost?
----
2019-12-17 20:11:17 UTC - Roman Popenov: Is it possible to know that chunks are part of the same message and skip them until next message with multiple consumers?
----
2019-12-17 20:15:32 UTC - Joe Francis: That's the guarantee given by Pulsar.  If you publish 1,2,3,4.5,6,7,8,9.10 and for some  error on the server side, Pulsar could not store 5, then the integrity of the Q order is lost. Pulsar will not ack the rest . It will ack 1..4, and then fail 5-10.
----
2019-12-17 20:18:30 UTC - rmb: ok, but you're specifically allowed to send messages with a sparse sequence of sequenceIds.  if a broker receives 1,2,3,4,6,7,8,9,10, how does it distinguish the producer sending that sequence from the producer sending 1,2,3,4,5,6,7,8,9,10 and 5 getting lost?
----
2019-12-17 20:27:44 UTC - rmb: Thanks for the answers to my questions!  I'm afraid it's dinner time for me and I have to drop offline
----
2019-12-17 20:37:01 UTC - Joe Francis: This has nothing to do with seq-id. The numbers i used indicate message order, not seq-id
----
2019-12-17 20:46:53 UTC - Joshua Dunham: Anyone see issues with the pulsar python client?
----
2019-12-17 20:47:25 UTC - Joshua Dunham: I'm getting a hard error (think it's in the C components)
----
2019-12-17 20:47:27 UTC - Joshua Dunham: "python pulsar-producer.py" terminated by signal SIGILL (Illegal instruction)" with the 2.4.2 client. Anyone else see this?
----
2019-12-17 21:00:54 UTC - ec: Is it safe to connect and use the Bookkeeper that Pulsar uses, for you know possibilities?
----
2019-12-17 21:04:21 UTC - tihomir: guys we are using pulsar 2.3.2 and we have the following strange problem
I subscribe to a topic with pulsar-client on region2 (shared subscription). Then I use the pulsar-client to publish on region1. Both regions have the replication clusters correctly set, but I never receive the message on region2.
----
2019-12-17 21:16:59 UTC - Addison Higham: @tihomir struggling to remember the call at the moment, but there is a status call for replication that will let you know what is happening, also, the broker logs are pretty useful for debugging replication
----
2019-12-17 22:15:57 UTC - Roman Popenov: So I ran `kubectl apply -f zookeeper.yaml` and my kubectl config is pointing to an EKS cluster in AWS
----
2019-12-17 22:16:16 UTC - Roman Popenov: It doesn’t seem to be in the default namespace
----
2019-12-17 22:16:22 UTC - Roman Popenov: Is that to be expected?
----
2019-12-17 22:53:41 UTC - Greg Hoover: Got it working. I had not setup Prometheus. It was included with some other containers I was using previously so overlooked that part. Downloaded a Prometheus container and configured it with the other containers and now it is working fine. Interestingly, the grafana dashboards in the streamnative and apache containers are different. I like different aspects of both, so I’ll prob use them both for a while. 
----
2019-12-17 23:33:23 UTC - Greg Hoover: Looks like the streamnative one may be a superset of the Apache. So will use the streamnative one for now.
----
2019-12-18 02:41:26 UTC - LaxChan: pulsar geo-replication must be use same zk cluster?
----
2019-12-18 03:29:18 UTC - jia zhai: It is not necessary
----
2019-12-18 03:29:33 UTC - jia zhai: <https://gist.github.com/jiazhai>
----
2019-12-18 03:30:06 UTC - jia zhai: Here contain 2 example. 1 use globalzk, another not using global zk
----
2019-12-18 06:11:49 UTC - LaxChan: :+1:
----
2019-12-18 09:06:14 UTC - Jasper Li: Hello all, I want to ask a question of Pulsar SQL. Does Pulsar SQL actually consume data in a subscription or have it just scanned the data from storage directly?
----