You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2020/02/06 09:11:03 UTC

Slack digest for #general - 2020-02-06

2020-02-05 10:22:59 UTC - Konstantinos Papalias: Thanks for the update @Sijie Guo, is there an open conversation currently or any ongoing PIP ?
----
2020-02-05 12:20:02 UTC - Roman Popenov: I cannot speak to that, I would follow the issue at hand:
PIP 37: <https://github.com/apache/pulsar/wiki/PIP-37:-Large-message-size-handling-in-Pulsar>
and on github: <https://github.com/apache/pulsar/pull/4400#issue-283063448>
+1 : Ryan
----
2020-02-05 12:30:54 UTC - Anonymitaet: Dear Pulsar enthusiast, 

The first Pulsar live video stream, TGIP-CN, will be available soon! :smiley_cat:

Date: Feb 9, 2020 (this Sunday)
Time: 11:00 CST, 
Topic: Pulsar architecture
Instructor: Sijie Guo from StreamNative
Duration: 40 min
Language: Chinese
Live streaming link: <https://live.bilibili.com/21468418>

Want to chat with Pulsar core engineer directly and get first-hand experiences? Do not miss this great opportunity! :raising_hand:
----
2020-02-05 12:33:57 UTC - Fernando: Are the brew binaries for `libpulsar` up to date with `2.5.0`? It seems that the latest version in brew is `2.4.2`
----
2020-02-05 15:34:08 UTC - Mikhail Veygman: @Mikhail Veygman has joined the channel
----
2020-02-05 15:53:41 UTC - Mikhail Veygman: Hi..  New to the forum.  I was wondering if there is a way to subscribe to the same topic with multiple clients and receive all messages that have ever been published to that topic.  If it helps pulsar running v2.4.1+, Java Client.
----
2020-02-05 16:20:46 UTC - Ryan Slominski: @Ryan Slominski has joined the channel
----
2020-02-05 16:27:07 UTC - Ryan Slominski: Hi - just trying out pulsar and following docs standalone example and hit snag on second step - try to subscribe:

``` -bash-4.2$ ./pulsar-client consume my-topic -s "first-subscription"
cat: /opt/pulsar/distribution/server/target/classpath.txt: No such file or directory
Error: Could not find or load main class org.apache.pulsar.client.cli.PulsarClientTool```

----
2020-02-05 16:27:58 UTC - Matteo Merli: is that from a binary distribution?
----
2020-02-05 16:28:29 UTC - Ryan Slominski: Yeah, just downloaded it.  I think I figured it out - looks like symbolic links in path are not allowed
----
2020-02-05 16:29:07 UTC - Matteo Merli: yes, it's better to just rename the directory
----
2020-02-05 16:33:44 UTC - Sijie Guo: For each client, you can just have one separate subscription. A subscription will receive all the data.
----
2020-02-05 16:53:44 UTC - Sam Leung: :thumbsup:
----
2020-02-05 16:59:08 UTC - Mikhail Veygman: For each client or for each topic?
----
2020-02-05 17:12:06 UTC - Fernando: I’m trying to offload topics to S3 but I keep getting `No ledgers to offload` Is it supposed to offload only messages that are not acked? Or am I missing some configuration?
----
2020-02-05 17:41:47 UTC - Mikhail Veygman: @Sijie Guo Is this per client or per topic?  I can't seem to receive all messages for one of the clients.  Does this need to be a Reader or will Consumer do just fine?
----
2020-02-05 17:44:56 UTC - Sijie Guo: it only offloads messages that are not deleted (i.e. messages not acked or rentetion policy applied)
----
2020-02-05 17:49:11 UTC - Sijie Guo: your requirement is to have each client subscribed to the topic and each client receives all message, no?
----
2020-02-05 17:49:21 UTC - Sijie Guo: do I misunderstand your requirements?
----
2020-02-05 18:01:19 UTC - Clemens Vasters: @Clemens Vasters has joined the channel
----
2020-02-05 18:28:08 UTC - Pradeesh: @Sijie Guo ^^ can you help us out with this error
----
2020-02-05 18:37:58 UTC - Guilherme Perinazzo: is there a way to force the client to send more than one message in a batch?
----
2020-02-05 18:38:13 UTC - Guilherme Perinazzo: I'm trying to test something, but it always seems to send 1 message batches
----
2020-02-05 18:38:44 UTC - Matteo Merli: it depends on the bathing max delay time
----
2020-02-05 18:39:12 UTC - Guilherme Perinazzo: i set it to 10000ms
----
2020-02-05 18:39:33 UTC - Matteo Merli: are you setting delays/keys?
----
2020-02-05 18:40:10 UTC - Matteo Merli: also, are you calling send() or sendAsync() :
----
2020-02-05 18:40:12 UTC - Matteo Merli: ?
----
2020-02-05 18:40:46 UTC - Guilherme Perinazzo: Oh, i'm doing send, yeah, makes sense
----
2020-02-05 18:41:21 UTC - Matteo Merli: you should change to :
• producer.sendAsync()
• producer.sendAsync()
• producer.flush()
----
2020-02-05 18:43:04 UTC - Guilherme Perinazzo: yeah, using async worked, thanks!
----
2020-02-05 18:46:49 UTC - Mikhail Veygman: That is correct
----
2020-02-05 18:47:11 UTC - Mikhail Veygman: I think I misunderstood the issue I was having.
----
2020-02-05 19:21:35 UTC - Mikhail Veygman: Thank you.
----
2020-02-05 20:31:15 UTC - Ryan Slominski: Hi - I'm experimenting with pulsar 2.5.0 and noticed bin/pulsar-daemon stop standalone results in some exceptions in the log file like:

```15:24:50.952 [Thread-1] ERROR org.apache.distributedlog.BKAbstractLogWriter - Completing Log segments encountered exception
<http://java.io|java.io>.IOException: Failed to close ledger for streams_000000000000000001_000000000000000001_000000000000000000:&lt;default&gt;:inprogress_000000000000000002 : BookKeeper client is closed
        at org.apache.distributedlog.BKLogSegmentWriter$6.closeComplete(BKLogSegmentWriter.java:660) ~[org.apache.distributedlog-distributedlog-core-4.10.0.jar:4.10.0]
        at org.apache.bookkeeper.client.LedgerHandle$5.lambda$safeRun$0(LedgerHandle.java:552) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
        at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_232]
        at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_232]
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_232]
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_232]
        at org.apache.bookkeeper.client.LedgerHandle$5.lambda$safeRun$3(LedgerHandle.java:614) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
        at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_232]
        at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_232]
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_232]
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_232]
        at org.apache.bookkeeper.client.MetadataUpdateLoop.lambda$writeLoop$1(MetadataUpdateLoop.java:146) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
        at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_232]
        at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_232]
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_232]
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_232]
        at org.apache.bookkeeper.meta.CleanupLedgerManager.lambda$close$1(CleanupLedgerManager.java:246) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
        at java.util.concurrent.ConcurrentHashMap$KeySetView.forEach(ConcurrentHashMap.java:4649) ~[?:1.8.0_232]
        at org.apache.bookkeeper.meta.CleanupLedgerManager.close(CleanupLedgerManager.java:246) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
        at org.apache.bookkeeper.client.BookKeeper.close(BookKeeper.java:1410) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
        at org.apache.distributedlog.BookKeeperClient.close(BookKeeperClient.java:271) ~[org.apache.distributedlog-distributedlog-core-4.10.0.jar:4.10.0]```
Doesn't exactly inspire confidence in a tool specializing in concurrency.   Is this a known issue?
----
2020-02-05 20:33:19 UTC - Sijie Guo: I think the bookkeeper client was forced to close when there are still cleanup tasks. I don’t think it is impacting anything. but it is something can be improved.
----
2020-02-05 20:35:37 UTC - Ryan Slominski: Is this a problem only with standalone or does this affect production cluster code too?   I'm simply evaluating the software and actually ran the stop command on a completely idle instance (no clients connected).   Not getting a good feeling about the software stability.
----
2020-02-05 20:48:09 UTC - Antti Kaikkonen: When I'm creating a source connector with --processing-guarantees EFFECTIVELY_ONCE I can only achieve throughput of ~100 msg/s. The ack latency is about 10ms so it seems that every read happens only after the previous message has been acknowledged. Is there any way to get higher throughput with effectively once guarantees?
----
2020-02-05 20:54:24 UTC - Sijie Guo: It is standalone. Standalone enables all the components. Some components are still in developer preview (e.g. the function state related components).
----
2020-02-05 20:55:32 UTC - Joe Francis: There was an unclean shutdown, sure.  But does it recover .. thats the real test.  Depending on a clean shutdown should not be a pre-requisite for system stability.
----
2020-02-05 21:09:07 UTC - Alexander Ursu: @Alexander Ursu has joined the channel
----
2020-02-05 21:11:11 UTC - Alexander Ursu: New to Pulsar, is there some sort of guide to setting up a multi-node cluster using Docker Swarm?
----
2020-02-05 21:12:02 UTC - Sijie Guo: what is your source connector? I think the throughput here mainly depends on how does the connector implement the #read method. If you can’t change the way how the connector reads the data, you can scale up the throughput by increasing parallelism of functions.
----
2020-02-05 21:13:27 UTC - Sijie Guo: I don’t think there is a guide specific about DockerSwarm. You can try to read the general guide for on-prems deployment and maybe kubernetes deployment.

<http://pulsar.apache.org/docs/en/deploy-bare-metal/>
<http://pulsar.apache.org/docs/en/deploy-kubernetes/>
----
2020-02-05 21:22:43 UTC - Antti Kaikkonen: I created my own that instantly returns a dummy record. When I tested the same connector with --processing-guarantees ATLEAST_ONCE I got so high performance that I started to get java heap space out of memory errors so I had to introduce Thread.sleep in the read method.

But I'm wondering if AT_LEAST once should achieve the same guarantee as long as de-duplication is enabled and I'm implementing getPartitionId and getRecordSequence methods of the Record interface?
----
2020-02-05 21:25:12 UTC - Antti Kaikkonen: I'm using 2.4.2 standalone mode.
----
2020-02-05 21:34:32 UTC - Sijie Guo: The exactly-once is implemented in atleast-once with broker de-duplication.  you have to make sure your connector implementation return the partition id and record sequence correctly and consistently,
----
2020-02-05 21:58:12 UTC - Alexander Ursu: Might there be a reason why, or is it just a not so popular choice? I'm lead to believe there's some other reason why it's almost not mentioned at all when I try to search for one.
----
2020-02-05 22:35:53 UTC - Antti Kaikkonen: Yes I have implemented those. My getPartitionId always returns Optional.of("1");  since there is only a single partition in the source. I don't think that I can use parallelism to increase performance since there is only a single source partition and I need to retain orderding.
----
2020-02-05 22:36:45 UTC - Antti Kaikkonen: I tested with ATLEAST_ONCE and the deduplication stopped working so that doesn't seem to be an option either.
----
2020-02-05 22:51:50 UTC - Roman Popenov: Do functions leverage pulsar proxy in any way? Is there some kind of internal load balancing mechanism going on internally between proxies and brokers?
----
2020-02-05 23:23:29 UTC - Sijie Guo: Oh it is just because most of the committers haven’t used  docker swarm before. And we don’t see many requests about deployment to docker swarm. But we are definitely happy to see contributions of deploying to docker swarm. Maybe you can help create a github issue for requesting this feature. So people in the community can help that out.
----
2020-02-05 23:24:57 UTC - Sijie Guo: functions doesn’t rely on broker or proxies directly. functions can talk to a pulsar cluster via a broker service url or a proxy service url.

the function worker implementation leverage pulsar topics and subscriptions for load balancing and message routing.

hope that clarifies your questions.
thanks : Roman Popenov
----
2020-02-05 23:25:21 UTC - Roman Popenov: It does. Thank you
----
2020-02-06 00:08:40 UTC - Guilherme Perinazzo: does the c client expose any way to free a string pointer it allocated?
----
2020-02-06 00:09:08 UTC - Matteo Merli: just regular `free()`
----
2020-02-06 00:11:30 UTC - Guilherme Perinazzo: Okay, guess i'll have to find how to call libc from rust, thanks
----
2020-02-06 00:47:18 UTC - Antti Kaikkonen: Just tested with 2.5.0 standalone and got ~90 msg/s with EFFECTIVELY_ONCE and over 300 000 msg/s with ATLEAST_ONCE.

Here is the source connector that I'm using for testing: <https://pastebin.com/rRkV0mTs>
----
2020-02-06 03:08:24 UTC - Antti Kaikkonen: <https://github.com/apache/pulsar/blob/master/pulsar-functions/instance/src/main/java/org/apache/pulsar/functions/sink/PulsarSink.java#L237>
I think that `future.join()`; is causing this behavior. Is it required for effectively once guarantee? Would it be possible for a message to be added to a topic after a failed message if the messages were sent asynchronously?
----
2020-02-06 07:56:20 UTC - Fernando: How can I migrate my pulsar cluster from one k8s cluster to another? I’ve tried copying the files in the mounted volumes but that doesn’t work. Also tried offloading topics to S3 but that doesn’t work either. It seems there’s some ephemeral data that might be missing in the new cluster, for it to recognize the ledgers and the topics correctly. Any advice?
----
2020-02-06 08:25:05 UTC - Yuvaraj Loganathan: <https://pulsar.apache.org/> website is down ?
----
2020-02-06 08:43:51 UTC - Martin Skogevall: I just tried it, and it seems to work,
----