You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2020/02/22 09:11:04 UTC

Slack digest for #general - 2020-02-22

2020-02-21 09:21:00 UTC - Steven Op de beeck: @Devin G. Bost There's a way around it, but I fear that's a debezium-kafka thing, and not supported in Pulsar. <https://debezium.io/documentation/reference/1.0/configuration/outbox-event-router.html>
----
2020-02-21 10:16:56 UTC - Jon Bennett: @Jon Bennett has joined the channel
----
2020-02-21 10:19:57 UTC - Jon Bennett: hi - anyone using the official (cgo) Go client in Docker-deployed apps?
----
2020-02-21 13:31:45 UTC - Manuel Mueller: hello,
we are currently looking into functions - especially "state" functions. We are experiencing some weird results, where the system is running but after a couple of hours it starts freezing and breaks the whole system. Logs start to show that the health check goes bad - it starts to refuse the connections and such. It would be great if any of you could share your feedback on "how to enable state functions" or your experience in general, maybe we missed something crucial.
We already ran into the python3 vs python2 "bug" which we fixed (symlinked). The current tests are being run in a kubernetes setup as well as standalone.
In addition the REST API goes down as well and becomes inresponsive (still not sure how they are connected)
----
2020-02-21 13:32:13 UTC - Manuel Mueller:
----
2020-02-21 13:32:45 UTC - Roman Popenov: Are those function logs? Are you running out of heap or memory?
----
2020-02-21 13:36:35 UTC - Manuel Mueller: currently it does not feel like it would be a memory related issue, so far the logs did not indicate anything specific
----
2020-02-21 13:36:56 UTC - Roman Popenov: Perhaps stale client connections?
----
2020-02-21 13:37:12 UTC - Roman Popenov: jps/ps showing any hanging processes?
----
2020-02-21 13:38:27 UTC - Roman Popenov: Do you have grafana enabled to see some metrics?
----
2020-02-21 13:39:59 UTC - Manuel Mueller: in the kubernetes setup we have it - would you recommend us to check for memory things or just in general to check overal system performance?
----
2020-02-21 13:40:29 UTC - Roman Popenov: I would check to see what happening with the Pulsar cluster itself first
----
2020-02-21 13:40:48 UTC - Mikhail Veygman: Is there a way to configure Pulsar to allow message forwarding before it is stored to disk on a persistent topic?
----
2020-02-21 13:46:12 UTC - Manuel Mueller: so far the grafana dashboard is rather inconclusive for us. After the function is deployed, it seems to communicate on its port where at one point it stops with the message "serving on port .." at the same time, we can not use the REST API any more, which hangs as well
----
2020-02-21 13:48:06 UTC - Roman Popenov: Sounds like an networking issue :thinking_face:
----
2020-02-21 13:48:20 UTC - Manuel Mueller: My hunch tells me - it is somehow the "table / state service" bugging out - but I am not sure how to debug this
----
2020-02-21 13:51:38 UTC - Roman Popenov: And what is the status of the function if you check through admin-cli?
----
2020-02-21 13:52:40 UTC - Roman Popenov: There are also functions logs somewhere
----
2020-02-21 13:53:06 UTC - Roman Popenov: `workerId/logs/functions/` don’t remember where it is exactly now
----
2020-02-21 13:54:41 UTC - Roman Popenov: <http://pulsar.apache.org/docs/en/functions-debugging/#debug-with-localrun-mode>
----
2020-02-21 14:03:56 UTC - Chris Bartholomew: In k8s, the function logs are under `/pulsar/logs/functions` where the functions are running (broker or function worker). If the function is having trouble connecting to the state server (bookkeeper) then you will likely see that in the logs for the function. The log message from above (in broker or function worker) indicates the health check could not connect to the function, which probably means it is not running (crashed). The function log should shed some light on that.
----
2020-02-21 14:18:34 UTC - Ming: We have apps using both cgo and native go library deployed in docker/k8s. We'll migrate cgo to native go. Native go may lack some features that cgo and java client provide.
----
2020-02-21 14:27:16 UTC - Jon Bennett: hi @Ming sorry for the delay!
----
2020-02-21 14:27:53 UTC - Jon Bennett: I’m trying to build the container, but keep getting missing header file errors, is there a trick I’m missing?

```# <http://github.com/apache/pulsar/pulsar-client-go/pulsar|github.com/apache/pulsar/pulsar-client-go/pulsar>
In file included from ../pkg/mod/github.com/apache/pulsar/pulsar-client-go@v0.0.0-20200128093721-d42cfa15ab11/pulsar/c_client.go:24:0:
./c_go_pulsar.h:22:29: fatal error: pulsar/c/client.h: No such file or directory
#include &lt;pulsar/c/client.h&gt;```

----
2020-02-21 14:28:33 UTC - Jon Bennett: I’ve tried vendoring on/off, and using the 3rd party `vend` application, same error each time.
----
2020-02-21 14:31:08 UTC - Ming: You need add c library. Here is our docker file <https://github.com/kafkaesque-io/pulsar-beam/blob/master/Dockerfile>
----
2020-02-21 14:31:46 UTC - Jon Bennett: @Ming for native, are you using the one from Comcast? I was looking at that, with the cpp official client, you set a `message.Key` and `message.Properties`, which I’ve not been able to see a way to do with the native client.
Without a key, it’s unclear if things like topic compaction would work, how would Pulsar know that a message is a newer version?
----
2020-02-21 14:32:25 UTC - Jon Bennett: @Ming thanks for Dockerfile, I’ll read and be back to you shortly.
----
2020-02-21 14:37:29 UTC - Devin G. Bost: You could easily create a sink that writes the data to a desired location before storing on a persistent topic. What exactly is meant by "forwarding"?
----
2020-02-21 14:39:04 UTC - Devin G. Bost: Functions with state are in developer preview and are not ready for production use. There are a few tests that fail intermittently and are currently under investigation.
----
2020-02-21 14:41:05 UTC - Devin G. Bost: My team use a cache layer (Apache Ignite) or stateful compute (Akka or Apache Flink, depending on the workload) to work around the issue.
----
2020-02-21 14:43:49 UTC - Ming: The example I gave uses pulsar's cgo library. The native go library supports ProducerMessage's key and properties. <https://github.com/apache/pulsar-client-go/blob/fc390a6a37f3cbd94ac46b3b5e4239b3ca5df875/pulsar/message.go#L31>
----
2020-02-21 14:44:41 UTC - Devin G. Bost: It would be extremely helpful if the logs and details here could be put into a Github Issue so we can have a permanent record and track progress on the issue and link your experience to any current issues.
----
2020-02-21 14:55:30 UTC - Chris Bartholomew: @Devin G. Bost in the case where you use Apache Ignite for state storage, do you package Ignite client in with the Pulsar function?
----
2020-02-21 15:05:31 UTC - Devin G. Bost: We created an Ignite Sink for that.
----
2020-02-21 15:05:47 UTC - Devin G. Bost: I'm working on getting it open sourced.
+1 : Chris Bartholomew, David Kjerrumgaard
----
2020-02-21 16:38:04 UTC - Mikhail Veygman: Message is sent to Pulsar. Pulsar forwards it then records it.
----
2020-02-21 16:38:24 UTC - Mikhail Veygman: So that if you look to replay topic from the beginning you can do it.
----
2020-02-21 17:07:49 UTC - David Kjerrumgaard: @Pushkar Sawant it depends on the underlying issue for the write quorum failure. Did you lose a bookie? Is one of the bookie disks full, etc?
----
2020-02-21 17:58:54 UTC - Pushkar Sawant: I have a cluster with Write Quorum set to 2. We had an issue with one of the bookies with journal directory filling up. I was in process of recovering the data from the node. While that node was recovering, another node went down with ledger directory full. It could not come back up with an error “Exception while replaying journals, shutting down”. I effectively lost all ledgers that were shared between these two nodes. To recover the topics that were shared on these two nodes, I tried to delete the topics but always received 500 internal server error. Only way i could recover cluster was to create a new cluster and migrate to it.
----
2020-02-21 18:30:10 UTC - David Kjerrumgaard: So you lost two bookies in succession which left you with one active bookie and a write quorum of 2.
----
2020-02-21 18:34:29 UTC - David Kjerrumgaard: In such a scenario, exceptions will be raised to the producers in order to make them stop sending messages, which is what we want in order to prevent data loss. As far as "fixing" the issue there are a few options; first was/is increasing the size of the journal directory if you are using expandable storage such as Logical Volume Management (LVM) or Amazon ESB.
----
2020-02-21 18:34:31 UTC - Rolf Arne Corneliussen: @Antti Kaikkonen Yes, you are right, I can imagine concurrency could be problematic with a heap.

Anyway, I just tried out subscribing to a partitioned topic with the Pulsar Java client, subscription type = Failover, and then the partitions were distributed among the different consumers (with same subscription), resembling a Kafka consumer group. If you register a `ConsumerEventListener` when building the `Consumer`, you will get callbacks when partitions get active/inactive for a consumer. The callbacks will be on a listener thread (e.g. 'pulsar-external-listener-3-4'). So I was wrong - 'consumer groups' with callbacks can be run on Pulsar.
ok : Antti Kaikkonen
+1 : Antti Kaikkonen
----
2020-02-21 18:36:33 UTC - David Kjerrumgaard: If you fail to do that in time, you can/should introduce a new bookie into the cluster to provide additional storage capacity (this is especially true when you lost the second bookie). That would really be your only course of action to prevent any further downtime.
----
2020-02-21 18:40:01 UTC - David Kjerrumgaard: once you got into a single bookie state, you would need to add more bookies, such that `EnsembleSize &gt;= Write Quorum &gt;= Ack Quorum` . Then you could start decommissioning the "lost" bookie nodes to offload the data onto the newly added bookie(s). <https://bookkeeper.apache.org/docs/latest/admin/decomission/>
----
2020-02-21 19:03:30 UTC - Jon Bennett: @Ming ahh, great. Do you have an example of creating a ProducerMessage with Key/Props using the native client?
----
2020-02-21 19:25:13 UTC - Devin G. Bost: You could write the message to two topics, one that sends to your desired location and one that writes to persistent storage via a sink. You could also setup retention with tiered storage.
----
2020-02-21 21:13:05 UTC - Ruwen: @Ruwen has joined the channel
----
2020-02-21 21:24:04 UTC - Ruwen: Hi. Are there any plans for a Java 11 Docker image? Asking because I tried to upload a function/jar compiled with Java 11 which (obviously) failed
----
2020-02-21 21:27:33 UTC - Ali Ahmed: it’s a matter of time, we will probably move to default java11 base image and deprecate java8, we can consider this for the 2.7 release
----
2020-02-21 21:28:03 UTC - Roman Popenov: Are there instructions how to build with Java 11? I am trying to do this now
----
2020-02-21 21:29:34 UTC - Roman Popenov: Anyone successfully built a docker image using Java 11?
----
2020-02-21 21:29:50 UTC - Ruwen: if you try it anyway: <https://github.com/apache/pulsar/blob/master/docker/pulsar/Dockerfile> swap out the base image
----
2020-02-21 21:29:58 UTC - Ali Ahmed: we use jdk11 internally
----
2020-02-21 21:30:05 UTC - Ruwen: and let me know how it turned out :wink:
----
2020-02-21 21:33:35 UTC - Jon Bennett: Are there plans to patch the Homebrew formula for local MacOS installs?
----
2020-02-21 21:34:40 UTC - Ali Ahmed: which homebrew this one ?
<https://github.com/streamlio/homebrew-formulae/blob/master/pulsar.rb>
----
2020-02-21 21:38:46 UTC - Devin G. Bost: What's up with it? Is it not working again?
----
2020-02-21 22:06:28 UTC - Roman Popenov: I keep seeing the following error while trying to run build.sh in <https://github.com/apache/pulsar/blob/master/docker/build.sh>
----
2020-02-22 00:04:39 UTC - Roman Popenov: Seems like this cannot be run: <https://github.com/apache/pulsar/blob/master/pulsar-client-cpp/docker/build-wheel-file-within-docker.sh#L25>
```Permission denied```
Do I need to set privileged to `docker run` command or is it something else?
----
2020-02-22 00:17:37 UTC - Jon Bennett: no, apologies, the libpulsar package <https://formulae.brew.sh/formula/libpulsar>
----
2020-02-22 00:21:40 UTC - Ali Ahmed: will try to get it updated
----
2020-02-22 00:40:24 UTC - Mikhail Veygman: That flies in the face of what I am doing. I am trying to make it faster instead of doing double the work.
----
2020-02-22 00:56:47 UTC - Pushkar Sawant: I have total 6 bookies. EmsembleSize, Write Quorum and Ack Quorum is 2. I expanded the storage on the nodes but they bookie there didn’t start on these two bookies with the error “”Exception while replaying journals, shutting down”. Also 3 new bookies were added to cluster. After the expansion, a small subset of topic has an error for “bookie handle not available” which i believe were on these two nodes which couldn’t start because of the error. As both copies were not accessible the decommission command did not work for me.
----
2020-02-22 00:57:48 UTC - Roman Popenov: It appears that my `Permission denied` was caused because I manually built the python and cpp client and the files were created with different permissions/users. Running `git clean -fdx` solved the issue
+1 : Devin G. Bost
----
2020-02-22 01:03:33 UTC - Pushkar Sawant: I was trying to delete and recreate the topics but that also resulted in 500 internal server error because of “bookie handle not available”
----
2020-02-22 02:44:29 UTC - Joe Francis: What are you doing? :slightly_smiling_face: What happens if the message is fwded and Pulsar cannot persist it? Pulsar guarantees delivery. That is, Pulsar acknowledges a message is published, only after its persisted.
----
2020-02-22 06:06:58 UTC - Devin G. Bost: @Mikhail Veygman

By "faster," are you talking about latency? If so, my recommendation still holds. We have long pipelines in production with many steps that are processing tens of thousands of messages per second, and adding a sink to a function like that will hardly add 15 milliseconds to your total latency. If your application is really that latency sensitive, then you are going to have a lot more to rework beyond adding a sink.

There are many good blog posts and videos about Pulsar's architecture. I recommend that you study them. It should help your understanding a lot.
----
2020-02-22 06:07:48 UTC - Devin G. Bost: @Joe Francis "Persistence" could refer to two different things in this context.
----
2020-02-22 06:12:38 UTC - Devin G. Bost: @Mikhail Veygman

It seems like you're expecting all of these operations to be synchronous and thread blocking. Pulsar is designed with parallelism and asynchronous I/O in mind. Think of it like a tree of dominoes falling. You can kick off many operations in parallel from a single event.
----
2020-02-22 06:18:28 UTC - Devin G. Bost: If you want to replay, there are multiple ways of doing that. For example, you could persist messages in external storage like Apache Ignite. Or, you could setup data retention with Apache Bookkeeper. You can also tier storage, but it sounds like you need all your data storage to be very fast. At the latencies it sounds like you're needing, I'd think that disk I/O would be too slow for you and that you'd need hot storage in a memory-only cache.
----