You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2019/04/26 09:11:04 UTC

Slack digest for #general - 2019-04-26

2019-04-25 09:11:20 UTC - Romain Castagnet: Hi, did you try "namespaces create" without cluster and next "namespaces set-clusters TENANT/NS -c cluster1,cluster2" ?
----
2019-04-25 09:14:15 UTC - stefan: hi, yes this is what i did
----
2019-04-25 09:16:04 UTC - stefan: i tried both ways
----
2019-04-25 09:23:24 UTC - Romain Castagnet: hum strange
----
2019-04-25 09:32:28 UTC - Matti-Pekka Laaksonen: Today I noticed one of our client applications had died, seemingly due to a lost connection. The last log message is:
{"timestamp":"2019-04-25T06:33:19.900Z","level":"WARN","thread":"pulsar-client-io-1-1","logger":"org.apache.pulsar.client.impl.ClientCnx","message":"[10.223.2.164/10.223.2.164:6650] Got exception NativeIoException : syscall:read(..) failed: Connection reset by peer","context":"default"}
----
2019-04-25 09:32:38 UTC - Matti-Pekka Laaksonen: This leads me to <https://github.com/apache/pulsar/blob/branch-2.2/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ClientCnx.java#L219>
----
2019-04-25 09:37:15 UTC - Matti-Pekka Laaksonen: I don't quite understand this case. Normally when the Pulsar connection is lost we catch the exception, close down the application gracefully, and the orchestration service restarts the container after a delay. In this case, however, there is no error or a caught exception, simply a WARN level log message. I'm not familiar with the execution path of the ClientCnx, should the connection die after the state is set to State.Failed?
----
2019-04-25 10:15:13 UTC - Yuvaraj Loganathan: Because the client will retry and establish the connection :thinking_face:
----
2019-04-25 10:35:49 UTC - songxinlei: @songxinlei has joined the channel
----
2019-04-25 11:39:58 UTC - Matti-Pekka Laaksonen: Hmm, might be that the Pulsar client was able to reconnect, but the non-Pulsar parts of the client failed. I'll look into it
----
2019-04-25 12:28:59 UTC - Chris Bartholomew: I wanted to let everyone know that I've built a service based on Pulsar. You can see it here: <https://kafkaesque.io> I am really hoping it helps people get started with Pulsar, testing their client code, etc. A basic account is free and includes an integrated dashboard for admin and monitoring of topics, namespaces, clusters, geo-replication. Would love it if everyone could try it out and give me some feedback. Thanks.
+1 : Sijie Guo, Ezequiel Lovelle, Guy Feldman, Karthik Ramasamy, DT, Ruud Kamphuis
----
2019-04-25 16:00:14 UTC - Grant Wu: What’s the current status of the Docker/k8s runtime for PFs?
----
2019-04-25 16:07:45 UTC - Sijie Guo: @Grant Wu k8s runtime is supported since 2.3.0. The documentation is still missing though :disappointed:
----
2019-04-25 16:08:29 UTC - Grant Wu: :disappointed:
----
2019-04-25 17:43:08 UTC - Devin G. Bost: We increased parallelism for a high-traffic Pulsar function (from 3 to 5), but the data shows that the new function instances aren't getting any traffic. How would we figure out why these instances aren't getting any of the load?
----
2019-04-25 17:44:13 UTC - Matteo Merli: Can you share the topics stats for the topic these are consuming from?

`pulsar-admin topics stats $TOPIC`
----
2019-04-25 17:54:10 UTC - Thor Sigurjonsson: Devin is getting the topic stats ready...
----
2019-04-25 17:55:59 UTC - Thor Sigurjonsson: I guess to add a little color to the conversation, we noticed that the metrics in grafana showed higher latency on 0.999 quantile and wanted to see if we could bring that down, when we deployed parallelism 5 (from 3) we noticed 2 new functions share hosts with 2 "older ones" and there are not metrics being shown either in grafana for those and 0 metrics from the pulsar-admin functions stats call.
----
2019-04-25 17:56:34 UTC - Devin G. Bost: I noticed that the instance with instance_id: "2" is missing from the list of subscriptions.
----
2019-04-25 17:56:49 UTC - Thor Sigurjonsson: those .999 quantile ones are around 100-125ms.
----
2019-04-25 17:57:23 UTC - Matteo Merli: It seems all 5 consumers (1 per function instance) are consuming at ~32 msg/s
----
2019-04-25 17:59:03 UTC - Matteo Merli: In the stats JSON, you have the `msgRateOut` for each consumer and the overall for the subscription
----
2019-04-25 18:02:36 UTC - Thor Sigurjonsson: when I do pulsar-admin functions status on that function I get this for instance 2:
```{
"instanceId" : 2,
"status" : {
"running" : true,
"error" : "",
"numRestarts" : 0,
"numReceived" : 0,
"numSuccessfullyProcessed" : 0,
"numUserExceptions" : 0,
"latestUserExceptions" : [ ],
"numSystemExceptions" : 0,
"latestSystemExceptions" : [ ],
"averageLatency" : 0.0,
"lastInvocationTime" : 0,
"workerId" : "REDACTED-8080"
}```
----
2019-04-25 18:04:36 UTC - Devin G. Bost: I also only count 4 consumers.
----
2019-04-25 18:05:32 UTC - Thor Sigurjonsson: Also instance 3 and instance 2 are on the same host, and instance 2 shows no metrics from prometheus-grafana and instance 4 has ~100ms .999 quantile latency (and no data showing for instance 2). It's roughly twice what other functions report... Made me guess maybe they were being rolled up for the host or something...
----
2019-04-25 18:07:45 UTC - Devin G. Bost: In the JSON output from `pulsar-admin topics stats $TOPIC`, I only see consumers with these instance_id values: 0, 3, 1, 4. (2 is missing.)
----
2019-04-25 18:12:51 UTC - David Kjerrumgaard: Are there any errors in the log for instance 2?
----
2019-04-25 18:16:07 UTC - Thor Sigurjonsson: Full contents of log from instance 2 at about the time of the parallelism update.
----
2019-04-25 18:20:33 UTC - Ruud Kamphuis: Interesting! Question: why does the name include kafka?
----
2019-04-25 18:21:18 UTC - Jerry Peng: @Thor Sigurjonsson there are not more logs for instance-2? If so it seems to be getting stuck.
----
2019-04-25 18:21:37 UTC - Jerry Peng: @Thor Sigurjonsson do you guys have function state enabled?
----
2019-04-25 18:22:38 UTC - Thor Sigurjonsson: Hmm, we do have a log topic set...
----
2019-04-25 18:23:01 UTC - Thor Sigurjonsson: working to see about function state being enabled..
----
2019-04-25 18:25:06 UTC - Thor Sigurjonsson: would that be `stateStorageServiceUrl` in functions_worker.yml? (It's commented out).
----
2019-04-25 18:25:19 UTC - Jerry Peng: yes and gotcha
----
2019-04-25 18:25:47 UTC - Jerry Peng: you guys are running functions via Thread Runtime?
----
2019-04-25 18:26:09 UTC - Thor Sigurjonsson: Yes
----
2019-04-25 18:26:59 UTC - Thor Sigurjonsson: (kerberos jvm params plumbing the kafka connector made us go there for now)
----
2019-04-25 18:29:25 UTC - Jerry Peng: Give me a second to investigate
+1 : Thor Sigurjonsson, Devin G. Bost
----
2019-04-25 18:33:31 UTC - Thor Sigurjonsson: That's like OpenOffice calling their thing Microsofty. :slightly_smiling_face:
----
2019-04-25 18:38:23 UTC - Thor Sigurjonsson: Sorry about being flippant. :slightly_smiling_face: I get that there is a good search marketing angle to get streaming customers.. I'll give it a spin this week and see if I can give useful feedback.
----
2019-04-25 18:48:02 UTC - Ruud Kamphuis: Yeah to me the name is only gonna confuse people. People are looking for pulsar will see the name and think : nope, this is not what I want. People that want hosted kafka find this and think nope, this is not what I want. Just my 2 cents
----
2019-04-25 18:48:26 UTC - Ruud Kamphuis: Great to have a hosted pulsar offering tho! :raised_hands:
----
2019-04-25 18:53:50 UTC - Chethan UK: @Chethan UK has joined the channel
----
2019-04-25 19:04:50 UTC - Jerry Peng: @Devin G. Bost @Thor Sigurjonsson I have reproduced the issue. There was a PR that went in earlier this year that might have be causing race conditions when using the same pulsar client to create consumers as is the case for running functions via ThreadRuntime. I am looking for a fix the issue.

In the meantime, do you guys want to try running with process runtime? We have added the ability to add runtime flags so that your kerberos configs can get passed in. The functionality is not in a official release so you guys can either 1) try to build your own pulsar release from master or 2) you can try a streamlio pulsar release that contains the functionality since we create releases more often that apache does.
+1 : Thor Sigurjonsson
----
2019-04-25 19:05:21 UTC - Devin G. Bost: &gt; I have reproduced the issue. There was a PR that went in earlier this year that might have be causing race conditions when using the same pulsar client to create consumers as is the case for running functions via ThreadRuntime. I am looking for a fix the issue.

Very impressive.
----
2019-04-25 19:06:51 UTC - Jerry Peng: Thanks! Can’t take all the credit. @Matteo Merli also helped
+1 : Devin G. Bost
----
2019-04-25 19:13:11 UTC - Devin G. Bost: Is there a temporary workaround?

We did have some concerns about the memory utilization that might be associated with the process runtime (with parallelization). Would we increase our memory requirements if we parallelized with the process runtime instead of parallelizing with the threading runtime?
----
2019-04-25 19:15:46 UTC - Thor Sigurjonsson: I think there are a few things that go into the decision for us: 1) bug fixes we need, 2) what runtime to "settle on" (and in which cluster maybe) 3) functions support for publishing properties and then the timing and how we roll out the prod env right now being used. We can roll faster in lower environment, but it would be good to pick a release soon that gets the most bang for the buck.
----
2019-04-25 19:16:22 UTC - Thor Sigurjonsson: This parallelism issue is not critical just yet, but it is part of 1) above I think.
----
2019-04-25 19:16:57 UTC - Thor Sigurjonsson: I guess we should consider the streamlio build also going forward.
----
2019-04-25 19:17:51 UTC - Thor Sigurjonsson: I'm guessing much of what we'd need would be in 2.3.2 (I may be wrong).
----
2019-04-25 19:20:08 UTC - Thor Sigurjonsson: Would we be getting those from here <https://hub.docker.com/r/streamlio/pulsar/tags> ? if we were rolling with docker?
----
2019-04-25 19:27:29 UTC - Jerry Peng: @Thor Sigurjonsson yes but its currently does not have the latest image. We are actually in the process of doing a another release. A new image should be up in the next half an hour
+1 : Thor Sigurjonsson
----
2019-04-25 19:28:22 UTC - Jerry Peng: @Devin G. Bost I am not sure of a temporary workaround at this moment, but this issue doesn’t happen everytime. It is a race condition. I am only able to reproduce it once out of the many times I have tried.
----
2019-04-25 19:34:46 UTC - Chethan UK: Has anyone used MongoDB Source connector?
----
2019-04-25 19:45:57 UTC - David Kjerrumgaard: Not yet, are you having issues?
----
2019-04-25 19:47:22 UTC - Chethan UK: <https://pulsar.apache.org/docs/en/io-cdc/>

is there a good tutorial on MongoDB *Source*?
----
2019-04-25 19:47:57 UTC - Ali Ahmed: <https://github.com/bbonnin/pulsar-io-mongo>
----
2019-04-25 19:48:11 UTC - Chethan UK: Its sink, I want source
----
2019-04-25 19:48:49 UTC - Ali Ahmed: sorry the source is just debezium you can try debezium docs
----
2019-04-25 19:54:51 UTC - Chethan UK: Where is the helm chart <https://pulsar.apache.org/docs/en/deploy-kubernetes/#deploying-pulsar-components-helm> ?
----
2019-04-25 19:59:16 UTC - David Kjerrumgaard: @Chethan UK It is bundled with the code. If you close the pulsar repo, then go to /apache/pulsar/deployment/kubernetes/helm/pulsar
----
2019-04-25 19:59:52 UTC - Devin G. Bost: Gotcha.
----
2019-04-25 20:32:34 UTC - Jerry Peng: @Devin G. Bost @Thor Sigurjonsson a new docker release is not available:
<https://hub.docker.com/r/streamlio/pulsa>
----
2019-04-25 20:32:46 UTC - Jerry Peng: sorry: <https://hub.docker.com/r/streamlio/pulsar/tags>
----
2019-04-25 20:43:03 UTC - Devin G. Bost: Thanks!
----
2019-04-25 22:28:54 UTC - Steven Le Roux: Hi, I've deployed a local instance of pulsar, but with separated components (zk, bk)
----
2019-04-25 22:29:44 UTC - Steven Le Roux: Bk seems ok so far (bk shell listbookies, is listing bookies), they're registred into zk properly under /ledgers/available
----
2019-04-25 22:30:16 UTC - Steven Le Roux: but when starting pulsar, it connects to zk, then :
22:03:43.753 [main] ERROR org.apache.bookkeeper.client.BookieWatcherImpl - Failed to get bookie list :
----
2019-04-25 22:30:55 UTC - Steven Le Roux: I can't find where to configure the ledger zk path, but anyway, it defaults to /ledgers which should be fine :
----
2019-04-25 22:30:56 UTC - Steven Le Roux: 22:03:43.583 [main] INFO org.apache.bookkeeper.meta.zk.ZKMetadataDriverBase - Initialize zookeeper metadata driver with external zookeeper client : ledgersRootPath = /ledgers.
----
2019-04-25 22:31:13 UTC - Steven Le Roux: any idea what I'm missing ?
----
2019-04-25 22:31:53 UTC - Matteo Merli: What’s the `zookeeperServers` settings in `broker.conf`?
----
2019-04-25 22:33:24 UTC - Steven Le Roux: zookeeperServers=10.0.0.2:2181/pulsar-local
----
2019-04-25 22:33:54 UTC - Steven Le Roux: I've reduced to one for testing but there are three of them
----
2019-04-25 22:33:55 UTC - Matteo Merli: I see, you’re using a chroot for ZK
----
2019-04-25 22:34:05 UTC - Steven Le Roux: yes
----
2019-04-25 22:34:14 UTC - Matteo Merli: is BK also using the same chroot?
----
2019-04-25 22:34:36 UTC - Steven Le Roux: also, I'm testing to chroot zk so that I can collocalize local zk and global zk for testing purpose
----
2019-04-25 22:35:25 UTC - Steven Le Roux: ok from what you're saying, pulsar is expecting to read /ledgers at /pulsar-local/ledgers then ?
----
2019-04-25 22:35:31 UTC - Matteo Merli: You can co-locate them without needing the chroot
----
2019-04-25 22:35:47 UTC - Matteo Merli: the “global” zk is only using `/admin/` prefix
----
2019-04-25 22:36:02 UTC - Steven Le Roux: ok perfect
----
2019-04-25 22:36:23 UTC - Matteo Merli: &gt; ok from what you’re saying, pulsar is expecting to read /ledgers at /pulsar-local/ledgers then ?

Yes, both Pulsar and BK should share the same chroot
----
2019-04-25 22:36:36 UTC - Steven Le Roux: ok that's why
----
2019-04-25 22:36:39 UTC - Steven Le Roux: thx, testing ;:)
----
2019-04-25 22:36:42 UTC - Matteo Merli: :slightly_smiling_face:
----
2019-04-25 22:43:08 UTC - Steven Le Roux: Far better :wink: thx @Matteo Merli!
+1 : Matteo Merli
----
2019-04-26 00:10:48 UTC - Grant Wu: @Sijie Guo did you figure anything out about <https://github.com/apache/bookkeeper/issues/1970> ?
----
2019-04-26 01:05:37 UTC - Jerry Peng: @Grant Wu I talked with a few users that saw this problem, they all had errors in their bookies when this error was occurring. There wasn’t enough non-faulty bookies in the cluster and that is what is causing this exception.
----
2019-04-26 01:06:58 UTC - Grant Wu: Interesting
----
2019-04-26 01:09:37 UTC - durga: Ok. Thanks @Matteo Merli
----
2019-04-26 02:26:48 UTC - Sijie Guo: @Grant Wu I was looking into that issue before but I didn’t get to the root cause yet. it is still on my backlog. even as what @Jerry Peng there wasn’t enough non-faulty bookies in the cluster, that bookkeeper should handle that. the ArrayIndexOutoOfBoundsException doesn’t sound right to me.

but anyway I will look into it whenever I have time.
----