You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2018/10/12 09:11:02 UTC

Slack digest for #general - 2018-10-12

2018-10-11 09:23:14 UTC - Martin Svensson: @Martin Svensson has joined the channel
----
2018-10-11 09:26:08 UTC - Martin Svensson: is there a way to create a new subscription to read from the start of the topic like kafka auto.offset.reset earliest?
----
2018-10-11 09:32:18 UTC - Jean-Bernard van Zuylen: @Matteo Merli Unfortunately in my case there is indeed no other way of triggering `PulsarClient.close()` than using a Shutdown Hook when shutting down Tomcat. Shutdown Hooks are only executed by the JVM when all non-daemon threads are stopped and that's where the threads from the Pulsar client executed as non-daemon are getting in the way
----
2018-10-11 09:44:23 UTC - Sijie Guo: @Martin Svensson yes. you can build a consumer with SubscriptionInitialPosition == MessageId.Earliest
----
2018-10-11 09:45:18 UTC - Sijie Guo: <http://pulsar.apache.org/api/client/org/apache/pulsar/client/api/ConsumerBuilder.html#subscriptionInitialPosition-org.apache.pulsar.client.api.SubscriptionInitialPosition->
----
2018-10-11 09:45:37 UTC - Sijie Guo: <http://pulsar.apache.org/api/client/org/apache/pulsar/client/api/SubscriptionInitialPosition.html>
----
2018-10-11 09:45:43 UTC - Martin Svensson: that’s great! is this functionality going to extended to the kafka wrapper do you think? also, does this work when using tiered storage? that would be truly great
----
2018-10-11 09:48:21 UTC - Sijie Guo: &gt; does this work when using tiered storage?

yes it will work with tiered storage. all the pulsar messaging features would work with tiered storage.

&gt; is this functionality going to extended to the kafka wrapper do you think?

it should be straightforward to be extended to kafka wrapper. feel free to file a github issue about this, then we can pick it up. or if you would like to contribute, you are welcome to contribute :slightly_smiling_face: the change should be pretty trivial and straightforward
----
2018-10-11 09:57:38 UTC - Jean-Bernard van Zuylen: To tell you more about my case: we have code on ColdFusion and are currently using a JMS message broker (HornetQ). We want to move to Pulsar. For this I'm currently developing a event gateway (<https://helpx.adobe.com/coldfusion/developing-applications/using-external-resources/using-event-gateways/about-event-gateways.html>) on top of the Pulsar Java client.
+1 : Sijie Guo
----
2018-10-11 10:03:59 UTC - Martin Svensson: thanks a bunch @Sijie Guo!
----
2018-10-11 10:29:13 UTC - Sijie Guo: @Jean-Bernard van Zuylen @Matteo Merli: maybe we can call it `useDaemonThread`, if that is set, pulsar client would use deamon thread for the threads that are spawned by pulsar.
----
2018-10-11 11:24:07 UTC - Jean-Bernard van Zuylen: @Sijie Guo I think @Matteo Merli preferred avoiding another setting and have it automatic. I've submitted a pull request with this approach (<https://github.com/apache/pulsar/pull/2770>). Feedback is welcome :wink:
----
2018-10-11 12:39:08 UTC - John Davenport: what about async consumer? Is there documentation for that?
----
2018-10-11 13:09:02 UTC - Steve Hicks: @Steve Hicks has joined the channel
----
2018-10-11 13:36:17 UTC - Steve Hicks: Hi all.
I'm struggling to find a way to completely delete a schema for a persistent topic so I can introduce incompatible changes. If I just delete the schema using pulsar-admin, I continue to get "Incompatible schema used" in the Java client. If I delete schemas, topics, namespace and the tenant, then recreate tenant/namespace, I get a NPE when trying to create a producer. Short of deleting the entire data directory, is there a way to cleanly remove the schema?
Any help appreciated, thanks.
(Pulsar version is 2.1.0, Java client is 2.1.1-incubating)
----
2018-10-11 16:22:31 UTC - Julien Nioche: Hi, I am looking at the Functions. In case I send the output to a partitioned topic, is there a way I can control which partition the output gets sent to?
----
2018-10-11 16:31:50 UTC - Matteo Merli: @John Davenport Yes, you can setup a listener callback on the consumer: <http://pulsar.apache.org/api/python/#pulsar.Client.subscribe> and pass the `message_listener` function as shown in the example
----
2018-10-11 18:13:25 UTC - Sijie Guo: @Steve Hicks that sounds like a bug. do you mind filing a github issue with some more details (e.g. stacktraces)?
----
2018-10-11 18:23:32 UTC - Sijie Guo: Currently if your input has message key, it will use that as the key for routing the results; otherwise if you enable effectively-once, it will use message id as the key for routing results; otherwise it falls back to use round-robin.

If you want  explicit controls on how results being written to the output topic, we can consider provide some interfaces to let the function submitter tell pulsar how to route the results.
----
2018-10-11 18:28:05 UTC - Victor Li: My Java `pulsar function` keeps running into exception and restarting:
```"18:13:30.321 [worker-stats-updater-1-1] ERROR org.apache.pulsar.functions.worker.FunctionRuntimeManager - Failed to update stats for public/default/hr:1-io.grpc.StatusRuntimeException: UNAVAILABLE: io exception"
"18:13:39.883 [Timer-0] ERROR org.apache.pulsar.functions.runtime.ProcessRuntime - Extracted Process death exception"
"java.lang.RuntimeException: "
"u0009at org.apache.pulsar.functions.runtime.ProcessRuntime.tryExtractingDeathException(ProcessRuntime.java:455) [org.apache.pulsar-pulsar-functions-runtime-2.1.1-incubating.jar:2.1.1-incubating]"
"u0009at org.apache.pulsar.functions.runtime.ProcessRuntime.isAlive(ProcessRuntime.java:442) [org.apache.pulsar-pulsar-functions-runtime-2.1.1-incubating.jar:2.1.1-incubating]"
"u0009at org.apache.pulsar.functions.runtime.RuntimeSpawner$1.run(RuntimeSpawner.java:84) [org.apache.pulsar-pulsar-functions-runtime-2.1.1-incubating.jar:2.1.1-incubating]"
"u0009at java.util.TimerThread.mainLoop(Timer.java:555) [?:1.8.0_181]"
"u0009at java.util.TimerThread.run(Timer.java:505) [?:1.8.0_181]"
```
----
2018-10-11 18:28:23 UTC - Victor Li: Is this a known issue? I am on `2.1.1` release.
----
2018-10-11 18:34:09 UTC - Sijie Guo: @Victor Li it seems like the function worker can’t talk to the forked function instance. it might be something wrong with functions. did you package your function as a fat jar?

there are a couple of options to debug:

- if you are running in standalone, you can check logs/functions/public/default/hr . there are logging files about that function.
- if you are running in cluster mode, you might consider using pulsar-admin to getstatus of that function to find the worker that runs this function and go to that worker to file the log files. or you can consider submit a function with --logTopic to dump the logs to a pulsar topic so we can tail the pulsar topic
- or try using localrun mode to run your functions first.
----
2018-10-11 18:38:26 UTC - Victor Li: I am trying to run a small function, the package is only 11k.
----
2018-10-11 18:38:58 UTC - Victor Li: I am running cluster mode, this is the output:
```
  "functionStatusList": [
    { 
      "numRestarts": "4",
      "instanceId": "1",
      "workerId": "c-pulsar-fw-10.244.2.76-8080"
    },
    { 
      "numRestarts": "4",
      "instanceId": "0",
      "workerId": "c-pulsar-fw-10.244.0.60-8080"
    },
    { 
      "numRestarts": "4",
      "instanceId": "2",
      "workerId": "c-pulsar-fw-10.244.1.51-8080"
    }
  ]
```
----
2018-10-11 18:39:27 UTC - Victor Li: the worker should be the second one, `c-pulsar-fw-10.244.0.60-8080`, how to file the log file?
----
2018-10-11 18:42:03 UTC - Sijie Guo: If you go to that machine, it should be under ${PULSAR_HOME}/logs/functions/&lt;tenant&gt;/&lt;namespace&gt;/&lt;function name&gt;
----
2018-10-11 18:44:46 UTC - Victor Li: I found it. Thanks. It crashed because one `assert` failed.
----
2018-10-11 18:50:16 UTC - Victor Li: In addition, I am deploying everything on kubernetes, is there a way to redirect the function logs to container so I can ship it out of the host? @Sijie Guo
----
2018-10-11 19:20:05 UTC - Matteo Merli: @Julien Nioche You can also use context object and publish directly to a specific partition, eg. `my-topic-partition-0`
----
2018-10-11 21:09:58 UTC - John Davenport: Ok I think I can make that work . . . Just set an asyc callback
----
2018-10-11 22:08:56 UTC - Sijie Guo: @Victor Li :

if you are using threaded mode, it is easier you can change `pulsar.log.appender` to console <https://github.com/apache/pulsar/blob/master/conf/log4j2.yaml#L33>

however if you are using process mode, it is a bit tricky, since functions will be running as separate processes than function workers. I don’t think kubernetes can actually handle stdout/stderr in other process. so you might need integration with some log agent (e.g. on cloud integrating with cloud driver) to ship the log files.

but if you run everything in kubernetes, you might consider using the kubernertes runtime, which is available on master. in that way, each function will be running as one k8s pod, where we can configure dumping the function logs to console by default. the k8s runtime will be available as part of 2.3 release.
----
2018-10-11 22:21:09 UTC - Victor Li: I am on 2.1.1, I did not see anything about thread/process mode. I guess mine is running as process mode as I kind of saw a new java process.
----
2018-10-11 22:21:56 UTC - Victor Li: Sounds good to me, this is not import to me now, I guess running in a pod might make things straightforward.
----
2018-10-11 22:22:25 UTC - Grant Wu: Victor what are you trying to do with functions…
----
2018-10-11 22:22:49 UTC - Victor Li: If I have tens of functions, can I run all of them in one pod in future 2.3 release?
----
2018-10-11 22:22:55 UTC - Victor Li: @Sijie Guo
----
2018-10-11 22:23:32 UTC - Grant Wu: @Victor Li <https://apache-pulsar.slack.com/archives/C5Z4T36F7/p1538060927000100> by the way this may be useful to you
----
2018-10-11 22:27:42 UTC - Victor Li: Interesting, this is what I read from the document: ```Message deduplication is a feature of Pulsar that, when enabled, ensures that each message produced on Pulsar topics is persisted to disk only once, even if the message is produced more than once.```
----
2018-10-11 22:28:07 UTC - Victor Li: I thought `Pulsar topics` meant across topics.
----
2018-10-11 22:29:50 UTC - Grant Wu: It should probably be replaced with “a Pulsar topic”
----
2018-10-11 22:30:07 UTC - Sijie Guo: &gt; If I have tens of functions, can I run all of them in one pod in future 2.3 release?

good question, currently we don’t support that mode.
but it is kind of like `FunctionGroup`, grouping functons together to run in a pod? /cc @Sanjeev Kulkarni for his thoughts here
----
2018-10-11 22:30:44 UTC - Sijie Guo: by default, it is process mode :slightly_smiling_face:
----
2018-10-11 22:32:46 UTC - Victor Li: It is fine to make each function a container inside a common pod.
----
2018-10-11 22:37:17 UTC - Sanjeev Kulkarni: @Victor Li Currently each function instance runs as one pod
----
2018-10-11 22:38:24 UTC - Sanjeev Kulkarni: Let me think about FunctionGroup concept that you floated. Essentially the idea is that instead of just one, you could have multiple functions containers in a pod.
----
2018-10-11 22:39:11 UTC - Victor Li: yes
----
2018-10-11 22:39:42 UTC - Sanjeev Kulkarni: Should’nt be too hard, the trick is what happens if we have different pods. Consider --paralllism of 10 but a FunctionGroupSize of 3. 3 pods will have 3 functions, and 1 pod will have 1 function
----
2018-10-11 22:56:59 UTC - Kendall Magesh-Davis: @Kendall Magesh-Davis has joined the channel
----
2018-10-11 23:44:23 UTC - Aaron Langford: I caught a presentation on Pulsar at a Seattle meetup a week or so ago. There was a slide on companies who are using Pulsar in production, wondering if someone knows where that slide is, or if someone knows a solid list of companies who are using pulsar in production?
----
2018-10-12 08:37:44 UTC - Steve Hicks: Will do, thanks.
----