You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2020/01/09 09:11:03 UTC

Slack digest for #general - 2020-01-09

2020-01-08 09:22:42 UTC - Suhan Duman: @Suhan Duman has joined the channel
----
2020-01-08 13:05:13 UTC - Roman Popenov: I’ve noticed that everywhere in the documentation for Source/Sinks, the JSON configs file examples are given as:
```{
    "bootstrapServers": "pulsar-kafka:9092",
    "groupId": "test-pulsar-io",
    "topic": "my-topic",
    "sessionTimeoutMs": "10000",
    "autoCommitEnabled": false
}```
whereas it should be
```{
  "configs": {
    "bootstrapServers": "pulsar-kafka:9092",
    "groupId": "test-pulsar-io",
    "topic": "my-topic",
    "sessionTimeoutMs": "10000",
    "autoCommitEnabled": false
  }
}```

----
2020-01-08 13:06:57 UTC - Roman Popenov: Did I miss something?
----
2020-01-08 13:25:46 UTC - tuteng: The above is correct, it can be used like this
```bin/pulsar-admin source localrun --archive connectors/pulsar-io-debezium-mysql-{{pulsar:version}}.nar --name debezium-mysql-source --destination-topic-name debezium-mysql-topic --tenant public --namespace default --source-config '{"database.hostname": "localhost","database.port": "3306","database.user": "debezium","database.password": "dbz","database.server.id": "184054","database.server.name": "dbserver1","database.whitelist": "inventory","database.history": "<http://org.apache.pulsar.io|org.apache.pulsar.io>.debezium.PulsarDatabaseHistory","database.history.pulsar.topic": "history-topic","database.history.pulsar.service.url": "<pulsar://127.0.0.1:6650>","key.converter": "org.apache.kafka.connect.json.JsonConverter","value.converter": "org.apache.kafka.connect.json.JsonConverter","pulsar.service.url": "<pulsar://127.0.0.1:6650>","offset.storage.topic": "offset-topic"}'```

thanks : Roman Popenov
----
2020-01-08 13:26:28 UTC - Roman Popenov: I see! Thanks!
----
2020-01-08 14:25:24 UTC - Kevin DETHELOT: @Kevin DETHELOT has joined the channel
----
2020-01-08 14:40:08 UTC - Fernando: Can I make http requests using the `Pulsar Functions SDK` in python?
----
2020-01-08 14:41:32 UTC - Roman Popenov: Shouldn’t be a problem
----
2020-01-08 14:41:56 UTC - Fernando: but how do I use the requests library?
----
2020-01-08 14:42:26 UTC - Roman Popenov: python or java?
----
2020-01-08 14:42:33 UTC - Fernando: ah sorry, python
----
2020-01-08 14:43:15 UTC - Fernando: this is an external dependency
----
2020-01-08 14:45:55 UTC - Roman Popenov: Yeah, I haven’t tried running requests in Python, sorry
----
2020-01-08 14:47:11 UTC - Roman Popenov: I would assume that importing a package using the relative paths would work
----
2020-01-08 14:47:16 UTC - Roman Popenov: But I haven’t tried that
----
2020-01-08 14:48:17 UTC - Fernando: I find it a bit confusing since pulsar is running on kubernetes so I’d have to install dependencies in the broker container which doesn’t sound like good practice
----
2020-01-08 14:48:38 UTC - Roman Popenov: I wouldn’t install dependencies
----
2020-01-08 14:48:57 UTC - Roman Popenov: If you are running it in a broker context, I would just CP the library itself
----
2020-01-08 14:49:25 UTC - Fernando: I see
----
2020-01-08 14:49:31 UTC - Roman Popenov: and then when importing modules, I would use import lib or refer to the modules using relative paths
----
2020-01-08 14:49:56 UTC - Roman Popenov: It doesn’t seem like a very clean solution, but that’s what I done once
----
2020-01-08 14:50:01 UTC - Fernando: ok so basically like deploying lambdas in AWS
----
2020-01-08 14:57:05 UTC - Roman Popenov: Actually, I think you can just import requests
----
2020-01-08 14:57:18 UTC - Roman Popenov: 
----
2020-01-08 14:57:35 UTC - Roman Popenov: requests is an already installed module
----
2020-01-08 14:57:45 UTC - Fernando: interesting
----
2020-01-08 14:57:52 UTC - Fernando: this is in the broker
----
2020-01-08 14:58:06 UTC - Roman Popenov: Yeah
----
2020-01-08 14:58:17 UTC - Fernando: I’ll have a look thanks
----
2020-01-08 14:59:04 UTC - Fernando: you’re right, awsome!
+1 : Roman Popenov
----
2020-01-08 16:09:49 UTC - Adam: @Adam has joined the channel
----
2020-01-08 16:10:24 UTC - Adam: Hi! I'm wondering if state storage from pulsar functions is still in developer preview
----
2020-01-08 16:11:00 UTC - Adam: I noticed that this blog post <https://streaml.io/blog/eda-simple-event-processing> started referencing it back in 2018
----
2020-01-08 16:11:49 UTC - Adam: But this doc seems to imply that it's still in developer preview: <https://pulsar.apache.org/docs/en/functions-state/#__docusaurus>
----
2020-01-08 16:16:35 UTC - Adam: Ah, I see an older message explaining that it is still in developer preview. I have a further question then - in Kafka Streams workers, you can have a transaction between the update to a state store and the consumer's progress (since both are writing to Kafka, and Kafka has transactions). I'm curious if there will be a similar capability for Pulsar functions at some point in the future?
----
2020-01-08 16:17:31 UTC - Adam: And one further question - is there a place that documents what work is remaining to take state storage out of developer preview?  I'm curious if there's any way to pitch in on that effort
----
2020-01-08 16:30:08 UTC - Ryan: Is there a specific reason clients are not allowed to skip to a specific message in a topic, whether via Id or otherwise?Currently, it appears clients have the choice of either starting at the beginning of a stream or at the latest message? After taking a look at the source code, there is support for a subscription to keep track of a client's location within a stream, if a client connects/disconnects, so the client has resume where it left off but the initial connection options appear to be an enum with the above two choices.
----
2020-01-08 16:45:22 UTC - Kohei Watanabe: @Kohei Watanabe has joined the channel
----
2020-01-08 21:40:21 UTC - Mathieu Druart: @Pedro Cardoso I tried the 2.5.0-RC2 version and added
```extraServerComponents: "org.apache.bookkeeper.stream.server.StreamStorageLifecycleComponent"```
to the values-mini.yaml, after the cluster deployment I verified and the property was correct in every  `conf/bookkeeper.conf`  files of the nodes, but when I try a function I still have the same exception `java.lang.IllegalStateException: State is not enabled.` when I try to access the state. Any ideas ? Thanks !
----
2020-01-08 21:43:19 UTC - Pedro Cardoso: @Sijie Guo :point_up: ?
----
2020-01-08 21:55:15 UTC - Julien: @Julien has joined the channel
----
2020-01-08 22:55:06 UTC - Roman Popenov: Anyone has any explanation for the resources values in `values.yaml` for the helm chart?
----
2020-01-08 22:56:15 UTC - juraj: like, why proxy has 4 gb of ram? lol, i don't
----
2020-01-08 22:58:02 UTC - Roman Popenov: And why 4 nodes of bookies with 15 Gi of ram and not 6 with 10 Gi
----
2020-01-08 22:59:14 UTC - Roman Popenov: Do you have any performance metrics for Pulsar?
----
2020-01-08 22:59:25 UTC - Roman Popenov: Or any recommendations?
----
2020-01-08 23:25:17 UTC - juraj: nothing that i could say has been thoroughly empirically validated / battle-tested yet
----
2020-01-08 23:26:39 UTC - Roman Popenov: Any future plans?
----
2020-01-08 23:30:47 UTC - juraj: i have estimated how much roughly to give each component ram/cpu based on total aws/eks node ram and cpu specs, and how much the k8s system components already allocated for themselves..

for the future would be nice to have a tool that would do this automatically and then spit out the values.yaml accordingly
----
2020-01-08 23:31:14 UTC - Roman Popenov: What are your estimates?
----
2020-01-08 23:32:23 UTC - Roman Popenov: I was thinking of starting with:
```Scaled Pulsar Cluster without monitoring
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
---------------------------
Zookeeper:
    3 Nodes:
      memory: 1Gi
      cpu: 1
      volume: 20Gi
---------------------------
---------------------------
Bookkeeper:
    4 Nodes:
      memory: 1Gi
      cpu: 1
      volumes: 
        50Gi ledger 
        50Gi journal
---------------------------
---------------------------
Broker:
    3 Nodes:
      memory: 1Gi
      cpu: 1
---------------------------
---------------------------
Proxy:
    3 Nodes:
      memory: 1Gi
      cpu: 1
---------------------------
---------------------------
Auto-Recovery:
    1 Node:
      memory: 1Gi
      cpu: 250m
---------------------------
---------------------------
Management:
    1 Nodes:
      memory: 250Mi
      cpu: 0.1
---------------------------
---------------------------
Functions +IO:
    1 Nodes:
      memory: 1Gi
      cpu: 0.5
---------------------------
---------------------------
Functions +IO:
    1 Nodes:
      memory: 1Gi
      cpu: 0.5
-------------------------```
----
2020-01-08 23:32:59 UTC - Roman Popenov: Roughly with
RAM ~ 16 Gi
CPU ~ 16
DISK ~50 Gi
----
2020-01-08 23:33:10 UTC - Roman Popenov: And see how it fares
----
2020-01-08 23:33:25 UTC - juraj: i have planned for an EKS cluster of 4 worker nodes (r5d.xlarge) and 1 system node (r5d.large)
----
2020-01-08 23:34:04 UTC - juraj: i'm placing the components using node taints / tolerations and deployment/statefulset affinity rules
----
2020-01-08 23:35:11 UTC - juraj: (i'm not using functions yet, plus they got broken in 2.4.2)
----
2020-01-08 23:35:21 UTC - Roman Popenov: Oh
----
2020-01-08 23:35:25 UTC - Roman Popenov: What broke?
----
2020-01-08 23:35:45 UTC - juraj: the broker :smile:
sweat_smile : Roman Popenov
----
2020-01-08 23:36:15 UTC - Roman Popenov: What is the issue exactly?
----
2020-01-08 23:36:25 UTC - juraj: <https://github.com/apache/pulsar/issues/5818>
----
2020-01-08 23:38:26 UTC - Roman Popenov: Oh yeah, I was working around it
----
2020-01-08 23:39:08 UTC - Roman Popenov: It shouldn’t prevent from using functions
----
2020-01-09 02:26:27 UTC - rmb: Hi all, I have some questions about pulsar producers, specifically in the nodejs library:
• if sending a message fails, what are the possible error messages send() could throw?
• if a broker has deduplication turned on, the docs recommend setting the timeout to -1 --- why is that?  and if there's no timeout, is there some other mechanism for the client to decide that a message has failed?
• the nodejs documentation only lists methods send(), flush(), and close() for the producer.  is there a way to extract the producer's configuration data? (for example, if I want to know the producerName or the lastSequenceId)  those functions seem to be implemented in the other client libraries; is there a reason they're not in the nodejs library?
----
2020-01-09 03:59:13 UTC - vikash: Hello All,
I am also facing the same issue through .net client producer(Pulsar.Client.Api Pulsar.Client, Version=0.12.0.0)

here is the issue link
<https://github.com/apache/pulsar/issues/5454>

java.lang.NullPointerException: null
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java

there it is closed for python pulsar client....but I m facing through .net pulsar client. while sending the message from .net client producer to pulsar topic.
Any alternative solution is there to pass message from .net client producer to IO JDBC sink connector with/without
Schema(Avro or JSON)?
----
2020-01-09 04:19:26 UTC - Sijie Guo: this is just fixed in <https://github.com/apache/pulsar/pull/5930>
----
2020-01-09 04:36:47 UTC - vikash: @Sijie Guo I m looking for .net side client producer to send message to the topic with/without schema to io-JDBC sink...any solution from .net client or websocket side?
----
2020-01-09 05:35:41 UTC - Sijie Guo: I don’t think websocket support schema yet.
----
2020-01-09 05:36:01 UTC - Sijie Guo: There are two .net client availble. I am not sure if the schema is supported or not.
----