You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2020/03/12 09:11:04 UTC

Slack digest for #general - 2020-03-12

2020-03-11 09:20:45 UTC - Steven Le Roux: No since Pulsar needs working softwares :=) Seriously, if you really don't want to go with ZK, you could try zetcd which is a ZK API in front of etcd. IMO you shouldn't do this. Zookeeper is a great peace of software that survived Jepsen more than etcd. And working on Etcd for K8S really make me want to replace etcd everywhere it's possible.
----
2020-03-11 09:22:02 UTC - Ali Ahmed: @xue sort of there is work underway to make the metadata store pluggable , it’s not complete yet , once done etcd or any baking store can be used.
----
2020-03-11 09:22:12 UTC - Ali Ahmed: currently zk is required
----
2020-03-11 09:22:16 UTC - Florentin Dubois: Hi Xue, why do you want to use etcd? For operating both in production, zookeeper is a way better in all categories. Besides, It generate no operating tasks compare to etcd.
----
2020-03-11 09:25:14 UTC - xue: I see that bookkeeper supports etcd after version 4.9.0, so I want to confirm whether pulsar cluster supports etcd
----
2020-03-11 09:32:43 UTC - Florentin Dubois: You could use an etcd cluster for one bookkeeper cluster, but pulsar need a dedicated zookeeper cluster for sharing namespaces, configurations and so on... accross pulsar clusters
----
2020-03-11 09:34:16 UTC - xue: I see, thank you!
----
2020-03-11 10:07:35 UTC - Abhilash Mandaliya: hi
does pulsar connector handle crash? I mean let’s say I am having my own sink connector and somehow their occurred an error which crashed my java client. Will pulsar know about this and send all the messages starting from that message or it will only give messages from the time I start my sink again?
----
2020-03-11 11:54:12 UTC - Ildefonso Junquero: @Ildefonso Junquero has joined the channel
----
2020-03-11 12:10:21 UTC - Ian: Thank you @Sijie Guo and @Joe Francis
----
2020-03-11 12:13:52 UTC - Ildefonso Junquero: Hello, I'm starting to play with pulsar functions, and I haven't found any way of specify the subscription initial position. I have investigated the source code in FunctionConfigUtils.convert, and I see there is no way to retrieve that configuration information and setup the final PulsarConsumerConfig. I have taken a look at LocalRunner.start which uses FunctionConfigUtils.convert(functionConfig, classLoader). This method should configure the subscriptionPosition in sourceSpec which is used later in JavaInstanceRunnable,setupInput to setup pulsarSourceConfig.

Now, my question is: is this a missing feature or is there any suggestion to avoid a function to consume messages since Latest, and force it to consume since Earliest?
eyes : Konstantinos Papalias
----
2020-03-11 12:17:48 UTC - Ildefonso Junquero: Another topic I'd like to comment is that Pulsar SQL (presto) does not support topics having names with uppercase characters. For instance, if you create a topic named MyTopic, you can subscribe and presto shows there is a mytopic table, but I haven't found a way to query that topic because it always return a table not found. If I create the topic mytopic (all lowercase) it works.

I have tried different options in the SQL syntax, with no success. Any suggestion?
----
2020-03-11 14:11:54 UTC - Michael Kaufman: @Michael Kaufman has joined the channel
----
2020-03-11 15:05:53 UTC - David Kjerrumgaard: I would recommend using unit tests for most of the testing, with local development. This allows me to set breakpoints inside the debugger. Once I move to localrun mode, I use LOG statements to trace the flow of messages that are problematic.
----
2020-03-11 15:29:19 UTC - Ming: I only speak for MySQL experience. MySQL can support mix case. This is the document how MYSQL supports mix case <https://dev.mysql.com/doc/refman/5.6/en/identifier-case-sensitivity.html> If you read carefully, MySQL case sensitivity support depends on underline OS. If you would like make your SQL portable, I would strongly against camel case, that is why SQL databases commonly use underscore. In Pulsar SQL, I switch everything to lowercase.
+1 : Ildefonso Junquero
----
2020-03-11 16:27:45 UTC - Pierre-Yves Lebecq: When localrun mode works correctly, which is not my case when trying to use state. :sweat_smile: Unit testing is a good suggestion though. Thank you for your help.
----
2020-03-11 16:29:29 UTC - David Kjerrumgaard: I haven't tried debugging with localrun mode, so I wouldn't be of much help in fixing that for you. However, I am confident that it can be done, and that it is more of a documentation issue than a technical issue. :smiley:
----
2020-03-11 16:34:38 UTC - Pierre-Yves Lebecq: For sure. I’m not familiar with the Java world, it’s the first time I try to run something in Java I know I’m missing some knowledge about running java code and packaging it in a jar file, etc. I find it quite difficult to get into. There are a lot of things to learn. It’s not Pulsar’s fault but for sure the docs are not beginner friendly! Anyway, I really appreciate you took some time to help me. Cheers!
+1 : David Kjerrumgaard
----
2020-03-11 17:00:11 UTC - John G: @John G has joined the channel
----
2020-03-11 17:08:57 UTC - Ildefonso Junquero: Understood and agreed. Anyway, In my mind I created a topic, not a table, and I never read anything telling that the topic shouldn't use capital letters in the name due to a "conflict" with Pulsar SQL (Presto). But at the end, I reached the conclussion of avoiding capital letters in topics. I think this could be explained in the pulsar doc to warn future users.
----
2020-03-11 17:35:34 UTC - Antti Kaikkonen: I had this same issue some weeks ago and found this solution:
1. `./bin/pulsar-admin topics create-subscription --messageId earliest --subscription testsub <persistent://public/default/topicname>`
2. Create your function using `--subs-name testsub`
+1 : Ildefonso Junquero
----
2020-03-11 17:36:06 UTC - Antti Kaikkonen: --messageId can also be 'latest' or (ledgerId:entryId)
----
2020-03-11 17:40:41 UTC - Antti Kaikkonen: &gt; Now, my question is: is this a missing feature or is there any suggestion to avoid a function to consume messages since Earliest, and force it to consume since Latest?
Isn't latest the default?
----
2020-03-11 18:01:05 UTC - Ildefonso Junquero: Yes, my mistake. It should say avoid ... Latest and force it to consume Earliest. Thank you. Original message corrected.
----
2020-03-11 18:12:43 UTC - Ildefonso Junquero: I have tested your workaround and it works! Thank you. :star-struck:
----
2020-03-11 18:33:20 UTC - Antti Kaikkonen: No problem. --subs-name may not be needed if you create the subscription with the name that the function uses by default, but i'm not sure what is that.
----
2020-03-11 19:17:50 UTC - Alexander Ursu: Been running mysql jdbc sink connectors on some topics, but one seems to be showing this error in the logs, not sure what it means.
```19:16:25.352 [pool-5-thread-1] ERROR <http://org.apache.pulsar.io|org.apache.pulsar.io>.jdbc.JdbcAbstractSink - Got exception 
java.lang.NullPointerException: null
	at org.apache.pulsar.client.impl.schema.generic.GenericJsonRecord.getField(GenericJsonRecord.java:49) ~[pulsar-client-original-2.5.0.jar:2.5.0]
	at <http://org.apache.pulsar.io|org.apache.pulsar.io>.jdbc.JdbcAutoSchemaSink.bindValue(JdbcAutoSchemaSink.java:63) ~[pulsar-io-jdbc-2.5.0.nar-unpacked/:?]
	at <http://org.apache.pulsar.io|org.apache.pulsar.io>.jdbc.JdbcAbstractSink.flush(JdbcAbstractSink.java:200) ~[pulsar-io-jdbc-2.5.0.nar-unpacked/:?]
	at <http://org.apache.pulsar.io|org.apache.pulsar.io>.jdbc.JdbcAbstractSink.lambda$open$0(JdbcAbstractSink.java:108) ~[pulsar-io-jdbc-2.5.0.nar-unpacked/:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_232]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_232]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_232]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_232]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_232]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_232]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
19:16:25.352 [pool-5-thread-1] ERROR <http://org.apache.pulsar.io|org.apache.pulsar.io>.jdbc.JdbcAbstractSink - Update count 0  not match total number of records 7```
----
2020-03-11 19:18:39 UTC - Alexander Ursu: If helpful, these are the stats of the sink
```{
  "numInstances" : 1,
  "numRunning" : 1,
  "instances" : [ {
    "instanceId" : 0,
    "status" : {
      "running" : true,
      "error" : "",
      "numRestarts" : 0,
      "numReadFromPulsar" : 28267,
      "numSystemExceptions" : 0,
      "latestSystemExceptions" : [ ],
      "numSinkExceptions" : 0,
      "latestSinkExceptions" : [ ],
      "numWrittenToSink" : 28267,
      "lastReceivedTime" : 1583954291838,
      "workerId" : "c-pulsar-cluster-1-fw-475eab276fee-8080"
    }
  } ]
}```
----
2020-03-11 19:27:33 UTC - Antti Kaikkonen: I'm trying to deploy a single node bare metal cluster and I'm getting
```Exception in thread "Thread-3" java.lang.IllegalStateException: State is not enabled.
        at com.google.common.base.Preconditions.checkState(Preconditions.java:507)
        at org.apache.pulsar.functions.instance.ContextImpl.ensureStateEnabled(ContextImpl.java:262)
        at org.apache.pulsar.functions.instance.ContextImpl.putStateAsync(ContextImpl.java:299)
...```
I also have `extraServerComponents=org.apache.bookkeeper.stream.server.StreamStorageLifecycleComponent`
in bookkeeper.conf
----
2020-03-11 19:43:27 UTC - Tim Corbett: @Tim Corbett has joined the channel
----
2020-03-11 20:50:07 UTC - David Kjerrumgaard: @Alexander Ursu It appears that there is a mismatch between the DB schema and the incoming JSON based data.
----
2020-03-11 21:03:55 UTC - Kirill Merkushev: Hello, does anybody know how to cleanup stale subscriptions from the pulsar stats? I’ve already unsubscribed them and cleaned the backlog, but its still there when calling `./pulsar-admin topics stats` or `/metrics` endpoint. How to get rid of them?
----
2020-03-11 21:07:10 UTC - Kirill Merkushev: Also maybe someone knows if there is a way to configure pulsar ns/tenants/topics via code with terraform/yml style config, so it could be like a control plane for Envoy and could be stored in github, versioned and automated on the template generation layer, but handle the difference automatically?
----
2020-03-11 21:10:21 UTC - Chris Bartholomew: Did you delete them with the admin CLI?
```bin/pulsar-admin topics unsubscribe
The following option is required: -s, --subscription 

Delete a durable subscriber from a topic. 
		The subscription cannot be deleted if there are any active consumers attached to it 

Usage: unsubscribe [options] <persistent://tenant/namespace/topic>
  Options:
  * -s, --subscription
       Subscription to be deleted```

----
2020-03-11 21:10:39 UTC - Kirill Merkushev: yep
----
2020-03-11 21:11:41 UTC - Chris Bartholomew: And do you have consumers connected to the topic that might be recreating them?
----
2020-03-11 21:12:52 UTC - Kirill Merkushev: no, thats for sure
----
2020-03-11 21:13:45 UTC - Kirill Merkushev: actually those subs were created by a script with uuid as a name of a consumer, which then disconnected at some point
----
2020-03-11 21:14:15 UTC - Kirill Merkushev: so I cleaned the traces with cli, but can’t get rid of the stats
----
2020-03-11 21:15:26 UTC - Chris Bartholomew: Hmm...that's strange. That works to get subscriptions out of the stats output for me. I use the REST API, but that is what the CLI is doing behind the scenes.
----
2020-03-11 21:15:47 UTC - Kirill Merkushev: btw topic is partitioned and these subscriptions are invisible on the topic name without partition
----
2020-03-11 21:18:23 UTC - Kirill Merkushev: okay, found the issue, I should unsubscribe it from each partition individually, thanks, good to have a place to ask questions :smile:
----
2020-03-11 21:29:52 UTC - Alexander Ursu: The only issue I can think of are maybe null values being sent for some keys in the json data, but even column in the mysql schema for the table is nullable by default, so this should be fine right? It doesn't seem to mention what specific column or key is causing the issue so it's hard to say.
----
2020-03-11 21:41:47 UTC - Greg Methvin: Can pulsar do deduplication based on a message key? I essentially want topic compaction but where it discards subsequent messages if the key is the same as an existing message. Perhaps what I’m describing can be done in some other way though.
----
2020-03-11 21:49:17 UTC - David Kjerrumgaard: Do you have the types defined as "optional" in the JSON schema?  (I am assuming it is JSON and not Avro converted to JSON)
----
2020-03-11 22:14:36 UTC - Kirill Merkushev: hash map? :slightly_smiling_face:
----
2020-03-11 22:23:58 UTC - Greg Methvin: yes, you could use some kind of distributed hash map, or redis
----
2020-03-11 22:25:25 UTC - Greg Methvin: but it’d be convenient to have it done by pulsar so we don’t have to coordinate state in two different places
----
2020-03-11 22:36:26 UTC - Kirill Merkushev: bookkeeper directly then? Pretty sure it can serve this purpose, since in pulsar functions there is a context which handles key-value case
----
2020-03-11 23:07:07 UTC - Sijie Guo: it seems that load balancing was triggered and namespace bundles are offloaded.
----
2020-03-11 23:07:24 UTC - Sijie Guo: Can you check your cpu usage?
----
2020-03-11 23:08:39 UTC - Andy Papia: Yeah I think it was an overload. I've moved on.
----
2020-03-11 23:09:30 UTC - Sijie Guo: <https://github.com/streamnative/terraform-provider-pulsar>
----
2020-03-11 23:10:04 UTC - Sijie Guo: We have developed a terraform provider for provisioning tenants/namespaces/topics. Not sure if that is something you are looking for.
----
2020-03-11 23:13:34 UTC - Sijie Guo: This sounds a simple feature to add to topic compaction. the current implementation of topic compaction overwrites keys and what you want is to drop keys if key exists.

It should be simple to introduce a flag to control duplicated key behavior in topic compaction.

• overwrite (current behavior)
• drop (new behavior)
then user can configure what is the compaction behavior through namespace settings.

maybe raise a github issue?
----
2020-03-11 23:13:52 UTC - Sijie Guo: okay
----
2020-03-11 23:25:57 UTC - Antti Kaikkonen: Is ECC memory recommended for pulsar deployments?
----
2020-03-12 01:33:07 UTC - Andy Papia: I've mostly run stateless apps in K8s up to now.  How should I think about Pulsar on K8s in AWS?  If I'm trying to cost optimize, I assume using persistent volume claims I'll need to run the bookie and zk pods 24x7 in order to keep their volumes available.  Since the brokers are stateless I assume they can be dynamically scaled with the autoscaler based on some metric.  So will I have a static ZK and bookie cluster than I can scale out when I need more throughput?  The volumes themselves can be resized if I need more storage.  Is it possible to use a distributed filesystem like EFS with Bookkeeper?
----
2020-03-12 04:27:45 UTC - Jeon.DeukJin: Hello, here is doesn’t show site.
<https://pulsar.apache.org/docs/en/reference-connector-admin/#sinks>
----
2020-03-12 04:28:19 UTC - Jeon.DeukJin: empty page.
----
2020-03-12 04:28:31 UTC - Jeon.DeukJin: also, <https://pulsar.apache.org/docs/en/reference-connector-admin/#sources>
----
2020-03-12 04:29:09 UTC - Jeon.DeukJin: and, then, Korean page ~!!
<https://pulsar.apache.org/docs/ko-KR/standalone>
----
2020-03-12 04:29:31 UTC - Jeon.DeukJin: Not Found error.
----
2020-03-12 04:29:42 UTC - Jeon.DeukJin: Please fix it.
----
2020-03-12 04:57:06 UTC - Greg Methvin: sounds good. I reported an issue here: <https://github.com/apache/pulsar/issues/6526>
----
2020-03-12 05:35:56 UTC - tuteng: This doc move to <https://pulsar.apache.org/docs/en/io-use/#sink-2>
----
2020-03-12 06:53:58 UTC - Devin G. Bost: @Jeon.DeukJin I've already reported this in a Github issue, so we're aware of it. Thanks for letting us know it's still an issue.
----
2020-03-12 06:54:44 UTC - Devin G. Bost: @tuteng I have an open pulsar issue for this. None of the foreign language pages are working.
FYI @jia zhai
----
2020-03-12 06:55:49 UTC - Devin G. Bost: <https://github.com/apache/pulsar/issues/6470|https://github.com/apache/pulsar/issues/6470>
----
2020-03-12 07:22:53 UTC - Aravindhan: @Aravindhan has joined the channel
----
2020-03-12 09:10:04 UTC - Aravindhan: Hi All, I am using pulsar io source connector to pull messages from Kafka to Pulsar topic. I need to do some data transformation, Before taking it into the application for processing. One way is writing a pulsar function for the transformation and get the required messages in the destination topic of pulsar function.

Is it possible to override the pulsar io source connector, So that the same can do the transformation as well? So that it can reduce the intermediate topic(which is an input to the pulsar function)?
----