You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2020/03/04 09:11:04 UTC

Slack digest for #general - 2020-03-04

2020-03-03 10:01:00 UTC - Kannan: This is because of broker advertise k8s service name (which is pulsaristio-broker) and the Broker-znode session gets constructed using this service name. So, with 1 broker it works but on adding another broker, it uses same service name and hence, fails with error - "broker-znode owned by different session."
----
2020-03-03 10:21:20 UTC - eilonk: anyone here deployed pulsar with helm using tls authentication?
While the chart allows you to create a tls secret for the admin certs, what happens when i want to use different ones for the consumers? brokers?
(in a different type of deployment, i would give the filepath to these certificates but that's not how helm works)
----
2020-03-03 13:13:03 UTC - Santiago Del Campo: It was a problem with K8s and linux host networking, for some reason the bookkeeper pod it's accessible from outside but the pod can't access itself..... as fas as i know, seems to be a bug with k8s and iptables.
----
2020-03-03 13:20:37 UTC - Greg: Hello there, we are currently facing an issue with pulsar 2.5.0 (was working fine in 2.4.2). at some point we are not able to subscribe to a given topic with this error sent by the server : [<non-persistent://public/default/cluster][Proxy-vm-k8s-worker-infinity-2]> Failed to create consumer: Topic does not have schema to check
----
2020-03-03 13:21:56 UTC - Greg: We are suspecting the topic to be cleaned at some point because of no activity, and when the client automatically reconnect, there are no more schema linked to this topic recreated
----
2020-03-03 13:23:32 UTC - Greg: Any hint on how to workaround this or some informations needed to pinpoint what is the issue ?
----
2020-03-03 13:26:15 UTC - Greg: I just realized we are using 2.4.2 java client on top of 2.5.0 server version of pulsar, i will try with 2.5.0 java client
----
2020-03-03 13:32:11 UTC - Ryan Slominski: Thanks for pointing out the getCurrentRecord method.  I missed that as I thought the input of the function was always passed in as a parameter in the method signature (in all the examples I've seen it's a String named "input").   I guess the input parameter appears in two places then?
----
2020-03-03 13:34:15 UTC - Penghui Li: Have you deleted the schema of this topic before?
----
2020-03-03 13:34:56 UTC - Antti Kaikkonen: <https://pulsar.apache.org/docs/en/functions-develop/#available-apis>

You have to use the pulsar function SDK instead of language-native function interface to get access to the context object.
----
2020-03-03 13:38:32 UTC - Greg: no, this topic is created by the first publisher with a schema (STRING)
----
2020-03-03 13:39:15 UTC - Greg: and at some point this topic is removed automatically
----
2020-03-03 13:39:22 UTC - Ryan Slominski: Yeah, it looks like you can get the input from the method signature AND from the context object.   In other words, the SDK version should not include the input as a method parameter - all you need is the context object.
+1 : Antti Kaikkonen
----
2020-03-03 13:39:41 UTC - Greg: and we cannot reconnect manually on it because of the schema not available
----
2020-03-03 13:40:24 UTC - Greg: i think the topic is recreated automatically by the java client, right ?
----
2020-03-03 13:40:26 UTC - Penghui Li: ok, is it happens when the topic is removed then a consumer try to connect to it.
----
2020-03-03 13:40:34 UTC - Greg: yes
----
2020-03-03 13:41:40 UTC - Greg: because it looks like it has been recreated automatically (client reconnection mechanism ?) without schema
----
2020-03-03 13:43:47 UTC - Penghui Li: No, `Topic does not have schema to check`  means broker can get schema from consumer but can't find any schema on this topic. So, it's better to check is the topic don't have any schema
----
2020-03-03 13:44:16 UTC - Greg: Sorry that was what i meant, the topic has lost the schema
----
2020-03-03 13:44:24 UTC - Penghui Li: yes
----
2020-03-03 13:44:53 UTC - Greg: we verified using pulsar-admin and can confirmed the topic has lost the schema, but i cannot explain why
----
2020-03-03 13:44:55 UTC - Penghui Li: Here is the source code:
```return getSchema(schemaId).thenCompose(existingSchema -&gt; {
            if (existingSchema != null &amp;&amp; !existingSchema.schema.isDeleted()) {
                    if (strategy == SchemaCompatibilityStrategy.BACKWARD ||
                            strategy == SchemaCompatibilityStrategy.FORWARD ||
                            strategy == SchemaCompatibilityStrategy.FORWARD_TRANSITIVE ||
                            strategy == SchemaCompatibilityStrategy.FULL) {
                        return checkCompatibilityWithLatest(schemaId, schemaData, SchemaCompatibilityStrategy.BACKWARD);
                    } else {
                        return checkCompatibilityWithAll(schemaId, schemaData, strategy);
                    }
            } else {
                return FutureUtil.failedFuture(new IncompatibleSchemaException("Topic does not have schema to check"));
            }
        });```
----
2020-03-03 13:45:58 UTC - Penghui Li: &gt; we verified using pulsar-admin and can confirmed the topic has lost the schema, but i cannot explain why
Ok, thanks.
----
2020-03-03 13:47:00 UTC - Greg: and i also see that the schema has been cleaned : [<non-persistent://public/default/cluster>] Topic deleted successfully due to inact
ivity
----
2020-03-03 13:47:34 UTC - Greg: so i don't know how the topic has been recreated (something automatic somewhere :wink: )
----
2020-03-03 13:49:07 UTC - Greg: sorry i mean the topic has been cleaned, not the schema...
----
2020-03-03 13:49:44 UTC - Penghui Li: Pulsar enable topic auto creation by default. So, if a producer or a consumer try to connect to a not exists topic, broker will create the topic.
----
2020-03-03 13:50:13 UTC - Greg: but in this case the topic is well created with the schema
----
2020-03-03 13:51:03 UTC - Ryan Slominski: FYI - looks like syntax highlighting of method getCurrentRecord is broken in the 2.5.0 docs.  Not sure why it isn't blue like the rest of the methods.  Probably why I didn't see it.
----
2020-03-03 13:51:16 UTC - Greg: so i suppose a client already connected that tries to use a topic that has been cleaned
----
2020-03-03 13:51:20 UTC - Greg: is it possible ?
----
2020-03-03 13:52:50 UTC - Greg: Can a topic be cleaned when there are publisher/consumer connnected to it ?
----
2020-03-03 13:53:05 UTC - Penghui Li: topic only delete when no publishers, no consumers, no subscriptions.
----
2020-03-03 13:53:18 UTC - Penghui Li: &gt; Can a topic be cleaned when there are publisher/consumer connnected to it ? (edited)
NO
----
2020-03-03 13:53:43 UTC - Greg: We are using a pulsar cluster with 3 instances
----
2020-03-03 13:54:23 UTC - Greg: i think we don't have the issue with only one pulsar instance, but i need to verify to confirm
----
2020-03-03 14:00:03 UTC - Penghui Li: Ok, I will try to reproduce it. looks this is a bug related to schema deletion.
----
2020-03-03 14:00:18 UTC - Alex Yaroslavsky: Hi,

Can't figure out how to deal with this one

I get the following error in the function log:
`[ERROR] python_instance.py: Exception while executing user method`
`Traceback (most recent call last):`
  `File "/opt/pulsar/instances/python-instance/python_instance.py", line 242, in actual_execution`
    `output_object = self.function_class.process(input_object, self.contextimpl)`
  `File "/tmp/pulsar_functions/c1-d1/in/in_router/0/in_router.py", line 8, in process`
    `context.publish(self.out_topic, item, properties={ 'tenant' : context.get_function_tenant() })`
  `File "/opt/pulsar/instances/python-instance/contextimpl.py", line 190, in publish`
    `output_bytes, partial(self.callback_wrapper, callback, topic_name, self.get_message_id()), **message_conf)`
  `File "/opt/pulsar/instances/python-instance/pulsar/__init__.py", line 894, in send_async`
    `replication_clusters, disable_replication, event_timestamp)`
  `File "/opt/pulsar/instances/python-instance/pulsar/__init__.py", line 928, in _build_msg`
    `mb.property(k, v)`
`ArgumentError: Python argument types in`
    `MessageBuilder.property(MessageBuilder, str, unicode)`
`did not match C++ signature:`
    `property(pulsar::MessageBuilder {lvalue}, std::string, std::string)`

The function code is this:
`from pulsar import Function`

`class RoutingFunction(Function):`
    `def __init__(self):`
        `self.out_topic = "<persistent://internal/emsaas/out>"`

    `def process(self, item, context):`
            `context.publish(self.out_topic, item, properties={ 'tenant' : context.get_function_tenant() })`
----
2020-03-03 14:01:07 UTC - Penghui Li: @Greg Can you help to create a issue on github, and provide the reproduce steps if possible ?
----
2020-03-03 14:01:15 UTC - Greg: ok thanks Penghui, tell me if you need more informations, traces or tests from me !
+1 : Penghui Li
----
2020-03-03 14:01:29 UTC - Greg: yes
----
2020-03-03 14:01:36 UTC - Penghui Li: Thanks
----
2020-03-03 14:03:46 UTC - Ryan Slominski: Looks like DOCUSAURUS code tabs interferes with the markdown because the raw markdown works fine.   Might be related to this bug:  <https://github.com/facebook/docusaurus/issues/1260>
----
2020-03-03 14:08:45 UTC - Tobias Macey: @Sijie Guo is there anywhere that I can track progress on the Kafka protocol layer that you are working on?
----
2020-03-03 14:10:59 UTC - Penghui Li: @Greg I can't reproduce it on master branch
----
2020-03-03 14:11:04 UTC - Penghui Li: ```{
  "version": 2,
  "schemaInfo": {
    "name": "test",
    "schema": "",
    "type": "STRING",
    "properties": {}
  }
}```
----
2020-03-03 14:11:56 UTC - Greg: i try to see how to reproduce it easily, will also see if i have the issue with only one broker
----
2020-03-03 14:12:01 UTC - Penghui Li: This is created schema by consumer. I create the consumer after topic delete.
----
2020-03-03 14:12:39 UTC - Greg: but in my case the consumer is already connected when the topic is deleted
----
2020-03-03 14:13:03 UTC - Greg: i don't know why it is deleted, i though it was because no messages were sent
----
2020-03-03 14:13:04 UTC - Penghui Li: Ok, let me try
----
2020-03-03 14:17:45 UTC - Penghui Li: ```22:14:20.129 [pulsar-client-io-1-1:org.apache.pulsar.client.impl.ConnectionHandler@114] INFO  org.apache.pulsar.client.impl.ConnectionHandler - [test] [test] Closed connection [id: 0x448f874d, L:/127.0.0.1:53657 - R:/127.0.0.1:6650] -- Will try again in 0.1 s
22:14:20.230 [pulsar-timer-4-1:org.apache.pulsar.client.impl.ConnectionHandler@117] INFO  org.apache.pulsar.client.impl.ConnectionHandler - [test] [test] Reconnecting after timeout
22:14:20.236 [pulsar-client-io-1-1:org.apache.pulsar.client.impl.ConsumerImpl@550] INFO  org.apache.pulsar.client.impl.ConsumerImpl - [test][test] Subscribing to topic on cnx [id: 0x448f874d, L:/127.0.0.1:53657 - R:/127.0.0.1:6650]
22:14:20.270 [pulsar-client-io-1-1:org.apache.pulsar.client.impl.ConsumerImpl@662] INFO  org.apache.pulsar.client.impl.ConsumerImpl - [test][test] Subscribed to topic on /127.0.0.1:6650 -- consumer: 0
22:15:00.982 [pulsar-client-io-1-1:org.apache.pulsar.client.impl.ClientCnx@629] INFO  org.apache.pulsar.client.impl.ClientCnx - [/127.0.0.1:6650] Broker notification of Closed consumer: 0
22:15:00.982 [pulsar-client-io-1-1:org.apache.pulsar.client.impl.ConnectionHandler@114] INFO  org.apache.pulsar.client.impl.ConnectionHandler - [test] [test] Closed connection [id: 0x448f874d, L:/127.0.0.1:53657 - R:/127.0.0.1:6650] -- Will try again in 0.1 s
22:15:01.084 [pulsar-timer-4-1:org.apache.pulsar.client.impl.ConnectionHandler@117] INFO  org.apache.pulsar.client.impl.ConnectionHandler - [test] [test] Reconnecting after timeout
22:15:01.085 [pulsar-client-io-1-1:org.apache.pulsar.client.impl.ConsumerImpl@550] INFO  org.apache.pulsar.client.impl.ConsumerImpl - [test][test] Subscribing to topic on cnx [id: 0x448f874d, L:/127.0.0.1:53657 - R:/127.0.0.1:6650]
22:15:01.120 [pulsar-client-io-1-1:org.apache.pulsar.client.impl.ConsumerImpl@662] INFO  org.apache.pulsar.client.impl.ConsumerImpl - [test][test] Subscribed to topic on /127.0.0.1:6650 -- consumer: 0
22:15:30.509 [pulsar-client-io-1-1:org.apache.pulsar.client.impl.ClientCnx@629] INFO  org.apache.pulsar.client.impl.ClientCnx - [/127.0.0.1:6650] Broker notification of Closed consumer: 0
22:15:30.510 [pulsar-client-io-1-1:org.apache.pulsar.client.impl.ConnectionHandler@114] INFO  org.apache.pulsar.client.impl.ConnectionHandler - [test] [test] Closed connection [id: 0x448f874d, L:/127.0.0.1:53657 - R:/127.0.0.1:6650] -- Will try again in 0.1 s
22:15:30.611 [pulsar-timer-4-1:org.apache.pulsar.client.impl.ConnectionHandler@117] INFO  org.apache.pulsar.client.impl.ConnectionHandler - [test] [test] Reconnecting after timeout
22:15:30.613 [pulsar-client-io-1-1:org.apache.pulsar.client.impl.ConsumerImpl@550] INFO  org.apache.pulsar.client.impl.ConsumerImpl - [test][test] Subscribing to topic on cnx [id: 0x448f874d, L:/127.0.0.1:53657 - R:/127.0.0.1:6650]
22:15:30.650 [pulsar-client-io-1-1:org.apache.pulsar.client.impl.ConsumerImpl@662] INFO  org.apache.pulsar.client.impl.ConsumerImpl - [test][test] Subscribed to topic on /127.0.0.1:6650 -- consumer: 0```
Looks not easy to reproduce. I tried delete topic only, delete topic and schema. The consumer still can reconnect succeed. And when I delete shema, I check the last schema version is changed.
----
2020-03-03 14:18:05 UTC - Penghui Li: ```lipenghui@lipenghuideMacBook-Pro-2 apache-pulsar-2.6.0-SNAPSHOT % bin/pulsar-admin schemas get test
{
  "version": 4,
  "schemaInfo": {
    "name": "test",
    "schema": "",
    "type": "STRING",
    "properties": {}
  }
}
lipenghui@lipenghuideMacBook-Pro-2 apache-pulsar-2.6.0-SNAPSHOT % bin/pulsar-admin topics delete test -f -d
lipenghui@lipenghuideMacBook-Pro-2 apache-pulsar-2.6.0-SNAPSHOT % bin/pulsar-admin schemas get test        
{
  "version": 6,
  "schemaInfo": {
    "name": "test",
    "schema": "",
    "type": "STRING",
    "properties": {}
  }
}```
----
2020-03-03 14:20:32 UTC - Greg: wow ok :disappointed: I will try to find the exact way to reproduce, thanks for the help
----
2020-03-03 14:29:39 UTC - Geetish Sanjeeb Nayak: @Geetish Sanjeeb Nayak has joined the channel
----
2020-03-03 14:31:54 UTC - Greg: So here is a reproduction with 2 pulsar nodes :
```Node 1 :
14:23:30.460 [ForkJoinPool.commonPool-worker-1] WARN  org.apache.pulsar.broker.service.AbstractTopic - [<non-persistent://public/default/cluster>] Error getting policies java.util.concurrent.CompletableFuture cannot be cast to org.apache.pulsar.common.policies.data.Policies and publish throttling will be disabled
14:23:30.461 [ForkJoinPool.commonPool-worker-1] INFO  org.apache.pulsar.broker.service.AbstractTopic - Disabling publish throttling for <non-persistent://public/default/cluster>
14:23:30.477 [ForkJoinPool.commonPool-worker-1] INFO  org.apache.pulsar.broker.service.BrokerService - Created topic NonPersistentTopic{topic=<non-persistent://public/default/cluster>}
14:25:20.219 [pulsar-io-24-3] INFO  org.apache.pulsar.broker.service.nonpersistent.NonPersistentTopic - [<non-persistent://public/default/cluster>] Topic deleted
14:25:20.219 [pulsar-io-24-3] INFO  org.apache.pulsar.broker.service.nonpersistent.NonPersistentTopic - [<non-persistent://public/default/cluster>] Topic deleted successfully due to inactivity
14:28:24.086 [pulsar-io-24-4] WARN  org.apache.pulsar.broker.service.AbstractTopic - [<non-persistent://public/default/cluster>] Error getting policies java.util.concurrent.CompletableFuture cannot be cast to org.apache.pulsar.common.policies.data.Policies and publish throttling will be disabled
14:28:24.086 [pulsar-io-24-4] INFO  org.apache.pulsar.broker.service.AbstractTopic - Disabling publish throttling for <non-persistent://public/default/cluster>
14:28:24.087 [pulsar-io-24-4] INFO  org.apache.pulsar.broker.service.BrokerService - Created topic NonPersistentTopic{topic=<non-persistent://public/default/cluster>}

Node 2 :
14:25:20.201 [pulsar-ordered-OrderedExecutor-5-0-EventThread] INFO  org.apache.pulsar.zookeeper.ZooKeeperCache - [State:CONNECTED Timeout:30000 sessionid:0x100d2a693590018 local:/10.200.8.111:41790 remoteserver:zookeeper/10.200.7.69:2181 lastZxid:4294984843 xid:719 sent:719 recv:753 queuedpkts:0 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeDataChanged path:/schemas/public/default/cluster
14:27:04.984 [pulsar-io-24-3] INFO  org.apache.pulsar.broker.service.ServerCnx - [/10.200.8.112:36886] Subscribing on topic <non-persistent://public/default/cluster> / Monitor-istra-monitor-667c9fbbcc-ldlsf
14:27:05.086 [Thread-14] WARN  org.apache.pulsar.broker.service.ServerCnx - [/10.200.8.112:36886][<non-persistent://public/default/cluster][Monitor-istra-monitor-667c9fbbcc-ldlsf]> Failed to create consumer: Topic does not have schema to check```
----
2020-03-03 14:33:21 UTC - Greg: The publisher does not disconnect but does not send any message, after 1 minute the Topic is deleted. Then a new publisher connects on the other node and gets the error message
----
2020-03-03 14:33:41 UTC - Greg: Will try with only one node
----
2020-03-03 14:40:35 UTC - Santiago Del Campo: Hello!.. any idea about this?

We're running pulsar cluster on top of K8s.... the broker takes high load traffic through websocket.... overtime, the CPU rise up to 3 cores of the server, could be related to some CPU usage "leak"?

Broker version is 2.4.2
----
2020-03-03 14:48:21 UTC - Greg: @Penghui Li: I cannot reproduce with only one pulsar instance, the Topic is never cleaned. But with several instances, i can easily reproduce as the Topic is deleted after 1 minute and then i cannot reconnect anymore to it
----
2020-03-03 14:56:45 UTC - Penghui Li: Is the instance means pulsar broker?
----
2020-03-03 15:15:51 UTC - Greg: yes
----
2020-03-03 15:34:06 UTC - Pavel Tishkevich: Hello.
We’ve increased number of brokers from 3 to 12 to have better availability when broker fails as @Sijie Guo suggested.

But after that latency for some operations increased significantly (For example `ConsumerBuilder.subscribe()` increased from 0.6s to 1.5s)

What is causing this latency growth? Can it be tuned with this number of brokers?

Also we’ve noticed that zookeeper is more loaded after adding brokers (see attached). Is this normal? What can we do with this to decrease latency?
----
2020-03-03 17:24:46 UTC - Sijie Guo: We are about to open source KoP in the coming week.
----
2020-03-03 17:27:06 UTC - Sijie Guo: what kind of hardware are you using for your zookeeper cluster?
----
2020-03-03 17:49:45 UTC - Gabriel Paiz III: @Gabriel Paiz III has joined the channel
----
2020-03-03 17:57:53 UTC - Devin G. Bost: I was just going to ask the same question.
----
2020-03-03 17:58:10 UTC - Devin G. Bost: More brokers puts greater load on Zookeeper.
----
2020-03-03 18:03:04 UTC - Devin G. Bost: I know there are folks using helm.
----
2020-03-03 18:03:29 UTC - Devin G. Bost: I don't know specifics though.
----
2020-03-03 18:19:50 UTC - Devin G. Bost: @Penghui Li I'm not sure if it's the same issue, but I know that sometimes the schema isn't created depending on the other in which components are created. For example, if a source is created before a function, or if a function is created before a sink, the schema is not created.
----
2020-03-03 18:20:36 UTC - Devin G. Bost: Do you have logs associated with the event? I think I've seen this happen before, but we need to capture logs around what's happening to identify the cause.
----
2020-03-03 18:24:54 UTC - Santiago Del Campo: Which logs exactly... broker's?:thinking_face:

I made some research about this.. found an issue on github related to CPU usage leak when broker was under high load, websocket communication and exclusive subscription of a topic was attempted (which btw it's our case due to the app architecture around Pulsar consumption).
----
2020-03-03 18:26:05 UTC - Santiago Del Campo: According to the github issue, and if i understand correctly, a fix was made in 2.4.2 and 2.5.0 :thinking_face:
----
2020-03-03 18:28:26 UTC - Devin G. Bost: Do you have the issue number? I can check on this.
----
2020-03-03 18:28:38 UTC - Devin G. Bost: Yes, I meant broker logs.
----
2020-03-03 18:29:42 UTC - Devin G. Bost: Is the issue intermittent? If so, it may be tricky to reproduce.
----
2020-03-03 18:30:09 UTC - Santiago Del Campo: issue: <https://github.com/apache/pulsar/issues/5200>

apparently fixed with: <https://github.com/apache/pulsar/pull/5204>
----
2020-03-03 18:31:50 UTC - Devin G. Bost: Thanks
----
2020-03-03 18:33:57 UTC - Santiago Del Campo: Thanks to you :slightly_smiling_face:
----
2020-03-03 18:41:47 UTC - Santiago Del Campo: This is the exception received from the stdout:
Tell me if you needed something else.

----
2020-03-03 18:41:47 UTC - Santiago Del Campo: ```18:38:41.381 [pulsar-web-32-7] WARN org.apache.pulsar.websocket.ConsumerHandler - [172.31.49.0:33282] Failed in creating subscription xxxxxxxxxxxxxxxxx on topic <non-persistent://public/default/xxxxxxxxxxxxxxxxxxx>
org.apache.pulsar.client.api.PulsarClientException$ConsumerBusyException: Exclusive consumer is already connected
at org.apache.pulsar.client.api.PulsarClientException.unwrap(PulsarClientException.java:276) ~[org.apache.pulsar-pulsar-client-api-2.4.2.jar:2.4.2]
at org.apache.pulsar.client.impl.ConsumerBuilderImpl.subscribe(ConsumerBuilderImpl.java:91) ~[org.apache.pulsar-pulsar-client-original-2.4.2.jar:2.4.2]
at org.apache.pulsar.websocket.ConsumerHandler.&lt;init&gt;(ConsumerHandler.java:111) ~[org.apache.pulsar-pulsar-websocket-2.4.2.jar:2.4.2]
at org.apache.pulsar.websocket.WebSocketConsumerServlet.lambda$configure$0(WebSocketConsumerServlet.java:44) ~[org.apache.pulsar-pulsar-websocket-2.4.2.jar:2.4.2]
at org.eclipse.jetty.websocket.server.WebSocketServerFactory.acceptWebSocket(WebSocketServerFactory.java:229) [org.eclipse.jetty.websocket-websocket-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at org.eclipse.jetty.websocket.server.WebSocketServerFactory.acceptWebSocket(WebSocketServerFactory.java:214) [org.eclipse.jetty.websocket-websocket-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at org.eclipse.jetty.websocket.servlet.WebSocketServlet.service(WebSocketServlet.java:160) [org.eclipse.jetty.websocket-websocket-servlet-9.4.20.v20190813.jar:9.4.20.v20190813]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) [javax.servlet-javax.servlet-api-3.1.0.jar:3.1.0]
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:852) [org.eclipse.jetty-jetty-servlet-9.4.20.v20190813.jar:9.4.20.v20190813]
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1604) [org.eclipse.jetty-jetty-servlet-9.4.20.v20190813.jar:9.4.20.v20190813]
at org.apache.pulsar.broker.web.ResponseHandlerFilter.doFilter(ResponseHandlerFilter.java:53) [org.apache.pulsar-pulsar-broker-2.4.2.jar:2.4.2]
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1591) [org.eclipse.jetty-jetty-servlet-9.4.20.v20190813.jar:9.4.20.v20190813]
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:542) [org.eclipse.jetty-jetty-servlet-9.4.20.v20190813.jar:9.4.20.v20190813]
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) [org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1581) [org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) [org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1307) [org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) [org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:482) [org.eclipse.jetty-jetty-servlet-9.4.20.v20190813.jar:9.4.20.v20190813]
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1549) [org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) [org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1204) [org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) [org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221) [org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146) [org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:173) [org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) [org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at org.eclipse.jetty.server.Server.handle(Server.java:494) [org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:374) [org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:268) [org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at <http://org.eclipse.jetty.io|org.eclipse.jetty.io>.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) [org.eclipse.jetty-jetty-io-9.4.20.v20190813.jar:9.4.20.v20190813]
at <http://org.eclipse.jetty.io|org.eclipse.jetty.io>.FillInterest.fillable(FillInterest.java:103) [org.eclipse.jetty-jetty-io-9.4.20.v20190813.jar:9.4.20.v20190813]
at <http://org.eclipse.jetty.io|org.eclipse.jetty.io>.ChannelEndPoint$2.run(ChannelEndPoint.java:117) [org.eclipse.jetty-jetty-io-9.4.20.v20190813.jar:9.4.20.v20190813]
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336) [org.eclipse.jetty-jetty-util-9.4.20.v20190813.jar:9.4.20.v20190813]
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313) [org.eclipse.jetty-jetty-util-9.4.20.v20190813.jar:9.4.20.v20190813]
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171) [org.eclipse.jetty-jetty-util-9.4.20.v20190813.jar:9.4.20.v20190813]
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129) [org.eclipse.jetty-jetty-util-9.4.20.v20190813.jar:9.4.20.v20190813]
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:367) [org.eclipse.jetty-jetty-util-9.4.20.v20190813.jar:9.4.20.v20190813]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_232]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_232]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]```
----
2020-03-03 19:11:04 UTC - eilonk: If anyone was wondering, this isn’t related to helm but to kubernetes itself - just requires to mount volumes with the matching secrets and then editing the chart templates to use those.
----
2020-03-03 19:16:54 UTC - Devin G. Bost: Zookeeper needs to be properly tuned to function at its best.
----
2020-03-03 19:17:04 UTC - Devin G. Bost: It needs very fast disk storage.
----
2020-03-03 19:30:55 UTC - Alexander Ursu: Quick question, has anyone tried to put the Pulsar brokers directly behind a reverse proxy, specifically Traefik in my case, without first standing behind the Pulsar proxy? Are there any inherit problems with this approach, or are there any opinions on it?

I know the Pulsar proxy can do service discovery, but so can Traefik, since I have all my components running in docker containers in a swarm.
----
2020-03-03 19:33:00 UTC - Rolf Arne Corneliussen: Simple question: for a _partitioned_ topic, if you set a *compaction-threshold*, will it apply to the topic as a whole or for each individual partition? I assume the latter, but I could not find the answer in the documentation.
----
2020-03-03 19:37:42 UTC - Addison Higham: @Alexander Ursu it is a bit more complex than that, topics belong to a broker, for admin calls, http redirects are used, and the proxy follows those for you, likely you could do that with Traefik fairly easily, but for the tcp protocol (that actually sends/receives messages), you need to ensure you are being routed to the proper broker. When you try and connect a consumer/producer via a proxy, it looks up the broker for a topic (either via talking to zookeeper directly or by making an http call to the brokers, depending on how your proxy is configured) and then forwards requests. There is additional complexity if you are doing authz/authn there. Making Traefik do all that would be a fair bit more complicated and requires domains specific knowledge of pulsar's protocol
----
2020-03-03 19:40:11 UTC - Alexander Ursu: Thanks for the input! So I guess using the Pulsar proxy is still a much more sane approach, and then putting that behind Traefik would be straightforward?
----
2020-03-03 19:40:40 UTC - Addison Higham: should be, in our k8s we use an AWS NLB -&gt; Pulasr proxy
----
2020-03-03 21:31:53 UTC - Ryan Slominski: `java.lang.IllegalStateException: State is not enabled.`

I turned on debugging and I see this.  That explain why my Pulsar Function that attempts to use state is not working.  So, how do I enable state in a Pulsar Function?
----
2020-03-03 21:45:47 UTC - Ryan Slominski: Tried updating conf/bookkeeper.conf with:
`extraServerComponents=org.apache.bookkeeper.stream.server.StreamStorageLifecycleComponent`
Still no luck.
----
2020-03-03 22:04:54 UTC - Alexander Ursu: Might anyone have an idea/opinion on connecting a pulsar cluster to a sink for an arbitrary amount of topics, with arbitrary schemas, and can be connected to Grafana for graphing, preferably through SQL or SQL-like query language
----
2020-03-03 22:12:38 UTC - Alexander Ursu: Or has anyone had any experience with putting any sort of visualizations over Presto after configuring it to work with Pulsar
----
2020-03-03 22:18:42 UTC - Ali Ahmed: @Alexander Ursu you can try this
<http://rickbergfalk.github.io/sqlpad/>
----
2020-03-03 22:42:23 UTC - Ryan Slominski: FYI - This does work if you deploy the function to the cluster.  Just doesn't work with Local Function Runner
----
2020-03-03 23:59:09 UTC - Derek Moore: @Derek Moore has joined the channel
----
2020-03-04 02:31:48 UTC - Eugen: I'm currently testing a pulsar cluster and I experience message loss when producing from a single thread to a single topic (asynchronously). The following list shows the sequence numbers that are missing when sending at 200.000 msg/sec. It does not matter if I consume in a tailing fashion or if I seek to the beginning and receive everything - the result is the same - which leads me to believe the producing side is the problem:
```11671-11840=169
23172-23397=225
23660-24127=467
24866-26299=1433
28529-28619=90
30519-30552=33
41185-41933=748
57788-58270=482
58534-58778=244
...```
Under what circumstances can message loss occur? I was under the impression that that's pretty much impossible in pulsar.
----
2020-03-04 02:33:13 UTC - Joe Francis: It is impossible, if the message is acked to the producer
----
2020-03-04 02:33:59 UTC - Eugen: Fwiw, I am using the same benchmark code to produce data for pulsar and kafka, which I'm comparing pulsar to, and the kafka variant, which uses the same message generation threading and logic, and with kafka there is no message loss.
----
2020-03-04 02:34:55 UTC - Eugen: I also thought it is impossible, even if brokers and bookies fail, as long as the producer is alive, which it is
----
2020-03-04 02:35:21 UTC - Joe Francis: Published messages are durable. A message is published only after you receive an ack for it.
----
2020-03-04 02:36:14 UTC - Eugen: I'm ignoring acks
----
2020-03-04 02:36:25 UTC - Joe Francis: :slightly_smiling_face:
----
2020-03-04 02:37:00 UTC - Eugen: I'm just sending using producer.sendAsync()
----
2020-03-04 02:37:22 UTC - Eugen: so you are saying that if I do not wait for an ack, messages may be lost?
----
2020-03-04 02:37:29 UTC - Eugen: In other words, I need to send synchronously?
----
2020-03-04 02:37:35 UTC - Joe Francis: Not at all
----
2020-03-04 02:39:21 UTC - Joe Francis: But the ack is the guarantee that your message was persisted.  Asynchronous send allows you to pipeline sends, instead of waiting for the roundtrip before the next send.
----
2020-03-04 02:42:16 UTC - Eugen: I heard (I believe here in slack) that as long as a producer lives, it will buffer messages in the face of broker failures, and make sure there can be nore reordering or message loss. As I'm facing message loss (always in batches), either my producer is buggy (but it works in kafka), or something is happening in pulsar, which I thought was impossible. So how could some messages get lost?
----
2020-03-04 02:43:43 UTC - Eugen: Obviously there could be message "loss" at the end, as long as brokers (or bookies) aren't recovering, but I still don't see how that could happen with messages in the middle.
----
2020-03-04 02:48:15 UTC - Joe Francis: That is correct. But if the operation times out, the clinet returns an error (future completed with error) and app has to republish
----
2020-03-04 02:49:06 UTC - Joe Francis: The broker retries will  not exceed the timeout window.  Within the timeout window, there will be retries.
----
2020-03-04 02:50:21 UTC - Joe Francis: The timeout window essentially is a limit - The app  tells Pulsar, if you cannot publish within this time, kick up the error to me
----
2020-03-04 02:50:52 UTC - Eugen: In which case there will be message loss (or reordering in case I resend), right?
----
2020-03-04 02:51:14 UTC - Eugen: I will look into the timeout setting next. Do you happen to know what the default is?
----
2020-03-04 02:52:13 UTC - Joe Francis: What happens on a timeout is that everything after that message that is pending in sthe client sendbuffer is cleaned out. the app has to resend everything starting from the message that returned error.
----
2020-03-04 02:52:57 UTC - Eugen: So before resending it needs to clear some flag that it received the nack?
----
2020-03-04 02:53:22 UTC - Eugen: Otherwise how would the client know if it's a resend (aware of the problem) or not
----
2020-03-04 02:58:43 UTC - Joe Francis: Lets say you get an error on the 23rd message, after you send 50 in async. You will get error on all 23..50 messages (futures). You will  just have to resend starting from 23...
----
2020-03-04 03:01:11 UTC - Joe Francis: I am not clear on what happened in your case, but I am pretty confident that if you received the ack for the publish, that message will be delivered.
----
2020-03-04 03:05:11 UTC - Eugen: So in your example, I send msg 50 asynchronously, and then the future for msg 23 returns an error. Everything before that is lost, but everything after that (like msg 51 which follows immediately after the asynchronously sent msg 50) gets sent? Then I don't see how I can prevent message loss / reordering when sending asynchronously.
----
2020-03-04 03:17:30 UTC - Joe Francis: That is a very good qn - I have to look at the code and see what happens @Matteo Merli may know it off the top of his head
----
2020-03-04 03:25:20 UTC - Eugen: I will try to set `brokerDeduplicationEnabled` to true
----
2020-03-04 03:59:45 UTC - Kannan: Broker-znode sessions names are constructed using broker advertised address. By default broker advertise its IP &amp; in k8s if we try to advertise the broker service name then other brokers failing to create session with because of session conflict. how to resolve this ?
----
2020-03-04 04:20:22 UTC - Eugen: adding
```brokerDeduplicationEnabled=true```
to broker.conf did not fix this
----
2020-03-04 04:48:22 UTC - Matteo Merli: @Eugen The most probable cause of the publish failure is the message being rejected directly in the client side because the buffer is full.

Producer has a queue where messages are kept before they are acknoledged by the broker.

When the queue gets full (default size is 1K messages), then the producer will, by default, reject new sendAsync() operation (by failing the associated future objects).

One way to avoid that is to set `blockIfQueueFull=true` on the `ProducerBuilder`. This will make the sendAsync operation to throttle (blocking) when there's no more space.
+1 : Eugen, Ali Ahmed
----
2020-03-04 04:49:48 UTC - Matteo Merli: Another way you can get publish failure is through timeouts (default is 30s). You can disable timeout, making the client to keep trying publishing the message forever by setting `sendTimeout=0`.
----
2020-03-04 04:58:29 UTC - Eugen: Good news - I will give this a shot!
----
2020-03-04 05:13:51 UTC - Eugen: The 30 sec timeout can't be what is causing my problem though, because it occurs after only a couple of seconds
----
2020-03-04 05:30:12 UTC - Sijie Guo: Are you using the same service name among different broker nodes?
----
2020-03-04 05:38:25 UTC - Eugen: Note to self and others following this thread: the queue size can be changed via `ProducerBuilder.maxPendingMessages()`
----
2020-03-04 05:42:36 UTC - Kannan: yes
----
2020-03-04 07:36:35 UTC - Greg: No, i can reproduce easily, but only with several brokers. If i use a single broker, i don't reproduce as the Topic is never cleaned. I need to understand why the Topic is cleaned even when there are subscribers/consumers connected...
----
2020-03-04 07:39:45 UTC - Greg: We managed to workaround the issue by setting brokerDeleteInactiveTopicsEnabled=false, but this is a temporary solution...
----
2020-03-04 07:50:47 UTC - Sijie Guo: each broker should have its own advertised address
----
2020-03-04 07:50:50 UTC - Sijie Guo: you can’t assign same address to different brokers.
----
2020-03-04 07:56:59 UTC - Kannan: by default, if we advertise broker ip its not accessible by its own pod with mTLS (<http://pod-ip:8080>)
----
2020-03-04 08:09:16 UTC - Greg: @Penghui Li: Hi, i just reproduced the issue with 2 brokers, i have a publisher and a consumer connected to the topic on broker 1 and i see that broker 2 deletes the topic :
```Broker 1 :
08:01:56.281 [pulsar-io-24-3] INFO  org.apache.pulsar.broker.service.ServerCnx - [/10.200.8.120:51372] Subscribing on topic <non-persistent://public/default/cluster> / Monitor-istra-monitor-667c9fbbcc-9cbns
08:01:56.309 [pulsar-io-24-3] WARN  org.apache.pulsar.broker.service.AbstractTopic - [<non-persistent://public/default/cluster>] Error getting policies java.util.concurrent.CompletableFuture cannot be cast to org.apache.pulsar.common.policies.data.Policies and publish throttling will be disabled
08:01:56.309 [pulsar-io-24-3] INFO  org.apache.pulsar.broker.service.AbstractTopic - Disabling publish throttling for <non-persistent://public/default/cluster>
08:01:56.314 [pulsar-io-24-3] INFO  org.apache.pulsar.broker.service.BrokerService - Created topic NonPersistentTopic{topic=<non-persistent://public/default/cluster>}
08:01:57.199 [Thread-9] INFO  org.apache.pulsar.broker.service.nonpersistent.NonPersistentTopic - [<non-persistent://public/default/cluster][Monitor-istra-monitor-667c9fbbcc-9cbns]> Created new subscription for 0
08:01:57.199 [Thread-9] INFO  org.apache.pulsar.broker.service.ServerCnx - [/10.200.8.120:51372] Created subscription on topic <non-persistent://public/default/cluster> / Monitor-istra-monitor-667c9fbbcc-9cbns
08:01:57.888 [pulsar-io-24-3] INFO  org.apache.pulsar.broker.service.ServerCnx - [/10.200.8.120:51372][<non-persistent://public/default/cluster>] Creating producer. producerId=0
08:01:58.025 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.service.ServerCnx - [/10.200.8.120:51372] Created new producer: Producer{topic=NonPersistentTopic{topic=<non-persistent://public/default/cluster>}, client=/10.200.8.120:51372, producerName=Monitor-istra-monitor-667c9fbbcc-9cbns, producerId=0}
08:03:10.130 [pulsar-ordered-OrderedExecutor-5-0-EventThread] INFO  org.apache.pulsar.zookeeper.ZooKeeperCache - [State:CONNECTED Timeout:30000 sessionid:0x100d2a69359002d local:/10.200.3.150:50400 remoteserver:zookeeper/10.200.7.69:2181 lastZxid:4294987042 xid:600 sent:600 recv:631 queuedpkts:0 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeDataChanged path:/schemas/public/default/cluster

Broker 2 :
08:01:55.954 [ForkJoinPool.commonPool-worker-0] WARN  org.apache.pulsar.broker.service.AbstractTopic - [<non-persistent://public/default/cluster>] Error getting policies java.util.concurrent.CompletableFuture cannot be cast to org.apache.pulsar.common.policies.data.Policies and publish throttling will be disabled
08:01:55.955 [ForkJoinPool.commonPool-worker-0] INFO  org.apache.pulsar.broker.service.AbstractTopic - Disabling publish throttling for <non-persistent://public/default/cluster>
08:01:55.959 [ForkJoinPool.commonPool-worker-0] INFO  org.apache.pulsar.broker.service.BrokerService - Created topic NonPersistentTopic{topic=<non-persistent://public/default/cluster>}
08:03:10.129 [pulsar-ordered-OrderedExecutor-4-0-EventThread] INFO  org.apache.pulsar.zookeeper.ZooKeeperCache - [State:CONNECTED Timeout:30000 sessionid:0x200eccd91c10031 local:/10.200.7.133:36742 remoteserver:zookeeper/10.200.8.87:2181 lastZxid:4294987046 xid:186 sent:186 recv:202 queuedpkts:0 pendingresp:0 queuedevents:1] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeDataChanged path:/schemas/public/default/cluster
08:03:10.132 [pulsar-io-24-3] INFO  org.apache.pulsar.broker.service.nonpersistent.NonPersistentTopic - [<non-persistent://public/default/cluster>] Topic deleted
08:03:10.133 [pulsar-io-24-3] INFO  org.apache.pulsar.broker.service.nonpersistent.NonPersistentTopic - [<non-persistent://public/default/cluster>] Topic deleted successfully due to inactivity```
----
2020-03-04 08:11:58 UTC - Greg: Is it normal that the topic is created in both brokers ?
----
2020-03-04 08:14:20 UTC - Penghui Li: Ok, let me take a look.
----
2020-03-04 08:19:52 UTC - Greg: thanks
----
2020-03-04 08:31:09 UTC - Vincent LE MAITRE: Hi, I would like to develop Pulsar functions with state. On my local standalone Pulsar installation it works fine (state is enabled by default). On a kubernetes Pulsar installation, deployed using Pulsar helm templates, the state feature seems to be disabled by default. And I am not able to enable this feature. Is there someone who succeed in activating function state on Kubernetes deployment ? Does the provided helm template allow to enable pulsar functions state ? Thanks
----
2020-03-04 08:42:02 UTC - Pavel Tishkevich: Zookeeper node has 12 CPU cores, 64GB of RAM, CPU idle > 75%
On each zookeeper node there are also deployed 1 broker and 1 bookie.

System load is not high: after adding more brokers it is never more than 5 (less than 50%).
There are almost no processes blocked by I/O.

Also we have disabled fsync in Zookeeper.

All this looks like problem is not related to hardware/disk speed.
----
2020-03-04 08:42:21 UTC - Pavel Tishkevich: 
----