You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2020/04/24 09:11:06 UTC

Slack digest for #general - 2020-04-24

2020-04-23 09:23:59 UTC - Sijie Guo: It seems that the maven asserts haven’t been synced to maven central.
----
2020-04-23 09:24:04 UTC - Sijie Guo: It will take a while.
----
2020-04-23 09:54:17 UTC - tuteng: <https://repository.apache.org/content/repositories/releases/org/apache/pulsar/>
----
2020-04-23 10:46:00 UTC - Fernando Miguélez: thx  @tuteng
----
2020-04-23 11:36:08 UTC - Grzegorz Wojnarowski: @Grzegorz Wojnarowski has joined the channel
----
2020-04-23 12:19:26 UTC - Grzegorz Wojnarowski: Hallo all. I have *Topic Compaction* problem in Pulsar 2.5.1
----
2020-04-23 12:20:05 UTC - Grzegorz Wojnarowski: I expected that sending empty message deletes message with given key. In my example there is only one message per key but there are empty messages.
Is it my fault or misunderstanding?
Below is code fragment and program output.
----
2020-04-23 12:20:33 UTC - Grzegorz Wojnarowski: `topic_name = 'public/default/compacted'`
    `for i in range(10):`
        `producer.send("msg {}".format(str(i)).encode(), partition_key=str(i).strip())`
    `for i in range(1, 10, 2):`
        `producer.send(b'', partition_key=str(i).strip())`
    `run_compation(topic_name)`
    `state = get_compaction_state(topic_name)`
    `while state['status'] == "RUNNING":`
        `time.sleep(1)`
        `state = get_compaction_state(topic_name)`
    `reader = pulsar_client.create_reader(topic_name, pulsar.MessageId.earliest, is_read_compacted=True)`
    `while reader.has_message_available():`
        `msg = reader.read_next()`
        `print(msg.message_id(), msg.partition_key(), msg.data().decode())`


    `(52,130,-1,-1) 0 msg 0`
    `(63,1,-1,-1) 2 msg 2`
    `(63,3,-1,-1) 4 msg 4`
    `(63,5,-1,-1) 6 msg 6`
    `(63,7,-1,-1) 8 msg 8`
    `(63,8,-1,-1) 9 msg 9`
    `(63,9,-1,-1) 1`
    `(63,10,-1,-1) 3`
    `(63,11,-1,-1) 5`
    `(63,12,-1,-1) 7`
    `(63,13,-1,-1) 9`
----
2020-04-23 13:32:47 UTC - Christophe Bornet: Hi everyone, from the docs on the binary protocol
&gt; Message payloads are passed in raw format rather than protobuf format for efficiency reasons.
Can someone give more details on this ? Where are the performance gains ?
+1 : Frank Kelly
----
2020-04-23 13:36:39 UTC - Huanli Meng: @Huanli Meng has joined the channel
----
2020-04-23 14:20:58 UTC - Manjunath Ghargi: Hi, I was looking for some reference examples code of how Mutual Auth can be enabled from client and the server-side? Can someone kindly help?
+1 : Frank Kelly
----
2020-04-23 14:55:04 UTC - Huanli Meng: @Huanli Meng set the channel topic: - Pulsar 2.5.1 released! <https://pulsar.apache.org/blog/2020/04/23/Apache-Pulsar-2-5-1/>
- 2020 Pulsar User Survey Report: <https://bit.ly/3d1KsGG>
- Pulsar Summit SF 2020: <https://pulsar-summit.org/>
+1 : matt_innerspace.io, Kirill Kosenko, rani, Rahul, Ben, Yuvaraj Loganathan, Sijie Guo
tada : Sijie Guo
heart : Sijie Guo
----
2020-04-23 14:59:47 UTC - Huanli Meng: Hi everyone, Pulsar 2.5.1 is released. For details, check out the release notes: <https://pulsar.apache.org/release-notes/#251-mdash-2020-04-20-a-id251a>
heart : Frank Kelly, Yuvaraj Loganathan, Sijie Guo
----
2020-04-23 15:06:20 UTC - rani: *[Function Worker Rest API Calls]* Rest API calls to the following endpoints return `Error 500 Internal Server Error`
```GET /admin/v3/functions/my_tenant_name/my_namespace_name/my_function_name (Listing functions deployed in given namespace)
GET /admin/v3/functions/connectors (Listing supported pulso IO connectors)```
*Pulsar Component Version*: `2.5.1-candidate-3`
*Architecture*: Function workers running on k8s separately from remaining components that are managed on EC2 via ASGs. Pulsar proxy was configured to be aware of the separate function worker cluster by configuring `functionWorkerWebServiceURL` appropriately.

My understanding is that these issues were fixed through this PR: <https://github.com/apache/pulsar/pull/6486>. Am I missing something here?
----
2020-04-23 15:52:12 UTC - Gabriel Volpe: @Gabriel Volpe has joined the channel
----
2020-04-23 15:52:19 UTC - Kai Levy: Thanks. It looks like the python client does not have that method available?
----
2020-04-23 15:53:41 UTC - Gabriel Volpe: Hi all! Quick question related to <https://pulsar.apache.org/api/client/org/apache/pulsar/client/api/SubscriptionInitialPosition.html>
----
2020-04-23 15:53:58 UTC - Gabriel Volpe: The `Earliest` option says "the earliest position which means the start consuming position will be the first message"
----
2020-04-23 15:54:42 UTC - Gabriel Volpe: Does that mean the first message ever, including those who have been acknowledged? Because that's the behavior I'm seeing
----
2020-04-23 15:55:04 UTC - Rahul: Yes
----
2020-04-23 15:55:23 UTC - Gabriel Volpe: Thanks for your quick answer!
----
2020-04-23 16:07:19 UTC - Sijie Guo: I think there are some issues related to compacting empty messages: <https://github.com/apache/pulsar/pull/6795>

It would come as part of 2.5..2. @Penghui Li can you double check?
----
2020-04-23 16:08:14 UTC - Sijie Guo: It avoids serialization and deserialization. Memory copy kills performance.
----
2020-04-23 16:08:55 UTC - Sijie Guo: <http://pulsar.apache.org/docs/en/security-tls-authentication/>
----
2020-04-23 16:09:04 UTC - Sijie Guo: This is the documentation for enabling m-auth.
----
2020-04-23 16:10:00 UTC - Mohit Manaskant: @Mohit Manaskant has joined the channel
----
2020-04-23 16:14:46 UTC - Sijie Guo: list functions in /v3 endpoint has a problem - <http://pulsar.apache.org/docs/en/security-tls-authentication/>

but the error code is 404. 500 seems to indicate a different error. did you see any errors in the proxy or in the broker?
----
2020-04-23 16:25:25 UTC - Mohit Manaskant: Hello,  I had the following use case which wish to try solving using Pulsar. We will be having `clicks` and `impressions` data flowing through two different topics in Pulsar. Data is in JSON, and both have `user` field in it and wish to join the data across two topics. With my limited knowledge of Pulsar and experience using Pulsar, I could not find a clear answer as to whether it can be achieved using :
1. Apache Pulsar SQL ==&gt; Can i write a join query which will produce the joined output to an output topic
2. Pulsar Functions ==&gt; Can i model this problem using it? In the common use cases of Functions like Filtering/Content based routing etc, we don't find any help article where an example of stateful processing of joining across two topics can be achieved.
Thanks in advance.
eyes : Konstantinos Papalias
----
2020-04-23 16:51:11 UTC - Christophe Bornet: You mean the ser/deser done by the protobuf lib ? I think bytes type are encoded as-is by protobuf. So it should be possible to access them with zero-copy, isn't it ?
----
2020-04-23 17:20:18 UTC - Sijie Guo: No it is not zero-copy in protobuf
----
2020-04-23 17:23:27 UTC - rani: these are the only errors i could find. Not sure how helpful they are
```function-worker-pod 13:08:35.499 [function-web-21-5] INFO  org .eclipse.jetty.server.RequestLog - 10.242.17.7 - - [23/Apr/202  0:13:08:35 +0000] "GET /admin/v3/functions/my_tenant/my_namesoace/exclamation HTTP/1.1" 500 388 "-" "PostmanRuntime/7.24.0" 6       
function-worker-pod 13:08:43.438 [function-web-21-5] ERROR org.glassfish.jersey.message.internal.WriterInterceptorExecutor -   MessageBodyWriter not found for media type=application/xml,```
----
2020-04-23 17:40:09 UTC - Sijie Guo: @rani: It seems that “GET /admin/v3/functions/my_tenant/my_namesoace/exclamation HTTP/1.1” 500 “.

Did you check the logs at function worker side?
----
2020-04-23 17:42:00 UTC - rani: hmm, these logs I shared are actually from the function worker pod itself :disappointed: Not sure where else to look
----
2020-04-23 17:43:58 UTC - rani: On a slightly separate note @Sijie Guo, i’m invoking the pulsar proxy REST interface with an admin token and and i’m receiving the following on `/admin/v2/brokers/health`
```Message: org.apache.pulsar.client.api.PulsarClientException$AuthorizationException: Authorization failed pulsar-role-anonymous on topic <persistent://pulsar/pulsar-staging/10.242.160.125:8080/healthcheck> with error Don't have permission to administrate resources on this tenant```
*proxy.conf:*
`anonymousUserRole`=pulsar-role-anonymous
`authenticationProviders`=org.apache.pulsar.broker.authentication.AuthenticationProviderToken
`brokerClientAuthenticationParameters`=token:aaaaaa _(JWT_ corresponding to _pulsar-role-proxy_)

*broker.conf*
`proxyRoles`=pulsar-role-proxy
`anonymousUserRole`=pulsar-role-anonymous
`authenticateOriginalAuthData`=false

1. Why am I being perceived as `pulsar-role-anonymous` even though I am using authenticating with a JWT corresponding to a specific role (in this case `pulsar-role-admin`)?
2. How can I get the the health check endpoint to work?

----
2020-04-23 18:58:50 UTC - Tolulope Awode: Hi guys
----
2020-04-23 18:59:30 UTC - Tolulope Awode: After a successful deployment of pulsar on kubernetes cluster, deploying the node pulsar-client has been failing. The Dockerfile composition failed as cpp needs to be installed and its not available either as npm or package nugget
----
2020-04-23 18:59:35 UTC - Tolulope Awode: Please help
----
2020-04-23 19:39:48 UTC - Ryan Brereton: Hi everyone-- I’m wanting to negatively acknowledge a message but in a Pulsar function. I’m wondering if `context.getCurrentRecord().fail`  (Pulsar Functions API) has the same functionality/result as `consumer.negativeAcknowledge(message)` (Client API) instead.

Basically does <http://pulsar.apache.org/api/pulsar-functions/2.4.2/org/apache/pulsar/functions/api/Record.html#fail--> do the same thing as <http://pulsar.apache.org/api/client/2.4.2/org/apache/pulsar/client/api/Consumer.html#negativeAcknowledge-org.apache.pulsar.client.api.Message-> ?

If so, what’s the redelivery delay when using .fail? I know the default when n-acking in my consumer is 1 min and can be set via .negativeAckRedeliveryDelay.
----
2020-04-23 20:29:45 UTC - Sijie Guo: &gt; these logs I shared are actually from the function worker pod itself
How many function worker pods are you running? Did you check all of the them?

&gt; Why am I being perceived as `pulsar-role-anonymous` even
It seems that you didn’t configure this properly. Hence the anonymous role is used.

```     * If authentication and authorization is enabled(and not sasl) and if the authRole is one of proxyRoles we want to enforce
     * - the originalPrincipal is given while connecting
     * - originalPrincipal is not blank
     * - originalPrincipal is not a proxy principal ```

----
2020-04-23 21:14:30 UTC - Vladimir Shchur: It's the earliest ever only for a new subscription, if consumer just reconnects, it will start from the last acknowledged
----
2020-04-23 21:18:07 UTC - Sijie Guo: It is easier to do in Pulsar SQL. It is available at this moment.

It is doable in Pulsar Functions as well. But you might have to write some fair amount of code to achieve that.
----
2020-04-23 21:20:05 UTC - Sijie Guo: record.fail() is basically a wrapper of `consumer.negativeAcknowledge()`
----
2020-04-23 21:20:42 UTC - Sijie Guo: So the delay is the consumer settings of negativeAckRedeliveryDelay
----
2020-04-23 22:01:10 UTC - Ryan Brereton: Thank you. Seemed like it, but wanted to make sure. Appreciate the help!
+1 : Sijie Guo
----
2020-04-23 22:04:52 UTC - Konstantinos Papalias: @Sijie Guo are there any examples on Pulsar SQL joins?
----
2020-04-23 22:28:02 UTC - Sijie Guo: <https://www.javahelps.com/2019/11/presto-sql-types-of-joins.html>
----
2020-04-23 22:28:20 UTC - Sijie Guo: You can find quite a lot of examples of Presto SQL joins.
----
2020-04-24 01:10:24 UTC - Penghui Li: @Grzegorz Wojnarowski The problem is fixed by <https://github.com/apache/pulsar/pull/6795> and I have add unit test for the case that you described. You can try it on the master branch.
----
2020-04-24 01:17:15 UTC - Ebere Abanonu: @Sijie Guo if I had my own Security token service server using IdentityServer4, how can I integrate that with Pulsar for Authentication and Authorization?
----
2020-04-24 01:59:25 UTC - Sijie Guo: Typically you need to implement a broker side oauth authentication provider that you can integrate with IdentityServer4.

For the client side, there are two approaches:

1. generate an access_token and use the standard token provider.
2. implement your own authentication provider that integrate with IdentityServer4.
For authorization, you can use the default zookeeper-based authorization provider. If you want use IdentityServer4 for authorization, you can implement your own one as well.
----
2020-04-24 02:07:41 UTC - Ebere Abanonu: Can a demo be given? or any tutorial on how to achieve this
----
2020-04-24 03:27:32 UTC - Mohit Manaskant: Hey @Sijie Guo, Does `Apache Pulsar SQL` behaves like `KSQL` , which allows a registered query to run forever (as KSQL internally turns into Kafka Streams application) and create a stream out of it or it behaves like a traditional SQL engine, wherein the query executes only once when fired and returns the data present, but it will not keep on updating itself automatically.
Thanks in advance.
----
2020-04-24 04:04:40 UTC - Addison Higham: anyone have tips on debugging bookkeeper heap and direct memory OOMs? have a bookkeeper with `-Xms512m -Xmx1g -XX:MaxDirectMemorySize=1g` and (using k8s) set with a 2.5Gi limit. `dbStorage_readAheadCacheMaxSizeMb="256",  dbStorage_writeCacheMaxSizeMb="256", journalMaxSizeMB="2048"` for other relevant memory settings, rest are default. I was getting heap errors, resized it up to a bit bigger on heap and then got a direct memory oom, finally worked once I went 2 gb heap and 2 gb direct mem
----
2020-04-24 04:18:54 UTC - magrain: @magrain has joined the channel
----
2020-04-24 05:06:14 UTC - Fayce: Another rookie question: The pulsar documentation says that the Pulsar schema registry is currently available only for the Java client. Does it mean that we cannot use schema (Avro or else) if we have producers/consumers in C++ and Python? i.e we have to serialize/de-serialize every message sent to/received from Pulsar???
----
2020-04-24 05:09:36 UTC - Sijie Guo: both c++/python support schema registry now
----
2020-04-24 05:09:45 UTC - Sijie Guo: cgo client also supports schema
----
2020-04-24 05:09:51 UTC - Fayce: Hi Sijeg, thanks for the feedback.

c++/python support schema in version 2.5.0?
----
2020-04-24 05:12:02 UTC - Sijie Guo: I think we added the support since 2.3 or 2.4.
----
2020-04-24 05:12:16 UTC - Sijie Guo: The documentation is probably not updated yet.
----
2020-04-24 05:12:49 UTC - Fayce: ah ok thanks, I was probably reading an old documentation then. Thanks a lot
----
2020-04-24 05:22:13 UTC - Fayce: I can see python example of how to create a schema with python but not with c++. Is there some examples in c++ on how to create a producer with a avro schema for instance?
----
2020-04-24 06:17:45 UTC - Grzegorz Wojnarowski: Thanks
----
2020-04-24 07:36:43 UTC - wuYin: hello guys, I’m a beginner with Pulsar
I want know, how pulsar store topic partition metadata, such as partition count?
I just find bundle owner in  /namespace/public/default/0x40000000_0x80000000
but can’t found topic partition count
----
2020-04-24 07:40:49 UTC - wuYin: sorry for this stupid question
I just create partitioned-topic, and find it in /admin/partitioned-topics/public/default/persistent/tp-1
+1 : Penghui Li
----
2020-04-24 07:40:56 UTC - Penghui Li: partitioned topic metadata is stored in the Apache Zookeeper.  You can try to use  `$ bin/pulsar-admin topics get-partitioned-topic-metadata`  to get the metadata.
----
2020-04-24 07:42:37 UTC - wuYin: thanks for reply
----
2020-04-24 07:50:23 UTC - Gabriel Volpe: Mmm this is not the behavior I'm seeing I think, I'll need to double-check
----
2020-04-24 08:10:00 UTC - Gabriel Volpe: Effectively, what I'm seeing is that when I restart my application (using `Earliest`) the consumer is closed (I call `unsubscribe` and `close`) and upon restarting it consumes the messages that have already been acknowledged. Is this the expected behavior?
----
2020-04-24 08:12:08 UTC - Rahul: @Gabriel Volpe Are you using same subscription while restarting your application
----
2020-04-24 08:12:45 UTC - Gabriel Volpe: Yes, the subscription is the same
----
2020-04-24 08:14:32 UTC - Ebere Abanonu: @Sijie Guo first attempt - <https://github.com/eaba/PulsarSTSOAuthProvider/blob/master/src/main/java/org/apache/pulsar/broker/authentication/AuthenticationProviderIdentityServer4.java>
----
2020-04-24 08:14:47 UTC - Rahul: In that scenario it shouldn't happen. Please verify that the subscription is still present for that topic before restating your application.
----
2020-04-24 08:15:25 UTC - Gabriel Volpe: You mean the subscription still being present in Pulsar?
----
2020-04-24 08:15:30 UTC - Ebere Abanonu: That seems to be the way to go to support more than IdentityServer4 as Token Authentication Provider
----
2020-04-24 08:16:29 UTC - Rahul: Yeah
----
2020-04-24 08:17:10 UTC - Gabriel Volpe: Let me check
----
2020-04-24 08:20:04 UTC - Gabriel Volpe: So yes, I'm looking in Pulsar Express and the subscription seems to still be around `<persistent://public/default/auth-subscription>`
----
2020-04-24 08:20:30 UTC - Rahul: ```pulsar-admin topics subscriptions &lt;topic&gt;```
----
2020-04-24 08:20:35 UTC - Gabriel Volpe: That means the `unsubscribe` command is not really working right?
----
2020-04-24 08:21:39 UTC - Gabriel Volpe: Can I run that command from within my Pulsar container?
----
2020-04-24 08:24:31 UTC - Gabriel Volpe: Okay got it
----
2020-04-24 08:24:46 UTC - Gabriel Volpe: ```$ pulsar-admin topics subscriptions <persistent://public/default/auth>
"<persistent://public/default/auth-subscription>"```
----
2020-04-24 08:24:55 UTC - Gabriel Volpe: It's still around as Pulsar Express shows
----
2020-04-24 08:32:34 UTC - Gabriel Volpe: Okay I think I found the issue. My application stops, unsubscribes and closes the consumer but there's another application still running which keeps this subscription active.
----
2020-04-24 08:32:38 UTC - Gabriel Volpe: Sorry for the trouble
----
2020-04-24 08:37:16 UTC - Christophe Bornet: Even with the modified Pulsar protoc plugin ?
----
2020-04-24 08:38:15 UTC - Christophe Bornet: I'm trying to see what is possible since with gRPC I can only use protobuf messages
----
2020-04-24 08:43:30 UTC - Sijie Guo: gRPC doesn’t support zero copy
----
2020-04-24 09:02:05 UTC - Fayce: Hello guys, is there anyone who has a sample c++ producer with Avro schema? I looked everywhere in the documentation and c++ pulsar client but can't find anything on how to create a schema from c++ client (only python). Unless I totally misunderstood the documentation and I have to use directly the Avro c++ library to create my schema? Thanks a lot.
----
2020-04-24 09:06:59 UTC - Ali Ahmed: @Fayce you can get started with  this
<https://github.com/apache/pulsar/blob/master/pulsar-client-cpp/tests/SchemaTest.cc>
----
2020-04-24 09:08:30 UTC - Fayce: Hi Ahmed, Thanks a lot. I will have a look at that. Cheers
----
2020-04-24 09:09:59 UTC - Fayce: Is it already available in c++ client for pulsar version 2.5.0?
----