You are viewing a plain text version of this content. The canonical link for it is here.
Posted to by Apache Pulsar Slack <> on 2020/08/26 09:11:05 UTC

Slack digest for #general - 2020-08-26

2020-08-25 09:32:28 UTC - Evion Cane: Hi guys. I am trying to configure authentication and authorization in Pulsar to connect PulsarAdmin (Java API) to it.
I have modified the broker.conf file with the following properties -&gt;
The problem is when I test authentication and authorization they do not seem to work. When I try to access tenants from pulsar without authorization as follow:
```PulsarAdmin admin = PulsarAdmin.builder()
List&lt;String&gt; tenants = admin.tenants().getTenants();```
From what I understand from the documentation you need to authenticate to get the tentants but I still can access the default tenants without authentication.
2020-08-25 11:38:43 UTC - Ravi Shah: Hi Guys-

We are facing issue similar to "Consumer stop receive messages from broker(<>)".

We are using logstash and input plugin(<>).
2020-08-25 11:39:05 UTC - Ravi Shah: Any help would be appreciated.
2020-08-25 14:43:40 UTC - Joshua Decosta: I don’t have that enabled but it seems like it is
2020-08-25 14:43:50 UTC - Joshua Decosta: The functions are being created as pods 
2020-08-25 14:52:09 UTC - Joshua Decosta: Is this running in standalone or is this deployed in a cloud cluster? 
2020-08-25 14:53:05 UTC - Joshua Decosta: There seems to be a few missing configurations regardless. You will need the brokerClient configs as well. It also doesn’t look like you’re passing a token from your admin calls which would lead to an error as well
2020-08-25 15:00:24 UTC - Evion Cane: I am running it in standalone. I was testing authentication so I did not put a token o purpose to wait for an error but it did not throw an error. If I am using java admin API do I need to configure brokerClient(client.config) also?
2020-08-25 15:06:23 UTC - Joshua Decosta: You need to configure all of those aspects in the standalone conf 
2020-08-25 15:06:49 UTC - Joshua Decosta: And yes, you need the brokerClient configs as well since those communicate and authenticate 
2020-08-25 15:26:43 UTC - Chris DiGiovanni: If you have set a retention of `-1` and you are offloading to s3.  Is there any way that would prevent these messages from getting deleted?  Want to make sure that backlog quotas and or TTLs can't prevent Pulsar from deleting ledgers from a bookie even when retention is infinite and your are offloading.  Currently my bookie disk space usage doesn't look right.  Pulsar is reporting 54G of ledgers on disk.  I have a 3x replica set and a combined raw storage of 1.3TB.  Though I'm roughly sitting at 1TB of used disk space...
2020-08-25 15:27:25 UTC - Chris DiGiovanni: This is running 2.5.2 of Pulsar
2020-08-25 15:46:59 UTC - Addison Higham: yeah if you have logs from the stateful sets. The role your functions are running as by default is the role you deployed your functions with. That role needs the "functions" permissions to download the package
2020-08-25 15:48:23 UTC - Joshua Decosta: Do these aspects rely on the AuthZ methods at all? 
2020-08-25 15:49:30 UTC - Joshua Decosta: Is there no way to configure some default functions permission to use with function creation 
2020-08-25 15:52:21 UTC - Addison Higham: you can check which ledgers have been offloaded with `pulsar-admin topics stats-internal &lt;topic&gt;` and that will have details on which ledgers have been offloaded

As far as bookkeeper disk usage, see this doc <>, specifically the "disk compaction" for details, but essentially, bookkeeper packs multiple ledgers into entry files. When you delete a ledger, you don't immediately get back space as it has to wait until the entry is above a certain ratio deleted to be compacted. You can tune those settings as needed
2020-08-25 15:55:24 UTC - Addison Higham: it does rely on AuthZ, see ``

And yes, you can implement another interface to have much more control over function auth, see `org.apache.pulsar.functions.auth.KubernetesFunctionAuthProvider`
2020-08-25 15:56:54 UTC - Joshua Decosta: Ok, but when creating the function, does it take my client auth data and use that for the function permissions? Considering I’m using tokens that will expire that would be a problem for the functions 
2020-08-25 15:56:58 UTC - Addison Higham: The way the default implementation works:
- it snags the token you used to deploy the function
- it stores that token in a kuberentes secret
- the statefulset mounts that secret and uses it to interact with the cluster

If you aren't using token auth, you need to implement your own `KubernetesFunctionAuthProvider`
+1 : Joshua Decosta
2020-08-25 15:59:25 UTC - Joshua Decosta: Is there a way to set the function creds within config files? 
2020-08-25 15:59:33 UTC - Joshua Decosta: Similar to the brokerClient, etc?
2020-08-25 15:59:57 UTC - Addison Higham: <> &lt;- this doc isn't in 2.6.1 yet (we should backport docs to release branches) but here is the doc on master
+1 : Joshua Decosta
2020-08-25 16:00:14 UTC - Joshua Decosta: Thanks 
2020-08-25 16:07:24 UTC - Addison Higham: that doc adds a lot more context, so hopefully it helps

I have thought about this problem a fair bit and with the current extension points I think you have 3 primary options:

1. Implement the KubernetesFunctionAuthProvider, it could easily go to talk another service to fetch a token and it can updateTokens, but it doesn't have a built-in way to refresh those automatically. However, you do have access to modify the stateful set in that API, so adding a sidecar pod that would refresh tokens and store them in a shared volume would be doable
2. You could also use the `org.apache.pulsar.functions.runtime.kubernetes.KubernetesManifestCustomizer` to do much the same and modify the statefulset spec to add in a sidecar or init container
3. Use the default implementation then have some other task outside pulsar that refresh tokens by manually updating the secrets. Because the secret is mounted as a volume, it does get updates. So you could simply have a k8s cron job that updates the secrets. The biggest downside here is you are still going to have the restriction of at least the initial token being the token that was used to upload the function
2020-08-25 16:18:25 UTC - Joshua Decosta: Is it not possible to configure the clientAuth* keys in the function-worker to allow for a default token setup used with the functions? 
2020-08-25 16:43:55 UTC - Ming: Neither does it support topic regex because of restriction of url formatting.
2020-08-25 17:04:36 UTC - Evion Cane: Ok, thank you. I will try that.
2020-08-25 17:05:08 UTC - Joshua Decosta: I’ve managed to get this working in standalone before so feel free to reach back here
2020-08-25 18:20:05 UTC - Tim Corbett: In trying to minimize unnecessary cross-rack/region traffic to minimize bandwidth costs, does the broker have any mechanism to prefer reads from its local bookie?  Does the broker even have a notion of which rack/region it is in?
2020-08-25 19:54:36 UTC - Addison Higham: @Tim Corbett that is possible using `bookkeeperClientReorderReadSequenceEnabled` and by enabling a rackAware/regionAware placement policy: see <> and the original issue for more details
+1 : Tim Corbett
2020-08-25 20:03:20 UTC - Aaron: What draft of Json schema does Pulsar use?
2020-08-25 20:57:25 UTC - Addison Higham: I think it might be possible if you overwrite the start command, but I am not completely sure if there would be a more fine grained way of doing it. See <> for where that command is built
2020-08-25 20:58:12 UTC - Joshua Decosta: Is the function-worker.conf only meant for standalone? 
2020-08-25 21:04:56 UTC - Addison Higham: no, it gets loaded by the functions-worker regardless of how the functions-worker is run.

However, it is important to remember that functions executed in their own pod *do not* run the function-worker. It is a bit poorly named, but is an outgrowth of when functions started first with modes that either were in VM or forked a VM.

With kubernetes, you might think of it as "function-coordinator" as all it does is basically manage the creation and deletion of stateful sets.
2020-08-25 21:05:51 UTC - Pushkar Sawant: Is anyone running pulsar-manager with JWT authentication? I am getting this error when i try to list tenants on a 2.6.0 cluster.
&lt;meta http-equiv="Content-Type" content="text/html;charset=utf-8"/&gt;
&lt;title&gt;Error 401 Authentication required&lt;/title&gt;
&lt;body&gt;&lt;h2&gt;HTTP ERROR 401&lt;/h2&gt;
&lt;p&gt;Problem accessing /admin/v2/tenants/public. Reason:
&lt;pre&gt;    Authentication required&lt;/pre&gt;&lt;/p&gt;&lt;hr&gt;&lt;a href="<>"&gt;Powered by Jetty:// 9.4.20.v20190813&lt;/a&gt;&lt;hr/&gt;

+1 : Ryan Nowacoski
2020-08-25 21:06:56 UTC - Joshua Decosta: If I don’t have “functionAsPods” enabled. How should the pods operate? 
2020-08-25 21:07:14 UTC - Joshua Decosta: Cause i don’t have it enabled but i keep seeing function pods showing up 
2020-08-25 21:07:30 UTC - Addison Higham: what is the way they are named?
2020-08-25 21:07:55 UTC - Joshua Decosta: I’ll have to respond next week to that cause i destroyed my cluster 
2020-08-25 21:08:12 UTC - Addison Higham: intentionally I hope :wink:
2020-08-25 21:08:26 UTC - Joshua Decosta: For sure 
2020-08-25 21:08:49 UTC - Joshua Decosta: Currently trying to determine best way to run those token scripts during a cluster config 
2020-08-25 21:09:38 UTC - Addison Higham: oh it looks like both `functions` and `functionsAsPods` both configure the functions worker to execute functions as pods
2020-08-25 21:09:50 UTC - Addison Higham: (I haven't worked with the chart a ton recently)
2020-08-25 21:10:31 UTC - Addison Higham: they should be created like `pf-&lt;functionname&gt;-&lt;parallel number&gt;`
2020-08-25 21:10:48 UTC - Joshua Decosta: That looks like the naming convention i did see
2020-08-25 21:11:09 UTC - Joshua Decosta: Is that for when they aren’t running as pods?
2020-08-25 21:12:08 UTC - Addison Higham: they are running as pods, I was just mistaken, `functionsAsPods: true` and `functions: true` configure it the exact same way now to both have them run as pods
2020-08-25 21:13:21 UTC - Joshua Decosta: Oh, is that on purpose? 
2020-08-25 21:15:54 UTC - Addison Higham: yes
2020-08-25 21:16:25 UTC - Addison Higham: we basically assume that if you want to run functions on k8s, you want to run as pods, otherwise it runs the functions as processes in the broker, which isn't ideal
2020-08-25 21:17:06 UTC - Joshua Decosta: That makes sense although the custom class is tossing a wrench into my function stuff 
2020-08-25 21:17:24 UTC - Joshua Decosta: Every time i think I’ve finished my classes i then need to implement another class 
2020-08-25 21:18:08 UTC - Tim Corbett: In trying to validate some performance characteristics of the .Net pulsar client, we're comparing against the pulsar-perf tool.  However, we're seeing results that don't match, and one large difference is the pulsar-perf tool does not seem to set/vary its routing keys, so all messages produced are going to only one consumer, even when setting the consumers to key-shared mode.  Is there a way to force it to set random keys, or some other way to validate performance with a key routed workload?
2020-08-25 21:29:31 UTC - Addison Higham: I assume you are talking about the the performance producer? it does not set any keys, but does use round-robin, meaning it will spread load evenly across partitioned topics.

With a consumer against partitioned topics, that is actually multiple consumers.

I would assume what you need to do is enable round-robin in your sending mode
2020-08-25 21:33:04 UTC - Tim Corbett: Specifically talking about the included pulsar-perf tool, and yes, in producer mode.  Round-robin for partitions is fine and all, but what we also need is random or incrementing routing keys so KeyShared subscriptions can route to all available consumers?
2020-08-25 21:33:32 UTC - Tim Corbett: For any given partition, if we have say 200 consumers connected, currently all messages are going to only one of them.
2020-08-25 21:36:47 UTC - Addison Higham: pulsar-perf does not currently have a way to set keys on messages, but I am sure it would be a useful feature!

If you wanted to open an issue for it that would be wonderful. If you are interested in contributing it, it should be fairly straight forward, we already just generate a payload, it would make sense to have an argument to just hash the payload to set a key
2020-08-25 21:37:37 UTC - Tim Corbett: Alright, just wanted to make sure it wasn't something I was missing.  Thanks!
2020-08-26 06:14:45 UTC - Guillaume Audic: Hi, we are faced with an issue with event processing.
We have a batch job which produces a lot of messages (thousand of messages)  to a topic and a scaled microservice which consume.
To trigger an another event we have to know if all of the messages of the batch job have been consumed.

I found this approach with Kafka:  <> in the subpart 6.

I'm looking for something similar with Pulsar, but I don't know if it exists

We are thinking to use a KV store to store the state of the job (id of job, number of messages produces, number of messages consumes, status)

When producer start to produce events, he puts the number of produced messages  in the kv store
The consumer increments number of consumed message in the kv store

If we have a number of producted messages equal to consumed messages then we set the status in the KV store to complete.

If there is no native approach with pulsar, can you give me some advices ?
2020-08-26 07:04:28 UTC - Julius S: A (possible?) generic solution: Instead of storing external state, put the state incrementally into each message. If your producer knows enough about the batch it is working on then the producer could place some metadata into each message about job_id and the number of the message within the job. Eg. (job_id xyz, msg 3/10), (job_id  xyz, msg 4/10)...  Your consumer can publish to another topic to update its status on how the job is going. Once consumer hits msg with 10/10 it can act accordingly. 

Re: your q about native solution the Pulsar functions framework and the state store it offers via Pulsar’s own bookkeeper is definitely something to look at if you really need to store state. 

Other ideas: Pulsar also has message chunking so depending on the size of the batch you might actually be able to put the entire job into a single message. 

Also, Pulsar’s individual message ack (instead of just high water mark like Kafka) will help you with a cleaner solution to a queuing task like this. 
2020-08-26 07:35:32 UTC - Olivier Chicha: @Addison Higham No, sorry I am kind of overloaded. Now that we have identified the issue we had we have implemented a workaround and I had to move on something else.