You are viewing a plain text version of this content. The canonical link for it is here.
Posted to by Apache Pulsar Slack <> on 2019/04/16 09:11:04 UTC

Slack digest for #general - 2019-04-16

2019-04-15 09:27:17 UTC - chlee: @chlee has joined the channel
2019-04-15 12:25:37 UTC - Alexandre DUVAL: Hi, I'm not sure to understand this error on `./bin/pulsar broker` for v2.3.1, ```2019-04-15 12:22:11,047 [sun.misc.Launcher$AppClassLoader@18b4aac2] error Uncaught exception in thread main: Cannot deserialize instance of `java.util.HashSet` out of VALUE_STRING token
 at [Source: (File); line: 113, column: 26] (through reference chain: org.apache.pulsar.functions.worker.WorkerConfig["authenticationProviders"])
2019-04-15 12:36:38 UTC - Alexandre DUVAL: I currently have `authenticationProviders:` in functions_worker.yml
2019-04-15 13:29:50 UTC - Martin Ashby: Hi all. Does pulsar keep the file system in a consistent state?

I ask because I'd like to take cold backups. There's no documentation for this, so I was hoping to use zfs or btrfs to take filesystem level backups. However, it's only worth doing this if it will actually work..
2019-04-15 13:30:14 UTC - Martin Ashby: I can't use geo-replication at the moment unfortunately (only 1 site, can't have a second site, can't use public cloud at the moment)
2019-04-15 13:46:22 UTC - Martin Ashby: I can ask the same question in the apache-bookkeeper slack if it's more relevant there.
2019-04-15 15:25:31 UTC - chris: @Alexandre DUVAL yes that’s not a great error messages and the docs are lacking on this. This is how you can configure the worker for authentication providers (the authenticationProviders key takes a list of class names)
2019-04-15 15:28:02 UTC - Alexandre DUVAL: Ahah, thanks !
2019-04-15 16:30:24 UTC - Fredrick P Eisele: I see pulsar-dashboard displays various metrics about tenants/namespaces/topics/clusters. Mine all report zeros, what do I need to do to enable the generation and presentation of these statistics?
2019-04-15 16:36:48 UTC - Adam P.: @Adam P. has joined the channel
2019-04-15 17:03:03 UTC - David Kjerrumgaard: @Fredrick P Eisele No additional configuration is required to get the statistics AFAIK.
2019-04-15 17:20:15 UTC - Kevin Brown: Hi All, quick question. Does the Cassandra Sink connector have the ability to pass credentials to Cassandra? I do not see them listed in the link below. Is there a workaround? <>
2019-04-15 18:03:11 UTC - Alexandre DUVAL: There is a way to clean bookkeepers? I used multiple times initialize cluster at start of pulsar usage, and now I have stuck ledgers.
2019-04-15 18:09:41 UTC - David Kjerrumgaard: @Alexandre DUVAL You can try `bin/bookkeeper shell metaformat` to reformat at the BK level.
2019-04-15 18:09:55 UTC - David Kjerrumgaard: Not sure what you mean by "stuck ledgers"
2019-04-15 18:12:29 UTC - Alexandre DUVAL: I read earlier in this chat that re initiliza cluster metadata, remove the actual references of ledgers and data is "lost", isn't it ?
2019-04-15 18:12:49 UTC - Alexandre DUVAL: It was test phase, I want to clean up these ledgers.
2019-04-15 18:18:16 UTC - Matteo Merli: <!here> Pulsar 2.3.1 is officially released! <>
tada : Karthik Ramasamy, Jerry Peng, Yuvaraj Loganathan, Alexandre DUVAL, Devin G. Bost, Byron, Sébastien de Melo
bananadance : Yuvaraj Loganathan, Alexandre DUVAL
100 : Yuvaraj Loganathan, Alexandre DUVAL
party-parrot : Yuvaraj Loganathan, Alexandre DUVAL, Kevin Brown
2019-04-15 18:18:36 UTC - Matteo Merli: @Matteo Merli set the channel topic: Pulsar graduated as a Top-Level project :tada: - Pulsar release 2.3.1 - <>
clap : David Kjerrumgaard, Guy Feldman, Jerry Peng, Ali Ahmed, Alexandre DUVAL
2019-04-15 18:23:13 UTC - David Kjerrumgaard: @Alexandre DUVAL That is correct, if you re-initialize Pulsar's metadata, then all the information regarding which ledger the messages are stored on will be lost. However, the raw data stored on the ledgers, and the associated metadata in BookKeeper will still be in BookKeepers metadata store.
2019-04-15 18:24:41 UTC - David Kjerrumgaard: @Alexandre DUVAL Are you running a standalone cluster or a multi-node cluster?
2019-04-15 18:25:24 UTC - Alexandre DUVAL: Multi-node cluster
2019-04-15 18:27:52 UTC - David Kjerrumgaard: @Alexandre DUVAL If your goal is to completely wipe the "old" data, then I suggest you run the pulsar cluster initialization (which you have done), the BK metadata format (as shown above), AND format the local filesystem data on a bookie using the bookieformat command on each bookie; `bin/bookkeeper shell bookieformat`
2019-04-15 18:28:32 UTC - David Kjerrumgaard: the last command removes the existing ledgers from local storage on the bookies.
2019-04-15 18:42:06 UTC - Alexandre DUVAL: Ok, thanks David.
2019-04-15 19:20:47 UTC - Sree Vaddi: It's tonight, Team.
2019-04-15 21:06:39 UTC - roman: @roman has joined the channel
2019-04-15 22:46:05 UTC - Maxime Martineau: @Maxime Martineau has joined the channel
2019-04-15 22:48:49 UTC - Maxime Martineau: Hi all, we have decided to use Apache Pulsar to do our processing pipeline and I have some questions and wondering if this was the right place to ask them ?
2019-04-15 22:58:06 UTC - David Kjerrumgaard: @Maxime Martineau Yes it is!
2019-04-15 23:00:35 UTC - Maxime Martineau: Alright, I’m using Pulsar 2.2.0, through Docker, setuped on Minikube locally.

We use a custom build image of the pulsar-all which have python 3.7 installed (because we wanted to have python 3.7 for the functions)

All is working great so far.

But for enhanced debugging, I would like to enable the Websocket service on the broker, but I have no idea on how to do that properly and easily.
Since they are in a volume in the docker image, I cannot change that configuration at build.
2019-04-15 23:01:58 UTC - Maxime Martineau: By the way, I looked at the documentation page which state to modify the broker.conf
2019-04-15 23:02:42 UTC - David Kjerrumgaard: @Maxime Martineau One option would be to use the --mount or -v option that comes with Docker, which would allow you to map a local directory on your machine to the drive inside the image that contains the configuration file you wish to edit
2019-04-15 23:03:11 UTC - Maxime Martineau: Thank you very much for taking the time to help me with this, I very much appreciate this.
2019-04-15 23:03:27 UTC - David Kjerrumgaard: No problem at all.
2019-04-15 23:04:22 UTC - Maxime Martineau: I did the setup in minikube using HELM, I know i could user a persistant volume(+claim) to mount those configs files, but I would have to modify the HELM chart i guess, and I’m not super familiar with this
2019-04-15 23:06:02 UTC - David Kjerrumgaard: I am not a Helm expert either, but I do believe that is the best path forward.  The other option is to create your own docker image based off of the one you are using now, and then modify the Dockerfile to overwrite the broker.conf file with one that has the changes you want.
2019-04-15 23:06:18 UTC - David Kjerrumgaard: But that is a little more messy to maintain.
2019-04-15 23:08:42 UTC - Maxime Martineau: yeah, I was actually trying to do the latest, but since they are in a volume declared in the apachepulsar/pulsar:2.2.0, I cannot modify them passed the volume declaration.

I tried to recreated the entire dockerfile (merging parents) but I am missing very important parts like the pulsar tarbal, connectors, and offloaders
2019-04-15 23:09:13 UTC - Maxime Martineau: and I found nowehere in the doc where to get those. If i do though, I could pretty surely get something working quickly
2019-04-15 23:11:45 UTC - David Kjerrumgaard: Does your image have `FROM apachepulsar/pulsar-all` ?
2019-04-15 23:12:00 UTC - Maxime Martineau: (that said, i actually don’t need to offloader for now, nor the connectors, at this point)
2019-04-15 23:12:43 UTC - Maxime Martineau: at first it did, up until i found about the volume declaration preventing me from modifying the configuration files in the build
2019-04-15 23:12:47 UTC - David Kjerrumgaard: <>
2019-04-15 23:12:58 UTC - David Kjerrumgaard: has the links to all the Pulsar tarballs
2019-04-15 23:13:36 UTC - Maxime Martineau: hoo nice
2019-04-15 23:13:38 UTC - David Kjerrumgaard: including the offloaders, and the connectors
2019-04-15 23:15:02 UTC - Chris Bartholomew: Jumping in here to hopefully be helpful on running Websocket in helm. The pulsar helm chart defines the broker.conf settings in a configmap. You can pass variables to the config map in this section of values.yaml: ```  ## Broker configmap
  ## templates/broker-configmap.yaml
    PULSAR_MEM: "\"-Xms15g -Xmx15g -XX:MaxDirectMemorySize=15g -Dio.netty.leakDetectionLevel=disabled -Dio.netty.recycler.linkCapacity=1024 -XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions -XX:+AggressiveOpts -XX:+DoEscapeAnalysis -XX:ParallelGCThreads=32 -XX:ConcGCThreads=32 -XX:G1NewSizePercent=50 -XX:+DisableExplicitGC -XX:-ResizePLAB -XX:+ExitOnOutOfMemoryError -XX:+PerfDisableSharedMem\""
    PULSAR_GC: "\"-XX:+UseG1GC -XX:MaxGCPauseMillis=10\""
    managedLedgerDefaultEnsembleSize: "3"
    managedLedgerDefaultWriteQuorum: "3"
    managedLedgerDefaultAckQuorum: "2"
    deduplicationEnabled: "false" 
    exposeTopicLevelMetricsInPrometheus: "true"
``` You should be able to enable Websocket on the broker by adding this to the values.yaml: ```
webSocketServiceEnabled: true ``` then do a helm upgrade.
+1 : David Kjerrumgaard
2019-04-15 23:15:34 UTC - Maxime Martineau: humm nice ! let me try this
2019-04-15 23:16:15 UTC - Chris Bartholomew: You need to put quotes around "true". Slack seems to have stripped that out from my reply.
2019-04-15 23:27:38 UTC - Maxime Martineau: ok, so I tried updating the values-mini.yaml, adding `webSocketServiceEnabled: "true"` to the section of the Broker config map
then deleted my deployment for the broker(only)
then ran this  : `helm upgrade --values pulsar/values-mini.yaml piquant-penguin  ./pulsar`
(which recreated the broker deployment)
and then tried to connect through the websocket. still no success.

I went into the pod for one of the broker and the config file ‘broker.conf’ is still with the `webSocketServiceEnabled=false` :disappointed:
2019-04-15 23:29:33 UTC - Maxime Martineau: 
2019-04-15 23:29:39 UTC - Maxime Martineau: just to make sure I have put it at the right place
+1 : Chris Bartholomew
2019-04-15 23:30:16 UTC - Chris Bartholomew: Looks good to me
2019-04-16 00:04:16 UTC - Maxime Martineau: sadly it still doesn’t work :disappointed:
2019-04-16 00:07:44 UTC - Chris Bartholomew: You can check if the setting was applied with this command: ```kubectl exec &lt;broker_pod&gt; grep webSocket /pulsar/conf/broker.conf ```
2019-04-16 00:09:18 UTC - Maxime Martineau: ```
perfectmight : /work/github/data-processing/dockerfiles/pulsar-all $ kubectl exec piquant-penguin-pulsar-broker-7cd57496f9-zfq46 grep webSocket /pulsar/conf/broker.conf --namespace pulsar
2019-04-16 00:16:36 UTC - Chris Bartholomew: Hmmm...I run the websocket proxy as a separate pod, so I don't use this exact config, but I use this same method to update my broker.conf all the time. Did you do a helm upgrade, or did you delete and reinstall the helm chart?
2019-04-16 00:17:23 UTC - Maxime Martineau: i did a helm upgrade
2019-04-16 00:17:34 UTC - Maxime Martineau: using this command :
`helm upgrade --values pulsar/values-mini.yaml piquant-penguin  ./pulsar`
2019-04-16 00:17:59 UTC - Maxime Martineau: i could also try running it separated, if that’s easer
2019-04-16 00:18:50 UTC - Maxime Martineau: I only found what was in the doc about it, so not much on how to configure helm to get the websocket service running
2019-04-16 00:23:31 UTC - Nick Rivera: @Nick Rivera has joined the channel
2019-04-16 00:23:32 UTC - Chris Bartholomew: Try deleting your broker pod to force it to rebuild your config from the configmap. I think that's the issue. ```kubectl delete pod piquant-penguin-pulsar-broker-7cd57496f9-zfq46``` The configmap sets environment variables that are read at start up and applied to the broker.conf. I the pod doesn't restart, then broker.conf doesn't get updated.
2019-04-16 00:36:29 UTC - Devin G. Bost: I'm attempting to create components (e.g. Functions, Sinks, and Sources) via the Java Admin API, and I'm getting:

org.apache.pulsar.client.admin.PulsarAdminException: osp/campaigns/campaign-kafka-source (No such file or directory)
	at org.apache.pulsar.client.admin.internal.BaseResource.getApiException(
	at org.apache.pulsar.client.admin.internal.SourceImpl.createSource(
. . . 

Caused by: osp/obfuscatedNamespace/my-kafka-source (No such file or directory)

I'm guessing that I'm calling the create function incorrectly.
I tried passing just the component name, as well as the fully qualified name (tenant/namespace/name), as the second parameter; and, in both cases, I got similar exceptions.
Here's my code:
2019-04-16 00:36:36 UTC - Devin G. Bost: ```

import lombok.*;
import lombok.experimental.Accessors;
import org.apache.pulsar.client.admin.PulsarAdmin;
import org.apache.pulsar.client.admin.PulsarAdminException;
import org.apache.pulsar.common.functions.FunctionConfig;

import java.util.List;

@Accessors(chain = true)
public @Data
class Function extends Component

    public FunctionConfig getFunctionConfig(){
        FunctionConfig functionConfig = new FunctionConfig(this.getTenant(), this.getNamespace(),
                this.getName(), this.getClassName(), this.getInputs(), null,
                null, null, null, this.getOutput(), null, null, this.getLogTopic(),
                null, null, this.getUserConfig(), null, null, null, null, null,
                null, null, null, this.getArtifactFileName(), null, null, this.getArtifactFileName(),
                null, null);
        return  functionConfig;

    public void upsert(PulsarAdmin adminClient){
        System.out.println("Calling upsert on Function.");
        try {
            List&lt;String&gt; sources = adminClient.source().listSources(this.getTenant(), this.getNamespace());
            String fullyQualifiedName = this.getTenant() + "/" + this.getNamespace() + "/" + this.getName();
            if (sources.contains(this.getName())){
                adminClient.functions().updateFunction(this.getFunctionConfig(), fullyQualifiedName);
            else {
                adminClient.functions().createFunction(this.getFunctionConfig(), fullyQualifiedName);

        } catch (PulsarAdminException e) {
2019-04-16 00:36:40 UTC - Devin G. Bost: Any ideas?
2019-04-16 02:15:48 UTC - Jerry Peng: @Devin G. Bost the path the the archive needs to be an absolute path
2019-04-16 02:18:09 UTC - Jerry Peng: @Devin G. Bost actually the error
``` osp/obfuscatedNamespace/my-kafka-source
seems like you are passing the wrong argument to the archive path.  “osp/obfuscatedNamespace/my-kafka-source” is the fully qualified name for your source right?
2019-04-16 05:20:12 UTC - Jacob O'Farrell: Hi all - loving our experience with Pulsar so far!

We've got Pulsar deployed on top of EKS, and I'm in a position whereby I need to reconfigure the disks (provision more IOPS)  utilised by Bookkeeper for its Journals, is there a recommended way to go about this?
2019-04-16 05:48:42 UTC - Sijie Guo: @Jacob O'Farrell

a common guideline for this process: 1) drop in new bookies with new disk configurations; 2) turn old bookies to readonly; 3) decommission old bookies. but it might be tricky to do it within one k8s job. because a k8s job doesn’t allow to different configurations exist.

so there is one approach :

1) create a new bookkeeper job with the new disk configuration.
2) you can turn the old bookies to readonly and wait the data to be consumed or expired.
3) decommission the old bookies one by one by running ` bookie shell decommission`.

2) is optional. you can decommission bookies immediately when 1) is done. although if you already have accumulate data there, the decommission can take a while since you will have to copy data.