You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2020/06/07 09:11:05 UTC

Slack digest for #general - 2020-06-07

2020-06-06 09:24:11 UTC - Liam Clarke: More fun, I'm debugging, and noticed this in the logs:

```09:21:46.710 [main] INFO  org.apache.bookkeeper.mledger.offload.jcloud.impl.BlobStoreManagedLedgerOffloader - Constructor offload driver: aws-s3, host: null, container: test, region: ap-southeast-2 ```
So the Jcloud offloader got the region okay - but the OffloadPolicies in BrokerService#getManagedLedgerConfig is still missing the necessary values:

```OffloadPolicies{managedLedgerOffloadDriver=aws-s3, managedLedgerOffloadMaxThreads=2, managedLedgerOffloadPrefetchRounds=1, managedLedgerOffloadThresholdInBytes=-1, managedLedgerOffloadDeletionLagInMillis=60000, s3ManagedLedgerOffloadRegion=null, s3ManagedLedgerOffloadBucket=null, s3ManagedLedgerOffloadServiceEndpoint=null, s3ManagedLedgerOffloadMaxBlockSizeInBytes=67108864, s3ManagedLedgerOffloadReadBufferSizeInBytes=1048576, s3ManagedLedgerOffloadRole=null, s3ManagedLedgerOffloadRoleSessionName=pulsar-s3-offload, gcsManagedLedgerOffloadRegion=null, gcsManagedLedgerOffloadBucket=null, gcsManagedLedgerOffloadMaxBlockSizeInBytes=67108864, gcsManagedLedgerOffloadReadBufferSizeInBytes=1048576, gcsManagedLedgerOffloadServiceAccountKeyFile=null, fileSystemProfilePath=null, fileSystemURI=null}```
----
2020-06-06 09:30:55 UTC - Ebere Abanonu: Hi, I have been able to look into this. PatternMultiTopicConsumer support auto discovery of new topics. You can configure that with ConsumerBuilder
----
2020-06-06 09:44:14 UTC - Liam Clarke: Okay, so using `pulsar-admin namespaces set-offload-policies --driver aws-s3 --region ap-southeast-2 --bucket test ... test-tenant/test-namespace`  to set an explicit offload policy on the namespace worked, so I guess my question is - is this because I was using `standalone.conf`  vs `broker.conf`? Or will I have to set a per-namespace offload policy for a production cluster also?
----
2020-06-06 10:16:58 UTC - Adriaan de Haan: Hi, I am trying to get the jdbc io connector working, but I keep getting the following:
```07:11:25.185 [main] INFO  org.apache.pulsar.functions.utils.io.ConnectorUtils - Searching for connectors in /home/adriaan/apache-pulsar-2.5.2/connectors
07:11:26.013 [main] INFO  org.apache.pulsar.functions.utils.io.ConnectorUtils - Found connector ConnectorDefinition(name=jdbc, description=Jdbc sink, sourceClass=null, sinkClass=org.apache.pulsar.io.jdbc.JdbcAutoSchemaSink) from /home/adriaan/apache-pulsar-2.5.2/connectors/pulsar-io-jdbc-2.5.2.nar
Exception in thread "main" java.lang.NullPointerException
        at org.apache.pulsar.functions.LocalRunner.startThreadedMode(LocalRunner.java:421)
        at org.apache.pulsar.functions.LocalRunner.start(LocalRunner.java:319)
        at org.apache.pulsar.functions.LocalRunner.main(LocalRunner.java:152)```
NullPointerException is not very helpful in trying to debug the issue... any advice on how I can determine what is wrong?
----
2020-06-06 10:25:01 UTC - Liam Clarke: Hi Adrian, line 421 is

`instanceConfig.setMaxPendingAsyncRequests(functionConfig.getMaxPendingAsyncRequests());`

maxPendingAsyncRequests in InstanceConfig is an `int` while in FunctionConfig it's an `Integer` - if it was set to `null` in the function config, it will throw an NPE on unboxing to an `int`.
----
2020-06-06 10:27:33 UTC - Liam Clarke: In both *Config classes it defaults to 1000. Are you setting it explicitly to null?
----
2020-06-06 10:28:52 UTC - Adriaan de Haan: I don't set it at all
----
2020-06-06 10:42:27 UTC - Liam Clarke: Try setting it to 1000, can't hurt and might resolve the issue
----
2020-06-06 12:39:22 UTC - Adriaan de Haan: so the null pointer exception at that line would imply that functionConfig is null
----
2020-06-06 12:42:14 UTC - Aaron Batilo: @Aaron Batilo has joined the channel
----
2020-06-06 12:46:28 UTC - Aaron Batilo: :wave: Hi everyone. I'm Aaron. I came across Pulsar a few weeks ago and have been trying to push it on my organization because I think it solves a lot of our use cases.
+1 : Enrico Olivelli, Karthik Ramasamy
----
2020-06-06 12:46:46 UTC - Adriaan de Haan: Since this is a Sink it has a SinkConfig and not a FunctionConfig I believe... so it seems that mgiht be why it's failing
----
2020-06-06 12:56:14 UTC - Adriaan de Haan: Hi, can anyobdy please confirm that sinks still work in v2.5.x?
----
2020-06-06 12:57:37 UTC - Adriaan de Haan: It seems that this commit:
<https://github.com/apache/pulsar/commit/55d5430701d41d92ce290d838e332eb9d9154b9e>
might have introduced a bug that will result in a null pointer exception - since functionConfig is null for a sink, but it is using functionConfig without checking for null
----
2020-06-06 13:01:17 UTC - alex kurtser: Hi @Sijie Guo

We set up it as separated statefullset (seprated from brokers) with "bin/pulsar proxy"  as entrypoint command for the container.
We also provide function_worker,yaml config file with parameters like this:
processContainerFactory:
  extraFunctionDependenciesDir: null
  javaInstanceJarLocation: null
  logDirectory: null
  pythonInstanceLocation: null
----
2020-06-06 13:03:36 UTC - alex kurtser: Of course, we have other paramters like pulsar endpoints and so on. Important to note that the functions actually are working good. The only one issue is with metrics. As i mentioned earlier, each function instance inside the container creates random port exposing its metrics. So we can not know what the port it will expose and can't define it on the annotations on in the prometheus config file.
----
2020-06-06 14:45:12 UTC - YounggyuChun: @YounggyuChun has joined the channel
----
2020-06-06 15:55:12 UTC - Amit Pal: @Amit Pal has joined the channel
----
2020-06-06 16:40:47 UTC - Asaf Mesika: @Asaf Mesika has joined the channel
----
2020-06-06 16:48:02 UTC - Asaf Mesika: I’ve got a couple of questions on that:
1. I searched a lot in the documentation and in the internet to answer this exact question. Is it documented some where and I missed it?
2. The default behaviour means I will potentially acknowledge to the broker, the broker acks back and I can still lose that information (meaning, the message in that 1sec will be redelivered)? From you information, is that different from Kafka design (out of curiosity comparing the two)?
----
2020-06-06 17:11:28 UTC - Asaf Mesika: I’m reading a lot about Apache Pulsar to understand how it works and understand it failures. One failure I couldn’t understand yet. If I experience a complete data loss (all machines terminated, or some corruption ruined data dir of all ZK nodes) - other than back up ZK disks and recover by restoring, is there any other way to recover or without ZK data, the pulsar+bookkeeper is essentially useless?
----
2020-06-06 17:19:07 UTC - Matteo Merli: Yes. ZK stores the metadata, so the pointers to the data. If that is missing, the data is not accessible.

Though....

ZK availability is determined by the number of nodes. Eg: in normal production environment one would run 5 ZK nodes.

On a bare-metal deployment, that would mean that 5 disks would have to physically break down in a very short amount of time to lose this data.
It would be **very** unlikely to happen. Sure, there's still a chance, but in any storage system the durability guarantee cannot ever be 100%, just approximate to that through more redundancy.

On a cloud deployment, the local VM disks are ephemerals, so it's not a good idea to use them for ZK. Rather, you would use EBS volumes (or similars). At that point, the data on each EBS volume is already replicated 2 way and it can be remounted in a different VM.

Finally, it's certainly possible to take offline backups of ZK snapshot and txn-log. You can restore ZK nodes through that.
+1 : Asaf Mesika
----
2020-06-06 22:48:17 UTC - Nicolas Ha: the json seems fixed, but I still can’t get to the page

<http://pulsar.apache.org/functions-rest-api/?version=2.5.1>
----