You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2020/10/22 09:11:06 UTC

Slack digest for #general - 2020-10-22

2020-10-21 09:26:46 UTC - Johannes Wienke: Hi again. Currently trying to evaluate the Pulsar Debezium integration with MongoDB. For that purpose I have created a docker-compose file:
```version: '3'

services:

  mongo:
    image: mongo:4.2
    ports:
      - 27017:27017
    command: ["mongod", "--replSet", "rep0", "--bind_ip", "localhost,mongo"]
    depends_on:
      # Crude hack to ensure that everyone gets a working MongoDB relica set.
      # The init container tries to connect until it succeeds.
      - mongo-repl-init

  mongo-repl-init:
    image: mongo:4.2
    command: |
      sh -c "while ! mongo <mongodb://mongo> --eval 'rs.initiate({ _id: \"rep0\", version: 1, members: [{ _id: 0, host: \"mongo\" }] });'; do sleep 2; done"

  pulsar:
    image: apachepulsar/pulsar:2.6.1
    command:
      - bin/pulsar
      - standalone
    ports:
      - 6650:6650
      - 8080:8080

  cdc:
    image: apachepulsar/pulsar-all:2.6.1
    command:
      - bin/pulsar-admin
      - source
      - localrun
      - --source-config-file
      - /config.yml
    volumes:
      - ./config.yml:/config.yml:ro```
With the respective debezium config:
```tenant: "public"
namespace: "default"
name: "debezium-mongodb-source"
topicName: "debezium-mongodb-topic"
archive: "connectors/pulsar-io-debezium-mongodb-2.6.1.nar"
parallelism: 1

configs:

    mongodb.hosts: "rep0/mongo:27017"
    mongodb.name:
    database.whitelist: "cdcevents"
    database.history.pulsar.service.url: "<pulsar://pulsar:6650>"

    pulsar.service.url: "<pulsar://pulsar:6650>"```
This always tries to connect to localhost for the pulsar broker:
&gt; org.apache.pulsar.client.api.PulsarClientException: java.util.concurrent.CompletionException: io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) failed: Connection refused: localhost/127.0.0.1:6650
What am I missing here?
----
2020-10-21 10:13:26 UTC - Lari Hotari: I pushed the PR now and it's ready for review. Please review <https://github.com/apache/pulsar/pull/8326>
----
2020-10-21 11:36:29 UTC - Praveen Sannadi: @Addison Higham What are the ideal resource requests and limits for a moderate cluster setup? Is there a way to decide on these resource requests and limits for all pulsar components in the cluster setup? If you have any docs to figure this out please help us on this.
----
2020-10-21 13:06:26 UTC - Praveen Sannadi: Hi All, 
	
	Can anyone help me with the ideal resource requests and limits for a moderate cluster setup? Is there a way to decide on these resource requests and limits for all pulsar components in the cluster setup? If you have any docs to figure this out please help us on this. For example In the helm charts bookkeeper deployment we have 

resources:
  requests:
   memory: 512Mi
   cpu: 0.2

The above ones are from apache-pulsar-helm charts repo. I am just trying to set these for our clusters. So trying to explore on these. Any docs like on what params we need to set these values etc.,
----
2020-10-21 14:07:12 UTC - Alexandre DUVAL: `Schema.JSON(MyClass.class)` is working well for `java.util.Optional`? I got `{"present":true}` as field value, but not the Optional value?
----
2020-10-21 14:42:54 UTC - Alexandre DUVAL: Need on Schema.JSON
```
ObjectMapper mapper = new ObjectMapper();
mapper.registerModule(new Jdk8Module());```

----
2020-10-21 15:05:26 UTC - Lari Hotari: Pulsar 2.6.0+ has support for custom ObjectMapper instance. There's a Kotlin code example in <https://github.com/apache/pulsar/issues/6528#issuecomment-701410483> about how to use it. I haven't tried it myself, but there's a note that it doesn't work with the shaded client and you should use pulsar-client-original dependency instead of pulsar-client.
----
2020-10-21 15:14:24 UTC - Milos Matijasevic: Hello, i'm trying to copy messages from one pulsar cluster to another one, and it is pretty slow, here is code for that (<https://gist.github.com/milos-matijasevic/e29d00e279b5d1a540552864c7ce1321|pulsar-topic-replicator>), have you any idea why it is so slow ?
----
2020-10-21 15:18:13 UTC - Joshua Decosta: Functions is for sure turned off 
----
2020-10-21 15:41:53 UTC - Joshua Decosta: There def seems to be something off. My metrics are showing that i don’t have topics but i can see via pulsaradmin that i indeed do have some. 
----
2020-10-21 15:42:18 UTC - Joshua Decosta: Is there somewhere i could look in the spruce code to see how Prometheus is configured? 
----
2020-10-21 16:02:37 UTC - Joshua Decosta: Does disabling the metrics auth on proxy trickle down to broker? 
----
2020-10-21 16:17:14 UTC - Addison Higham: Are you going over an unstable network connection somewhere? a few things that could be causing it:

1) unstable network connection and just high amounts of packet loss
2) your pulsar broker is overloaded, see if you are getting into GC problems or have high CPU
3) issues with your zookeeper, if your zookeeper is under powered, you could see issues where when a new ledger is opened, it causes higher latency resulting in timeouts
----
2020-10-21 16:19:58 UTC - Addison Higham: @VanderChen

1. yes, you can use use the java admin lib, see <http://pulsar.apache.org/api/admin/2.6.0-SNAPSHOT/org/apache/pulsar/client/admin/Functions.html#uploadFunction-java.lang.String-java.lang.String->
2. Nothing is preventing you from using threads, it should be noted that in 2.6.x you also have the option of returning a completableFuture
----
2020-10-21 16:22:05 UTC - Addison Higham: Hi @Konrad Łyś the log4j.yaml file controls the log format, what I would suggest doing is either customizing your image by just replacing the built in log4j.yaml or you can use a combination of `PULSAR_LOG_CONF` environment variable and a kubernetes mount to point at a custom log4j.yaml file.

AFAIK, that some log4j conf should be used for both BK and brokers
----
2020-10-21 16:22:34 UTC - Konrad Łyś: ok
----
2020-10-21 16:24:12 UTC - Addison Higham: @Johannes Wienke the pulsar standalone takes a minute to start up, it is possible that the localrun is trying to run before pulsar finishes, you can use something like <https://github.com/vishnubob/wait-for-it> to ensure pulsar is ready before starting local run
----
2020-10-21 16:27:00 UTC - Addison Higham: You should be using async methods for both consume and receive, it will be *much* higher.

But one thing: You can use replication to do this, you don't need to have a global zookeeper if you manually set up the metadata on each cluster, see <https://gist.github.com/sijie/79364497eaa349bf58d9fb760561f930> for details on that
----
2020-10-21 16:28:12 UTC - Guillaume: Hi,
I deployed Pulsar on Kubernetes using official Helm but I do not have Proxy Metrics in Grafana. Everything seems to be working except for Proxy Metrics. Is there something to activate to have it working ?
Thank you.
----
2020-10-21 16:39:41 UTC - Pushkar Sawant: Thanks for your response.
1. Both producer and Pulsar cluster are on same cluster, separated by namespaces
2. The GC pauses are usually 1 sec or less. Max CPU utilization is at around 20%.
3. Zookeeper cluster memory utilization is around 50%, cpu utilization is 5% or less. Peak GC pauses are at around 500ms.
----
2020-10-21 16:41:59 UTC - Addison Higham: that is somewhat of a high GC pause for the broker, have you tried adding more memory?
----
2020-10-21 16:57:21 UTC - Milos Matijasevic: thank you! also i thought about calling ack in thread
```go consumer.Ack()```
btw, we are moving from old cluster v1.0 to newest one and we use helm chart so we can't do it manually and also we are refactoring namespaces, so that's why we use this approach
----
2020-10-21 17:21:04 UTC - Johannes Wienke: I know, the test case is not perfect there and I managed that manually. Still the option for the pulsar broker doesn't seem to have any effect. (127.0.0.1 != pulsar). I was able to work around this by providing the `--broker-service-url` command line flag. Is that really required despite the different declarations in the config file?
----
2020-10-21 17:22:20 UTC - Robert Morrow: @Robert Morrow has joined the channel
----
2020-10-21 17:35:44 UTC - Addison Higham: ah apologies, didn't notice that. You may need it because with localrun it may connect in 2 different places
----
2020-10-21 17:50:38 UTC - Joshua Decosta: Are you using authentication at all?
----
2020-10-21 17:51:00 UTC - Guillaume: Yes I am
----
2020-10-21 17:52:43 UTC - Guillaume: Is there something specific to configure when using authentication?
----
2020-10-21 17:58:35 UTC - Pushkar Sawant: Our memory usage is at 50%. The GC times are 10 minutes cumulative
----
2020-10-21 18:21:45 UTC - Joshua Decosta: @Addison Higham perhaps there are ghost topics being created in the metrics? I’m seeing the topic climb on each additional topic created 
----
2020-10-21 23:28:10 UTC - Addison Higham: apologies @Joshua Decosta was speaking at a conference so haven't had a chance to be online as much.

TBH, I am not as familiar with how we generate prometheus metrics. Have you tried looking directly at the prometheus metrics coming out the metrics endpoint and see if some of the metrics that are keyed per topic are set?

As far as missing metrics, one thing to note is that each broker reports metrics for a different set of topics. It only reports for the topics that it currently owns
----
2020-10-21 23:28:31 UTC - Addison Higham: This might be a case of where 30 minutes of time on a call would be helpful as well
----
2020-10-21 23:41:58 UTC - Devin G. Bost: I configured retention on a topic that's getting a heartbeat so I could test the retention setting. However, it appears from the graph that the topic's storage is clearing every 4 hours. I'd expect it to not drop to 0 until the retention period has passed. When I get the retention for the topic, it looks like this:

```bin/pulsar-admin namespaces get-retention public/default
{
  "retentionTimeInMinutes" : 43200,
  "retentionSizeInMB" : 0
}```
Are the messages not actually getting retained?
----
2020-10-22 00:30:55 UTC - Addison Higham: you need to set both
----
2020-10-22 00:31:06 UTC - Addison Higham: time and size
----
2020-10-22 07:25:14 UTC - Johannes Wienke: Ok, maybe that should be added to the docs. That was really unexpected with all the existing broker URI declarations already contained in the config files.
----
2020-10-22 07:56:26 UTC - Konrad Łyś: Here are my logs from pulsar broker
```07:51:45.452 [pulsar-web-42-1] INFO org.eclipse.jetty.server.RequestLog - 127.0.0.1 - - [22/Oct/2020:07:51:45 +0000] "GET /admin/v2/persistent/spain/_system/_signals_ingest/stats HTTP/1.1" 200 5654 "-" "curl/7.64.0" 3
07:51:49.348 [pulsar-load-manager-3-1] INFO org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl - Only 1 broker available: no load shedding will be performed
07:51:49.550 [pulsar-load-manager-3-1] INFO org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl - Writing local data to ZooKeeper because maximum change 14.762450912068523% exceeded threshold 10%; time since last report written is 60.0 seconds
07:51:49.576 [pulsar-ordered-OrderedExecutor-5-0-EventThread] INFO org.apache.pulsar.zookeeper.ZooKeeperDataCache - [State:CONNECTED Timeout:30000 sessionid:0x1001dc776af0015 local:/10.8.35.4:37244 remoteserver:spain-zookeeper-0.spain-zookeeper-headless/10.8.137.137:2181 lastZxid:1781 xid:512 sent:512 recv:532 queuedpkts:0 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeDataChanged path:/loadbalance/brokers/spain-pulsar-broker-0.spain-pulsar-broker.spain.svc.cluster.local:8080```
Namely from
`kubectl logs spain-pulsar-broker-0 -c spain-pulsar-broker -n spain`
----
2020-10-22 07:56:35 UTC - Konrad Łyś: spain is my namespace
----
2020-10-22 07:57:20 UTC - Konrad Łyś: As you can see some logs are following the general format and some are not
----
2020-10-22 07:57:34 UTC - Konrad Łyś: Is this the desired behaviour?
----