You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2019/12/11 09:11:04 UTC

Slack digest for #general - 2019-12-11

2019-12-10 13:43:44 UTC - juraj: build of `master` in a `pulsar-build` container fails with:
----
2019-12-10 14:25:59 UTC - juraj: same on the `maven:3.6.3-jdk-8` image
----
2019-12-10 14:38:06 UTC - Martin Kunev: Hi,
I have a question regarding message ordering on a single persistent topic. I couldn't figure it out from reading the documentation.
A topic has replication clusters: cluster0, cluster1 and cluster2. There is a susbscriber for the topic on each cluster. In the following scenario:

* cluster0 publishes messageA
* the subscriber on cluster1 receives messageA and publishes messageB as a result

Is it possible that the subscriber on cluster2 receives messageB before messageA?
----
2019-12-10 14:51:47 UTC - Joe Francis: Yes. Message ordering is guaranteed only per topic (partition)  per producer.
----
2019-12-10 15:25:23 UTC - Daniel Ferreira Jorge: Hello guys... I'm trying to configure the GCS Offloader for a new deployment... I have the `tiered-storage-jcloud-2.4.2.nar` inside the `offloaders` directory (I'm using the `pulsar-all` docker image), but I keep getting `No offloader found for driver 'google-cloud-storage. Please make sure you dropped the offloader nar packages under `${PULSAR_HOME}/offloaders'` and the broker won't initialize... Below is my `broker.conf` offloading config.... Am I missing something?

```### --- Ledger Offloading --- ###

# The directory for all the offloader implementations
offloadersDirectory=./offloaders

# Driver to use to offload old data to long term storage (Possible values: S3, aws-s3, google-cloud-storage)
# When using google-cloud-storage, Make sure both Google Cloud Storage and Google Cloud Storage JSON API are enabled for
# the project (check from Developers Console -&gt; Api&amp;auth -&gt; APIs).
managedLedgerOffloadDriver=google-cloud-storage

# Maximum number of thread pool threads for ledger offloading
managedLedgerOffloadMaxThreads=2

# Use Open Range-Set to cache unacked messages
managedLedgerUnackedRangesOpenCacheSetEnabled=true

# For Amazon S3 ledger offload, AWS region
s3ManagedLedgerOffloadRegion=

# For Amazon S3 ledger offload, Bucket to place offloaded ledger into
s3ManagedLedgerOffloadBucket=

# For Amazon S3 ledger offload, Alternative endpoint to connect to (useful for testing)
s3ManagedLedgerOffloadServiceEndpoint=

# For Amazon S3 ledger offload, Max block size in bytes. (64MB by default, 5MB minimum)
s3ManagedLedgerOffloadMaxBlockSizeInBytes=67108864

# For Amazon S3 ledger offload, Read buffer size in bytes (1MB by default)
s3ManagedLedgerOffloadReadBufferSizeInBytes=1048576

# For Google Cloud Storage ledger offload, region where offload bucket is located.
# reference this page for more details: <https://cloud.google.com/storage/docs/bucket-locations>
gcsManagedLedgerOffloadRegion=us-central1

# For Google Cloud Storage ledger offload, Bucket to place offloaded ledger into
gcsManagedLedgerOffloadBucket=pulsar-topic-offload

# For Google Cloud Storage ledger offload, Max block size in bytes. (64MB by default, 5MB minimum)
gcsManagedLedgerOffloadMaxBlockSizeInBytes=67108864

# For Google Cloud Storage ledger offload, Read buffer size in bytes (1MB by default)
gcsManagedLedgerOffloadReadBufferSizeInBytes=1048576

# For Google Cloud Storage, path to json file containing service account credentials.
# For more details, see the "Service Accounts" section of <https://support.google.com/googleapi/answer/6158849>
gcsManagedLedgerOffloadServiceAccountKeyFile=/tmp/gcp_access.json```
----
2019-12-10 15:40:20 UTC - Alexandre DUVAL: @Daniel Ferreira Jorge Hi, you need to add &lt;https://www.apache.org/dyn/mirrors/mirrors.cgi?action=download&amp;filename=pulsar/pulsar-2.4.2/apache-pulsar-offloaders-2.4.2-bin.tar.gz&gt;
----
2019-12-10 15:57:42 UTC - Nick Ruhl: @Nick Ruhl has joined the channel
----
2019-12-10 15:57:43 UTC - Daniel Ferreira Jorge: @Alexandre DUVAL Thanks for the answer. Where should I add this file? I'm already using the `pulsar-all` docker image, which is supposed to contain all the necessary files...
----
2019-12-10 16:24:39 UTC - Alexandre DUVAL: In /lib
----
2019-12-10 16:27:26 UTC - Nick Ruhl: Hi Pulsar Community. I am new to Pulsar but just stood up a K8S cluster and plan to use it heavily in production soon. I am currently chaos testing it in order to help my understanding of managing the cluster and fixing issues when they arise. A few questions I have is the procedure for what to do if/when the ledger and/or journaling disks fill up as this has been my largest issue and so far has required me to rebuild cluster.

```- How can I get things back on track and/or clear the ledger and/or journal so the cluster if functional?(If persistance is not required)
- Can I extend the volumes and perform some actions to get things realigned?(If persistance is required)
- Is there any documentation on these topics and what to do when things go wrong?```
Thank you all and happy holidays!
----
2019-12-10 16:31:42 UTC - Sijie Guo: Can you provide more output of that command? I can’t see anything from the screenshot.
----
2019-12-10 16:37:33 UTC - Sijie Guo: Can you create a github issue or a stackoverflow question for this question? This is a question better to answer there, which can benefit all the community.
+1 : Nick Ruhl
----
2019-12-10 16:38:15 UTC - Nick Ruhl: @Sijie Guo No problem. Thank you
----
2019-12-10 16:53:42 UTC - juraj: can u see better here?
----
2019-12-10 16:53:47 UTC - juraj: ```[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.6.0:exec (rename-epoll-library) on project managed-ledger: Command execution failed.: Process exited with an error: 127 (Exit value: 127) -&gt; [Help 1]```
----
2019-12-10 16:54:34 UTC - juraj: found the same issue here, but idk how it was solved: <http://mail-archives.apache.org/mod_mbox/pulsar-dev/201807.mbox/%3C680602365.2282.1532567683130.JavaMail.jenkins@jenkins01%3E>
----
2019-12-10 17:02:10 UTC - juraj: more context
----
2019-12-10 17:02:21 UTC - juraj: ohhhh `zip: command not found`
----
2019-12-10 17:03:01 UTC - juraj: `apt-get install zip` and trying again
----
2019-12-10 17:18:48 UTC - Daniel Ferreira Jorge: @Alexandre DUVAL This file you told me to download only contains the `tiered-storage-jcloud-2.4.2.nar`, which is already inside my `/pulsar/offloaders` directory... anyway, I did download and put the file you told me inside the `lib` and I have the same results....
----
2019-12-10 17:24:13 UTC - Alexandre DUVAL: my bad, you have to put in pulsar/offloaders
----
2019-12-10 17:24:32 UTC - Alexandre DUVAL: ```~pulsar/offloaders # ls
tiered-storage-jcloud-2.4.0.nar```
----
2019-12-10 17:24:44 UTC - Alexandre DUVAL: @Daniel Ferreira Jorge
----
2019-12-10 17:24:56 UTC - Daniel Ferreira Jorge: it already is
----
2019-12-10 17:25:16 UTC - juraj: that worked, i'm on another issue involving the docker-maven-plugin, will post later
----
2019-12-10 17:25:30 UTC - Daniel Ferreira Jorge: @Alexandre DUVAL
----
2019-12-10 17:35:28 UTC - Daniel Ferreira Jorge: From the logs, I can see an exception when it tries to load the nar file:
----
2019-12-10 17:35:33 UTC - Daniel Ferreira Jorge: ```"<http://java.io|java.io>.IOException: /tmp/pulsar-nar/tiered-storage-jcloud-2.4.2.nar-unpacked/META-INF could not be created
	at org.apache.pulsar.common.nar.FileUtils.ensureDirectoryExistAndCanReadAndWrite(FileUtils.java:51) ~[org.apache.pulsar-pulsar-common-2.4.2.jar:2.4.2]
	at org.apache.pulsar.common.nar.NarUnpacker.unpack(NarUnpacker.java:106) ~[org.apache.pulsar-pulsar-common-2.4.2.jar:2.4.2]
	at org.apache.pulsar.common.nar.NarUnpacker.unpackNar(NarUnpacker.java:66) ~[org.apache.pulsar-pulsar-common-2.4.2.jar:2.4.2]
	at org.apache.pulsar.common.nar.NarClassLoader.getFromArchive(NarClassLoader.java:141) ~[org.apache.pulsar-pulsar-common-2.4.2.jar:2.4.2]
	at org.apache.bookkeeper.mledger.offload.OffloaderUtils.getOffloaderDefinition(OffloaderUtils.java:109) ~[org.apache.pulsar-managed-ledger-original-2.4.2.jar:2.4.2]
	at org.apache.bookkeeper.mledger.offload.OffloaderUtils.lambda$searchForOffloaders$1(OffloaderUtils.java:130) ~[org.apache.pulsar-managed-ledger-original-2.4.2.jar:2.4.2]
	at java.lang.Iterable.forEach(Iterable.java:75) [?:1.8.0_232]
	at org.apache.bookkeeper.mledger.offload.OffloaderUtils.searchForOffloaders(OffloaderUtils.java:128) [org.apache.pulsar-managed-ledger-original-2.4.2.jar:2.4.2]
	at org.apache.pulsar.broker.PulsarService.createManagedLedgerOffloader(PulsarService.java:728) [org.apache.pulsar-pulsar-broker-2.4.2.jar:2.4.2]
	at org.apache.pulsar.broker.PulsarService.start(PulsarService.java:382) [org.apache.pulsar-pulsar-broker-2.4.2.jar:2.4.2]
	at org.apache.pulsar.PulsarBrokerStarter$BrokerStarter.start(PulsarBrokerStarter.java:273) [org.apache.pulsar-pulsar-broker-2.4.2.jar:2.4.2]
	at org.apache.pulsar.PulsarBrokerStarter.main(PulsarBrokerStarter.java:332) [org.apache.pulsar-pulsar-broker-2.4.2.jar:2.4.2]```
----
2019-12-10 17:38:08 UTC - Daniel Ferreira Jorge: apparently it is trying to unpack the nar into a temp folder, but it is not being able to for some reason
----
2019-12-10 17:46:44 UTC - Brian Doran: Sorry @Sijie Guo I completely missed your reply to this.
----
2019-12-10 17:47:05 UTC - Brian Doran: we are looking at improving the throughput from our Pulsar Producer clients to a 3 node Pulsar cluster...

We cannot get the message throughput beyond about 230K/sec no matter how many changes we make to the producer client settings:

Current client settings are:

2019-12-10 15:35:48.347Z INFO  [Export-Pipeline-Queue-9] o.a.p.c.i.ProducerStatsRecorderImpl - Pulsar client config: {
  "serviceUrl" : "<pulsar://prod-fx3s1c.s.com:6650>,prod-fx3s1a.s.com:6650,prod-fx3s1b.s.com:6650",
  "authPluginClassName" : null,
  "authParams" : null,
  "operationTimeoutMs" : 30000,
  "statsIntervalSeconds" : 60,
  "numIoThreads" : 50,
  "numListenerThreads" : 1,
  "connectionsPerBroker" : 15,
  "useTcpNoDelay" : true,
  "useTls" : false,
  "tlsTrustCertsFilePath" : "",
  "tlsAllowInsecureConnection" : false,
  "tlsHostnameVerificationEnable" : false,
  "concurrentLookupRequest" : 5000,
  "maxLookupRequest" : 50000,
  "maxNumberOfRejectedRequestPerConnection" : 50,
  "keepAliveIntervalSeconds" : 30,
  "connectionTimeoutMs" : 10000,
  "requestTimeoutMs" : 60000,
  "defaultBackoffIntervalNanos" : 100000000,
  "maxBackoffIntervalNanos" : 30000000000
}

We have 13 partitioned topics with 10 partitions each

bin/pulsar-admin topics list-partitioned-topics public/default
<persistent://public/default/TestTopic1>
<persistent://public/default/TestTopic2>
<persistent://public/default/TestTopic3>
<persistent://public/default/TestTopic4>
<persistent://public/default/TestTopic5>
<persistent://public/default/TestTopic6>
<persistent://public/default/TestTopic6>
<persistent://public/default/TestTopic7>
<persistent://public/default/TestTopic8>
<persistent://public/default/TestTopic9>
<persistent://public/default/TestTopic10>
<persistent://public/default/TestTopic11>
<persistent://public/default/TestTopic12>
<persistent://public/default/TestTopic13>
<persistent://public/default/TestTopic14>
<persistent://public/default/TestTopic15>
<persistent://public/default/TestTopic10>
<persistent://public/default/TestTopic11>
<persistent://public/default/TestTopic12>
<persistent://public/default/TestTopic13>


As you can see from the picture we have lots of producers: 16 threads consuming data, each one with a producer per partition; so it's quite a high producer count.

2019-12-10 15:35:48.343Z INFO  [Export-Pipeline-Queue-9] o.a.p.c.i.ProducerStatsRecorderImpl - Starting Pulsar producer perf with config: {
  "topicName" : "<persistent://public/default/TestTopic1>",
  "producerName" : "<persistent://public/default/TestTopic1[Export-Pipeline-Queue-9]>",
  "sendTimeoutMs" : 30000,
  "blockIfQueueFull" : true,
  "maxPendingMessages" : 5000,
  "maxPendingMessagesAcrossPartitions" : 50000,
  "messageRoutingMode" : "RoundRobinPartition",
  "hashingScheme" : "JavaStringHash",
  "cryptoFailureAction" : "FAIL",
  "batchingMaxPublishDelayMicros" : 200000,
  "batchingMaxMessages" : 1000,
  "batchingEnabled" : true,
  "batcherBuilder" : { },
  "compressionType" : "LZ4",
  "initialSequenceId" : null,
  "autoUpdatePartitions" : true,
  "properties" : { }
}
----
2019-12-10 17:47:49 UTC - Brian Doran: 
----
2019-12-10 17:49:34 UTC - Brian Doran: We run the same data through with the destination as our Kafka broker (we've been doing this a long time so we know what to expect here) but with Pulsar we're only really starting to benchmark it over the last few weeks and having trouble replicating the Kafka throughput.
----
2019-12-10 18:06:36 UTC - Sijie Guo: Can you share more details about the Pulsar setup (e.g. # brokers, # bookies, bookie disk and configuration)?
----
2019-12-10 19:06:32 UTC - Ryan Samo: Hey guys, is there a limit to the number of subscriptions that can be active on a topic?
----
2019-12-10 19:18:56 UTC - Joe Francis: Not really. Keep in mind that every sub adds 1X dispatch load. So you will need to consider scaling
----
2019-12-10 19:20:00 UTC - Addison Higham: no hard limit and they should be fairly cheap in the case of tailing reads, but there is obviously bandwidth as well as some CPU and mem used.
----
2019-12-10 19:20:58 UTC - Ryan Samo: Awesome, just thinking about having devices subscribe to receive push notifications etc. thanks!
----
2019-12-10 19:21:34 UTC - Ryan Samo: I didn’t want to reach some arbitrary cap
----
2019-12-10 19:23:06 UTC - Joe Francis: Depends on the numbers. What numbers are we talking about? 100s? 1000s? Millions?
----
2019-12-10 19:23:19 UTC - Ryan Samo: 1000s
----
2019-12-10 19:24:20 UTC - Ryan Samo: Loading a config to each client on connection via Pulsar then all of the clients update with that config. For example
----
2019-12-10 19:26:07 UTC - Ryan Samo: Trying to keep many clients in sync
----
2019-12-10 19:27:38 UTC - Addison Higham: since topics are tied to a single broker, subscriptions don't split across multiple brokers so your mechanism to scale that out is to just do bigger brokers. You could consider fanning out the data to multiple topics and keying data, or if not, one common technique in systems like rabbitmq (that should work here as well) might be to do a "tiered" fanout. For example, if you had a single topic you produce to have a pulsar functions that fans out to 10 other topics (with each topic having a copy of the data) you could then have your clients randomly choose one of the 10 topics and then subscribe to that. That way, instead of 1000 subscriptions on a single broker, you get 200 subscriptions on 5 brokers for example
----
2019-12-10 19:28:47 UTC - Ryan Samo: Is this true for partitioned topics as well?
----
2019-12-10 19:29:00 UTC - Ryan Samo: I thought they spanned brokers
----
2019-12-10 19:31:19 UTC - Addison Higham: they do, but a subscription also spans  multiple brokers as well (in the case of shared, exclusive, and failover) so you don't buy yourself a ton there
----
2019-12-10 19:31:45 UTC - Addison Higham: not sure about the details of key shared
----
2019-12-10 19:32:26 UTC - Joe Francis: It all depends on the througput you plan to push. 1X in means 1000sX out.   If its a config once in a while, should be possible , but test it out
----
2019-12-10 19:33:51 UTC - Joe Francis: For some context, I  run subs in the 100s, without issues
----
2019-12-10 19:54:53 UTC - juraj: almost there, but have hit this one now:
----
2019-12-10 19:56:36 UTC - juraj: seems like starting the whole build from within docker will not work
----
2019-12-10 19:57:46 UTC - Ryan Samo: I see, ok thanks!
----
2019-12-10 19:58:36 UTC - juraj: actually, seems like some images were built!
tada : Roman Popenov
----
2019-12-10 19:58:41 UTC - Roman Popenov: Can someone go more into details about key shared subscription?
----
2019-12-10 19:58:54 UTC - Roman Popenov: Or point me to the right KBA?
----
2019-12-10 20:00:49 UTC - Brian Doran: We have 3 brokers running in docker  / 3 brokers running in docker / 3 zookeepers running in docker
----
2019-12-10 20:01:30 UTC - Brian Doran: which configurations are you looking for.. bookie.conf?
----
2019-12-10 20:14:55 UTC - David Kjerrumgaard: <https://github.com/apache/pulsar/wiki/PIP-34%3A-Add-new-subscribe-type-Key_shared>
thanks : Roman Popenov
----
2019-12-10 20:30:34 UTC - Brian Doran: *CPU*: 2 x Intel Xeon E5-2695 v3 2.3GHz,35M Cache,9.60GT/s QPI,Turbo,HT,14C/28T
*RAM*: 128GB RAM
*OS Disk*: 2 x 200GB SSD
*Network*: 10Gbps

FD332 Specs
16 x 1.2TB 10K SAS 6Gbps 2.5"
----
2019-12-10 20:31:34 UTC - Brian Doran: very few changes to the bookie config
----
2019-12-11 08:40:58 UTC - Jens Fosgerau: @Jens Fosgerau has joined the channel
----
2019-12-11 08:41:29 UTC - Dan Koopman: @Dan Koopman has joined the channel
----