You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2018/01/29 17:44:18 UTC

Slack digest for #general - 2018-01-29

2018-01-29 01:13:29 UTC - Daniel Ferreira Jorge: Hi, I still am really confused regarding the messages retention and expiry in pulsar. This particular page in the documentation is really confusing. For instance, the first paragraph states: "by default brokers immediately deletes all messages that have been acknowledged by a consumer". Does this mean that the acked message is physically deleted from the bookie ledger? Isn't the ack of a message tied to a particular SUBSCRIPTION? If I have 10 subscriptions consuming from a topic, the first subscription that ack the message will cause the message to be DELETED and not being consumed by the other 9 subacriptions? Or the "delete" in the docs really mean "marked as consumed for a particular subscription"? In that same page in the "Retention policies" section it also states "By default, when a Pulsar message arrives at a broker it will be stored until it has been acknowledged by a consumer, at which point it will be deleted". What is really confusing to me is: since a topic can have MANY subscriptions, an acknowledgement is not a GLOBAL event for a message, but a per SUBSCRIPTION event. Am I completely misguided here? 
----
2018-01-29 01:26:02 UTC - jia zhai: Yes, ack is tied to subscription.  It is the second of your understanding `the "delete" in the docs really mean "marked as consumed for a particular subscription"`. 
The message will be delete-able, when all the subscription have consume it.
----
2018-01-29 01:29:48 UTC - Daniel Ferreira Jorge: Ok so if I have only one subscription, if I ack a message it will be deleted from de bookie unless I configure the retention policy for the namespace to which that topic belongs?
----
2018-01-29 01:30:44 UTC - jia zhai: :+1:
----
2018-01-29 01:33:32 UTC - Daniel Ferreira Jorge: Thank you! I have another question unrelated to this one. What are bundles? 
----
2018-01-29 01:44:57 UTC - Daniel Ferreira Jorge: Why would I increase the number of bundles in a namespace?
----
2018-01-29 01:49:08 UTC - jia zhai: Oh, Sorry, I did not touch too much related to bundle. It seems be more load balance related between brokers.
----
2018-01-29 01:54:07 UTC - Daniel Ferreira Jorge: Are there any docs about bundles and their purpose? I couldn't find anything related to that...
----
2018-01-29 02:05:55 UTC - jia zhai: @Matteo Merli @Sijie Guo for more info regarding bundles
----
2018-01-29 05:13:36 UTC - Matteo Merli: @Daniel Ferreira Jorge Ok, this is becoming really a FAQ and there’s not much documentation around bundles. 
The intention for that was that in most cases, one should not worry about it (or even know what they
are and what they’re for). 

I’ll try to summarize here and we’ll add a better redacted page in the docs. 

In Pulsar, “namespaces” are the administrative unit: you can configure most options on a namespace and they will be applied on the topics contained on the namespace. It gives the convenience of doing settings and operations on a group of topics rather than having to do it once per topic. 

In general, the pattern is to use a namespace for each user application. So a single user/tenant, can create
multiple namespaces to manage its own applications.

When it comes to topics, we need a way to assign topics to brokers, control the load and move them
if a broker becomes overloaded. Rather that doing this operations per each single topic (ownership, load-monitoring, assigning), we do it in _bundles_, or “groups of topics”.

In practical words, the number of bundles determines “into how many brokers can I spread the topics for a given namespace”.

From the client API or implementation, there’s no concept of bundles, clients will lookup the topics they want to publish/consume individually.

On the broker side, the namespace is broke down into multiple _bundles_, and each bundle can be assigned to a
different broker. Effectively, bundles are the “unit of assignment” for topics into brokers and this is what 
the load-manager uses to track the traffic and decide where to place “bundles” and whether to offload them
to other brokers.

A bundle is represented by a hash-range. The 32bit hash space is initially divided equally into the 
the requested bundles. Topics are matched to a bundle by hashing on the topic name.

Default number of bundles is configured in `broker.conf`: `defaultNumberOfNamespaceBundles=4`

When the traffic increases on a given bundle, it will be split in 2 and reassigned to a different broker.

Enable auto-split: `loadBalancerAutoBundleSplitEnabled=true`
Trigger unload and reassignment after splitting: `loadBalancerAutoUnloadSplitBundlesEnabled=true`

If is expected to have a high traffic on a particular namespace, it’s a good practice to 
specify a higher number of bundles when creating the namespace: 

`bin/pulsar-admin namespaces create $NS --bundles 64`

This will avoid the initial auto-adjustment phase.

All the thresholds for the auto-splitting can be configured in `broker.conf`, eg: number of topics/partitions, messages in/out, bytes in/out, etc..
----
2018-01-29 05:15:00 UTC - jia zhai: :+1:
----
2018-01-29 07:26:24 UTC - Jaebin Yoon: @Sijie Guo Here is what I did before I started seeing those errors. 
1) brought up new bookies (10)
2) terminated all old bookies  (10) (while no auto-recovery was running)
3) deleted the old partitioned topic
4) create a new partitioned topics (with same name)
5) started traffic on the new partitioned topic
----
2018-01-29 07:27:51 UTC - Jaebin Yoon: I thought there would be auto-recovery running by default in the bookie cluster but realized it required running a separate auto-recovery service (or embedded option, which by default was off).
----
2018-01-29 07:34:00 UTC - Sijie Guo: @Jaebin Yoon: &gt; realized it required running a separate auto-recovery service (or embedded option, which by default was off)

ah, right. we can make the auto recovery is on by default /cc @Matteo Merli 

@Jaebin Yoon we will try to repeat your sequence to see if we can reproduce this behavior. /cc @Matteo Merli
----
2018-01-29 08:01:45 UTC - Jaebin Yoon: I'm trying to increase the number of partitions on the existing partitioned topic with pulsar-admin CLI while messages are being produced and consumed on that topic. (from 10 partitions to 50  partitions since the traffic was not distributed well over brokers) but the command gets stuck and it seems nothing is happening. Here is the command I used :

```pulsar-admin persistent update-partitioned-topic -p 50 persistent://$PROPERTY/${CLUSTER}/${NS}/${TOPIC}```
----
2018-01-29 09:17:06 UTC - Ivan Kelly: @Jaebin Yoon what does jstack say the jvm is doing?
----
2018-01-29 09:27:00 UTC - Jaebin Yoon: Some brokers' cpus were hot because of traffic and GC pauses up to 1.5s repeatedly for the traffic as well. It doesn't seem that the API changes any CPU usage or GC pauses.
----
2018-01-29 09:32:15 UTC - Ivan Kelly: what version is the cluster running?
----
2018-01-29 14:37:48 UTC - Jesse Thompson: Test-driving Pulsar via the standalone Docker instructions. I’m trying to create a new namespace, but am stuck in trying to first create a property. If I try `pulsar-admin create property property-name` I get an error stating that parameter _create_ was passed but no _main_ parameter was defined.. Not sure what that means. Additionally, the documents state that the options are `--admin-roles` and `--allowed-clusters`, are either of those required parameters? What are the possible _admin roles_? What would one specify the cluster as being if using a standalone cluster? Loopback?
----
2018-01-29 15:52:32 UTC - Ivan Kelly: is should be pulsar-admin properties create property-name
----
2018-01-29 15:55:27 UTC - Jesse Thompson: `pulsar-admin --admin-url <http://localhost:8080> properties create property-name`

Tells me that I must specify `--admin-roles` and `--allowed-clusters`
What kinds of roles are available? For the clusters, do I just point it at loopback?
----
2018-01-29 16:16:22 UTC - Ivan Kelly: for the clusters, you can get a list with "pulsar-admin clusters list"
----
2018-01-29 16:16:36 UTC - Ivan Kelly: you can just make something up for admin-roles
----
2018-01-29 16:16:40 UTC - Ivan Kelly: like test-admin-role
----