You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2018/05/04 09:11:03 UTC
Slack digest for #general - 2018-05-04

2018-05-03 11:05:07 UTC - haphut: @haphut has joined the channel
----
2018-05-03 14:56:54 UTC - Vasily Yanov: Hi! We just performed stress test for our pulsar cluster. After few hours I see a lot of logs like
```
2018-05-03 14:54:28,687 - WARN  [bookkeeper-ml-workers-30-1:PersistentTopic@238] - [<persistent://server-ali-t2-1525356990406/prod-pulsar-cluster-1/session_queue/2fb1b643-8c24-48b8-b96b-a882ea96bc59>] Error getting policies KeeperErrorCode = NoNode and isEncryptionRequired will be set to false
2018-05-03 14:54:28,727 - WARN  [main-EventThread:AbstractZkLedgerManager$2@287] - Ledger node does not exist in ZooKeeper: ledgerId=1557442
2018-05-03 14:54:28,727 - WARN  [BookKeeperClientWorker-20-1:ManagedCursorImpl@2220] - [server-ali-t2-1525356990406/prod-pulsar-cluster-1/session_queue/persistent/e1f775a1-fc5e-45b2-801e-2bea442a74dc][sender] Failed to delete ledger 1557442: No such ledger exists
2018-05-03 14:54:28,791 - WARN  [bookkeeper-ml-workers-33-1:PersistentTopic@238] - [<persistent://server-ali-t2-1525356990406/prod-pulsar-cluster-1/session_queue/51b4972d-9f82-47ed-a8e9-aa2d3a39ed1e>] Error getting policies KeeperErrorCode = NoNode and isEncryptionRequired will be set to false
```
what does it means? Zookeeper nodes is UP and running with no errors in logs
----
2018-05-03 15:50:50 UTC - Matteo Merli: @Vasily Yanov was the namespace explicitly created?
----
2018-05-03 15:52:01 UTC - Matteo Merli: It seems just the that a new topic is created, and since there are no policies, it defaults to “encryption not required” for the topic
----
2018-05-03 16:42:02 UTC - Vasily Yanov: @Matteo Merli no, namespace was automatically created with default settings
----
2018-05-03 16:43:26 UTC - Matteo Merli: Ok, that is the reason then. since there is not authentication or authorization configured, it lets you publish anyway but the namespace was not fully created
----
2018-05-03 16:44:59 UTC - Matteo Merli: we’ve been adjusting this aspect in next release (2.0), to require namespace to be created in any cases so that it’s less confusing (also it provides a `public/default` namespace already pre-created)
----
2018-05-03 17:42:06 UTC - Karthik Palanivelu: Hi All, I am looking at Multi-Tenant feature. At broker level, there is a isolation of data between tenants. Can you please let me know whether the same isolation is applicable at Bookie level when storing?
----
2018-05-03 17:45:19 UTC - Matteo Merli: @Karthikeyan Palanivelu Yes, that is possible as well through the rack-aware policy configuration. 

In addition of selecting the bookies to write to based on their rack, you can configure to have different groups of bookies. Then brokers can be configured to use just a subset of bookies. 

For example, you can isolate a tenant to a subset of brokers and then have these brokers use subset of the bookies, to have complete end-to-end isolation
----
2018-05-03 17:47:04 UTC - Karthik Palanivelu: Cool, is this is in 2.0?
----
2018-05-03 17:48:01 UTC - Matteo Merli: Yes, that was already there since a long time. We just don’t have currently have a lot of tooling or documentation around rack-aware, though that wold be one of the focuses of next 2.1 release
----
2018-05-03 17:49:01 UTC - Karthik Palanivelu: Ok, documentation helps.
----
2018-05-03 17:49:41 UTC - Karthik Palanivelu: My next question was, how do I install broker and bookie in different ASGs within AWS; basically expecting how would I let broker know to connect with Bookies?
----
2018-05-03 17:50:07 UTC - Karthik Palanivelu: In the documentation of Single Cluster describes about having it in same Instance.
----
2018-05-03 17:51:37 UTC - Matteo Merli: Sure, the first step is to actually start VMs in different AZs, then you need to configure BookKeeper client (in Pulsar brokers) so that it knows which “rack” each bookie is located
----
2018-05-03 17:52:28 UTC - Karthik Palanivelu: Can bookies be behind Load Balancer?
----
2018-05-03 17:52:33 UTC - Matteo Merli: That can be done in a few ways. The default for Pulsar is to read that (bookie -&gt; rack) mapping from a z-node
----
2018-05-03 17:53:15 UTC - Matteo Merli: no, there is no load-balancer involved, since it’s a stateful system we connect directly to the bookies that have a particular ledger/entry
----
2018-05-03 17:57:15 UTC - Karthik Palanivelu: Please pardon me and correct my understanding; I am connecting the dots for my deployment architecture. I standup 3 zk nodes in Aws. Similar way I stand up 3 bookies in AWS with zk nodes configuration. Now zk has bookies information. Now I install pulsar broker in 3 different nodes with zk configs. When I start the broker, it reads the Bookie Client list configuration from zk and connects to the bookies. This way whenever new bookie comes up broker will retrieve the latest from zk and connects to bookies.
----
2018-05-03 17:57:47 UTC - Karthik Palanivelu: All these brokers will have the same cluster name.
----
2018-05-03 18:05:48 UTC - Karthik Palanivelu: In this scenario, if I create Tenant and Namespace in a broker gets recognized in other brokers within the same cluster. Please confirm.
----
2018-05-03 18:34:31 UTC - Matteo Merli: That is correct
----
2018-05-03 18:35:09 UTC - Matteo Merli: The only thing missing for rack-aware is the information on which “AZ” each bookie is located
----
2018-05-03 18:35:46 UTC - Karthik Palanivelu: How do I configure it? Any reference you can help with?
----
2018-05-03 18:38:09 UTC - Matteo Merli: As I mentioned,  the docs for that part are not really there yet. You can take a look at <https://github.com/apache/incubator-pulsar/issues/151> and ask question here for what’s missing
----
2018-05-03 18:39:51 UTC - Matteo Merli: short summary: the mapping is kept in ZK at `/bookies`. The content is JSON and you can see the format from the example linked in the isseu
----
2018-05-03 18:40:40 UTC - Matteo Merli: whenever the z-node is updated, brokers will automatically update the mapping
----
2018-05-03 18:42:10 UTC - Matteo Merli: the top-level grouping in the JSON content (“us-east-1” in the example) is the bookie group that can be selected in `broker.conf`:

```
# Enable bookie isolation by specifying a list of bookie groups to choose from. Any bookie
# outside the specified groups will not be used by the broker
bookkeeperClientIsolationGroups=
```
----
2018-05-03 18:48:05 UTC - Karthik Palanivelu: Cool Thats Great Start, Thanks so much
----
2018-05-04 04:23:52 UTC - Vikas Lalwani: @Vikas Lalwani has joined the channel
----
2018-05-04 05:52:56 UTC - Vasily Yanov: @Matteo Merli additionally to yesterday's outage I see a lot of messages like:
```
2018-05-03 20:35:21,641 - WARN  [main-EventThread:AbstractZkLedgerManager$2@287] - Ledger node does not exist in ZooKeeper: ledgerId=1726440
2018-05-03 20:35:21,641 - WARN  [main-EventThread:AbstractZkLedgerManager$2@287] - Ledger node does not exist in ZooKeeper: ledgerId=1737402
2018-05-03 20:35:21,641 - WARN  [main-EventThread:AbstractZkLedgerManager$2@287] - Ledger node does not exist in ZooKeeper: ledgerId=1750016
```
and when I tries to restart 1 of 3 zookeeper nodes I got:
```
2018-05-03 16:42:46,793 - ERROR [BookKeeperClientWorker-21-1:LedgerHandle@845] - Closing ledger 1569995 due to error -9
2018-05-03 16:42:46,793 - ERROR [BookKeeperClientWorker-21-1:PendingAddOp@270] - Write of ledger entry to quorum failed: L1569995 E72
```
----
2018-05-04 06:11:29 UTC - Sijie Guo: @Vasily Yanov 

“Ledger node does not exist in ZooKeeper: ledgerId=1726440” =&gt; this logging is fine. it is a warning logging, it happens when pulsar is attempting to delete ledger but ledger is already deleted. so that should be safe, just noisy

“Closing ledger 1569995 due to error -9" =&gt; -9 means zk exception, it is possible that you resetarting the zookeeper nodes when brokers attempt to write metadata. and the following “write of ledger entry to quorum failed” is related to that.

do you see any  effects of those loggings?
----
2018-05-04 06:14:10 UTC - Vasily Yanov: @Sijie Guo yes. our developers complains for "pulsar connection timeout" errors and "Caused by: org.apache.pulsar.client.api.PulsarClientException: Disconnected from server at &lt;my pulsar URL&gt;:port"
----
2018-05-04 06:16:34 UTC - Sijie Guo: oh, can you tell me more about your environment setup and your stress testing?
----
2018-05-04 07:05:11 UTC - Vasily Yanov: @Sijie Guo 3x bare metal nodes. zookeeper+bookkeeper+pulsar on each node. zookeeper dataLogDir are stored on RAM drive. Pulsar and bookkeeper are on separate partition. BTW: I'm using zookeeper from ubuntu repo (Zookeeper version: 3.4.9-3--1, built on Thu, 01 Jun 2017 16:26:44 -0700) and not from pulsar bundle
----
2018-05-04 07:11:29 UTC - Sijie Guo: @Vasily Yanov cool. broker configuration is default? when you do load testing, how many requests/second or bytes/second are you sending? what kind of disks are you using?
----
2018-05-04 07:20:20 UTC - Vasily Yanov: @Sijie Guo broker configuration is almost default except ```exposeTopicLevelMetricsInPrometheus=true``` changed to false (thanks for @Matteo Merli that trick is really improved our performance). I don't know about what exactly they did during the tests (will have more details later as nobody in the office now) but grafana's graphs during the tests looks like: <http://prntscr.com/jdjn0i>
----
2018-05-04 07:25:01 UTC - Sijie Guo: @Vasily Yanov the traffic seems to be low but you have quite a lot of consumers 60k+. it would be good if you can have more details about the tests, so I can help you to understand the behavior.
----
2018-05-04 07:25:49 UTC - Vasily Yanov: @Sijie Guo ok. will update you as soon as I have more tests details
----
2018-05-04 07:34:07 UTC - Sijie Guo: thank you @Vasily Yanov
----