You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2018/03/30 09:11:02 UTC

Slack digest for #dev - 2018-03-30

2018-03-29 17:05:30 UTC - Sahaya Andrews Albert: What is our approach to get stateful bookie where the hostname/IP can be retained and if bookie is using ebs volumes, it can lock the volume for specific container? Going through the stateful set docs, the IP address itself is not guaranteed when container restarts, but it could get same hostname which is assigned by appending container number at the end of hostname.
----
2018-03-29 17:06:09 UTC - Sahaya Andrews Albert: So, thinking of scripting it around it's hostname and determine which pv's it should use. Is there a better approach we have than this?
----
2018-03-29 17:06:44 UTC - Matteo Merli: a persistent volume is assigned to a particular instance of the stateful set
----
2018-03-29 17:07:04 UTC - Matteo Merli: there should be nothing additional to be done
----
2018-03-29 17:14:00 UTC - Sahaya Andrews Albert: Does it retain both journal and ledger volumes?
----
2018-03-29 17:14:24 UTC - Matteo Merli: you need to have 2 volumes, though that should work
----
2018-03-29 17:15:14 UTC - Sahaya Andrews Albert: ok, What about the IP address? Should we use the hostname instead of IP?
----
2018-03-29 17:16:13 UTC - Matteo Merli: you can enable bookies to use hostname rather than IP to advertise
----
2018-03-29 17:16:20 UTC - Matteo Merli: that would be stable
----
2018-03-29 17:16:24 UTC - Sahaya Andrews Albert: ok
----
2018-03-29 17:17:15 UTC - Sahaya Andrews Albert: Thanks. I'll try it out.
----
2018-03-29 17:18:47 UTC - Sahaya Andrews Albert: Do you know if k8s has a config to get fully qualified host name? metadata.name doesn't append the domain name.
----
2018-03-29 17:20:18 UTC - Matteo Merli: You shouldn’t need that. Just pass `useHostNameAsBookieID=true` in bookie config
----
2018-03-29 17:20:27 UTC - Matteo Merli: <https://github.com/apache/incubator-pulsar/blob/master/deployment/kubernetes/google-kubernetes-engine/bookie.yaml#L53>
----
2018-03-29 17:21:17 UTC - Sahaya Andrews Albert: Um... thats what I tried and saw zk still had ip's. Let me clean them up and retry.
----
2018-03-30 00:18:12 UTC - Matteo Merli: Hi @Rajan Dhabalia, I’m seeing some issues with the failure domain z-node creation
----
2018-03-30 00:19:14 UTC - Matteo Merli: it can create the failure domain z-node before the cluster z-node is created. This can happen when brokers are started while the initialization is still happening
----
2018-03-30 00:19:45 UTC - Matteo Merli: (which is typically the case when the initial cluster is deployed in kubernetes or similars)
----
2018-03-30 00:20:23 UTC - Matteo Merli: so it creates the `/admin/clusters/MY_CLUSTER/failureDomain` with the intermediate z-node paths
----
2018-03-30 00:21:25 UTC - Matteo Merli: problem is `/admin/clusters/MY_CLUSTER` gets created empty and then the `initialize-cluster-metadata` will fail because the z-node already exist, so the cluster metadata won’t be created properly
----
2018-03-30 00:26:06 UTC - Rajan Dhabalia: hmm.. so, probably we can avoid creating failure-domain if cluster is not created yet
----
2018-03-30 00:26:29 UTC - Rajan Dhabalia: let me take a look and I will create a PR
----
2018-03-30 00:27:28 UTC - Matteo Merli: thanks
----
2018-03-30 00:28:13 UTC - Matteo Merli: yes, I think we can try to do the regular zk create wihtouth the parent z-nodes creation
----
2018-03-30 00:28:56 UTC - Rajan Dhabalia: yes
----
2018-03-30 00:57:50 UTC - Jai Asher: I am planning to add `publishThrottlingRatePerTopicInMsg` and `publishThrottlingRatePerTopicInByte` to throttle publish rate per topic - similar to this PIP Rajan had worked on

<https://github.com/apache/incubator-pulsar/wiki/PIP-3:-Message-dispatch-throttling>
<https://github.com/apache/incubator-pulsar/commit/f03a8b7fdec8ad3c5c726ffde522aa8da4f7adc8>

Let me know if you see any issues with the suggestion
----
2018-03-30 01:04:24 UTC - Matteo Merli: Great, how do you plan to implement the throttling?
----
2018-03-30 01:33:07 UTC - Jai Asher: Due to ordering guarantees we can't error out the publish so I was thinking of either:-
- Closing the producer `PublishRateExceededException` using a new ProtoBuf Command which newer client versions honor.  -&gt; client does a backoff and reconnects
OR
 - setAutoRead off to create backpressure but then this will be a connection level throttling
----
2018-03-30 01:34:52 UTC - Matteo Merli: &gt; client does a backoff and reconnects 

that’s not really controlled by server though
----
2018-03-30 01:35:23 UTC - Matteo Merli: &gt; - setAutoRead off to create backpressure but then this will be a connection level throttling

True but I think this is the better option
----
2018-03-30 01:35:54 UTC - Matteo Merli: we should then schedule a task after, say 10ms and reevaluate the rate
----
2018-03-30 01:45:15 UTC - Rajan Dhabalia: but if we set `setAutoRead` off then isn't it impact all the topics on that connection even though other topics have not reached the limit?
----
2018-03-30 02:08:52 UTC - Matteo Merli: sure, but that’s what we’re doing anyway right now to throttle publishes
----
2018-03-30 02:09:43 UTC - Matteo Merli: for a given client connection, there can only be 1K writes to BK, after that we do “auto-read=off” to limit the amount of memory used in broker
----
2018-03-30 05:18:14 UTC - Rajan Dhabalia: yes, we already have it. so, topic-throttling may impact other topics on that connection but might be helpful when we really want to throttle specific topic that is having high publish rate from multiple connected producers.. so, we can go in this direction..:+1:
----
2018-03-30 08:47:16 UTC - Jai Asher: The combination of the two approaches given below will help rate limit the publish per topic.
a. Client Side backoff  
b. Broker side connection level throttling


a. Client Side backoff - soft throttling
	- Create a new Command
	```
		CommandStopPublishing{
		    required uint64 producer_id = 1;
		    required string producer_name = 2;
		    // time when next permit will be available + a small random offset.
		    required uint64 stopPublishingInMs = 3;  
		}
	```
	- The combination of producer id and producer name is unique per Publisher.
	- Once the rateLimit is hit the Broker sends this command to all producers connected to the topic and expects clients to reject all newly published messages for stopPublishingInMs (PulsarException("Publish Rate Exceeded on the topic.")).
	- Broker will send this command only once per 10 seconds to prevent flooding the client. (Configurable)

	Advantages:
	- Topic level throttling is done and not connection level
	- Ordering guarantee is maintained

	Disadvantage:
	- Relies on Client to do the throttling
	- Requires newer version of the client


b. Broker side connection level throttling - server side limit
	- Broker maintains a connection level rateLimiter 
	- When a new topic is added to the connection the the permits on rateLimiter are increased
	- Once the rateLimit is hit the Broker sets `setAutoRead off` to create back pressure, on all connections *currently* publishing on the topic.
	- A separate task will be scheduled to `setAutoRead on` after estimated time when the next permit will be available + a random offset.

	Advantages:
	- Server side throttling - doesn't depend on client to hornor the rate Limit
	- Ordering guarantee is maintained

	Disadvantage:
	- This is not Topic level throttling not Connection level throttling 


================================================================================
The advantage of using both the approaches together is:-

a. Small spiky loads -  A small amount of spiky loads will not affect other topics, since they will be throttled using "Client Side backoff - soft throttling".  The "Broker side connection level throttling - server side limit" may not be kicked in since permits from other topics on this connection are still available.

b. Unresponsive and continuously misbehaving client - A client which doesn't honor "Client Side backoff" and continues publishing on the broker will be limited by "Broker side connection level throttling - server side limit"

c. Fairness - the random offset will ensure that all producers and connections get equal oppurtunity to publish on a topic
 
d. If a producer is idle on a connection then it will not be throttled due to another misbehaving producer since we plan to throttle only connections *currently* publishing on the topic. However this doesn't guarantee that a misbehaving topic will not affect other topics on the same connections.



======

Let me know what you guys think and if we are in agreement regarding the approach - I will create a PIP

@Sahaya Andrews Albert @Rajan Dhabalia @Matteo Merli @Joe Francis ^^^^^^
----