You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@bookkeeper.apache.org by GitBox <gi...@apache.org> on 2018/02/03 02:07:41 UTC

[GitHub] jiazhai commented on a change in pull request #1113: BP-28: Etcd as metadata store

jiazhai commented on a change in pull request #1113: BP-28: Etcd as metadata store
URL: https://github.com/apache/bookkeeper/pull/1113#discussion_r165800469
 
 

 ##########
 File path: site/bps/BP-28-etcd-as-metadata-store.md
 ##########
 @@ -0,0 +1,102 @@
+---
+title: "BP-28: use etcd as metadata store"
+issue: https://github.com/apache/bookkeeper/<issue-number>
+state: 'Under Discussion'
+release: "N/A"
+---
+
+### Motivation
+
+Currently bookkeeper uses zookeeper as the metadata store. However there is a couple of issues with current approach, especially using zookeeper.
+
+These issues includes:
+
+1. You need to allocate special nodes for zookeeper. These nodes need to be treated specially, and have their own monitoring.
+   Ops need to understand both bookies and zookeeper.
+2. ZooKeeper is the scalability bottleneck. ZooKeeper doesn?t scale writes as you add nodes. This means that if your bookkeeper
+   cluster reaches the maximum write throughput that ZK can sustain, you?ve reached the maximum capacity of your cluster, and there?s nothing you
+   can do (except buy bigger hardware for your special nodes).
+3. ZooKeeper enforces you into its programming model. In general, its programming model is not too bad. However it becomes problematic when
+   the scale goes up (e.g. the number of clients and watcher increase). The issues usually comes from _session expires_ and _watcher_.
+  - *Session Expires*: For simplicity, ZooKeeper ties session state directly with connection state. So when a connection is broken, a session is usually expired (unless it reconnects before session expires), and when a session is expired, the underlying connection can not be used anymore, the application has to close the connection and re-establish a new client (a new connection). It is understandable that it makes zookeeper development easy. However in reality, it means if you can not establish a session, you can?t use this connection and you have to create new connections. Once your zookeeper cluster is in a bad state (e.g. network issue or jvm gc), the whole cluster is usually unable to recover because of the connection storm introduced by session expires.
+  - *Watchers*: The zookeeper watcher is one time watcher, applications can?t reliably use it to get updates. In order to set a watcher, you have to read a znode or get children. Imagine such a use case, clients are watching a list of znodes (e.g. list of bookies), when those clients expire, they have to get the list of znodes in order to rewatch the list, even the list is never changed.
+  - The combination of session expires and watchers is often the root cause of critical zookeeper outages.
+
+This proposal is to explore other existing systems such as etcd as the metadata store. Using Etcd doesn't address concerns #1, however it might potentially
+address concern #2 and #3 to some extend. And if you are running bookkeeper in k8s, there is already an Etcd instance available. It can become easier to run
+bookkeeper on k8s if we can use Etcd as the metadata store.
 
 Review comment:
   nit: seems bring some line breaks because of copy paste?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services