You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "Christopher Tubbs (JIRA)" <ji...@apache.org> on 2015/05/22 20:49:17 UTC

[jira] [Commented] (ACCUMULO-3842) [UMBRELLA] Remove non-transient data from ZooKeeper

    [ https://issues.apache.org/jira/browse/ACCUMULO-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556605#comment-14556605 ] 

Christopher Tubbs commented on ACCUMULO-3842:
---------------------------------------------

As I see it, we have two main non-transient use cases for ZooKeeper:

# Bootstrap information (primarily, ZooKeeper serves as an entry point for clients communicating with Accumulo).
# Information which needs some degree of distributed consistency.

A lot of the configuration falls into the second category (soon-ish consistency with ZK watchers and ZooCache).

I think there's probably some stuff we could move over, but I also see other ways we can address some of those benefits you mention.

* Loss of ZK configuration could be handled with some sort of snapshotting.
* The number of watchers could be reduced greatly by changing our strategy for storing stuff. We could either have a single "watched" node (or one per table) and update that node when we make any other change. That would reduce the need to have as many watchers. We could also serialize table configuration a bit better (one node instead of many), which would also help address ACCUMULO-1568.

I'm not sure how the change proposed would manifest the third benefit you mention (consistent updates of table props). Can you explain that, please? As I understand it, we use ZooKeeper, because it has watchers, which we can use to get consistency. I'm not aware of any similar mechanism with any alternatives.

> [UMBRELLA] Remove non-transient data from ZooKeeper
> ---------------------------------------------------
>
>                 Key: ACCUMULO-3842
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3842
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client, tserver
>            Reporter: Josh Elser
>             Fix For: 1.8.0
>
>
> Wanted to start brainstorming about this.
> We store a lot of persistent data in ZooKeeper that would better stored in something backed by HDFS. ZooKeeper can be a very convenient place to store persisted data so that it's available to all nodes, but it comes at a price and often must be asynchronously accessed to achieve good performance.
> * Table/Namespace configuration
> * Users/Authorizations
> * Problem reports (maybe?)
> * System configuration overrides (maybe?)
> Some benefits we'd see from this:
> * Loss of ZooKeeper doesn't lose table configuration and users.
> * Greatly reduce zookeeper watchers (assume watchers=50*num_tables*num_tservers)
> * Consistent updates of table constraints and all other table properties
> The last note is the most important one IMO. The number of test issues alone that we've had with constraints not being seen on all servers are bound to affect users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)