You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "Josh Elser (JIRA)" <ji...@apache.org> on 2015/04/04 04:40:34 UTC

[jira] [Updated] (ACCUMULO-1719) Convenient instanceName to instanceID mapping is unnecessary

     [ https://issues.apache.org/jira/browse/ACCUMULO-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Josh Elser updated ACCUMULO-1719:
---------------------------------
    Fix Version/s:     (was: 1.7.0)
                   1.8.0

> Convenient instanceName to instanceID mapping is unnecessary
> ------------------------------------------------------------
>
>                 Key: ACCUMULO-1719
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1719
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client
>            Reporter: Christopher Tubbs
>             Fix For: 1.8.0
>
>
> ZooKeeperInstance constructor typically takes two parameters: instanceName and a comma separated list of zookeeper host[:port] (there's some others also, that take a UUID and/or a timeout setting).
> Initialize generates a UUID and associates a user-provided instanceName to it, with the following mapping in ZooKeeper:
> /accumulo/instances/instanceName, which contains a UUID, which points to /accumulo/UUID
> Since the introduction of instance.secret, there are potential problems with this mapping.
> If /accumulo (and /accumulo/instances and /accumulo/instances/instanceName) is created by Initialize in a write-protected way (using instance.secret), then re-initializing with a new generated instanceID but the same instanceName will not work unless the new instance has the same instance secret. This is very limiting and can be a nightmare for system administrators and developers trying to re-initialize.
> If it is not created in a write-protected way, there's an even bigger problem, because anybody with access to ZooKeeper can overwrite the old mapping to point to a new instance (and we expect all clients to be able to access ZooKeeper). While the old data is still protected, any clients connecting with the instanceName will connect (and ingest to) the new instanceID that the instanceName currently maps to.
> The current implementation appears to be using the former... (the instanceName node itself is protected by the same secret as the instanceId and child nodes). This means that at least the mapping is protected from being overwritten... but it also means that it doesn't provide us with any added value. Even if we're counting the added value of being able to reinitialize the same instanceName (generating a new instanceID), leaving the old instance data around for inspection, we've got the problems of ZK filling up and the fact that the mapping was re-written, we can't tell which old instanceID was the previous one to inspect.
> A better solution:
> Drop the mapping. It is unnecessary complex with no added value. Allow the instanceName that users create in new versions to represent the unique ID. Don't generate/use UUIDs anymore... use the provided instanceName. Keep the API for UUID... but just for convenience (treat it like a string internally). We can still prompt to overwrite the old instance... if it exists AND we have the same secret... but when we "overwrite it", we can optionally rename the old instanceName to instanceName_backup_date.
> Dropping the mapping has the benefit of reduced complexity, and (mostly) backwards-compatible (instances can't have the name "instances"). It is easier on developers to debug their instances, because there's no obscure UUID to deal with (unless they want to use that as the name) and they can find the old versions of their instances if they choose to back up the old data when re-initalizing. If not, they can avoid ZK filling up (esp. in dev environments where instanceNames get reused often). And, with a backup naming convention, it's easy for admins to decide which old instance data to keep and which to throw away... without the need of a mapping. The scope for the instance.secret is also well-defined to just the /accumulo/instanceName that created it, and there's no possibility of overwriting the instanceName to instanceID mapping.
> Instance names work best when unique. Instance IDs are guaranteed to be unique. There's no good reason these should be separate things.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)