You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "Christopher Tubbs (JIRA)" <ji...@apache.org> on 2016/07/13 22:29:20 UTC

[jira] [Updated] (ACCUMULO-2971) ChangeSecret tool should refuse to run if no write access to HDFS

     [ https://issues.apache.org/jira/browse/ACCUMULO-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christopher Tubbs updated ACCUMULO-2971:
----------------------------------------
    Description: 
Currently, the ChangeSecret tool doesn't do any check to ensure the user running it has the ability to write to /accumlo/instance_id.

In the event that an admin knows the instance secret but runs the command as a user who can not write to the instance_id, the result is an unhelpful error message and a disconnect between HDFS and zookeeper.


Example for cluster with instance named "foobar"

{code}
[busbey@edge ~]$ hdfs dfs -ls /accumulo/instance_id
Found 1 items
-rw-r--r--   3 accumulo accumulo          0 2014-07-02 09:05 /accumulo/instance_id/cb977c77-3e13-4522-b718-2b487d722fd4
[busbey@edge ~]$ accumulo org.apache.accumulo.server.util.ChangeSecret
old zookeeper password: 
new zookeeper password: 
Thread "org.apache.accumulo.server.util.ChangeSecret" died Permission denied: user=busbey, access=WRITE, inode="/accumulo":accumulo:accumulo:drwxr-x--x
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:224)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:204)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4846)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:2911)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:2872)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2859)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:642)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:408)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44968)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)

org.apache.hadoop.security.AccessControlException: Permission denied: user=busbey, access=WRITE, inode="/accumulo":accumulo:accumulo:drwxr-x--x
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:224)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:204)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4846)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:2911)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:2872)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2859)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:642)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:408)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44968)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)

	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
	at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1489)
	at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:355)
	at org.apache.accumulo.server.util.ChangeSecret.updateHdfs(ChangeSecret.java:150)
	at org.apache.accumulo.server.util.ChangeSecret.main(ChangeSecret.java:66)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.accumulo.start.Main$1.run(Main.java:141)
	at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=busbey, access=WRITE, inode="/accumulo":accumulo:accumulo:drwxr-x--x
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:224)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:204)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4846)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:2911)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:2872)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2859)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:642)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:408)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44968)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)

	at org.apache.hadoop.ipc.Client.call(Client.java:1238)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
	at $Proxy16.delete(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:408)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
	at $Proxy17.delete(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1487)
	... 9 more
[busbey@edge ~]$ hdfs dfs -ls /accumulo/instance_id
Found 1 items
-rw-r--r--   3 accumulo accumulo          0 2014-07-02 09:05 /accumulo/instance_id/cb977c77-3e13-4522-b718-2b487d722fd4
[busbey@edge ~]$ zookeeper-client
Connecting to localhost:2181
Welcome to ZooKeeper!
JLine support is enabled

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0] get /accumulo/instances/foobar
1528cc95-2600-4649-a50e-1645404e9d6c
cZxid = 0xe00034f45
ctime = Wed Jul 02 09:27:58 PDT 2014
mZxid = 0xe00034f45
mtime = Wed Jul 02 09:27:58 PDT 2014
pZxid = 0xe00034f45
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 36
numChildren = 0
[zk: localhost:2181(CONNECTED) 1] ls /accumulo/1528cc95-2600-4649-a50e-1645404e9d6c
[users, monitor, problems, root_tablet, gc, hdfs_reservations, table_locks, namespaces, recovery, fate, tservers, tables, next_file, tracers, config, dead, bulk_failed_copyq, masters]
[zk: localhost:2181(CONNECTED) 2] ls /accumulo/cb977c77-3e13-4522-b718-2b487d722fd4
[users, problems, monitor, root_tablet, hdfs_reservations, gc, table_locks, namespaces, recovery, fate, tservers, tables, next_file, tracers, config, masters, bulk_failed_copyq, dead]

{code}

What's worse, in this condition the cluster will properly come up and show everything fine if the old instance secret is used.

However, clients and servers will now end up looking at different zookeeper nodes depending on wether they used HDFS to get the instance_id or if they use a ZK instance name lookup to get it so long as they use the corresponding instance secret.

Furthermore, if an admin uses the CleanZooKeeper utility  subsequent to this failure, it'll cause the loss of the zookeeper nodes the server processes are looking at.

The utility should do a sanity check that /accumulo/instance_id is writable prior to changing zookeeper. It should also wait to update the instance name to instand_id pointer in zookeeper until after HDFS has been updated.

Workaround: manually edit the HDFS instance_id to match the new instance id found zk for the instance name and proceed as though the secret change had succeeded.

  was:
Currently, the ChangePassword tool doesn't do any check to ensure the user running it has the ability to write to /accumlo/instance_id.

In the event that an admin knows the instance secret but runs the command as a user who can not write to the instance_id, the result is an unhelpful error message and a disconnect between HDFS and zookeeper.


Example for cluster with instance named "foobar"

{code}
[busbey@edge ~]$ hdfs dfs -ls /accumulo/instance_id
Found 1 items
-rw-r--r--   3 accumulo accumulo          0 2014-07-02 09:05 /accumulo/instance_id/cb977c77-3e13-4522-b718-2b487d722fd4
[busbey@edge ~]$ accumulo org.apache.accumulo.server.util.ChangeSecret
old zookeeper password: 
new zookeeper password: 
Thread "org.apache.accumulo.server.util.ChangeSecret" died Permission denied: user=busbey, access=WRITE, inode="/accumulo":accumulo:accumulo:drwxr-x--x
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:224)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:204)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4846)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:2911)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:2872)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2859)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:642)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:408)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44968)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)

org.apache.hadoop.security.AccessControlException: Permission denied: user=busbey, access=WRITE, inode="/accumulo":accumulo:accumulo:drwxr-x--x
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:224)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:204)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4846)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:2911)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:2872)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2859)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:642)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:408)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44968)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)

	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
	at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1489)
	at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:355)
	at org.apache.accumulo.server.util.ChangeSecret.updateHdfs(ChangeSecret.java:150)
	at org.apache.accumulo.server.util.ChangeSecret.main(ChangeSecret.java:66)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.accumulo.start.Main$1.run(Main.java:141)
	at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=busbey, access=WRITE, inode="/accumulo":accumulo:accumulo:drwxr-x--x
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:224)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:204)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4846)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:2911)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:2872)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2859)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:642)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:408)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44968)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)

	at org.apache.hadoop.ipc.Client.call(Client.java:1238)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
	at $Proxy16.delete(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:408)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
	at $Proxy17.delete(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1487)
	... 9 more
[busbey@edge ~]$ hdfs dfs -ls /accumulo/instance_id
Found 1 items
-rw-r--r--   3 accumulo accumulo          0 2014-07-02 09:05 /accumulo/instance_id/cb977c77-3e13-4522-b718-2b487d722fd4
[busbey@edge ~]$ zookeeper-client
Connecting to localhost:2181
Welcome to ZooKeeper!
JLine support is enabled

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0] get /accumulo/instances/foobar
1528cc95-2600-4649-a50e-1645404e9d6c
cZxid = 0xe00034f45
ctime = Wed Jul 02 09:27:58 PDT 2014
mZxid = 0xe00034f45
mtime = Wed Jul 02 09:27:58 PDT 2014
pZxid = 0xe00034f45
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 36
numChildren = 0
[zk: localhost:2181(CONNECTED) 1] ls /accumulo/1528cc95-2600-4649-a50e-1645404e9d6c
[users, monitor, problems, root_tablet, gc, hdfs_reservations, table_locks, namespaces, recovery, fate, tservers, tables, next_file, tracers, config, dead, bulk_failed_copyq, masters]
[zk: localhost:2181(CONNECTED) 2] ls /accumulo/cb977c77-3e13-4522-b718-2b487d722fd4
[users, problems, monitor, root_tablet, hdfs_reservations, gc, table_locks, namespaces, recovery, fate, tservers, tables, next_file, tracers, config, masters, bulk_failed_copyq, dead]

{code}

What's worse, in this condition the cluster will properly come up and show everything fine if the old instance secret is used.

However, clients and servers will now end up looking at different zookeeper nodes depending on wether they used HDFS to get the instance_id or if they use a ZK instance name lookup to get it so long as they use the corresponding instance secret.

Furthermore, if an admin uses the CleanZooKeeper utility  subsequent to this failure, it'll cause the loss of the zookeeper nodes the server processes are looking at.

The utility should do a sanity check that /accumulo/instance_id is writable prior to changing zookeeper. It should also wait to update the instance name to instand_id pointer in zookeeper until after HDFS has been updated.

Workaround: manually edit the HDFS instance_id to match the new instance id found zk for the instance name and proceed as though the secret change had succeeded.


> ChangeSecret tool should refuse to run if no write access to HDFS
> -----------------------------------------------------------------
>
>                 Key: ACCUMULO-2971
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2971
>             Project: Accumulo
>          Issue Type: Bug
>    Affects Versions: 1.5.0, 1.5.1, 1.6.0
>            Reporter: Sean Busbey
>              Labels: newbie
>             Fix For: 1.8.1
>
>
> Currently, the ChangeSecret tool doesn't do any check to ensure the user running it has the ability to write to /accumlo/instance_id.
> In the event that an admin knows the instance secret but runs the command as a user who can not write to the instance_id, the result is an unhelpful error message and a disconnect between HDFS and zookeeper.
> Example for cluster with instance named "foobar"
> {code}
> [busbey@edge ~]$ hdfs dfs -ls /accumulo/instance_id
> Found 1 items
> -rw-r--r--   3 accumulo accumulo          0 2014-07-02 09:05 /accumulo/instance_id/cb977c77-3e13-4522-b718-2b487d722fd4
> [busbey@edge ~]$ accumulo org.apache.accumulo.server.util.ChangeSecret
> old zookeeper password: 
> new zookeeper password: 
> Thread "org.apache.accumulo.server.util.ChangeSecret" died Permission denied: user=busbey, access=WRITE, inode="/accumulo":accumulo:accumulo:drwxr-x--x
> 	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:224)
> 	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:204)
> 	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4846)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:2911)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:2872)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2859)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:642)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:408)
> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44968)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)
> org.apache.hadoop.security.AccessControlException: Permission denied: user=busbey, access=WRITE, inode="/accumulo":accumulo:accumulo:drwxr-x--x
> 	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:224)
> 	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:204)
> 	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4846)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:2911)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:2872)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2859)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:642)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:408)
> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44968)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> 	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
> 	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
> 	at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1489)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:355)
> 	at org.apache.accumulo.server.util.ChangeSecret.updateHdfs(ChangeSecret.java:150)
> 	at org.apache.accumulo.server.util.ChangeSecret.main(ChangeSecret.java:66)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.accumulo.start.Main$1.run(Main.java:141)
> 	at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=busbey, access=WRITE, inode="/accumulo":accumulo:accumulo:drwxr-x--x
> 	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:224)
> 	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:204)
> 	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4846)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:2911)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:2872)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2859)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:642)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:408)
> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44968)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1238)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> 	at $Proxy16.delete(Unknown Source)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:408)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> 	at $Proxy17.delete(Unknown Source)
> 	at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1487)
> 	... 9 more
> [busbey@edge ~]$ hdfs dfs -ls /accumulo/instance_id
> Found 1 items
> -rw-r--r--   3 accumulo accumulo          0 2014-07-02 09:05 /accumulo/instance_id/cb977c77-3e13-4522-b718-2b487d722fd4
> [busbey@edge ~]$ zookeeper-client
> Connecting to localhost:2181
> Welcome to ZooKeeper!
> JLine support is enabled
> WATCHER::
> WatchedEvent state:SyncConnected type:None path:null
> [zk: localhost:2181(CONNECTED) 0] get /accumulo/instances/foobar
> 1528cc95-2600-4649-a50e-1645404e9d6c
> cZxid = 0xe00034f45
> ctime = Wed Jul 02 09:27:58 PDT 2014
> mZxid = 0xe00034f45
> mtime = Wed Jul 02 09:27:58 PDT 2014
> pZxid = 0xe00034f45
> cversion = 0
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 36
> numChildren = 0
> [zk: localhost:2181(CONNECTED) 1] ls /accumulo/1528cc95-2600-4649-a50e-1645404e9d6c
> [users, monitor, problems, root_tablet, gc, hdfs_reservations, table_locks, namespaces, recovery, fate, tservers, tables, next_file, tracers, config, dead, bulk_failed_copyq, masters]
> [zk: localhost:2181(CONNECTED) 2] ls /accumulo/cb977c77-3e13-4522-b718-2b487d722fd4
> [users, problems, monitor, root_tablet, hdfs_reservations, gc, table_locks, namespaces, recovery, fate, tservers, tables, next_file, tracers, config, masters, bulk_failed_copyq, dead]
> {code}
> What's worse, in this condition the cluster will properly come up and show everything fine if the old instance secret is used.
> However, clients and servers will now end up looking at different zookeeper nodes depending on wether they used HDFS to get the instance_id or if they use a ZK instance name lookup to get it so long as they use the corresponding instance secret.
> Furthermore, if an admin uses the CleanZooKeeper utility  subsequent to this failure, it'll cause the loss of the zookeeper nodes the server processes are looking at.
> The utility should do a sanity check that /accumulo/instance_id is writable prior to changing zookeeper. It should also wait to update the instance name to instand_id pointer in zookeeper until after HDFS has been updated.
> Workaround: manually edit the HDFS instance_id to match the new instance id found zk for the instance name and proceed as though the secret change had succeeded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)