You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by "Eric Yang (JIRA)" <ji...@apache.org> on 2018/02/02 23:39:00 UTC
[jira] [Created] (YARN-7884) Race condition in registering YARN
service in ZooKeeper
Eric Yang created YARN-7884:
-------------------------------
Summary: Race condition in registering YARN service in ZooKeeper
Key: YARN-7884
URL: https://issues.apache.org/jira/browse/YARN-7884
Project: Hadoop YARN
Issue Type: Bug
Components: yarn-native-services
Affects Versions: 3.1.0
Reporter: Eric Yang
In Kerberos enabled cluster, there seems to be a race condition for registering YARN service.
Yarn-service znode creation seems to happen after AM started and reporting back to update components information. For some reason, Yarnservice znode should have access to create the znode, but reported NoAuth.
{code}
2018-02-02 22:53:30,442 [main] INFO service.ServiceScheduler - Set registry user accounts: sasl:hbase
2018-02-02 22:53:30,471 [main] INFO zk.RegistrySecurity - Registry default system acls:
[1,s{'world,'anyone}
, 31,s{'sasl,'yarn}
, 31,s{'sasl,'jhs}
, 31,s{'sasl,'hdfs-demo}
, 31,s{'sasl,'rm}
, 31,s{'sasl,'hive}
]
2018-02-02 22:53:30,472 [main] INFO zk.RegistrySecurity - Registry User ACLs
[31,s{'sasl,'hbase}
, 31,s{'sasl,'hbase}
]
2018-02-02 22:53:30,503 [main] INFO event.AsyncDispatcher - Registering class org.apache.hadoop.yarn.service.component.ComponentEventType for class org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler
2018-02-02 22:53:30,504 [main] INFO event.AsyncDispatcher - Registering class org.apache.hadoop.yarn.service.component.instance.ComponentInstanceEventType for class org.apache.hadoop.yarn.service.ServiceScheduler$ComponentInstanceEventHandler
2018-02-02 22:53:30,528 [main] INFO impl.NMClientAsyncImpl - Upper bound of the thread pool size is 500
2018-02-02 22:53:30,531 [main] INFO service.ServiceMaster - Starting service as user hbase/eyang-5.openstacklocal@EXAMPLE.COM (auth:KERBEROS)
2018-02-02 22:53:30,545 [main] INFO ipc.CallQueueManager - Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 100 scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler
2018-02-02 22:53:30,554 [Socket Reader #1 for port 56859] INFO ipc.Server - Starting Socket Reader #1 for port 56859
2018-02-02 22:53:30,589 [main] INFO pb.RpcServerFactoryPBImpl - Adding protocol org.apache.hadoop.yarn.service.impl.pb.service.ClientAMProtocolPB to the server
2018-02-02 22:53:30,606 [IPC Server Responder] INFO ipc.Server - IPC Server Responder: starting
2018-02-02 22:53:30,607 [IPC Server listener on 56859] INFO ipc.Server - IPC Server listener on 56859: starting
2018-02-02 22:53:30,607 [main] INFO service.ClientAMService - Instantiated ClientAMService at eyang-5.openstacklocal/172.26.111.20:56859
2018-02-02 22:53:30,609 [main] INFO zk.CuratorService - Creating CuratorService with connection fixed ZK quorum "eyang-1.openstacklocal:2181"
2018-02-02 22:53:30,615 [main] INFO zk.RegistrySecurity - Enabling ZK sasl client: jaasClientEntry = Client, principal = hbase/eyang-5.openstacklocal@EXAMPLE.COM, keytab = /etc/security/keytabs/hbase.service.keytab
2018-02-02 22:53:30,752 [main] INFO client.RMProxy - Connecting to ResourceManager at eyang-1.openstacklocal/172.26.111.17:8032
2018-02-02 22:53:30,909 [main] INFO service.ServiceScheduler - Registering appattempt_1517611904996_0001_000001, abc into registry
2018-02-02 22:53:30,911 [main] INFO service.ServiceScheduler - Received 0 containers from previous attempt.
2018-02-02 22:53:31,072 [main] INFO service.ServiceScheduler - Could not read component paths: `/users/hbase/services/yarn-service/abc/components': No such file or directory: KeeperErrorCode = NoNode for /registry/users/hbase/services/yarn-service/abc/components
2018-02-02 22:53:31,074 [main] INFO service.ServiceScheduler - Triggering initial evaluation of component sleeper
2018-02-02 22:53:31,075 [main] INFO component.Component - [INIT COMPONENT sleeper]: 2 instances.
2018-02-02 22:53:31,094 [main] INFO component.Component - [COMPONENT sleeper] Transitioned from INIT to FLEXING on FLEX event.
2018-02-02 22:53:31,215 [pool-5-thread-1] ERROR service.ServiceScheduler - Failed to register app abc in registry
org.apache.hadoop.registry.client.exceptions.NoPathPermissionsException: `/registry/users/hbase/services/yarn-service/abc': Not authorized to access path; ACLs: [
0x01: 'world,'anyone
0x1f: 'sasl,'yarn
0x1f: 'sasl,'jhs
0x1f: 'sasl,'hdfs-demo
0x1f: 'sasl,'rm
0x1f: 'sasl,'hive
0x1f: 'sasl,'hbase
0x1f: 'sasl,'hbase
]: KeeperErrorCode = NoAuth for /registry/users/hbase/services/yarn-service/abc
at org.apache.hadoop.registry.client.impl.zk.CuratorService.operationFailure(CuratorService.java:412)
at org.apache.hadoop.registry.client.impl.zk.CuratorService.zkCreate(CuratorService.java:637)
at org.apache.hadoop.registry.client.impl.zk.CuratorService.zkSet(CuratorService.java:679)
at org.apache.hadoop.registry.client.impl.zk.RegistryOperationsService.bind(RegistryOperationsService.java:116)
at org.apache.hadoop.yarn.service.registry.YarnRegistryViewForProviders.putService(YarnRegistryViewForProviders.java:195)
at org.apache.hadoop.yarn.service.registry.YarnRegistryViewForProviders.registerSelf(YarnRegistryViewForProviders.java:210)
at org.apache.hadoop.yarn.service.ServiceScheduler$2.run(ServiceScheduler.java:462)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /registry/users/hbase/services/yarn-service/abc
at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:740)
at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:723)
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109)
at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:720)
at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:484)
at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:474)
at org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:260)
at org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:214)
at org.apache.hadoop.registry.client.impl.zk.CuratorService.zkCreate(CuratorService.java:635)
... 12 more
2018-02-02 22:53:33,135 [AMRM Callback Handler Thread] INFO service.ServiceScheduler - 2 containers allocated.
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org