You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Thomas Friedrich (JIRA)" <ji...@apache.org> on 2016/07/02 04:10:10 UTC

[jira] [Created] (YARN-5309) SSLFactory truststore reloader thread leak in TimelineClientImpl

Thomas Friedrich created YARN-5309:
--------------------------------------

             Summary: SSLFactory truststore reloader thread leak in TimelineClientImpl
                 Key: YARN-5309
                 URL: https://issues.apache.org/jira/browse/YARN-5309
             Project: Hadoop YARN
          Issue Type: Bug
          Components: timelineserver, yarn
    Affects Versions: 2.7.1
            Reporter: Thomas Friedrich


We found a similar issue as HADOOP-11368 in TimelineClientImpl. The class creates an instance of SSLFactory in newSslConnConfigurator and subsequently creates the ReloadingX509TrustManager instance which in turn starts a trust store reloader thread. 
However, the SSLFactory is never destroyed and hence the trust store reloader threads are not killed.

This problem was observed by a customer who had SSL enabled in Hadoop and submitted many queries against the HiveServer2. After a few days, the HS2 instance crashed and from the Java dump we could see many (over 13000) threads like this:
"Truststore reloader thread" #126 daemon prio=5 os_prio=0 tid=0x00007f680d2e3000 nid=0x98fd waiting on 
condition [0x00007f67e482c000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.security.ssl.ReloadingX509TrustManager.run
(ReloadingX509TrustManager.java:225)
        at java.lang.Thread.run(Thread.java:745)

HiveServer2 uses the JobClient to submit a job:
Thread [HiveServer2-Background-Pool: Thread-188] (Suspended (breakpoint at line 89 in 

ReloadingX509TrustManager))	
	owns: Object  (id=464)	
	owns: Object  (id=465)	
	owns: Object  (id=466)	
	owns: ServiceLoader<S>  (id=210)	
	ReloadingX509TrustManager.<init>(String, String, String, long) line: 89	
	FileBasedKeyStoresFactory.init(SSLFactory$Mode) line: 209	
	SSLFactory.init() line: 131	
	TimelineClientImpl.newSslConnConfigurator(int, Configuration) line: 532	
	TimelineClientImpl.newConnConfigurator(Configuration) line: 507	
	TimelineClientImpl.serviceInit(Configuration) line: 269	
	TimelineClientImpl(AbstractService).init(Configuration) line: 163	
	YarnClientImpl.serviceInit(Configuration) line: 169	
	YarnClientImpl(AbstractService).init(Configuration) line: 163	
	ResourceMgrDelegate.serviceInit(Configuration) line: 102	
	ResourceMgrDelegate(AbstractService).init(Configuration) line: 163	
	ResourceMgrDelegate.<init>(YarnConfiguration) line: 96	
	YARNRunner.<init>(Configuration) line: 112	
	YarnClientProtocolProvider.create(Configuration) line: 34	
	Cluster.initialize(InetSocketAddress, Configuration) line: 95	
	Cluster.<init>(InetSocketAddress, Configuration) line: 82	
	Cluster.<init>(Configuration) line: 75	
	JobClient.init(JobConf) line: 475	
	JobClient.<init>(JobConf) line: 454	
	MapRedTask(ExecDriver).execute(DriverContext) line: 401	
	MapRedTask.execute(DriverContext) line: 137	
	MapRedTask(Task<T>).executeTask() line: 160	
	TaskRunner.runSequential() line: 88	
	Driver.launchTask(Task<Serializable>, String, boolean, String, int, DriverContext) line: 1653	
	Driver.execute() line: 1412	

For every job, a new instance of JobClient/YarnClientImpl/TimelineClientImpl is created. But because the HS2 process stays up for days, the previous trust store reloader threads are still hanging around in the HS2 process and eventually use all the resources available. 

It seems like a similar fix as HADOOP-11368 is needed in TimelineClientImpl but it doesn't have a destroy method to begin with. 

One option to avoid this problem is to disable the yarn timeline service (yarn.timeline-service.enabled=false).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org