You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Alexey Serbin (JIRA)" <ji...@apache.org> on 2019/05/20 17:57:00 UTC

[jira] [Comment Edited] (KUDU-2395) Thread spike with all threads blocked in libnss

    [ https://issues.apache.org/jira/browse/KUDU-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844162#comment-16844162 ] 

Alexey Serbin edited comment on KUDU-2395 at 5/20/19 5:56 PM:
--------------------------------------------------------------

[~tlipcon] I think adding cache for resolved DNS entries should fix this issue, at least with the cache I don't expect the number of threads performing DNS resolution to jump that high.  But it would be nice to add some sort of test for that or at least test that scenario once manually once DNS cache is in place.

I'll prioritize revving https://gerrit.cloudera.org/#/c/13266/ this week.  Thank you for the reminder.


was (Author: aserbin):
[~tlipcon] I think adding cache for resolved DNS entries should fix this issue, at least cached DNS names I don't expect the number of threads performing DNS resolution to go that high.  But it would be nice to add some sort of test for that (at least test that scenario once manually).

I'll prioritize revving https://gerrit.cloudera.org/#/c/13266/ this week.  Thank you for the reminder.

> Thread spike with all threads blocked in libnss
> -----------------------------------------------
>
>                 Key: KUDU-2395
>                 URL: https://issues.apache.org/jira/browse/KUDU-2395
>             Project: Kudu
>          Issue Type: Bug
>          Components: consensus, tserver, util
>            Reporter: Todd Lipcon
>            Priority: Minor
>
> I saw the thread count on a server under a load test spike from 280 threads (fairly constant) to 3400 threads (briefly). I checked the diagnostics log and found that there are several thousand threads in a stack like:
> {code}
> 0x7facce018606 _nss_files_gethostbyname2_r
>   0x345a703645 <unknown>
>   0x345a6d0b3b <unknown>
>   0x345a6d2d80 <unknown>
>      0x1c9366c kudu::(anonymous namespace)::GetAddrInfo()
>      0x1c95fbe kudu::HostPort::ResolveAddresses()
>       0xac4b78 kudu::consensus::(anonymous namespace)::CreateConsensusServiceProxyForHost()
>       0xac5058 kudu::consensus::RpcPeerProxyFactory::NewProxy()
>       0xb0b212 kudu::consensus::LeaderElection::LeaderElection()
>       0xafab80 kudu::consensus::RaftConsensus::StartElection()
>       0xafd20c kudu::consensus::RaftConsensus::ReportFailureDetectedTask()
>      0x1ccf4ed kudu::FunctionRunnable::Run()
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)