You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Andrew Purtell (JIRA)" <ji...@apache.org> on 2014/05/11 00:14:28 UTC
[jira] [Commented] (HBASE-11142) Taking snapshots can leave sockets
on the master stuck in CLOSE_WAIT state
[ https://issues.apache.org/jira/browse/HBASE-11142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993256#comment-13993256 ]
Andrew Purtell commented on HBASE-11142:
----------------------------------------
For example:
{noformat}
$ lsof | grep TCP | grep CLOSE_WAIT
java 793 hbase 286u IPv4 2296149216 0t0 TCP hmaster.domain.local:54849->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 292u IPv4 2296149231 0t0 TCP hmaster.domain.local:54850->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 297u IPv4 2296149243 0t0 TCP hmaster.domain.local:54851->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 298u IPv4 2296149250 0t0 TCP hmaster.domain.local:54852->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 299u IPv4 2296149260 0t0 TCP hmaster.domain.local:45214->regionserver-6.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 311u IPv4 2296149268 0t0 TCP hmaster.domain.local:54413->regionserver-5.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 312r IPv4 2296149276 0t0 TCP hmaster.domain.local:54414->regionserver-5.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 313u IPv4 2296149284 0t0 TCP hmaster.domain.local:54856->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 314u IPv4 2296149294 0t0 TCP hmaster.domain.local:54857->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 315u IPv4 2296149304 0t0 TCP hmaster.domain.local:45219->regionserver-6.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 316u IPv4 2296159910 0t0 TCP hmaster.domain.local:54958->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 317r IPv4 2296159918 0t0 TCP hmaster.domain.local:45320->regionserver-6.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 318u IPv4 2296159934 0t0 TCP hmaster.domain.local:54961->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 319u IPv4 2296159954 0t0 TCP hmaster.domain.local:54521->regionserver-5.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 320u IPv4 2296160007 0t0 TCP hmaster.domain.local:54522->regionserver-5.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 321u IPv4 2296160024 0t0 TCP hmaster.domain.local:54523->regionserver-5.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 322u IPv4 2296160043 0t0 TCP hmaster.domain.local:54965->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 323u IPv4 2296160051 0t0 TCP hmaster.domain.local:54525->regionserver-5.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 324u IPv4 2296160059 0t0 TCP hmaster.domain.local:45328->regionserver-6.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 325u IPv4 2296160067 0t0 TCP hmaster.domain.local:54527->regionserver-5.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 326u IPv4 2296167761 0t0 TCP hmaster.domain.local:55061->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 327r IPv4 2296167767 0t0 TCP hmaster.domain.local:55062->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 328u IPv4 2296167772 0t0 TCP hmaster.domain.local:45424->regionserver-6.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 329u IPv4 2296167775 0t0 TCP hmaster.domain.local:55064->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 330u IPv4 2296167778 0t0 TCP hmaster.domain.local:45426->regionserver-6.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 331u IPv4 2296167782 0t0 TCP hmaster.domain.local:45427->regionserver-6.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 332u IPv4 2296167786 0t0 TCP hmaster.domain.local:54626->regionserver-5.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 333u IPv4 2296167789 0t0 TCP hmaster.domain.local:45429->regionserver-6.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 334u IPv4 2296167791 0t0 TCP hmaster.domain.local:55069->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 335u IPv4 2296167795 0t0 TCP hmaster.domain.local:45431->regionserver-6.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 336u IPv4 2296198829 0t0 TCP hmaster.domain.local:54964->regionserver-5.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 337r IPv4 2296198838 0t0 TCP hmaster.domain.local:45767->regionserver-6.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 338u IPv4 2296198847 0t0 TCP hmaster.domain.local:55407->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 339u IPv4 2296198858 0t0 TCP hmaster.domain.local:55408->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 340u IPv4 2296198867 0t0 TCP hmaster.domain.local:55409->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 341u IPv4 2296198880 0t0 TCP hmaster.domain.local:55410->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 342u IPv4 2296198896 0t0 TCP hmaster.domain.local:45772->regionserver-6.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 343u IPv4 2296198956 0t0 TCP hmaster.domain.local:54971->regionserver-5.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 344u IPv4 2296198979 0t0 TCP hmaster.domain.local:55413->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java 793 hbase 345u IPv4 2296198999 0t0 TCP hmaster.domain.local:45775->regionserver-6.domain.local:50010 (CLOSE_WAIT)
{noformat}
> Taking snapshots can leave sockets on the master stuck in CLOSE_WAIT state
> --------------------------------------------------------------------------
>
> Key: HBASE-11142
> URL: https://issues.apache.org/jira/browse/HBASE-11142
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.94.2, 0.99.0, 0.96.1.1, 0.98.2
> Reporter: Andrew Purtell
>
> As reported by Hansi Klose on user@.
> {quote}
> we use a script to take on a regular basis snapshot's and delete old one's.
> We recognizes that the web interface of the hbase master was not working any more because of too many open files.
> The master reaches his number of open file limit of 32768
> When I run lsof I saw that there where a lot of TCP CLOSE_WAIT handles open with the regionserver as target.
> On the regionserver there is just one connection to the hbase master.
> I can see that the count of the CLOSE_WAIT handles grow each time
> i take a snapshot. When i delete on nothing changes.
> Each time i take a snapshot there are 20 - 30 new CLOSE_WAIT handles.
> {quote}
--
This message was sent by Atlassian JIRA
(v6.2#6252)