You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Andrew Purtell (JIRA)" <ji...@apache.org> on 2014/05/11 00:14:28 UTC

[jira] [Commented] (HBASE-11142) Taking snapshots can leave sockets on the master stuck in CLOSE_WAIT state

    [ https://issues.apache.org/jira/browse/HBASE-11142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993256#comment-13993256 ] 

Andrew Purtell commented on HBASE-11142:
----------------------------------------

For example:

{noformat}
$ lsof | grep TCP | grep CLOSE_WAIT
java       793   hbase  286u     IPv4         2296149216      0t0        TCP hmaster.domain.local:54849->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  292u     IPv4         2296149231      0t0        TCP hmaster.domain.local:54850->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  297u     IPv4         2296149243      0t0        TCP hmaster.domain.local:54851->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  298u     IPv4         2296149250      0t0        TCP hmaster.domain.local:54852->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  299u     IPv4         2296149260      0t0        TCP hmaster.domain.local:45214->regionserver-6.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  311u     IPv4         2296149268      0t0        TCP hmaster.domain.local:54413->regionserver-5.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  312r     IPv4         2296149276      0t0        TCP hmaster.domain.local:54414->regionserver-5.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  313u     IPv4         2296149284      0t0        TCP hmaster.domain.local:54856->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  314u     IPv4         2296149294      0t0        TCP hmaster.domain.local:54857->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  315u     IPv4         2296149304      0t0        TCP hmaster.domain.local:45219->regionserver-6.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  316u     IPv4         2296159910      0t0        TCP hmaster.domain.local:54958->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  317r     IPv4         2296159918      0t0        TCP hmaster.domain.local:45320->regionserver-6.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  318u     IPv4         2296159934      0t0        TCP hmaster.domain.local:54961->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  319u     IPv4         2296159954      0t0        TCP hmaster.domain.local:54521->regionserver-5.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  320u     IPv4         2296160007      0t0        TCP hmaster.domain.local:54522->regionserver-5.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  321u     IPv4         2296160024      0t0        TCP hmaster.domain.local:54523->regionserver-5.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  322u     IPv4         2296160043      0t0        TCP hmaster.domain.local:54965->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  323u     IPv4         2296160051      0t0        TCP hmaster.domain.local:54525->regionserver-5.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  324u     IPv4         2296160059      0t0        TCP hmaster.domain.local:45328->regionserver-6.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  325u     IPv4         2296160067      0t0        TCP hmaster.domain.local:54527->regionserver-5.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  326u     IPv4         2296167761      0t0        TCP hmaster.domain.local:55061->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  327r     IPv4         2296167767      0t0        TCP hmaster.domain.local:55062->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  328u     IPv4         2296167772      0t0        TCP hmaster.domain.local:45424->regionserver-6.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  329u     IPv4         2296167775      0t0        TCP hmaster.domain.local:55064->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  330u     IPv4         2296167778      0t0        TCP hmaster.domain.local:45426->regionserver-6.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  331u     IPv4         2296167782      0t0        TCP hmaster.domain.local:45427->regionserver-6.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  332u     IPv4         2296167786      0t0        TCP hmaster.domain.local:54626->regionserver-5.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  333u     IPv4         2296167789      0t0        TCP hmaster.domain.local:45429->regionserver-6.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  334u     IPv4         2296167791      0t0        TCP hmaster.domain.local:55069->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  335u     IPv4         2296167795      0t0        TCP hmaster.domain.local:45431->regionserver-6.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  336u     IPv4         2296198829      0t0        TCP hmaster.domain.local:54964->regionserver-5.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  337r     IPv4         2296198838      0t0        TCP hmaster.domain.local:45767->regionserver-6.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  338u     IPv4         2296198847      0t0        TCP hmaster.domain.local:55407->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  339u     IPv4         2296198858      0t0        TCP hmaster.domain.local:55408->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  340u     IPv4         2296198867      0t0        TCP hmaster.domain.local:55409->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  341u     IPv4         2296198880      0t0        TCP hmaster.domain.local:55410->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  342u     IPv4         2296198896      0t0        TCP hmaster.domain.local:45772->regionserver-6.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  343u     IPv4         2296198956      0t0        TCP hmaster.domain.local:54971->regionserver-5.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  344u     IPv4         2296198979      0t0        TCP hmaster.domain.local:55413->regionserver-4.domain.local:50010 (CLOSE_WAIT)
java       793   hbase  345u     IPv4         2296198999      0t0        TCP hmaster.domain.local:45775->regionserver-6.domain.local:50010 (CLOSE_WAIT)
{noformat}

> Taking snapshots can leave sockets on the master stuck in CLOSE_WAIT state
> --------------------------------------------------------------------------
>
>                 Key: HBASE-11142
>                 URL: https://issues.apache.org/jira/browse/HBASE-11142
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.2, 0.99.0, 0.96.1.1, 0.98.2
>            Reporter: Andrew Purtell
>
> As reported by Hansi Klose on user@. 
> {quote}
> we use a script to take on a regular basis snapshot's and delete old one's.
> We recognizes that the web interface of the hbase master was not working any more because of too many open files.
> The master reaches his number of open file limit of 32768
> When I run lsof I saw that there where a lot of TCP CLOSE_WAIT handles open with the regionserver as target.
> On the regionserver there is just one connection to the hbase master.
> I can see that the count of the CLOSE_WAIT handles grow each time
> i take a snapshot. When i delete on nothing changes.
> Each time i take a snapshot  there are 20 - 30 new CLOSE_WAIT handles.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)