You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Andrew Purtell (JIRA)" <ji...@apache.org> on 2014/05/11 00:14:31 UTC

[jira] [Comment Edited] (HBASE-11142) Taking snapshots can leave sockets on the master stuck in CLOSE_WAIT state

    [ https://issues.apache.org/jira/browse/HBASE-11142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993262#comment-13993262 ] 

Andrew Purtell edited comment on HBASE-11142 at 5/8/14 11:55 PM:
-----------------------------------------------------------------

Apparently these settings were added to the HBase site configuration by the reporting user without apparent effect:

{code}
   <property>
     <name>dfs.client.socketcache.capacity</name>
     <value>0</value>
   </property>
   <property>
     <name>dfs.datanode.socket.reuse.keepalive</name>
     <value>0</value>
   </property>
  <property>
    <name>dfs.client.socketcache.expiryMsec</name>
    <value>900</value>
  </property>
{code}



was (Author: apurtell):
Apparently these settings were added to the HBase site configuration without apparent effect:

{code}
   <property>
     <name>dfs.client.socketcache.capacity</name>
     <value>0</value>
   </property>
   <property>
     <name>dfs.datanode.socket.reuse.keepalive</name>
     <value>0</value>
   </property>
  <property>
    <name>dfs.client.socketcache.expiryMsec</name>
    <value>900</value>
  </property>
{code}


> Taking snapshots can leave sockets on the master stuck in CLOSE_WAIT state
> --------------------------------------------------------------------------
>
>                 Key: HBASE-11142
>                 URL: https://issues.apache.org/jira/browse/HBASE-11142
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.2, 0.99.0, 0.96.1.1, 0.98.2
>            Reporter: Andrew Purtell
>
> As reported by Hansi Klose on user@. 
> {quote}
> we use a script to take on a regular basis snapshot's and delete old one's.
> We recognizes that the web interface of the hbase master was not working any more because of too many open files.
> The master reaches his number of open file limit of 32768
> When I run lsof I saw that there where a lot of TCP CLOSE_WAIT handles open with the regionserver as target.
> On the regionserver there is just one connection to the hbase master.
> I can see that the count of the CLOSE_WAIT handles grow each time
> i take a snapshot. When i delete on nothing changes.
> Each time i take a snapshot  there are 20 - 30 new CLOSE_WAIT handles.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)