You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Hansi Klose <ha...@web.de> on 2014/04/17 15:51:40 UTC

taking snapshot's creates to many TCP CLOSE_WAIT handles on the hbase master server

Hi,

we use a script to take on a regular basis snapshot's and delete old one's.

We recognizes that the web interface of the hbase master was not working
any more becaues of too many open files.

The master reaches his number of open file limit of 32768

When I run lsof I saw that there where a lot of TCP CLOSE_WAIT handles open
with the regionserver as target.

On the regionserver there is just one connection to the hbase master.

I can see that the count of the CLOSE_WAIT handles grow each time 
i take a snapshot. When i delete on nothing changes.
Each time i take a snapshot  there are 20 - 30 new CLOSE_WAIT handles.

Why does the master do not close the handles? Is there a parameter
with a timeout we can use?

We use hbase 0.94.2-cdh4.2.0.

Regards Hansi

Re: Re: Re: Re: taking snapshot's creates to many TCP CLOSE_WAIT handles on the hbase master server

Posted by Andrew Purtell <ap...@apache.org>.

Thanks for the detail.

Unless you've changed it, port 50010 is the *DataNode* data transfer
socket. I'm surprised the HDFS tunings suggested by others on this thread
have not had an impact.

I filed https://issues.apache.org/jira/browse/HBASE-11142 to track this
report.



On Mon, May 5, 2014 at 5:19 PM, Hansi Klose <ha...@web.de> wrote:

> Hi Andrew,
>
> here is the output from our testing environment.
> There we can see the same behavior like in our production environment.
>
> Sorry if my description was not clear.
> The connection source is the hbase master process PID 793 and the target
> are
> the datanode port of our 3 regionserver.
>
> hbase master:   lsof | grep TCP | grep CLOSE_WAIT
>
> http://pastebin.com/BTyiVgb2
>
> Here are 40 connection in state CLOSE_WAIT to our 3 region server.
> This connection are there since last week.
>
> Regards Hansi
>
> > Gesendet: Mittwoch, 30. April 2014 um 18:48 Uhr
> > Von: "Andrew Purtell" <ap...@apache.org>
> > An: "user@hbase.apache.org" <us...@hbase.apache.org>
> > Betreff: Re: Re: Re: taking snapshot's creates to many TCP CLOSE_WAIT
> handles on the hbase master server
> >
> > Let's circle back to the original mail:
> >
> > > When I run lsof I saw that there where a lot of TCP CLOSE_WAIT handles
> > open with the regionserver as target.
> >
> > Is that right? *Regionserver*, not another process (datanode or
> whatever)?
> > Or did I miss where somewhere along this thread there was evidence
> > confirming a datanode was the remote?
> >
> > If you are sure that the stuck connections are to the regionserver
> process
> > (maybe pastebin lsof output so we can double check the port numbers
> > involved?) then the regionserver is closing the connection but the master
> > is not somehow, by definition of what CLOSE_WAIT means. HDFS settings
> won't
> > matter if it is the master is failing to close a socket, maybe this is an
> > IPC bug.
> >
> >
> >
> > On Wed, Apr 30, 2014 at 12:38 AM, Hansi Klose <ha...@web.de>
> wrote:
> >
> > > Hi,
> > >
> > > sorry i missed that  :-(
> > >
> > > I tried that parameter in my hbase-site.xml and restartet the hbase
> master
> > > and all regionserver.
> > >
> > >   <property>
> > >     <name>dfs.client.socketcache.expiryMsec</name>
> > >     <value>900</value>
> > >   </property>
> > >
> > > No change, the ClOSE_WAIT sockets still persists on the hbase master
> to the
> > > regionserver's datanode after taking snapshots.
> > >
> > > Because it was not clear for me where to the setting has to go
> > > i put it in our hdfs-site.xml too and restarted all datanodes.
> > > I thought that settings with dfs.client maybe have to go there.
> > > But this did not change the behavior either.
> > >
> > > Regards Hansi
> > >
> > > > Gesendet: Dienstag, 29. April 2014 um 19:21 Uhr
> > > > Von: Stack <st...@duboce.net>
> > > > An: Hbase-User <us...@hbase.apache.org>
> > > > Betreff: Re: Re: taking snapshot's creates to many TCP CLOSE_WAIT
> > > handles on the hbase master server
> > > >
> > > > On Tue, Apr 29, 2014 at 8:15 AM, Hansi Klose <ha...@web.de>
> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > sorry for the late answer.
> > > > >
> > > > > I configured the hbase-site.conf like this
> > > > >
> > > > >   <property>
> > > > >     <name>dfs.client.socketcache.capacity</name>
> > > > >     <value>0</value>
> > > > >   </property>
> > > > >   <property>
> > > > >     <name>dfs.datanode.socket.reuse.keepalive</name>
> > > > >     <value>0</value>
> > > > >   </property>
> > > > >
> > > > > and restarted the hbase master and all regionservers.
> > > > > I still can see the same behavior. Each snapshot creates
> > > > > new CLOSE_WAIT Sockets which stay there till hbase master restart.
> > > > >
> > > > > I there any other setting I can try?
> > > > >
> > > >
> > > > You saw my last suggestion about
> "...dfs.client.socketcache.expiryMsec to
> > > > 900 in your HBase client configuration.."?
> > > >
> > > > St.Ack
> > > >
> > >
> >
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Aw: Re: Re: Re: taking snapshot's creates to many TCP CLOSE_WAIT handles on the hbase master server

Posted by Hansi Klose <ha...@web.de>.

Hi Andrew,

here is the output from our testing environment. 
There we can see the same behavior like in our production environment.

Sorry if my description was not clear.
The connection source is the hbase master process PID 793 and the target are
the datanode port of our 3 regionserver.

hbase master:   lsof | grep TCP | grep CLOSE_WAIT

http://pastebin.com/BTyiVgb2

Here are 40 connection in state CLOSE_WAIT to our 3 region server. 
This connection are there since last week.

Regards Hansi

> Gesendet: Mittwoch, 30. April 2014 um 18:48 Uhr
> Von: "Andrew Purtell" <ap...@apache.org>
> An: "user@hbase.apache.org" <us...@hbase.apache.org>
> Betreff: Re: Re: Re: taking snapshot's creates to many TCP CLOSE_WAIT handles on the hbase master server
>
> Let's circle back to the original mail:
> 
> > When I run lsof I saw that there where a lot of TCP CLOSE_WAIT handles
> open with the regionserver as target.
> 
> Is that right? *Regionserver*, not another process (datanode or whatever)?
> Or did I miss where somewhere along this thread there was evidence
> confirming a datanode was the remote?
> 
> If you are sure that the stuck connections are to the regionserver process
> (maybe pastebin lsof output so we can double check the port numbers
> involved?) then the regionserver is closing the connection but the master
> is not somehow, by definition of what CLOSE_WAIT means. HDFS settings won't
> matter if it is the master is failing to close a socket, maybe this is an
> IPC bug.
> 
> 
> 
> On Wed, Apr 30, 2014 at 12:38 AM, Hansi Klose <ha...@web.de> wrote:
> 
> > Hi,
> >
> > sorry i missed that  :-(
> >
> > I tried that parameter in my hbase-site.xml and restartet the hbase master
> > and all regionserver.
> >
> >   <property>
> >     <name>dfs.client.socketcache.expiryMsec</name>
> >     <value>900</value>
> >   </property>
> >
> > No change, the ClOSE_WAIT sockets still persists on the hbase master to the
> > regionserver's datanode after taking snapshots.
> >
> > Because it was not clear for me where to the setting has to go
> > i put it in our hdfs-site.xml too and restarted all datanodes.
> > I thought that settings with dfs.client maybe have to go there.
> > But this did not change the behavior either.
> >
> > Regards Hansi
> >
> > > Gesendet: Dienstag, 29. April 2014 um 19:21 Uhr
> > > Von: Stack <st...@duboce.net>
> > > An: Hbase-User <us...@hbase.apache.org>
> > > Betreff: Re: Re: taking snapshot's creates to many TCP CLOSE_WAIT
> > handles on the hbase master server
> > >
> > > On Tue, Apr 29, 2014 at 8:15 AM, Hansi Klose <ha...@web.de> wrote:
> > >
> > > > Hi all,
> > > >
> > > > sorry for the late answer.
> > > >
> > > > I configured the hbase-site.conf like this
> > > >
> > > >   <property>
> > > >     <name>dfs.client.socketcache.capacity</name>
> > > >     <value>0</value>
> > > >   </property>
> > > >   <property>
> > > >     <name>dfs.datanode.socket.reuse.keepalive</name>
> > > >     <value>0</value>
> > > >   </property>
> > > >
> > > > and restarted the hbase master and all regionservers.
> > > > I still can see the same behavior. Each snapshot creates
> > > > new CLOSE_WAIT Sockets which stay there till hbase master restart.
> > > >
> > > > I there any other setting I can try?
> > > >
> > >
> > > You saw my last suggestion about "...dfs.client.socketcache.expiryMsec to
> > > 900 in your HBase client configuration.."?
> > >
> > > St.Ack
> > >
> >
> 
> 
> 
> -- 
> Best regards,
> 
>    - Andy
> 
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Re: Re: Re: taking snapshot's creates to many TCP CLOSE_WAIT handles on the hbase master server

Posted by Andrew Purtell <ap...@apache.org>.

Let's circle back to the original mail:

> When I run lsof I saw that there where a lot of TCP CLOSE_WAIT handles
open with the regionserver as target.

Is that right? *Regionserver*, not another process (datanode or whatever)?
Or did I miss where somewhere along this thread there was evidence
confirming a datanode was the remote?

If you are sure that the stuck connections are to the regionserver process
(maybe pastebin lsof output so we can double check the port numbers
involved?) then the regionserver is closing the connection but the master
is not somehow, by definition of what CLOSE_WAIT means. HDFS settings won't
matter if it is the master is failing to close a socket, maybe this is an
IPC bug.



On Wed, Apr 30, 2014 at 12:38 AM, Hansi Klose <ha...@web.de> wrote:

> Hi,
>
> sorry i missed that  :-(
>
> I tried that parameter in my hbase-site.xml and restartet the hbase master
> and all regionserver.
>
>   <property>
>     <name>dfs.client.socketcache.expiryMsec</name>
>     <value>900</value>
>   </property>
>
> No change, the ClOSE_WAIT sockets still persists on the hbase master to the
> regionserver's datanode after taking snapshots.
>
> Because it was not clear for me where to the setting has to go
> i put it in our hdfs-site.xml too and restarted all datanodes.
> I thought that settings with dfs.client maybe have to go there.
> But this did not change the behavior either.
>
> Regards Hansi
>
> > Gesendet: Dienstag, 29. April 2014 um 19:21 Uhr
> > Von: Stack <st...@duboce.net>
> > An: Hbase-User <us...@hbase.apache.org>
> > Betreff: Re: Re: taking snapshot's creates to many TCP CLOSE_WAIT
> handles on the hbase master server
> >
> > On Tue, Apr 29, 2014 at 8:15 AM, Hansi Klose <ha...@web.de> wrote:
> >
> > > Hi all,
> > >
> > > sorry for the late answer.
> > >
> > > I configured the hbase-site.conf like this
> > >
> > >   <property>
> > >     <name>dfs.client.socketcache.capacity</name>
> > >     <value>0</value>
> > >   </property>
> > >   <property>
> > >     <name>dfs.datanode.socket.reuse.keepalive</name>
> > >     <value>0</value>
> > >   </property>
> > >
> > > and restarted the hbase master and all regionservers.
> > > I still can see the same behavior. Each snapshot creates
> > > new CLOSE_WAIT Sockets which stay there till hbase master restart.
> > >
> > > I there any other setting I can try?
> > >
> >
> > You saw my last suggestion about "...dfs.client.socketcache.expiryMsec to
> > 900 in your HBase client configuration.."?
> >
> > St.Ack
> >
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Aw: Re: Re: taking snapshot's creates to many TCP CLOSE_WAIT handles on the hbase master server

Posted by Hansi Klose <ha...@web.de>.

Hi,

sorry i missed that  :-(

I tried that parameter in my hbase-site.xml and restartet the hbase master and all regionserver.

  <property>
    <name>dfs.client.socketcache.expiryMsec</name>
    <value>900</value>
  </property>

No change, the ClOSE_WAIT sockets still persists on the hbase master to the 
regionserver's datanode after taking snapshots.

Because it was not clear for me where to the setting has to go
i put it in our hdfs-site.xml too and restarted all datanodes.
I thought that settings with dfs.client maybe have to go there. 
But this did not change the behavior either.

Regards Hansi

> Gesendet: Dienstag, 29. April 2014 um 19:21 Uhr
> Von: Stack <st...@duboce.net>
> An: Hbase-User <us...@hbase.apache.org>
> Betreff: Re: Re: taking snapshot's creates to many TCP CLOSE_WAIT handles on the hbase master server
>
> On Tue, Apr 29, 2014 at 8:15 AM, Hansi Klose <ha...@web.de> wrote:
> 
> > Hi all,
> >
> > sorry for the late answer.
> >
> > I configured the hbase-site.conf like this
> >
> >   <property>
> >     <name>dfs.client.socketcache.capacity</name>
> >     <value>0</value>
> >   </property>
> >   <property>
> >     <name>dfs.datanode.socket.reuse.keepalive</name>
> >     <value>0</value>
> >   </property>
> >
> > and restarted the hbase master and all regionservers.
> > I still can see the same behavior. Each snapshot creates
> > new CLOSE_WAIT Sockets which stay there till hbase master restart.
> >
> > I there any other setting I can try?
> >
> 
> You saw my last suggestion about "...dfs.client.socketcache.expiryMsec to
> 900 in your HBase client configuration.."?
> 
> St.Ack
>

Re: Re: taking snapshot's creates to many TCP CLOSE_WAIT handles on the hbase master server

Posted by Stack <st...@duboce.net>.

On Tue, Apr 29, 2014 at 8:15 AM, Hansi Klose <ha...@web.de> wrote:

> Hi all,
>
> sorry for the late answer.
>
> I configured the hbase-site.conf like this
>
>   <property>
>     <name>dfs.client.socketcache.capacity</name>
>     <value>0</value>
>   </property>
>   <property>
>     <name>dfs.datanode.socket.reuse.keepalive</name>
>     <value>0</value>
>   </property>
>
> and restarted the hbase master and all regionservers.
> I still can see the same behavior. Each snapshot creates
> new CLOSE_WAIT Sockets which stay there till hbase master restart.
>
> I there any other setting I can try?
>

You saw my last suggestion about "...dfs.client.socketcache.expiryMsec to
900 in your HBase client configuration.."?

St.Ack

Aw: Re: taking snapshot's creates to many TCP CLOSE_WAIT handles on the hbase master server

Posted by Hansi Klose <ha...@web.de>.

Hi all,

sorry for the late answer. 

I configured the hbase-site.conf like this

  <property>
    <name>dfs.client.socketcache.capacity</name>
    <value>0</value>
  </property>
  <property>
    <name>dfs.datanode.socket.reuse.keepalive</name>
    <value>0</value>
  </property>

and restarted the hbase master and all regionservers. 
I still can see the same behavior. Each snapshot creates
new CLOSE_WAIT Sockets which stay there till hbase master restart.

I there any other setting I can try?

Update is not possible at the moment.

Regards Hansi

> Gesendet: Sonntag, 20. April 2014 um 02:05 Uhr
> Von: Stack <st...@duboce.net>
> An: Hbase-User <us...@hbase.apache.org>
> Betreff: Re: taking snapshot's creates to many TCP CLOSE_WAIT handles on the hbase master server
>
> On Thu, Apr 17, 2014 at 9:50 PM, Stack <st...@duboce.net> wrote:
> 
> > On Thu, Apr 17, 2014 at 6:51 AM, Hansi Klose <ha...@web.de> wrote:
> >
> >> Hi,
> >>
> >> we use a script to take on a regular basis snapshot's and delete old
> >> one's.
> >>
> >> We recognizes that the web interface of the hbase master was not working
> >> any more becaues of too many open files.
> >>
> >> The master reaches his number of open file limit of 32768
> >>
> >> When I run lsof I saw that there where a lot of TCP CLOSE_WAIT handles
> >> open
> >> with the regionserver as target.
> >>
> >> On the regionserver there is just one connection to the hbase master.
> >>
> >> I can see that the count of the CLOSE_WAIT handles grow each time
> >> i take a snapshot. When i delete on nothing changes.
> >> Each time i take a snapshot  there are 20 - 30 new CLOSE_WAIT handles.
> >>
> >> Why does the master do not close the handles? Is there a parameter
> >> with a timeout we can use?
> >>
> >> We use hbase 0.94.2-cdh4.2.0.
> >>
> >
> > Does
> > https://issues.apache.org/jira/browse/HBASE-9393?jql=text%20~%20%22CLOSE_WAIT%22help?  In particular, what happens if you up the socket cache as suggested
> > on the end of the issue?
> >
> > HDFS-4911 may help (the CLOSE_WAIT is against local/remote DN, right?) or
> quoting one of our lads off an internal issue, "You could get most of the
> same benefit of HDFS-4911...by setting dfs.client.socketcache.expiryMsec to
> 900 in your HBase client configuration. The goal is that the client should
> not hang on to sockets longer than the DataNode does...."
> 
> Or, can you upgrade?
> 
> Thanks,
> 
> St.Ack
>

Re: taking snapshot's creates to many TCP CLOSE_WAIT handles on the hbase master server

Posted by Stack <st...@duboce.net>.

On Thu, Apr 17, 2014 at 9:50 PM, Stack <st...@duboce.net> wrote:

> On Thu, Apr 17, 2014 at 6:51 AM, Hansi Klose <ha...@web.de> wrote:
>
>> Hi,
>>
>> we use a script to take on a regular basis snapshot's and delete old
>> one's.
>>
>> We recognizes that the web interface of the hbase master was not working
>> any more becaues of too many open files.
>>
>> The master reaches his number of open file limit of 32768
>>
>> When I run lsof I saw that there where a lot of TCP CLOSE_WAIT handles
>> open
>> with the regionserver as target.
>>
>> On the regionserver there is just one connection to the hbase master.
>>
>> I can see that the count of the CLOSE_WAIT handles grow each time
>> i take a snapshot. When i delete on nothing changes.
>> Each time i take a snapshot  there are 20 - 30 new CLOSE_WAIT handles.
>>
>> Why does the master do not close the handles? Is there a parameter
>> with a timeout we can use?
>>
>> We use hbase 0.94.2-cdh4.2.0.
>>
>
> Does
> https://issues.apache.org/jira/browse/HBASE-9393?jql=text%20~%20%22CLOSE_WAIT%22help?  In particular, what happens if you up the socket cache as suggested
> on the end of the issue?
>
> HDFS-4911 may help (the CLOSE_WAIT is against local/remote DN, right?) or
quoting one of our lads off an internal issue, "You could get most of the
same benefit of HDFS-4911...by setting dfs.client.socketcache.expiryMsec to
900 in your HBase client configuration. The goal is that the client should
not hang on to sockets longer than the DataNode does...."

Or, can you upgrade?

Thanks,

St.Ack

Re: taking snapshot's creates to many TCP CLOSE_WAIT handles on the hbase master server

Posted by Stack <st...@duboce.net>.

On Thu, Apr 17, 2014 at 6:51 AM, Hansi Klose <ha...@web.de> wrote:

> Hi,
>
> we use a script to take on a regular basis snapshot's and delete old one's.
>
> We recognizes that the web interface of the hbase master was not working
> any more becaues of too many open files.
>
> The master reaches his number of open file limit of 32768
>
> When I run lsof I saw that there where a lot of TCP CLOSE_WAIT handles open
> with the regionserver as target.
>
> On the regionserver there is just one connection to the hbase master.
>
> I can see that the count of the CLOSE_WAIT handles grow each time
> i take a snapshot. When i delete on nothing changes.
> Each time i take a snapshot  there are 20 - 30 new CLOSE_WAIT handles.
>
> Why does the master do not close the handles? Is there a parameter
> with a timeout we can use?
>
> We use hbase 0.94.2-cdh4.2.0.
>

Does
https://issues.apache.org/jira/browse/HBASE-9393?jql=text%20~%20%22CLOSE_WAIT%22help?
 In particular, what happens if you up the socket cache as suggested
on the end of the issue?

St.Ack

Re: Re: taking snapshot's creates to many TCP CLOSE_WAIT handles on the hbase master server

Posted by Ted Yu <yu...@gmail.com>.

I went over the jstack output.
There were 53 IPC Server handler threads which were mostly in WAITING state.

Please try Stack's suggestion and see if the problem gets resolved.

Cheers


On Tue, Apr 22, 2014 at 3:18 AM, Hansi Klose <ha...@web.de> wrote:

> Hi Ted,
>
> I inserted the output at pastebin
>
> http://pastebin.com/n3mMPxBA
>
> At the moment the hbase master process holds 10716 handles.
> We stopped making snapshots last week.
> After 4 days the count is still the same.
>
> Regards Hansi
>
> > Gesendet: Donnerstag, 17. April 2014 um 19:09 Uhr
> > Von: "Ted Yu" <yu...@gmail.com>
> > An: "user@hbase.apache.org" <us...@hbase.apache.org>
> > Betreff: Re: taking snapshot's creates to many TCP CLOSE_WAIT handles on
> the hbase master server
> >
> > Can you take jstack of master process and pastebin it ?
> >
> > Thanks
> >
> >
> > On Thu, Apr 17, 2014 at 6:51 AM, Hansi Klose <ha...@web.de> wrote:
> >
> > > Hi,
> > >
> > > we use a script to take on a regular basis snapshot's and delete old
> one's.
> > >
> > > We recognizes that the web interface of the hbase master was not
> working
> > > any more becaues of too many open files.
> > >
> > > The master reaches his number of open file limit of 32768
> > >
> > > When I run lsof I saw that there where a lot of TCP CLOSE_WAIT handles
> open
> > > with the regionserver as target.
> > >
> > > On the regionserver there is just one connection to the hbase master.
> > >
> > > I can see that the count of the CLOSE_WAIT handles grow each time
> > > i take a snapshot. When i delete on nothing changes.
> > > Each time i take a snapshot  there are 20 - 30 new CLOSE_WAIT handles.
> > >
> > > Why does the master do not close the handles? Is there a parameter
> > > with a timeout we can use?
> > >
> > > We use hbase 0.94.2-cdh4.2.0.
> > >
> > > Regards Hansi
> > >
> >
>

Aw: Re: taking snapshot's creates to many TCP CLOSE_WAIT handles on the hbase master server

Posted by Hansi Klose <ha...@web.de>.

Hi Ted,

I inserted the output at pastebin

http://pastebin.com/n3mMPxBA

At the moment the hbase master process holds 10716 handles.
We stopped making snapshots last week.
After 4 days the count is still the same.

Regards Hansi

> Gesendet: Donnerstag, 17. April 2014 um 19:09 Uhr
> Von: "Ted Yu" <yu...@gmail.com>
> An: "user@hbase.apache.org" <us...@hbase.apache.org>
> Betreff: Re: taking snapshot's creates to many TCP CLOSE_WAIT handles on the hbase master server
>
> Can you take jstack of master process and pastebin it ?
> 
> Thanks
> 
> 
> On Thu, Apr 17, 2014 at 6:51 AM, Hansi Klose <ha...@web.de> wrote:
> 
> > Hi,
> >
> > we use a script to take on a regular basis snapshot's and delete old one's.
> >
> > We recognizes that the web interface of the hbase master was not working
> > any more becaues of too many open files.
> >
> > The master reaches his number of open file limit of 32768
> >
> > When I run lsof I saw that there where a lot of TCP CLOSE_WAIT handles open
> > with the regionserver as target.
> >
> > On the regionserver there is just one connection to the hbase master.
> >
> > I can see that the count of the CLOSE_WAIT handles grow each time
> > i take a snapshot. When i delete on nothing changes.
> > Each time i take a snapshot  there are 20 - 30 new CLOSE_WAIT handles.
> >
> > Why does the master do not close the handles? Is there a parameter
> > with a timeout we can use?
> >
> > We use hbase 0.94.2-cdh4.2.0.
> >
> > Regards Hansi
> >
>

Re: taking snapshot's creates to many TCP CLOSE_WAIT handles on the hbase master server

Posted by Ted Yu <yu...@gmail.com>.

Can you take jstack of master process and pastebin it ?

Thanks


On Thu, Apr 17, 2014 at 6:51 AM, Hansi Klose <ha...@web.de> wrote:

> Hi,
>
> we use a script to take on a regular basis snapshot's and delete old one's.
>
> We recognizes that the web interface of the hbase master was not working
> any more becaues of too many open files.
>
> The master reaches his number of open file limit of 32768
>
> When I run lsof I saw that there where a lot of TCP CLOSE_WAIT handles open
> with the regionserver as target.
>
> On the regionserver there is just one connection to the hbase master.
>
> I can see that the count of the CLOSE_WAIT handles grow each time
> i take a snapshot. When i delete on nothing changes.
> Each time i take a snapshot  there are 20 - 30 new CLOSE_WAIT handles.
>
> Why does the master do not close the handles? Is there a parameter
> with a timeout we can use?
>
> We use hbase 0.94.2-cdh4.2.0.
>
> Regards Hansi
>