You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Bryan Keller <br...@gmail.com> on 2011/04/16 00:41:25 UTC

Re: zk connection leak with TableInput/OutputFormat (CDH3b4, 0.90.1)

I am having this same problem. After every run of my map-reduce job which uses TableInputFormat, I am leaking one ZK connection. The connections that are not being cleaned up are connected to the node that submitted the job, not the cluster nodes.

I tried explicitly cleaning up the connection using HConnectionManager.deleteConnection(config, true) after the job runs, but this has no effect. ZK still retains one connection per job run and never releases it. Eventually I run out of ZK connections even if I set maxCnxns very high (e.g. 600).

This happened for me with CDH3B4 and is still happening with the CDH3 release.



On Mar 23, 2011, at 3:27 PM, Todd Lipcon wrote:

> Hi Dmitriy,
> 
> Are you submitting these MR jobs on a cluster? Which machines are
> leaking the connections? Is it the cluster nodes or the node where you
> submitted the job?
> 
> After a job is complete, the JVMs that ran the tasks should be
> completely torn down and thus should not be able to hang onto a
> connection.
> 
> -Todd
> 
> On Wed, Mar 23, 2011 at 2:24 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
>> yes i am passing destroyProxy=true. But according to the code, it
>> doesn't affect closing zookeeper connection (it should be closed
>> anyway) but i  have +1 zk connection each time i run the MR job still.
>> 
>> -d
>> 
>> On Wed, Mar 23, 2011 at 2:22 PM, Ted Yu <yu...@gmail.com> wrote:
>>> I assume you passed true as second parameter to deleteConnection().
>>> 
>>> On Wed, Mar 23, 2011 at 1:54 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I am experiencing severe connection leak in my MR client that uses
>>>> Hbase as input/output . Every job that uses TableInputFormat leaks 1
>>>> zookeeper connection per run as evidenced by netstat.
>>>> 
>>>> I understand that the way HTable manages connections now is it creates
>>>> a new HBase (and also Zookeeper) connection per each instance of
>>>> Configuration it is initialized with. By looking at the code of the
>>>> TableInputFormat class, i see that it creates HTable in the front end
>>>> during configuration (of course, it probably needs to use it to
>>>> determine region splits).
>>>> 
>>>> Since i have to configure each job individually, i must create a new
>>>> instance of Configuration. Thus, i am not able to use shared HBase
>>>> connections (which i would prefer to, but there seems to be no way now
>>>> to do that).
>>>> 
>>>> So... after i run an instance of MR job, the hbase connection seems to
>>>> be leaked. It also leaks zk connection , which is a problem since
>>>> zookeeper instances have limits on how many connections can be made
>>>> from the same IP and eventually the client is not able to create any
>>>> new HTables anymore since it can't establish any new zookeeper
>>>> connections.
>>>> 
>>>> I tried to do explicit cleanup by calling
>>>> HConnectionManager.deleteConnection (Configuration) passing in the
>>>> configuration that i used to create MR job. Doesn't seem to work.
>>>> 
>>>> So.. Is there a way to run MR job with TableInputFormat without
>>>> leaking a connection? I am pretty sure i am not creating any HTables
>>>> in the client side. Or is it a bug? I spent several days now
>>>> investigation an issue but i am still not able to come up with a
>>>> workaround against zookeeper connection leaks in HBase MR jobs.
>>>> 
>>>> thank you very much.
>>>> -Dmitriy
>>>> 
>>> 
>> 
> 
> 
> 
> -- 
> Todd Lipcon
> Software Engineer, Cloudera

Re: zk connection leak with TableInput/OutputFormat (CDH3b4, 0.90.1)

Posted by Bryan Keller <br...@gmail.com>.

I opened a bug for this.
https://issues.apache.org/jira/browse/HBASE-3792

On Apr 16, 2011, at 4:56 AM, Bryan Keller wrote:

> I did more research and found the issue.
> 
> The TableInputFormat creates an HTable using a new Configuration object, and it never cleans it up. When running a Mapper, the TableInputFormat is instantiated and the ZK connection is created. While this connection is not explicitly cleaned up, the Mapper process eventually exits and thus the connection is closed. Ideally the TableRecordReader would close the connection in its close() method rather than relying on the process to die for connection cleanup. This is fairly easy to implement by overriding TableRecordReader, and also overriding TableInputFormat to specify the new record reader.
> 
> The leak occurs when the JobClient is initializing and needs to retrieves the splits. To get the splits, it instantiates a TableInputFormat. Doing so creates a ZK connection that is never cleaned up. Unlike the mapper, however, my job client process does not die. Thus the ZK connections accumulate.
> 
> I was able to fix the problem by writing my own TableInputFormat that does not initialize the HTable in the getConf() method and does not have an HTable member variable. Rather, it has a variable for the table name. The HTable is instantiated where needed and then cleaned up. For example, in the getSplits() method, I create the HTable, then close the connection once the splits are retrieved. I also create the HTable when creating the record reader, and I have a record reader that closes the connection when done.
> 
> Calling HConnectionManager.deleteAllConnections() is not desirable in my case, as I may have some connections that I do not want deleted.
> 
> 
> On Apr 16, 2011, at 3:56 AM, Ted Yu wrote:
> 
>> I think you should call this method of HTablePool:
>> public void closeTablePool(final String tableName) throws IOException {
>> 
>> Actually you only use HTablePool in populateTable(), HTable should be enough
>> for you.
>> 
>> I have logged https://issues.apache.org/jira/browse/HBASE-3791 for ease of
>> debugging.
>> 
>> I think if you place this call:
>> HConnectionManager.deleteAllConnections(true);
>> on line 52 before calling obj.wait(), situation should be different.
>> 
>> Cheers
>> 
>> On Fri, Apr 15, 2011 at 11:56 PM, Bryan Keller <br...@gmail.com> wrote:
>> 
>>> FWIW, I created a test program that demonstrates the issue. The program
>>> creates an HBase table, populates it with 10 rows, then runs a simple
>>> map-reduce job 10 times in succession, and then goes into a wait state. The
>>> test uses gradle so you'll need to download that.
>>> 
>>> Before running, telnet to Zookeeper and type 'stats' to get the
>>> connections. Then run the program using 'gradle run'. Finally, telnet to
>>> Zookeeper again and type 'stats' to get the connections.
>>> 
>>> I'd be interested to see if others are seeing the same behavior I am.
>>> 
>>> You can download the code here:
>>> http://www.vancameron.net/HBaseMR.zip
>>> 
>>> I'll open a JIRA issue after I do a little more research into the problem.
>>> 
>>> On Apr 15, 2011, at 4:19 PM, Ted Yu wrote:
>>> 
>>>> Bryan:
>>>> Thanks for reporting this issue.
>>>> TableOutputFormat.TableRecordWriter calls the following in close():
>>>>    HConnectionManager.deleteAllConnections(true);
>>>> But there is no such call in TableInputFormat / TableInputFormatBase /
>>>> TableRecordReader
>>>> 
>>>> Do you mind filing a JIRA ?
>>>> 
>>>> On Fri, Apr 15, 2011 at 3:41 PM, Bryan Keller <br...@gmail.com> wrote:
>>>> 
>>>>> I am having this same problem. After every run of my map-reduce job
>>> which
>>>>> uses TableInputFormat, I am leaking one ZK connection. The connections
>>> that
>>>>> are not being cleaned up are connected to the node that submitted the
>>> job,
>>>>> not the cluster nodes.
>>>>> 
>>>>> I tried explicitly cleaning up the connection using
>>>>> HConnectionManager.deleteConnection(config, true) after the job runs,
>>> but
>>>>> this has no effect. ZK still retains one connection per job run and
>>> never
>>>>> releases it. Eventually I run out of ZK connections even if I set
>>> maxCnxns
>>>>> very high (e.g. 600).
>>>>> 
>>>>> This happened for me with CDH3B4 and is still happening with the CDH3
>>>>> release.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Mar 23, 2011, at 3:27 PM, Todd Lipcon wrote:
>>>>> 
>>>>>> Hi Dmitriy,
>>>>>> 
>>>>>> Are you submitting these MR jobs on a cluster? Which machines are
>>>>>> leaking the connections? Is it the cluster nodes or the node where you
>>>>>> submitted the job?
>>>>>> 
>>>>>> After a job is complete, the JVMs that ran the tasks should be
>>>>>> completely torn down and thus should not be able to hang onto a
>>>>>> connection.
>>>>>> 
>>>>>> -Todd
>>>>>> 
>>>>>> On Wed, Mar 23, 2011 at 2:24 PM, Dmitriy Lyubimov <dl...@gmail.com>
>>>>> wrote:
>>>>>>> yes i am passing destroyProxy=true. But according to the code, it
>>>>>>> doesn't affect closing zookeeper connection (it should be closed
>>>>>>> anyway) but i  have +1 zk connection each time i run the MR job still.
>>>>>>> 
>>>>>>> -d
>>>>>>> 
>>>>>>> On Wed, Mar 23, 2011 at 2:22 PM, Ted Yu <yu...@gmail.com> wrote:
>>>>>>>> I assume you passed true as second parameter to deleteConnection().
>>>>>>>> 
>>>>>>>> On Wed, Mar 23, 2011 at 1:54 PM, Dmitriy Lyubimov <dlieu.7@gmail.com
>>>> 
>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> I am experiencing severe connection leak in my MR client that uses
>>>>>>>>> Hbase as input/output . Every job that uses TableInputFormat leaks 1
>>>>>>>>> zookeeper connection per run as evidenced by netstat.
>>>>>>>>> 
>>>>>>>>> I understand that the way HTable manages connections now is it
>>> creates
>>>>>>>>> a new HBase (and also Zookeeper) connection per each instance of
>>>>>>>>> Configuration it is initialized with. By looking at the code of the
>>>>>>>>> TableInputFormat class, i see that it creates HTable in the front
>>> end
>>>>>>>>> during configuration (of course, it probably needs to use it to
>>>>>>>>> determine region splits).
>>>>>>>>> 
>>>>>>>>> Since i have to configure each job individually, i must create a new
>>>>>>>>> instance of Configuration. Thus, i am not able to use shared HBase
>>>>>>>>> connections (which i would prefer to, but there seems to be no way
>>> now
>>>>>>>>> to do that).
>>>>>>>>> 
>>>>>>>>> So... after i run an instance of MR job, the hbase connection seems
>>> to
>>>>>>>>> be leaked. It also leaks zk connection , which is a problem since
>>>>>>>>> zookeeper instances have limits on how many connections can be made
>>>>>>>>> from the same IP and eventually the client is not able to create any
>>>>>>>>> new HTables anymore since it can't establish any new zookeeper
>>>>>>>>> connections.
>>>>>>>>> 
>>>>>>>>> I tried to do explicit cleanup by calling
>>>>>>>>> HConnectionManager.deleteConnection (Configuration) passing in the
>>>>>>>>> configuration that i used to create MR job. Doesn't seem to work.
>>>>>>>>> 
>>>>>>>>> So.. Is there a way to run MR job with TableInputFormat without
>>>>>>>>> leaking a connection? I am pretty sure i am not creating any HTables
>>>>>>>>> in the client side. Or is it a bug? I spent several days now
>>>>>>>>> investigation an issue but i am still not able to come up with a
>>>>>>>>> workaround against zookeeper connection leaks in HBase MR jobs.
>>>>>>>>> 
>>>>>>>>> thank you very much.
>>>>>>>>> -Dmitriy
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Todd Lipcon
>>>>>> Software Engineer, Cloudera
>>>>> 
>>>>> 
>>> 
>>> 
>

Re: zk connection leak with TableInput/OutputFormat (CDH3b4, 0.90.1)

Posted by Ted Yu <yu...@gmail.com>.

Bryan:
Thanks for the analysis.

>> The TableInputFormat creates an HTable using a new Configuration object
...
I and Karthick have been tackling the root cause of this problem in
HBASE-3777 this week.
Our current plan is to combine his work in HBASE-3766 with HBASE-3777.
In short, after the combined patch goes in, the following call would reuse
the connection (by considering the connection-specific properties) from
previous invocation instead of creating new one:
      setHTable(new HTable(new Configuration(conf), tableName));

Your proposal in HBASE-3792 is valuable. It suits the need of people who
choose to reuse the JVM in their map/reduce program.
Good job.

On Sat, Apr 16, 2011 at 4:56 AM, Bryan Keller <br...@gmail.com> wrote:

> I did more research and found the issue.
>
> The TableInputFormat creates an HTable using a new Configuration object,
> and it never cleans it up. When running a Mapper, the TableInputFormat is
> instantiated and the ZK connection is created. While this connection is not
> explicitly cleaned up, the Mapper process eventually exits and thus the
> connection is closed. Ideally the TableRecordReader would close the
> connection in its close() method rather than relying on the process to die
> for connection cleanup. This is fairly easy to implement by overriding
> TableRecordReader, and also overriding TableInputFormat to specify the new
> record reader.
>
> The leak occurs when the JobClient is initializing and needs to retrieves
> the splits. To get the splits, it instantiates a TableInputFormat. Doing so
> creates a ZK connection that is never cleaned up. Unlike the mapper,
> however, my job client process does not die. Thus the ZK connections
> accumulate.
>
> I was able to fix the problem by writing my own TableInputFormat that does
> not initialize the HTable in the getConf() method and does not have an
> HTable member variable. Rather, it has a variable for the table name. The
> HTable is instantiated where needed and then cleaned up. For example, in the
> getSplits() method, I create the HTable, then close the connection once the
> splits are retrieved. I also create the HTable when creating the record
> reader, and I have a record reader that closes the connection when done.
>
> Calling HConnectionManager.deleteAllConnections() is not desirable in my
> case, as I may have some connections that I do not want deleted.
>
>
> On Apr 16, 2011, at 3:56 AM, Ted Yu wrote:
>
> > I think you should call this method of HTablePool:
> >  public void closeTablePool(final String tableName) throws IOException {
> >
> > Actually you only use HTablePool in populateTable(), HTable should be
> enough
> > for you.
> >
> > I have logged https://issues.apache.org/jira/browse/HBASE-3791 for ease
> of
> > debugging.
> >
> > I think if you place this call:
> > HConnectionManager.deleteAllConnections(true);
> > on line 52 before calling obj.wait(), situation should be different.
> >
> > Cheers
> >
> > On Fri, Apr 15, 2011 at 11:56 PM, Bryan Keller <br...@gmail.com>
> wrote:
> >
> >> FWIW, I created a test program that demonstrates the issue. The program
> >> creates an HBase table, populates it with 10 rows, then runs a simple
> >> map-reduce job 10 times in succession, and then goes into a wait state.
> The
> >> test uses gradle so you'll need to download that.
> >>
> >> Before running, telnet to Zookeeper and type 'stats' to get the
> >> connections. Then run the program using 'gradle run'. Finally, telnet to
> >> Zookeeper again and type 'stats' to get the connections.
> >>
> >> I'd be interested to see if others are seeing the same behavior I am.
> >>
> >> You can download the code here:
> >> http://www.vancameron.net/HBaseMR.zip
> >>
> >> I'll open a JIRA issue after I do a little more research into the
> problem.
> >>
> >> On Apr 15, 2011, at 4:19 PM, Ted Yu wrote:
> >>
> >>> Bryan:
> >>> Thanks for reporting this issue.
> >>> TableOutputFormat.TableRecordWriter calls the following in close():
> >>>     HConnectionManager.deleteAllConnections(true);
> >>> But there is no such call in TableInputFormat / TableInputFormatBase /
> >>> TableRecordReader
> >>>
> >>> Do you mind filing a JIRA ?
> >>>
> >>> On Fri, Apr 15, 2011 at 3:41 PM, Bryan Keller <br...@gmail.com>
> wrote:
> >>>
> >>>> I am having this same problem. After every run of my map-reduce job
> >> which
> >>>> uses TableInputFormat, I am leaking one ZK connection. The connections
> >> that
> >>>> are not being cleaned up are connected to the node that submitted the
> >> job,
> >>>> not the cluster nodes.
> >>>>
> >>>> I tried explicitly cleaning up the connection using
> >>>> HConnectionManager.deleteConnection(config, true) after the job runs,
> >> but
> >>>> this has no effect. ZK still retains one connection per job run and
> >> never
> >>>> releases it. Eventually I run out of ZK connections even if I set
> >> maxCnxns
> >>>> very high (e.g. 600).
> >>>>
> >>>> This happened for me with CDH3B4 and is still happening with the CDH3
> >>>> release.
> >>>>
> >>>>
> >>>>
> >>>> On Mar 23, 2011, at 3:27 PM, Todd Lipcon wrote:
> >>>>
> >>>>> Hi Dmitriy,
> >>>>>
> >>>>> Are you submitting these MR jobs on a cluster? Which machines are
> >>>>> leaking the connections? Is it the cluster nodes or the node where
> you
> >>>>> submitted the job?
> >>>>>
> >>>>> After a job is complete, the JVMs that ran the tasks should be
> >>>>> completely torn down and thus should not be able to hang onto a
> >>>>> connection.
> >>>>>
> >>>>> -Todd
> >>>>>
> >>>>> On Wed, Mar 23, 2011 at 2:24 PM, Dmitriy Lyubimov <dlieu.7@gmail.com
> >
> >>>> wrote:
> >>>>>> yes i am passing destroyProxy=true. But according to the code, it
> >>>>>> doesn't affect closing zookeeper connection (it should be closed
> >>>>>> anyway) but i  have +1 zk connection each time i run the MR job
> still.
> >>>>>>
> >>>>>> -d
> >>>>>>
> >>>>>> On Wed, Mar 23, 2011 at 2:22 PM, Ted Yu <yu...@gmail.com>
> wrote:
> >>>>>>> I assume you passed true as second parameter to deleteConnection().
> >>>>>>>
> >>>>>>> On Wed, Mar 23, 2011 at 1:54 PM, Dmitriy Lyubimov <
> dlieu.7@gmail.com
> >>>
> >>>> wrote:
> >>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> I am experiencing severe connection leak in my MR client that uses
> >>>>>>>> Hbase as input/output . Every job that uses TableInputFormat leaks
> 1
> >>>>>>>> zookeeper connection per run as evidenced by netstat.
> >>>>>>>>
> >>>>>>>> I understand that the way HTable manages connections now is it
> >> creates
> >>>>>>>> a new HBase (and also Zookeeper) connection per each instance of
> >>>>>>>> Configuration it is initialized with. By looking at the code of
> the
> >>>>>>>> TableInputFormat class, i see that it creates HTable in the front
> >> end
> >>>>>>>> during configuration (of course, it probably needs to use it to
> >>>>>>>> determine region splits).
> >>>>>>>>
> >>>>>>>> Since i have to configure each job individually, i must create a
> new
> >>>>>>>> instance of Configuration. Thus, i am not able to use shared HBase
> >>>>>>>> connections (which i would prefer to, but there seems to be no way
> >> now
> >>>>>>>> to do that).
> >>>>>>>>
> >>>>>>>> So... after i run an instance of MR job, the hbase connection
> seems
> >> to
> >>>>>>>> be leaked. It also leaks zk connection , which is a problem since
> >>>>>>>> zookeeper instances have limits on how many connections can be
> made
> >>>>>>>> from the same IP and eventually the client is not able to create
> any
> >>>>>>>> new HTables anymore since it can't establish any new zookeeper
> >>>>>>>> connections.
> >>>>>>>>
> >>>>>>>> I tried to do explicit cleanup by calling
> >>>>>>>> HConnectionManager.deleteConnection (Configuration) passing in the
> >>>>>>>> configuration that i used to create MR job. Doesn't seem to work.
> >>>>>>>>
> >>>>>>>> So.. Is there a way to run MR job with TableInputFormat without
> >>>>>>>> leaking a connection? I am pretty sure i am not creating any
> HTables
> >>>>>>>> in the client side. Or is it a bug? I spent several days now
> >>>>>>>> investigation an issue but i am still not able to come up with a
> >>>>>>>> workaround against zookeeper connection leaks in HBase MR jobs.
> >>>>>>>>
> >>>>>>>> thank you very much.
> >>>>>>>> -Dmitriy
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Todd Lipcon
> >>>>> Software Engineer, Cloudera
> >>>>
> >>>>
> >>
> >>
>
>

Re: zk connection leak with TableInput/OutputFormat (CDH3b4, 0.90.1)

Posted by Bryan Keller <br...@gmail.com>.

I did more research and found the issue.

The TableInputFormat creates an HTable using a new Configuration object, and it never cleans it up. When running a Mapper, the TableInputFormat is instantiated and the ZK connection is created. While this connection is not explicitly cleaned up, the Mapper process eventually exits and thus the connection is closed. Ideally the TableRecordReader would close the connection in its close() method rather than relying on the process to die for connection cleanup. This is fairly easy to implement by overriding TableRecordReader, and also overriding TableInputFormat to specify the new record reader.

The leak occurs when the JobClient is initializing and needs to retrieves the splits. To get the splits, it instantiates a TableInputFormat. Doing so creates a ZK connection that is never cleaned up. Unlike the mapper, however, my job client process does not die. Thus the ZK connections accumulate.

I was able to fix the problem by writing my own TableInputFormat that does not initialize the HTable in the getConf() method and does not have an HTable member variable. Rather, it has a variable for the table name. The HTable is instantiated where needed and then cleaned up. For example, in the getSplits() method, I create the HTable, then close the connection once the splits are retrieved. I also create the HTable when creating the record reader, and I have a record reader that closes the connection when done.

Calling HConnectionManager.deleteAllConnections() is not desirable in my case, as I may have some connections that I do not want deleted.


On Apr 16, 2011, at 3:56 AM, Ted Yu wrote:

> I think you should call this method of HTablePool:
>  public void closeTablePool(final String tableName) throws IOException {
> 
> Actually you only use HTablePool in populateTable(), HTable should be enough
> for you.
> 
> I have logged https://issues.apache.org/jira/browse/HBASE-3791 for ease of
> debugging.
> 
> I think if you place this call:
> HConnectionManager.deleteAllConnections(true);
> on line 52 before calling obj.wait(), situation should be different.
> 
> Cheers
> 
> On Fri, Apr 15, 2011 at 11:56 PM, Bryan Keller <br...@gmail.com> wrote:
> 
>> FWIW, I created a test program that demonstrates the issue. The program
>> creates an HBase table, populates it with 10 rows, then runs a simple
>> map-reduce job 10 times in succession, and then goes into a wait state. The
>> test uses gradle so you'll need to download that.
>> 
>> Before running, telnet to Zookeeper and type 'stats' to get the
>> connections. Then run the program using 'gradle run'. Finally, telnet to
>> Zookeeper again and type 'stats' to get the connections.
>> 
>> I'd be interested to see if others are seeing the same behavior I am.
>> 
>> You can download the code here:
>> http://www.vancameron.net/HBaseMR.zip
>> 
>> I'll open a JIRA issue after I do a little more research into the problem.
>> 
>> On Apr 15, 2011, at 4:19 PM, Ted Yu wrote:
>> 
>>> Bryan:
>>> Thanks for reporting this issue.
>>> TableOutputFormat.TableRecordWriter calls the following in close():
>>>     HConnectionManager.deleteAllConnections(true);
>>> But there is no such call in TableInputFormat / TableInputFormatBase /
>>> TableRecordReader
>>> 
>>> Do you mind filing a JIRA ?
>>> 
>>> On Fri, Apr 15, 2011 at 3:41 PM, Bryan Keller <br...@gmail.com> wrote:
>>> 
>>>> I am having this same problem. After every run of my map-reduce job
>> which
>>>> uses TableInputFormat, I am leaking one ZK connection. The connections
>> that
>>>> are not being cleaned up are connected to the node that submitted the
>> job,
>>>> not the cluster nodes.
>>>> 
>>>> I tried explicitly cleaning up the connection using
>>>> HConnectionManager.deleteConnection(config, true) after the job runs,
>> but
>>>> this has no effect. ZK still retains one connection per job run and
>> never
>>>> releases it. Eventually I run out of ZK connections even if I set
>> maxCnxns
>>>> very high (e.g. 600).
>>>> 
>>>> This happened for me with CDH3B4 and is still happening with the CDH3
>>>> release.
>>>> 
>>>> 
>>>> 
>>>> On Mar 23, 2011, at 3:27 PM, Todd Lipcon wrote:
>>>> 
>>>>> Hi Dmitriy,
>>>>> 
>>>>> Are you submitting these MR jobs on a cluster? Which machines are
>>>>> leaking the connections? Is it the cluster nodes or the node where you
>>>>> submitted the job?
>>>>> 
>>>>> After a job is complete, the JVMs that ran the tasks should be
>>>>> completely torn down and thus should not be able to hang onto a
>>>>> connection.
>>>>> 
>>>>> -Todd
>>>>> 
>>>>> On Wed, Mar 23, 2011 at 2:24 PM, Dmitriy Lyubimov <dl...@gmail.com>
>>>> wrote:
>>>>>> yes i am passing destroyProxy=true. But according to the code, it
>>>>>> doesn't affect closing zookeeper connection (it should be closed
>>>>>> anyway) but i  have +1 zk connection each time i run the MR job still.
>>>>>> 
>>>>>> -d
>>>>>> 
>>>>>> On Wed, Mar 23, 2011 at 2:22 PM, Ted Yu <yu...@gmail.com> wrote:
>>>>>>> I assume you passed true as second parameter to deleteConnection().
>>>>>>> 
>>>>>>> On Wed, Mar 23, 2011 at 1:54 PM, Dmitriy Lyubimov <dlieu.7@gmail.com
>>> 
>>>> wrote:
>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I am experiencing severe connection leak in my MR client that uses
>>>>>>>> Hbase as input/output . Every job that uses TableInputFormat leaks 1
>>>>>>>> zookeeper connection per run as evidenced by netstat.
>>>>>>>> 
>>>>>>>> I understand that the way HTable manages connections now is it
>> creates
>>>>>>>> a new HBase (and also Zookeeper) connection per each instance of
>>>>>>>> Configuration it is initialized with. By looking at the code of the
>>>>>>>> TableInputFormat class, i see that it creates HTable in the front
>> end
>>>>>>>> during configuration (of course, it probably needs to use it to
>>>>>>>> determine region splits).
>>>>>>>> 
>>>>>>>> Since i have to configure each job individually, i must create a new
>>>>>>>> instance of Configuration. Thus, i am not able to use shared HBase
>>>>>>>> connections (which i would prefer to, but there seems to be no way
>> now
>>>>>>>> to do that).
>>>>>>>> 
>>>>>>>> So... after i run an instance of MR job, the hbase connection seems
>> to
>>>>>>>> be leaked. It also leaks zk connection , which is a problem since
>>>>>>>> zookeeper instances have limits on how many connections can be made
>>>>>>>> from the same IP and eventually the client is not able to create any
>>>>>>>> new HTables anymore since it can't establish any new zookeeper
>>>>>>>> connections.
>>>>>>>> 
>>>>>>>> I tried to do explicit cleanup by calling
>>>>>>>> HConnectionManager.deleteConnection (Configuration) passing in the
>>>>>>>> configuration that i used to create MR job. Doesn't seem to work.
>>>>>>>> 
>>>>>>>> So.. Is there a way to run MR job with TableInputFormat without
>>>>>>>> leaking a connection? I am pretty sure i am not creating any HTables
>>>>>>>> in the client side. Or is it a bug? I spent several days now
>>>>>>>> investigation an issue but i am still not able to come up with a
>>>>>>>> workaround against zookeeper connection leaks in HBase MR jobs.
>>>>>>>> 
>>>>>>>> thank you very much.
>>>>>>>> -Dmitriy
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Todd Lipcon
>>>>> Software Engineer, Cloudera
>>>> 
>>>> 
>> 
>>

Re: zk connection leak with TableInput/OutputFormat (CDH3b4, 0.90.1)

Posted by Ted Yu <yu...@gmail.com>.

I think you should call this method of HTablePool:
  public void closeTablePool(final String tableName) throws IOException {

Actually you only use HTablePool in populateTable(), HTable should be enough
for you.

I have logged https://issues.apache.org/jira/browse/HBASE-3791 for ease of
debugging.

I think if you place this call:
 HConnectionManager.deleteAllConnections(true);
on line 52 before calling obj.wait(), situation should be different.

Cheers

On Fri, Apr 15, 2011 at 11:56 PM, Bryan Keller <br...@gmail.com> wrote:

> FWIW, I created a test program that demonstrates the issue. The program
> creates an HBase table, populates it with 10 rows, then runs a simple
> map-reduce job 10 times in succession, and then goes into a wait state. The
> test uses gradle so you'll need to download that.
>
> Before running, telnet to Zookeeper and type 'stats' to get the
> connections. Then run the program using 'gradle run'. Finally, telnet to
> Zookeeper again and type 'stats' to get the connections.
>
> I'd be interested to see if others are seeing the same behavior I am.
>
> You can download the code here:
> http://www.vancameron.net/HBaseMR.zip
>
> I'll open a JIRA issue after I do a little more research into the problem.
>
> On Apr 15, 2011, at 4:19 PM, Ted Yu wrote:
>
> > Bryan:
> > Thanks for reporting this issue.
> > TableOutputFormat.TableRecordWriter calls the following in close():
> >      HConnectionManager.deleteAllConnections(true);
> > But there is no such call in TableInputFormat / TableInputFormatBase /
> > TableRecordReader
> >
> > Do you mind filing a JIRA ?
> >
> > On Fri, Apr 15, 2011 at 3:41 PM, Bryan Keller <br...@gmail.com> wrote:
> >
> >> I am having this same problem. After every run of my map-reduce job
> which
> >> uses TableInputFormat, I am leaking one ZK connection. The connections
> that
> >> are not being cleaned up are connected to the node that submitted the
> job,
> >> not the cluster nodes.
> >>
> >> I tried explicitly cleaning up the connection using
> >> HConnectionManager.deleteConnection(config, true) after the job runs,
> but
> >> this has no effect. ZK still retains one connection per job run and
> never
> >> releases it. Eventually I run out of ZK connections even if I set
> maxCnxns
> >> very high (e.g. 600).
> >>
> >> This happened for me with CDH3B4 and is still happening with the CDH3
> >> release.
> >>
> >>
> >>
> >> On Mar 23, 2011, at 3:27 PM, Todd Lipcon wrote:
> >>
> >>> Hi Dmitriy,
> >>>
> >>> Are you submitting these MR jobs on a cluster? Which machines are
> >>> leaking the connections? Is it the cluster nodes or the node where you
> >>> submitted the job?
> >>>
> >>> After a job is complete, the JVMs that ran the tasks should be
> >>> completely torn down and thus should not be able to hang onto a
> >>> connection.
> >>>
> >>> -Todd
> >>>
> >>> On Wed, Mar 23, 2011 at 2:24 PM, Dmitriy Lyubimov <dl...@gmail.com>
> >> wrote:
> >>>> yes i am passing destroyProxy=true. But according to the code, it
> >>>> doesn't affect closing zookeeper connection (it should be closed
> >>>> anyway) but i  have +1 zk connection each time i run the MR job still.
> >>>>
> >>>> -d
> >>>>
> >>>> On Wed, Mar 23, 2011 at 2:22 PM, Ted Yu <yu...@gmail.com> wrote:
> >>>>> I assume you passed true as second parameter to deleteConnection().
> >>>>>
> >>>>> On Wed, Mar 23, 2011 at 1:54 PM, Dmitriy Lyubimov <dlieu.7@gmail.com
> >
> >> wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> I am experiencing severe connection leak in my MR client that uses
> >>>>>> Hbase as input/output . Every job that uses TableInputFormat leaks 1
> >>>>>> zookeeper connection per run as evidenced by netstat.
> >>>>>>
> >>>>>> I understand that the way HTable manages connections now is it
> creates
> >>>>>> a new HBase (and also Zookeeper) connection per each instance of
> >>>>>> Configuration it is initialized with. By looking at the code of the
> >>>>>> TableInputFormat class, i see that it creates HTable in the front
> end
> >>>>>> during configuration (of course, it probably needs to use it to
> >>>>>> determine region splits).
> >>>>>>
> >>>>>> Since i have to configure each job individually, i must create a new
> >>>>>> instance of Configuration. Thus, i am not able to use shared HBase
> >>>>>> connections (which i would prefer to, but there seems to be no way
> now
> >>>>>> to do that).
> >>>>>>
> >>>>>> So... after i run an instance of MR job, the hbase connection seems
> to
> >>>>>> be leaked. It also leaks zk connection , which is a problem since
> >>>>>> zookeeper instances have limits on how many connections can be made
> >>>>>> from the same IP and eventually the client is not able to create any
> >>>>>> new HTables anymore since it can't establish any new zookeeper
> >>>>>> connections.
> >>>>>>
> >>>>>> I tried to do explicit cleanup by calling
> >>>>>> HConnectionManager.deleteConnection (Configuration) passing in the
> >>>>>> configuration that i used to create MR job. Doesn't seem to work.
> >>>>>>
> >>>>>> So.. Is there a way to run MR job with TableInputFormat without
> >>>>>> leaking a connection? I am pretty sure i am not creating any HTables
> >>>>>> in the client side. Or is it a bug? I spent several days now
> >>>>>> investigation an issue but i am still not able to come up with a
> >>>>>> workaround against zookeeper connection leaks in HBase MR jobs.
> >>>>>>
> >>>>>> thank you very much.
> >>>>>> -Dmitriy
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Todd Lipcon
> >>> Software Engineer, Cloudera
> >>
> >>
>
>

Re: zk connection leak with TableInput/OutputFormat (CDH3b4, 0.90.1)

Posted by Bryan Keller <br...@gmail.com>.

FWIW, I created a test program that demonstrates the issue. The program creates an HBase table, populates it with 10 rows, then runs a simple map-reduce job 10 times in succession, and then goes into a wait state. The test uses gradle so you'll need to download that.

Before running, telnet to Zookeeper and type 'stats' to get the connections. Then run the program using 'gradle run'. Finally, telnet to Zookeeper again and type 'stats' to get the connections.

I'd be interested to see if others are seeing the same behavior I am.

You can download the code here:
http://www.vancameron.net/HBaseMR.zip

I'll open a JIRA issue after I do a little more research into the problem.

On Apr 15, 2011, at 4:19 PM, Ted Yu wrote:

> Bryan:
> Thanks for reporting this issue.
> TableOutputFormat.TableRecordWriter calls the following in close():
>      HConnectionManager.deleteAllConnections(true);
> But there is no such call in TableInputFormat / TableInputFormatBase /
> TableRecordReader
> 
> Do you mind filing a JIRA ?
> 
> On Fri, Apr 15, 2011 at 3:41 PM, Bryan Keller <br...@gmail.com> wrote:
> 
>> I am having this same problem. After every run of my map-reduce job which
>> uses TableInputFormat, I am leaking one ZK connection. The connections that
>> are not being cleaned up are connected to the node that submitted the job,
>> not the cluster nodes.
>> 
>> I tried explicitly cleaning up the connection using
>> HConnectionManager.deleteConnection(config, true) after the job runs, but
>> this has no effect. ZK still retains one connection per job run and never
>> releases it. Eventually I run out of ZK connections even if I set maxCnxns
>> very high (e.g. 600).
>> 
>> This happened for me with CDH3B4 and is still happening with the CDH3
>> release.
>> 
>> 
>> 
>> On Mar 23, 2011, at 3:27 PM, Todd Lipcon wrote:
>> 
>>> Hi Dmitriy,
>>> 
>>> Are you submitting these MR jobs on a cluster? Which machines are
>>> leaking the connections? Is it the cluster nodes or the node where you
>>> submitted the job?
>>> 
>>> After a job is complete, the JVMs that ran the tasks should be
>>> completely torn down and thus should not be able to hang onto a
>>> connection.
>>> 
>>> -Todd
>>> 
>>> On Wed, Mar 23, 2011 at 2:24 PM, Dmitriy Lyubimov <dl...@gmail.com>
>> wrote:
>>>> yes i am passing destroyProxy=true. But according to the code, it
>>>> doesn't affect closing zookeeper connection (it should be closed
>>>> anyway) but i  have +1 zk connection each time i run the MR job still.
>>>> 
>>>> -d
>>>> 
>>>> On Wed, Mar 23, 2011 at 2:22 PM, Ted Yu <yu...@gmail.com> wrote:
>>>>> I assume you passed true as second parameter to deleteConnection().
>>>>> 
>>>>> On Wed, Mar 23, 2011 at 1:54 PM, Dmitriy Lyubimov <dl...@gmail.com>
>> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I am experiencing severe connection leak in my MR client that uses
>>>>>> Hbase as input/output . Every job that uses TableInputFormat leaks 1
>>>>>> zookeeper connection per run as evidenced by netstat.
>>>>>> 
>>>>>> I understand that the way HTable manages connections now is it creates
>>>>>> a new HBase (and also Zookeeper) connection per each instance of
>>>>>> Configuration it is initialized with. By looking at the code of the
>>>>>> TableInputFormat class, i see that it creates HTable in the front end
>>>>>> during configuration (of course, it probably needs to use it to
>>>>>> determine region splits).
>>>>>> 
>>>>>> Since i have to configure each job individually, i must create a new
>>>>>> instance of Configuration. Thus, i am not able to use shared HBase
>>>>>> connections (which i would prefer to, but there seems to be no way now
>>>>>> to do that).
>>>>>> 
>>>>>> So... after i run an instance of MR job, the hbase connection seems to
>>>>>> be leaked. It also leaks zk connection , which is a problem since
>>>>>> zookeeper instances have limits on how many connections can be made
>>>>>> from the same IP and eventually the client is not able to create any
>>>>>> new HTables anymore since it can't establish any new zookeeper
>>>>>> connections.
>>>>>> 
>>>>>> I tried to do explicit cleanup by calling
>>>>>> HConnectionManager.deleteConnection (Configuration) passing in the
>>>>>> configuration that i used to create MR job. Doesn't seem to work.
>>>>>> 
>>>>>> So.. Is there a way to run MR job with TableInputFormat without
>>>>>> leaking a connection? I am pretty sure i am not creating any HTables
>>>>>> in the client side. Or is it a bug? I spent several days now
>>>>>> investigation an issue but i am still not able to come up with a
>>>>>> workaround against zookeeper connection leaks in HBase MR jobs.
>>>>>> 
>>>>>> thank you very much.
>>>>>> -Dmitriy
>>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>> 
>>

Re: zk connection leak with TableInput/OutputFormat (CDH3b4, 0.90.1)

Posted by Ted Yu <yu...@gmail.com>.

Bryan:
Thanks for reporting this issue.
TableOutputFormat.TableRecordWriter calls the following in close():
      HConnectionManager.deleteAllConnections(true);
But there is no such call in TableInputFormat / TableInputFormatBase /
TableRecordReader

Do you mind filing a JIRA ?

On Fri, Apr 15, 2011 at 3:41 PM, Bryan Keller <br...@gmail.com> wrote:

> I am having this same problem. After every run of my map-reduce job which
> uses TableInputFormat, I am leaking one ZK connection. The connections that
> are not being cleaned up are connected to the node that submitted the job,
> not the cluster nodes.
>
> I tried explicitly cleaning up the connection using
> HConnectionManager.deleteConnection(config, true) after the job runs, but
> this has no effect. ZK still retains one connection per job run and never
> releases it. Eventually I run out of ZK connections even if I set maxCnxns
> very high (e.g. 600).
>
> This happened for me with CDH3B4 and is still happening with the CDH3
> release.
>
>
>
> On Mar 23, 2011, at 3:27 PM, Todd Lipcon wrote:
>
> > Hi Dmitriy,
> >
> > Are you submitting these MR jobs on a cluster? Which machines are
> > leaking the connections? Is it the cluster nodes or the node where you
> > submitted the job?
> >
> > After a job is complete, the JVMs that ran the tasks should be
> > completely torn down and thus should not be able to hang onto a
> > connection.
> >
> > -Todd
> >
> > On Wed, Mar 23, 2011 at 2:24 PM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
> >> yes i am passing destroyProxy=true. But according to the code, it
> >> doesn't affect closing zookeeper connection (it should be closed
> >> anyway) but i  have +1 zk connection each time i run the MR job still.
> >>
> >> -d
> >>
> >> On Wed, Mar 23, 2011 at 2:22 PM, Ted Yu <yu...@gmail.com> wrote:
> >>> I assume you passed true as second parameter to deleteConnection().
> >>>
> >>> On Wed, Mar 23, 2011 at 1:54 PM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I am experiencing severe connection leak in my MR client that uses
> >>>> Hbase as input/output . Every job that uses TableInputFormat leaks 1
> >>>> zookeeper connection per run as evidenced by netstat.
> >>>>
> >>>> I understand that the way HTable manages connections now is it creates
> >>>> a new HBase (and also Zookeeper) connection per each instance of
> >>>> Configuration it is initialized with. By looking at the code of the
> >>>> TableInputFormat class, i see that it creates HTable in the front end
> >>>> during configuration (of course, it probably needs to use it to
> >>>> determine region splits).
> >>>>
> >>>> Since i have to configure each job individually, i must create a new
> >>>> instance of Configuration. Thus, i am not able to use shared HBase
> >>>> connections (which i would prefer to, but there seems to be no way now
> >>>> to do that).
> >>>>
> >>>> So... after i run an instance of MR job, the hbase connection seems to
> >>>> be leaked. It also leaks zk connection , which is a problem since
> >>>> zookeeper instances have limits on how many connections can be made
> >>>> from the same IP and eventually the client is not able to create any
> >>>> new HTables anymore since it can't establish any new zookeeper
> >>>> connections.
> >>>>
> >>>> I tried to do explicit cleanup by calling
> >>>> HConnectionManager.deleteConnection (Configuration) passing in the
> >>>> configuration that i used to create MR job. Doesn't seem to work.
> >>>>
> >>>> So.. Is there a way to run MR job with TableInputFormat without
> >>>> leaking a connection? I am pretty sure i am not creating any HTables
> >>>> in the client side. Or is it a bug? I spent several days now
> >>>> investigation an issue but i am still not able to come up with a
> >>>> workaround against zookeeper connection leaks in HBase MR jobs.
> >>>>
> >>>> thank you very much.
> >>>> -Dmitriy
> >>>>
> >>>
> >>
> >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
>
>