You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Bryan Keller (JIRA)" <ji...@apache.org> on 2011/04/16 14:09:06 UTC

[jira] [Created] (HBASE-3792) TableInputFormat leaks ZK connections

TableInputFormat leaks ZK connections
-------------------------------------

                 Key: HBASE-3792
                 URL: https://issues.apache.org/jira/browse/HBASE-3792
             Project: HBase
          Issue Type: Bug
          Components: mapreduce
    Affects Versions: 0.90.1
         Environment: Java 1.6.0_24, Mac OS X 10.6.7
            Reporter: Bryan Keller


The TableInputFormat creates an HTable using a new Configuration object, and it never cleans it up. When running a Mapper, the TableInputFormat is instantiated and the ZK connection is created. While this connection is not explicitly cleaned up, the Mapper process eventually exits and thus the connection is closed. Ideally the TableRecordReader would close the connection in its close() method rather than relying on the process to die for connection cleanup. This is fairly easy to implement by overriding TableRecordReader, and also overriding TableInputFormat to specify the new record reader.

The leak occurs when the JobClient is initializing and needs to retrieves the splits. To get the splits, it instantiates a TableInputFormat. Doing so creates a ZK connection that is never cleaned up. Unlike the mapper, however, my job client process does not die. Thus the ZK connections accumulate.

I was able to fix the problem by writing my own TableInputFormat that does not initialize the HTable in the getConf() method and does not have an HTable member variable. Rather, it has a variable for the table name. The HTable is instantiated where needed and then cleaned up. For example, in the getSplits() method, I create the HTable, then close the connection once the splits are retrieved. I also create the HTable when creating the record reader, and I have a record reader that closes the connection when done.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3792) TableInputFormat leaks ZK connections

Posted by "Bryan Keller (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141296#comment-13141296 ] 

Bryan Keller commented on HBASE-3792:
-------------------------------------

I was planning on uploading a patch later today in case it would be useful.
                
> TableInputFormat leaks ZK connections
> -------------------------------------
>
>                 Key: HBASE-3792
>                 URL: https://issues.apache.org/jira/browse/HBASE-3792
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.1
>         Environment: Java 1.6.0_24, Mac OS X 10.6.7
>            Reporter: Bryan Keller
>
> The TableInputFormat creates an HTable using a new Configuration object, and it never cleans it up. When running a Mapper, the TableInputFormat is instantiated and the ZK connection is created. While this connection is not explicitly cleaned up, the Mapper process eventually exits and thus the connection is closed. Ideally the TableRecordReader would close the connection in its close() method rather than relying on the process to die for connection cleanup. This is fairly easy to implement by overriding TableRecordReader, and also overriding TableInputFormat to specify the new record reader.
> The leak occurs when the JobClient is initializing and needs to retrieves the splits. To get the splits, it instantiates a TableInputFormat. Doing so creates a ZK connection that is never cleaned up. Unlike the mapper, however, my job client process does not die. Thus the ZK connections accumulate.
> I was able to fix the problem by writing my own TableInputFormat that does not initialize the HTable in the getConf() method and does not have an HTable member variable. Rather, it has a variable for the table name. The HTable is instantiated where needed and then cleaned up. For example, in the getSplits() method, I create the HTable, then close the connection once the splits are retrieved. I also create the HTable when creating the record reader, and I have a record reader that closes the connection when done.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3792) TableInputFormat leaks ZK connections

Posted by "Bryan Keller (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145191#comment-13145191 ] 

Bryan Keller commented on HBASE-3792:
-------------------------------------

BTW, I am having trouble running the tests on trunk so I wasn't able to verify this patch, I'll work on getting my dev environment more functional.
                
> TableInputFormat leaks ZK connections
> -------------------------------------
>
>                 Key: HBASE-3792
>                 URL: https://issues.apache.org/jira/browse/HBASE-3792
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.1
>         Environment: Java 1.6.0_24, Mac OS X 10.6.7
>            Reporter: Bryan Keller
>         Attachments: tableinput.patch
>
>
> The TableInputFormat creates an HTable using a new Configuration object, and it never cleans it up. When running a Mapper, the TableInputFormat is instantiated and the ZK connection is created. While this connection is not explicitly cleaned up, the Mapper process eventually exits and thus the connection is closed. Ideally the TableRecordReader would close the connection in its close() method rather than relying on the process to die for connection cleanup. This is fairly easy to implement by overriding TableRecordReader, and also overriding TableInputFormat to specify the new record reader.
> The leak occurs when the JobClient is initializing and needs to retrieves the splits. To get the splits, it instantiates a TableInputFormat. Doing so creates a ZK connection that is never cleaned up. Unlike the mapper, however, my job client process does not die. Thus the ZK connections accumulate.
> I was able to fix the problem by writing my own TableInputFormat that does not initialize the HTable in the getConf() method and does not have an HTable member variable. Rather, it has a variable for the table name. The HTable is instantiated where needed and then cleaned up. For example, in the getSplits() method, I create the HTable, then close the connection once the splits are retrieved. I also create the HTable when creating the record reader, and I have a record reader that closes the connection when done.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3792) TableInputFormat leaks ZK connections

Posted by "Bryan Keller (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199488#comment-13199488 ] 

Bryan Keller commented on HBASE-3792:
-------------------------------------

The latest Cloudera code introduced the new reference counting connection management. There is a reference counter leak it appears in the HTable constructor, thus you'll see connection leaks again and my patch doesn't fix it. As a hack for now I force the connection to close by using reflection, setting the ref counter to 1, and calling close() on the connection. I do this after calling table.close() in TableInputFormat, TableRecordReader, and TableOutputFormat. I think I should log another bug, as the leak is not in the map reduce classes.

                
> TableInputFormat leaks ZK connections
> -------------------------------------
>
>                 Key: HBASE-3792
>                 URL: https://issues.apache.org/jira/browse/HBASE-3792
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.1
>         Environment: Java 1.6.0_24, Mac OS X 10.6.7
>            Reporter: Bryan Keller
>         Attachments: patch0.90.4, tableinput.patch
>
>
> The TableInputFormat creates an HTable using a new Configuration object, and it never cleans it up. When running a Mapper, the TableInputFormat is instantiated and the ZK connection is created. While this connection is not explicitly cleaned up, the Mapper process eventually exits and thus the connection is closed. Ideally the TableRecordReader would close the connection in its close() method rather than relying on the process to die for connection cleanup. This is fairly easy to implement by overriding TableRecordReader, and also overriding TableInputFormat to specify the new record reader.
> The leak occurs when the JobClient is initializing and needs to retrieves the splits. To get the splits, it instantiates a TableInputFormat. Doing so creates a ZK connection that is never cleaned up. Unlike the mapper, however, my job client process does not die. Thus the ZK connections accumulate.
> I was able to fix the problem by writing my own TableInputFormat that does not initialize the HTable in the getConf() method and does not have an HTable member variable. Rather, it has a variable for the table name. The HTable is instantiated where needed and then cleaned up. For example, in the getSplits() method, I create the HTable, then close the connection once the splits are retrieved. I also create the HTable when creating the record reader, and I have a record reader that closes the connection when done.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3792) TableInputFormat leaks ZK connections

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029142#comment-13029142 ] 

stack commented on HBASE-3792:
------------------------------

So, HBASE-3777 was committed to trunk.  Can we close this issue now as fixed in TRUNK?

> TableInputFormat leaks ZK connections
> -------------------------------------
>
>                 Key: HBASE-3792
>                 URL: https://issues.apache.org/jira/browse/HBASE-3792
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.1
>         Environment: Java 1.6.0_24, Mac OS X 10.6.7
>            Reporter: Bryan Keller
>
> The TableInputFormat creates an HTable using a new Configuration object, and it never cleans it up. When running a Mapper, the TableInputFormat is instantiated and the ZK connection is created. While this connection is not explicitly cleaned up, the Mapper process eventually exits and thus the connection is closed. Ideally the TableRecordReader would close the connection in its close() method rather than relying on the process to die for connection cleanup. This is fairly easy to implement by overriding TableRecordReader, and also overriding TableInputFormat to specify the new record reader.
> The leak occurs when the JobClient is initializing and needs to retrieves the splits. To get the splits, it instantiates a TableInputFormat. Doing so creates a ZK connection that is never cleaned up. Unlike the mapper, however, my job client process does not die. Thus the ZK connections accumulate.
> I was able to fix the problem by writing my own TableInputFormat that does not initialize the HTable in the getConf() method and does not have an HTable member variable. Rather, it has a variable for the table name. The HTable is instantiated where needed and then cleaned up. For example, in the getSplits() method, I create the HTable, then close the connection once the splits are retrieved. I also create the HTable when creating the record reader, and I have a record reader that closes the connection when done.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3792) TableInputFormat leaks ZK connections

Posted by "Terry Siu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155973#comment-13155973 ] 

Terry Siu commented on HBASE-3792:
----------------------------------

Bryan, would be able to post a patch of the changes you are using for 0.90.4? I applied the trunk patch to 0.90.4 and aside from one minor flub, the patch was very clean. I left my mapreduce jobs to run overnight and am seeing ZK connections accummulating again, but at a slower rate, so now I'm wondering what differences exist between the changes you made for 0.90.4 versus the one you posted. Thanks!
                
> TableInputFormat leaks ZK connections
> -------------------------------------
>
>                 Key: HBASE-3792
>                 URL: https://issues.apache.org/jira/browse/HBASE-3792
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.1
>         Environment: Java 1.6.0_24, Mac OS X 10.6.7
>            Reporter: Bryan Keller
>         Attachments: tableinput.patch
>
>
> The TableInputFormat creates an HTable using a new Configuration object, and it never cleans it up. When running a Mapper, the TableInputFormat is instantiated and the ZK connection is created. While this connection is not explicitly cleaned up, the Mapper process eventually exits and thus the connection is closed. Ideally the TableRecordReader would close the connection in its close() method rather than relying on the process to die for connection cleanup. This is fairly easy to implement by overriding TableRecordReader, and also overriding TableInputFormat to specify the new record reader.
> The leak occurs when the JobClient is initializing and needs to retrieves the splits. To get the splits, it instantiates a TableInputFormat. Doing so creates a ZK connection that is never cleaned up. Unlike the mapper, however, my job client process does not die. Thus the ZK connections accumulate.
> I was able to fix the problem by writing my own TableInputFormat that does not initialize the HTable in the getConf() method and does not have an HTable member variable. Rather, it has a variable for the table name. The HTable is instantiated where needed and then cleaned up. For example, in the getSplits() method, I create the HTable, then close the connection once the splits are retrieved. I also create the HTable when creating the record reader, and I have a record reader that closes the connection when done.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3792) TableInputFormat leaks ZK connections

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020620#comment-13020620 ] 

Jean-Daniel Cryans commented on HBASE-3792:
-------------------------------------------

The main issue with not using a new connection is that it breaks CopyTable or any other job that wants to use one cluster then another. I think you're workaround is the way to go until we fix the real source of the problem (HBASE-3777 being one solution to it).

> TableInputFormat leaks ZK connections
> -------------------------------------
>
>                 Key: HBASE-3792
>                 URL: https://issues.apache.org/jira/browse/HBASE-3792
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.1
>         Environment: Java 1.6.0_24, Mac OS X 10.6.7
>            Reporter: Bryan Keller
>
> The TableInputFormat creates an HTable using a new Configuration object, and it never cleans it up. When running a Mapper, the TableInputFormat is instantiated and the ZK connection is created. While this connection is not explicitly cleaned up, the Mapper process eventually exits and thus the connection is closed. Ideally the TableRecordReader would close the connection in its close() method rather than relying on the process to die for connection cleanup. This is fairly easy to implement by overriding TableRecordReader, and also overriding TableInputFormat to specify the new record reader.
> The leak occurs when the JobClient is initializing and needs to retrieves the splits. To get the splits, it instantiates a TableInputFormat. Doing so creates a ZK connection that is never cleaned up. Unlike the mapper, however, my job client process does not die. Thus the ZK connections accumulate.
> I was able to fix the problem by writing my own TableInputFormat that does not initialize the HTable in the getConf() method and does not have an HTable member variable. Rather, it has a variable for the table name. The HTable is instantiated where needed and then cleaned up. For example, in the getSplits() method, I create the HTable, then close the connection once the splits are retrieved. I also create the HTable when creating the record reader, and I have a record reader that closes the connection when done.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3792) TableInputFormat leaks ZK connections

Posted by "Terry Siu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156289#comment-13156289 ] 

Terry Siu commented on HBASE-3792:
----------------------------------

Thanks, Bryan, looking forward to getting the 0.90.4 patch.
                
> TableInputFormat leaks ZK connections
> -------------------------------------
>
>                 Key: HBASE-3792
>                 URL: https://issues.apache.org/jira/browse/HBASE-3792
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.1
>         Environment: Java 1.6.0_24, Mac OS X 10.6.7
>            Reporter: Bryan Keller
>         Attachments: tableinput.patch
>
>
> The TableInputFormat creates an HTable using a new Configuration object, and it never cleans it up. When running a Mapper, the TableInputFormat is instantiated and the ZK connection is created. While this connection is not explicitly cleaned up, the Mapper process eventually exits and thus the connection is closed. Ideally the TableRecordReader would close the connection in its close() method rather than relying on the process to die for connection cleanup. This is fairly easy to implement by overriding TableRecordReader, and also overriding TableInputFormat to specify the new record reader.
> The leak occurs when the JobClient is initializing and needs to retrieves the splits. To get the splits, it instantiates a TableInputFormat. Doing so creates a ZK connection that is never cleaned up. Unlike the mapper, however, my job client process does not die. Thus the ZK connections accumulate.
> I was able to fix the problem by writing my own TableInputFormat that does not initialize the HTable in the getConf() method and does not have an HTable member variable. Rather, it has a variable for the table name. The HTable is instantiated where needed and then cleaned up. For example, in the getSplits() method, I create the HTable, then close the connection once the splits are retrieved. I also create the HTable when creating the record reader, and I have a record reader that closes the connection when done.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-3792) TableInputFormat leaks ZK connections

Posted by "Bryan Keller (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Keller updated HBASE-3792:
--------------------------------

    Attachment: patch0.90.4

Here is the patch I'm using in my production system for 0.90.4. (Sorry it took me so long.)
                
> TableInputFormat leaks ZK connections
> -------------------------------------
>
>                 Key: HBASE-3792
>                 URL: https://issues.apache.org/jira/browse/HBASE-3792
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.1
>         Environment: Java 1.6.0_24, Mac OS X 10.6.7
>            Reporter: Bryan Keller
>         Attachments: patch0.90.4, tableinput.patch
>
>
> The TableInputFormat creates an HTable using a new Configuration object, and it never cleans it up. When running a Mapper, the TableInputFormat is instantiated and the ZK connection is created. While this connection is not explicitly cleaned up, the Mapper process eventually exits and thus the connection is closed. Ideally the TableRecordReader would close the connection in its close() method rather than relying on the process to die for connection cleanup. This is fairly easy to implement by overriding TableRecordReader, and also overriding TableInputFormat to specify the new record reader.
> The leak occurs when the JobClient is initializing and needs to retrieves the splits. To get the splits, it instantiates a TableInputFormat. Doing so creates a ZK connection that is never cleaned up. Unlike the mapper, however, my job client process does not die. Thus the ZK connections accumulate.
> I was able to fix the problem by writing my own TableInputFormat that does not initialize the HTable in the getConf() method and does not have an HTable member variable. Rather, it has a variable for the table name. The HTable is instantiated where needed and then cleaned up. For example, in the getSplits() method, I create the HTable, then close the connection once the splits are retrieved. I also create the HTable when creating the record reader, and I have a record reader that closes the connection when done.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3792) TableInputFormat leaks ZK connections

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455358#comment-13455358 ] 

Lars Hofhansl commented on HBASE-3792:
--------------------------------------

What should we do with this?
                
> TableInputFormat leaks ZK connections
> -------------------------------------
>
>                 Key: HBASE-3792
>                 URL: https://issues.apache.org/jira/browse/HBASE-3792
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.1
>         Environment: Java 1.6.0_24, Mac OS X 10.6.7
>            Reporter: Bryan Keller
>         Attachments: patch0.90.4, tableinput.patch
>
>
> The TableInputFormat creates an HTable using a new Configuration object, and it never cleans it up. When running a Mapper, the TableInputFormat is instantiated and the ZK connection is created. While this connection is not explicitly cleaned up, the Mapper process eventually exits and thus the connection is closed. Ideally the TableRecordReader would close the connection in its close() method rather than relying on the process to die for connection cleanup. This is fairly easy to implement by overriding TableRecordReader, and also overriding TableInputFormat to specify the new record reader.
> The leak occurs when the JobClient is initializing and needs to retrieves the splits. To get the splits, it instantiates a TableInputFormat. Doing so creates a ZK connection that is never cleaned up. Unlike the mapper, however, my job client process does not die. Thus the ZK connections accumulate.
> I was able to fix the problem by writing my own TableInputFormat that does not initialize the HTable in the getConf() method and does not have an HTable member variable. Rather, it has a variable for the table name. The HTable is instantiated where needed and then cleaned up. For example, in the getSplits() method, I create the HTable, then close the connection once the splits are retrieved. I also create the HTable when creating the record reader, and I have a record reader that closes the connection when done.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3792) TableInputFormat leaks ZK connections

Posted by "Terry Siu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199325#comment-13199325 ] 

Terry Siu commented on HBASE-3792:
----------------------------------

No worries, Bryan. I managed to patch it myself a while back using the tableinput.patch as a guideline. Thanks again!
                
> TableInputFormat leaks ZK connections
> -------------------------------------
>
>                 Key: HBASE-3792
>                 URL: https://issues.apache.org/jira/browse/HBASE-3792
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.1
>         Environment: Java 1.6.0_24, Mac OS X 10.6.7
>            Reporter: Bryan Keller
>         Attachments: patch0.90.4, tableinput.patch
>
>
> The TableInputFormat creates an HTable using a new Configuration object, and it never cleans it up. When running a Mapper, the TableInputFormat is instantiated and the ZK connection is created. While this connection is not explicitly cleaned up, the Mapper process eventually exits and thus the connection is closed. Ideally the TableRecordReader would close the connection in its close() method rather than relying on the process to die for connection cleanup. This is fairly easy to implement by overriding TableRecordReader, and also overriding TableInputFormat to specify the new record reader.
> The leak occurs when the JobClient is initializing and needs to retrieves the splits. To get the splits, it instantiates a TableInputFormat. Doing so creates a ZK connection that is never cleaned up. Unlike the mapper, however, my job client process does not die. Thus the ZK connections accumulate.
> I was able to fix the problem by writing my own TableInputFormat that does not initialize the HTable in the getConf() method and does not have an HTable member variable. Rather, it has a variable for the table name. The HTable is instantiated where needed and then cleaned up. For example, in the getSplits() method, I create the HTable, then close the connection once the splits are retrieved. I also create the HTable when creating the record reader, and I have a record reader that closes the connection when done.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3792) TableInputFormat leaks ZK connections

Posted by "Patrick Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13258696#comment-13258696 ] 

Patrick Yu commented on HBASE-3792:
-----------------------------------

One possible workaround for this issue is to reuse the connection for all jobs. However, hbase.connection.per.config must be set to false for connection sharing to work. This flag defaults to true up until HBASE-4508, though.
                
> TableInputFormat leaks ZK connections
> -------------------------------------
>
>                 Key: HBASE-3792
>                 URL: https://issues.apache.org/jira/browse/HBASE-3792
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.1
>         Environment: Java 1.6.0_24, Mac OS X 10.6.7
>            Reporter: Bryan Keller
>         Attachments: patch0.90.4, tableinput.patch
>
>
> The TableInputFormat creates an HTable using a new Configuration object, and it never cleans it up. When running a Mapper, the TableInputFormat is instantiated and the ZK connection is created. While this connection is not explicitly cleaned up, the Mapper process eventually exits and thus the connection is closed. Ideally the TableRecordReader would close the connection in its close() method rather than relying on the process to die for connection cleanup. This is fairly easy to implement by overriding TableRecordReader, and also overriding TableInputFormat to specify the new record reader.
> The leak occurs when the JobClient is initializing and needs to retrieves the splits. To get the splits, it instantiates a TableInputFormat. Doing so creates a ZK connection that is never cleaned up. Unlike the mapper, however, my job client process does not die. Thus the ZK connections accumulate.
> I was able to fix the problem by writing my own TableInputFormat that does not initialize the HTable in the getConf() method and does not have an HTable member variable. Rather, it has a variable for the table name. The HTable is instantiated where needed and then cleaned up. For example, in the getSplits() method, I create the HTable, then close the connection once the splits are retrieved. I also create the HTable when creating the record reader, and I have a record reader that closes the connection when done.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3792) TableInputFormat leaks ZK connections

Posted by "Bryan Keller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029185#comment-13029185 ] 

Bryan Keller commented on HBASE-3792:
-------------------------------------

I think the change should still be made. The HBASE-3777 change adds reference counters to connection, so closing the connection here explicitly where needed will decrement the reference counter and allow for connection cleanup.

> TableInputFormat leaks ZK connections
> -------------------------------------
>
>                 Key: HBASE-3792
>                 URL: https://issues.apache.org/jira/browse/HBASE-3792
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.1
>         Environment: Java 1.6.0_24, Mac OS X 10.6.7
>            Reporter: Bryan Keller
>
> The TableInputFormat creates an HTable using a new Configuration object, and it never cleans it up. When running a Mapper, the TableInputFormat is instantiated and the ZK connection is created. While this connection is not explicitly cleaned up, the Mapper process eventually exits and thus the connection is closed. Ideally the TableRecordReader would close the connection in its close() method rather than relying on the process to die for connection cleanup. This is fairly easy to implement by overriding TableRecordReader, and also overriding TableInputFormat to specify the new record reader.
> The leak occurs when the JobClient is initializing and needs to retrieves the splits. To get the splits, it instantiates a TableInputFormat. Doing so creates a ZK connection that is never cleaned up. Unlike the mapper, however, my job client process does not die. Thus the ZK connections accumulate.
> I was able to fix the problem by writing my own TableInputFormat that does not initialize the HTable in the getConf() method and does not have an HTable member variable. Rather, it has a variable for the table name. The HTable is instantiated where needed and then cleaned up. For example, in the getSplits() method, I create the HTable, then close the connection once the splits are retrieved. I also create the HTable when creating the record reader, and I have a record reader that closes the connection when done.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3792) TableInputFormat leaks ZK connections

Posted by "Bryan Keller (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155997#comment-13155997 ] 

Bryan Keller commented on HBASE-3792:
-------------------------------------

Sure, I'll post a patch for 0.90.4 in a bit. There have been quite a few changes to ZK connection handling in trunk (deep compare of configs, reference counting), so it is possible the patch might need to be tweaked or the leak is somewhere else.
                
> TableInputFormat leaks ZK connections
> -------------------------------------
>
>                 Key: HBASE-3792
>                 URL: https://issues.apache.org/jira/browse/HBASE-3792
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.1
>         Environment: Java 1.6.0_24, Mac OS X 10.6.7
>            Reporter: Bryan Keller
>         Attachments: tableinput.patch
>
>
> The TableInputFormat creates an HTable using a new Configuration object, and it never cleans it up. When running a Mapper, the TableInputFormat is instantiated and the ZK connection is created. While this connection is not explicitly cleaned up, the Mapper process eventually exits and thus the connection is closed. Ideally the TableRecordReader would close the connection in its close() method rather than relying on the process to die for connection cleanup. This is fairly easy to implement by overriding TableRecordReader, and also overriding TableInputFormat to specify the new record reader.
> The leak occurs when the JobClient is initializing and needs to retrieves the splits. To get the splits, it instantiates a TableInputFormat. Doing so creates a ZK connection that is never cleaned up. Unlike the mapper, however, my job client process does not die. Thus the ZK connections accumulate.
> I was able to fix the problem by writing my own TableInputFormat that does not initialize the HTable in the getConf() method and does not have an HTable member variable. Rather, it has a variable for the table name. The HTable is instantiated where needed and then cleaned up. For example, in the getSplits() method, I create the HTable, then close the connection once the splits are retrieved. I also create the HTable when creating the record reader, and I have a record reader that closes the connection when done.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3792) TableInputFormat leaks ZK connections

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13258739#comment-13258739 ] 

Lars Hofhansl commented on HBASE-3792:
--------------------------------------

@Patrik: In 0.92+ you can create a standalone HConnection (HConnectionManager.createConnection(...)) and then create HTables using that HConnection. See HBASE-4805.
                
> TableInputFormat leaks ZK connections
> -------------------------------------
>
>                 Key: HBASE-3792
>                 URL: https://issues.apache.org/jira/browse/HBASE-3792
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.1
>         Environment: Java 1.6.0_24, Mac OS X 10.6.7
>            Reporter: Bryan Keller
>         Attachments: patch0.90.4, tableinput.patch
>
>
> The TableInputFormat creates an HTable using a new Configuration object, and it never cleans it up. When running a Mapper, the TableInputFormat is instantiated and the ZK connection is created. While this connection is not explicitly cleaned up, the Mapper process eventually exits and thus the connection is closed. Ideally the TableRecordReader would close the connection in its close() method rather than relying on the process to die for connection cleanup. This is fairly easy to implement by overriding TableRecordReader, and also overriding TableInputFormat to specify the new record reader.
> The leak occurs when the JobClient is initializing and needs to retrieves the splits. To get the splits, it instantiates a TableInputFormat. Doing so creates a ZK connection that is never cleaned up. Unlike the mapper, however, my job client process does not die. Thus the ZK connections accumulate.
> I was able to fix the problem by writing my own TableInputFormat that does not initialize the HTable in the getConf() method and does not have an HTable member variable. Rather, it has a variable for the table name. The HTable is instantiated where needed and then cleaned up. For example, in the getSplits() method, I create the HTable, then close the connection once the splits are retrieved. I also create the HTable when creating the record reader, and I have a record reader that closes the connection when done.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3792) TableInputFormat leaks ZK connections

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199528#comment-13199528 ] 

stack commented on HBASE-3792:
------------------------------

@Bryan Yes please.  Thats crazy acrobatics you are at just to close a connection.  Thanks.
                
> TableInputFormat leaks ZK connections
> -------------------------------------
>
>                 Key: HBASE-3792
>                 URL: https://issues.apache.org/jira/browse/HBASE-3792
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.1
>         Environment: Java 1.6.0_24, Mac OS X 10.6.7
>            Reporter: Bryan Keller
>         Attachments: patch0.90.4, tableinput.patch
>
>
> The TableInputFormat creates an HTable using a new Configuration object, and it never cleans it up. When running a Mapper, the TableInputFormat is instantiated and the ZK connection is created. While this connection is not explicitly cleaned up, the Mapper process eventually exits and thus the connection is closed. Ideally the TableRecordReader would close the connection in its close() method rather than relying on the process to die for connection cleanup. This is fairly easy to implement by overriding TableRecordReader, and also overriding TableInputFormat to specify the new record reader.
> The leak occurs when the JobClient is initializing and needs to retrieves the splits. To get the splits, it instantiates a TableInputFormat. Doing so creates a ZK connection that is never cleaned up. Unlike the mapper, however, my job client process does not die. Thus the ZK connections accumulate.
> I was able to fix the problem by writing my own TableInputFormat that does not initialize the HTable in the getConf() method and does not have an HTable member variable. Rather, it has a variable for the table name. The HTable is instantiated where needed and then cleaned up. For example, in the getSplits() method, I create the HTable, then close the connection once the splits are retrieved. I also create the HTable when creating the record reader, and I have a record reader that closes the connection when done.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-3792) TableInputFormat leaks ZK connections

Posted by "Bryan Keller (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Keller updated HBASE-3792:
--------------------------------

    Attachment: tableinput.patch

Here's a patch demonstrating the changes I have implemented in my system, as described above. The patch is for trunk, so the changes are slightly different than what I am using for 0.90.4.
                
> TableInputFormat leaks ZK connections
> -------------------------------------
>
>                 Key: HBASE-3792
>                 URL: https://issues.apache.org/jira/browse/HBASE-3792
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.1
>         Environment: Java 1.6.0_24, Mac OS X 10.6.7
>            Reporter: Bryan Keller
>         Attachments: tableinput.patch
>
>
> The TableInputFormat creates an HTable using a new Configuration object, and it never cleans it up. When running a Mapper, the TableInputFormat is instantiated and the ZK connection is created. While this connection is not explicitly cleaned up, the Mapper process eventually exits and thus the connection is closed. Ideally the TableRecordReader would close the connection in its close() method rather than relying on the process to die for connection cleanup. This is fairly easy to implement by overriding TableRecordReader, and also overriding TableInputFormat to specify the new record reader.
> The leak occurs when the JobClient is initializing and needs to retrieves the splits. To get the splits, it instantiates a TableInputFormat. Doing so creates a ZK connection that is never cleaned up. Unlike the mapper, however, my job client process does not die. Thus the ZK connections accumulate.
> I was able to fix the problem by writing my own TableInputFormat that does not initialize the HTable in the getConf() method and does not have an HTable member variable. Rather, it has a variable for the table name. The HTable is instantiated where needed and then cleaned up. For example, in the getSplits() method, I create the HTable, then close the connection once the splits are retrieved. I also create the HTable when creating the record reader, and I have a record reader that closes the connection when done.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira