You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Naveen Gangam (JIRA)" <ji...@apache.org> on 2016/04/15 21:25:25 UTC

[jira] [Updated] (HIVE-13527) Using deprecated APIs in HBase client causes zookeeper connection leaks.

     [ https://issues.apache.org/jira/browse/HIVE-13527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Naveen Gangam updated HIVE-13527:
---------------------------------
    Attachment: HIVE-13527.patch

Attaching a patch that removes the usage of setHTable() from the TableInputFormatBase. 

> Using deprecated APIs in HBase client causes zookeeper connection leaks.
> ------------------------------------------------------------------------
>
>                 Key: HIVE-13527
>                 URL: https://issues.apache.org/jira/browse/HIVE-13527
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>    Affects Versions: 1.1.0
>            Reporter: Naveen Gangam
>            Assignee: Naveen Gangam
>         Attachments: HIVE-13527.patch
>
>
> When running queries against hbase-backed hive tables, the following log messages are seen in the HS2 log.
> {code}
> 2016-04-11 07:25:23,657 WARN org.apache.hadoop.hbase.mapreduce.TableInputFormatBase: You are using an HTable instance that relies on an HBase-managed Connection. This is usually due to directly creating an HTable, which is deprecated. Instead, you should create a Connection object and then request a Table instance from it. If you don't need the Table instance for your own use, you should instead use the TableInputFormatBase.initalizeTable method directly.
> 2016-04-11 07:25:23,658 INFO org.apache.hadoop.hbase.mapreduce.TableInputFormatBase: Creating an additional unmanaged connection because user provided one can't be used for administrative actions. We'll close it when we close out the table.
> {code}
> In a HS2 log file, there are 1366 zookeeper connections established but only a small fraction of them were closed. So lsof would show 1300+ open TCP connections to Zookeeper.
> grep "org.apache.zookeeper.ClientCnxn: Session establishment complete on server" * |wc -l
> 1366
> grep "INFO org.apache.zookeeper.ZooKeeper: Session:" * |grep closed |wc -l
> 54
> According to the comments in TableInputFormatBase, the recommended means for subclasses like HiveHBaseTableInputFormat is to call initializeTable() instead of setHTable() that it currently uses.
> "
> Subclasses MUST ensure initializeTable(Connection, TableName) is called for an instance to function properly. Each of the entry points to this class used by the MapReduce framework, {@link #createRecordReader(InputSplit, TaskAttemptContext)} and {@link #getSplits(JobContext)}, will call {@link #initialize(JobContext)} as a convenient centralized location to handle retrieving the necessary configuration information. If your subclass overrides either of these methods, either call the parent version or call initialize yourself.
> "
> Currently setHTable() also creates an additional Admin connection, even though it is not needed.
> So the use of deprecated APIs are to be replaced.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)