You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Naveen Gangam (JIRA)" <ji...@apache.org> on 2016/04/15 21:06:25 UTC

[jira] [Created] (HIVE-13527) Using deprecated APIs in HBase client causes zookeeper connection leaks.

Naveen Gangam created HIVE-13527:
------------------------------------

             Summary: Using deprecated APIs in HBase client causes zookeeper connection leaks.
                 Key: HIVE-13527
                 URL: https://issues.apache.org/jira/browse/HIVE-13527
             Project: Hive
          Issue Type: Bug
          Components: HiveServer2
    Affects Versions: 1.1.0
            Reporter: Naveen Gangam
            Assignee: Naveen Gangam


When running queries against hbase-backed hive tables, the following log messages are seen in the HS2 log.
{code}
2016-04-11 07:25:23,657 WARN org.apache.hadoop.hbase.mapreduce.TableInputFormatBase: You are using an HTable instance that relies on an HBase-managed Connection. This is usually due to directly creating an HTable, which is deprecated. Instead, you should create a Connection object and then request a Table instance from it. If you don't need the Table instance for your own use, you should instead use the TableInputFormatBase.initalizeTable method directly.
2016-04-11 07:25:23,658 INFO org.apache.hadoop.hbase.mapreduce.TableInputFormatBase: Creating an additional unmanaged connection because user provided one can't be used for administrative actions. We'll close it when we close out the table.
{code}

In a HS2 log file, there are 1366 zookeeper connections established but only a small fraction of them were closed. So lsof would show 1300+ open TCP connections to Zookeeper.
grep "org.apache.zookeeper.ClientCnxn: Session establishment complete on server" * |wc -l
1366
grep "INFO org.apache.zookeeper.ZooKeeper: Session:" * |grep closed |wc -l
54

According to the comments in TableInputFormatBase, the recommended means for subclasses like HiveHBaseTableInputFormat is to call initializeTable() instead of setHTable() that it currently uses.
"
Subclasses MUST ensure initializeTable(Connection, TableName) is called for an instance to function properly. Each of the entry points to this class used by the MapReduce framework, {@link #createRecordReader(InputSplit, TaskAttemptContext)} and {@link #getSplits(JobContext)}, will call {@link #initialize(JobContext)} as a convenient centralized location to handle retrieving the necessary configuration information. If your subclass overrides either of these methods, either call the parent version or call initialize yourself.
"

Currently setHTable() also creates an additional Admin connection, even though it is not needed.

So the use of deprecated APIs are to be replaced.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)