You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Kevin Minder (JIRA)" <ji...@apache.org> on 2018/06/26 14:19:00 UTC
[jira] [Created] (HIVE-19996) Beeline performance poor with drivers
having slow DatabaseMetaData.getPrimaryKeys impl
Kevin Minder created HIVE-19996:
-----------------------------------
Summary: Beeline performance poor with drivers having slow DatabaseMetaData.getPrimaryKeys impl
Key: HIVE-19996
URL: https://issues.apache.org/jira/browse/HIVE-19996
Project: Hive
Issue Type: Improvement
Components: Beeline
Affects Versions: 1.2.1
Environment: Issue detected using Beeline with HBase Phoenix thin driver and a result set with many columns.
Reporter: Kevin Minder
Beeline performance is rather poor for table output format when two conditions occur for the same result set.
# The result set has a large number of columns.
# The driver being used has a slow implementation of DatabaseMetaData.getPrimaryKeys.
For example testing has shown that for a query with ~100 columns using the HBase Phoenix thin driver the execution time can be cut from ~30 seconds to ~2 seconds by using CSV output format vs table output format. For example: {{select * from system.catalog;}}
This is due to how primary keys are detected. Currently the Rows implementation will make a metadata call for every column to determine it is a primary key for display purposes. I propose optimizing this such that a metadata call is only made for each unique table in the result set's columns.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)