You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Charles Yorek <ch...@gmail.com> on 2014/08/15 16:54:43 UTC

ODBC Calls Extremely Slow

Hello All,

I am trying to use the ODBC driver but making ODBC Calls to fetch a list of
tables from Hive is extremely slow on a HiveServer2.  The cluster has a
large number of tables but it takes in excess of 1 hour to extract a list
of tables via ODBC Calls.

Are there any known workarounds for this issue?

Thanks

Re: ODBC Calls Extremely Slow

Posted by Charles Yorek <ch...@gmail.com>.

Thanks for the replies - we are using Derby for the metastore and there is
a large number of tables so this may be part of the issue.  Our assumption
is that the issue is with the driver because queries against the database
run without issue - even queries such as 'show tables'.  I do believe part
of the problem lies in the fact that as was mentioned the drivers go for
the table properties of the entire schema and there is no way yet to limit
that to a specific database.


On Fri, Aug 15, 2014 at 3:46 PM, David Morel <dm...@gmail.com> wrote:

> On 15 Aug 2014, at 22:21, Stephen Sprague wrote:
>
>  what container are you using for your metastore? Derby, mysql or postgres?
>> for a large set of tables don't use Derby.
>>
>> So you've confirmed its the ODBC driver and not the metastore itself?
>>
>
> I had the same sort of issue, related to the fact previous versions of
> the ODBC driver (by all vendors since they are all based on the same
> codebase) tries to get table properties for the whole schema when
> connecting. More recent versions of the driver (try the latest ones from
> Cloudera for instance) do a better job and defer table properties
> retrieval until it is actually needed. In MSQuery for instance, it would
> be when adding a table by drag and drop when assembling the query, or
> highlighting it in the list, etc. We run on MySQL so the metastore speed
> was not an issue. To speed up the whole process, instead of building the
> queries in the GUIs available (standalone SQL/ODBC tools or MSQuery) , I
> ended up saving the queries (producing .qry files in MS, iirc) and
> modifying the text files directly, thus avoiding the schema scanning
> time. Note that, regardless of speed, you will also encounter an issue
> when using placeholders, as the parser -at the driver level- does, or
> did, a rather poor job at parsing the query to transform it to hive
> semantics.
>
> David
>
>
>
>>
>> On Fri, Aug 15, 2014 at 8:54 AM, Bradley Wright <
>> Bradley.Wright@progress.com
>>
>>> wrote:
>>>
>>
>>  Try an eval of our commercial ODBC driver for Hive:
>>>
>>>
>>> http://www.progress.com/products/datadirect-connect/
>>> odbc-drivers/data-sources/hadoop-apache-hive
>>>
>>> It will perform better!
>>>
>>> From: Charles Yorek <ch...@gmail.com>
>>> Reply-To: "user@hive.apache.org" <us...@hive.apache.org>
>>> Date: Friday, August 15, 2014 9:54 AM
>>> To: "user@hive.apache.org" <us...@hive.apache.org>
>>> Subject: ODBC Calls Extremely Slow
>>>
>>> Hello All,
>>>
>>> I am trying to use the ODBC driver but making ODBC Calls to fetch a list
>>> of tables from Hive is extremely slow on a HiveServer2.  The cluster has
>>> a
>>> large number of tables but it takes in excess of 1 hour to extract a list
>>> of tables via ODBC Calls.
>>>
>>> Are there any known workarounds for this issue?
>>>
>>> Thanks
>>>
>>>

Re: ODBC Calls Extremely Slow

Posted by David Morel <dm...@gmail.com>.

On 15 Aug 2014, at 22:21, Stephen Sprague wrote:

> what container are you using for your metastore? Derby, mysql or 
> postgres?
> for a large set of tables don't use Derby.
>
> So you've confirmed its the ODBC driver and not the metastore itself?

I had the same sort of issue, related to the fact previous versions of
the ODBC driver (by all vendors since they are all based on the same
codebase) tries to get table properties for the whole schema when
connecting. More recent versions of the driver (try the latest ones from
Cloudera for instance) do a better job and defer table properties
retrieval until it is actually needed. In MSQuery for instance, it would
be when adding a table by drag and drop when assembling the query, or
highlighting it in the list, etc. We run on MySQL so the metastore speed
was not an issue. To speed up the whole process, instead of building the
queries in the GUIs available (standalone SQL/ODBC tools or MSQuery) , I
ended up saving the queries (producing .qry files in MS, iirc) and
modifying the text files directly, thus avoiding the schema scanning
time. Note that, regardless of speed, you will also encounter an issue
when using placeholders, as the parser -at the driver level- does, or
did, a rather poor job at parsing the query to transform it to hive
semantics.

David

>
>
> On Fri, Aug 15, 2014 at 8:54 AM, Bradley Wright 
> <Bradley.Wright@progress.com
>> wrote:
>
>> Try an eval of our commercial ODBC driver for Hive:
>>
>>
>> http://www.progress.com/products/datadirect-connect/odbc-drivers/data-sources/hadoop-apache-hive
>>
>> It will perform better!
>>
>> From: Charles Yorek <ch...@gmail.com>
>> Reply-To: "user@hive.apache.org" <us...@hive.apache.org>
>> Date: Friday, August 15, 2014 9:54 AM
>> To: "user@hive.apache.org" <us...@hive.apache.org>
>> Subject: ODBC Calls Extremely Slow
>>
>> Hello All,
>>
>> I am trying to use the ODBC driver but making ODBC Calls to fetch a 
>> list
>> of tables from Hive is extremely slow on a HiveServer2.  The cluster 
>> has a
>> large number of tables but it takes in excess of 1 hour to extract a 
>> list
>> of tables via ODBC Calls.
>>
>> Are there any known workarounds for this issue?
>>
>> Thanks
>>

Re: ODBC Calls Extremely Slow

Posted by Stephen Sprague <sp...@gmail.com>.

what container are you using for your metastore? Derby, mysql or postgres?
for a large set of tables don't use Derby.

So you've confirmed its the ODBC driver and not the metastore itself?


On Fri, Aug 15, 2014 at 8:54 AM, Bradley Wright <Bradley.Wright@progress.com
> wrote:

> Try an eval of our commercial ODBC driver for Hive:
>
>
> http://www.progress.com/products/datadirect-connect/odbc-drivers/data-sources/hadoop-apache-hive
>
> It will perform better!
>
> From: Charles Yorek <ch...@gmail.com>
> Reply-To: "user@hive.apache.org" <us...@hive.apache.org>
> Date: Friday, August 15, 2014 9:54 AM
> To: "user@hive.apache.org" <us...@hive.apache.org>
> Subject: ODBC Calls Extremely Slow
>
> Hello All,
>
> I am trying to use the ODBC driver but making ODBC Calls to fetch a list
> of tables from Hive is extremely slow on a HiveServer2.  The cluster has a
> large number of tables but it takes in excess of 1 hour to extract a list
> of tables via ODBC Calls.
>
> Are there any known workarounds for this issue?
>
> Thanks
>

Re: ODBC Calls Extremely Slow

Posted by Bradley Wright <Br...@progress.com>.

Try an eval of our commercial ODBC driver for Hive:

http://www.progress.com/products/datadirect-connect/odbc-drivers/data-sources/hadoop-apache-hive

It will perform better!

From: Charles Yorek <ch...@gmail.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Friday, August 15, 2014 9:54 AM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: ODBC Calls Extremely Slow

Hello All,

I am trying to use the ODBC driver but making ODBC Calls to fetch a list of tables from Hive is extremely slow on a HiveServer2.  The cluster has a large number of tables but it takes in excess of 1 hour to extract a list of tables via ODBC Calls.

Are there any known workarounds for this issue?

Thanks