You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Aditya Rao <ad...@gmail.com> on 2013/02/18 15:15:11 UTC

Using HiveJDBC interface

Hi,

I've just recently started using Hive and I'm particularly interested about
the capabilities of the HiveJDBC interface. I'm writing an simple
application that aims to use the Hive JDBC driver to submit hive queries.
My end goal is to be able to create multiple connections using the
Hive JDBCdriver and submit queries concurrently.

I came across a few issues in the mailing list and in JIRA related to
issuing concurrent requests to the hive server (explained here https://cwiki
.apache.org/Hive/hiveserver2-thrift-api.html) . I would like to know if
anyone has suggestions/guidelines regarding best practices to work around
this problem? Apart from restricting to a single query at a time, are there
any other known pitfalls that one should keep an eye out when using the
HiveJDBC interface.

Thanks,

Aditya

Re: Using HiveJDBC interface

Posted by Aditya Rao <ad...@gmail.com>.

Thanks for the tips. I would think #2 works well when you are setting
hiveconf variables that are isolated to your query. I have instances in my
scripts where I need to set hadoop properties before executing a query. For
example setting the number of reducers using

set mapred.reduce.tasks=50

Without the concept of a session in HiveServer, won't setting hadoop
configurations like the one above effect all queries that are being
submitted concurrently?

Also, how do you tackle conflicts with tables stored in the meta store?

Aditya



On Mon, Feb 18, 2013 at 8:09 PM, Edward Capriolo <ed...@gmail.com>wrote:

> I personally do not find it a large problem.
>
> 1) have multiple backend hive thrift servers with ha-proxy in front
> 2) don't use varaible names like "x" use "myprocess1.x" to remove
> possible collisions
> 3) experiment with hivethrift2
> 4) dont use zk locking + thrift (it leaks as far as I can tell (older
> versions))
>
> Really #2 solve the problem mentioned on the wiki page. There are
> other subtle issues, but all in all it works pretty well.
>
> Edward
>
> On Mon, Feb 18, 2013 at 9:15 AM, Aditya Rao <ad...@gmail.com> wrote:
> > Hi,
> >
> > I've just recently started using Hive and I'm particularly interested
> about
> > the capabilities of the HiveJDBC interface. I'm writing an simple
> > application that aims to use the Hive JDBC driver to submit hive
> queries. My
> > end goal is to be able to create multiple connections using the Hive JDBC
> > driver and submit queries concurrently.
> >
> > I came across a few issues in the mailing list and in JIRA related to
> > issuing concurrent requests to the hive server (explained here
> > https://cwiki.apache.org/Hive/hiveserver2-thrift-api.html) . I would
> like to
> > know if anyone has suggestions/guidelines regarding best practices to
> work
> > around this problem? Apart from restricting to a single query at a time,
> are
> > there any other known pitfalls that one should keep an eye out when using
> > the HiveJDBC interface.
> >
> > Thanks,
> >
> > Aditya
>

Re: Using HiveJDBC interface

Posted by Edward Capriolo <ed...@gmail.com>.

I personally do not find it a large problem.

1) have multiple backend hive thrift servers with ha-proxy in front
2) don't use varaible names like "x" use "myprocess1.x" to remove
possible collisions
3) experiment with hivethrift2
4) dont use zk locking + thrift (it leaks as far as I can tell (older versions))

Really #2 solve the problem mentioned on the wiki page. There are
other subtle issues, but all in all it works pretty well.

Edward

On Mon, Feb 18, 2013 at 9:15 AM, Aditya Rao <ad...@gmail.com> wrote:
> Hi,
>
> I've just recently started using Hive and I'm particularly interested about
> the capabilities of the HiveJDBC interface. I'm writing an simple
> application that aims to use the Hive JDBC driver to submit hive queries. My
> end goal is to be able to create multiple connections using the Hive JDBC
> driver and submit queries concurrently.
>
> I came across a few issues in the mailing list and in JIRA related to
> issuing concurrent requests to the hive server (explained here
> https://cwiki.apache.org/Hive/hiveserver2-thrift-api.html) . I would like to
> know if anyone has suggestions/guidelines regarding best practices to work
> around this problem? Apart from restricting to a single query at a time, are
> there any other known pitfalls that one should keep an eye out when using
> the HiveJDBC interface.
>
> Thanks,
>
> Aditya