You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by "V.Senthil Kumar" <va...@yahoo.com> on 2011/05/03 00:41:28 UTC

HIVE Server multiple instances

Hello, 

I have one instance of HIVE JDBC server running on port 10000. Can I run another 
instance on different port ? Would it cause a concurrency issue on the 
underlying data warehouse files ? Please clarify.

Thanks,
V.Senthil Kumar

Re: HIVE Server multiple instances

Posted by "V.Senthil Kumar" <va...@yahoo.com>.

Thanks Paul. That is really useful information. 



----- Original Message ----
From: Matthew Rathbone <ma...@foursquare.com>
To: user@hive.apache.org
Sent: Tue, May 3, 2011 11:18:17 AM
Subject: Re: HIVE Server multiple instances

Hey Paul,

I'd be very interested in reading about your hadoop/hive setup, do you have a 
blog post or anything describing this setup, or some of the issues you've have 
with hive?

-- 
Matthew Rathbone
Foursquare | Software Engineer | Server Engineering Team
matthew@foursquare.com | @rathboma | 4sq

On Tuesday, May 3, 2011 at 2:15 PM, Paul Ingles wrote:
HiveServer does seem to support multiple connections but I think it still has 
thread-safety problems (https://issues.apache.org/jira/browse/HIVE-80).
> 
> We've (www.forward.co.uk) certainly had instability problems with the thrift 
>server in the past and now run 5 or so instances behind the HAProxy 
>load-balancer (http://haproxy.1wt.eu/). Since we did that it's been 
>significantly better. 
>
> 
> I think the JDBC server still operates using thrift to connect to the 
>HiveServer so I would expect it to have similar problems (but I may have got 
>that wrong :)
> 
> 
> On 3 May 2011, at 18:59, Matthew Rathbone wrote:
> 
> > Even if it is single threaded it certainly seems to support multiple 
>connections. 
>
> > 
> > We run 5 workers all connected at the same time executing a different query 
>each ( with a different connection per worker).
> > 
> > Hope that helps
> > 
> > Matthew 
> > On Tuesday, May 3, 2011 at 1:40 PM, V.Senthil Kumar wrote:
> > Thanks Matthew. The wiki page http://wiki.apache.org/hadoop/Hive/HiveServer 
>says 
>
> > > its single threaded. I have a queue of queries which gets added dynamically 
>all 
>
> > > the time. By the time I run 1 query using 1 JDBC connection, the queue gets 
>
> > > added more queries and builds up a backlog. So, I was that's why I was 
>wondering 
>
> > > whether I can run two or more instances to avoid having a big backlog in 
>queue.
> > > 
> > > 
> > > 
> > > ----- Original Message ----
> > > From: Matthew Rathbone <ma...@foursquare.com>
> > > To: user@hive.apache.org
> > > Sent: Tue, May 3, 2011 7:46:49 AM
> > > Subject: Re: HIVE Server multiple instances
> > > 
> > > Why would you want to run two? I think it is multithreaded, so you can 
>query it 
>
> > > from two different connections
> > > 
> > > -- 
> > > Matthew Rathbone
> > > Foursquare | Software Engineer | Server Engineering Team
> > > matthew@foursquare.com | @rathboma | 4sq
> > > 
> > > On Monday, May 2, 2011 at 6:41 PM, V.Senthil Kumar wrote:
> > > Hello, 
> > > > 
> > > > I have one instance of HIVE JDBC server running on port 10000. Can I run 

> > > > another 
> > > > 
> > > > instance on different port ? Would it cause a concurrency issue on the 
> > > > underlying data warehouse files ? Please clarify.
> > > > 
> > > > Thanks,
> > > > V.Senthil Kumar
>

Re: HIVE Server multiple instances

Posted by Marcos Ortiz <ml...@uci.cu>.

El 5/4/2011 7:48 AM, Paul Ingles escribió:
> For future reference I've posted a little more about our setup here: 
> http://oobaloo.co.uk/multiple-connections-with-hive
>
>
> On Tue, May 3, 2011 at 8:01 PM, Paul Ingles <paul@oobaloo.co.uk 
> <ma...@oobaloo.co.uk>> wrote:
>
>     Nothing specifically about our Hive setup although some of us at
>     Forward have blogged bits and pieces about Hive + Hadoop and have
>     a few Hadoop/Hive related libs on our GitHub account:
>     https://github.com/forward.
>
>     I've blogged a few bits (http://www.oobaloo.co.uk/) as has one of
>     my colleagues
>     (http://blog.fingertap.org/post/1255463384/hive-thrift-client).
>
>     Another colleague also presented a little about our setup during a
>     Hadoop meetup last summer
>     (http://skillsmatter.com/podcast/home/hadoop-in-context-1591). The
>     numbers Andy mentioned will be a little out of date but it does
>     include some screenshots of a few of the surrounding apps we built
>     that connect to Hive and Hadoop (including a web based Hive query
>     tool + work queue).
>
>     I had a quick search through the mailing lists when we had
>     connection problems but I think most of it was discussed/resolved
>     during a chat I had with Shevek from Karmasphere at a London pub
>     following a Hadoop meetup :)
>
>     If you're interested, I've posted a gist
>     (https://gist.github.com/953926) that contains our HAProxy config;
>     clients connect to 10000 and are balanced between :10001 and
>     :10005 on 2 servers (so actually 10 backend servers).
>
>     Be happy to talk more about our experience- feel free to ping me
>     an email off list if you'd like.
>
>
>     On 3 May 2011, at 19:18, Matthew Rathbone wrote:
>
>     > Hey Paul,
>     >
>     > I'd be very interested in reading about your hadoop/hive setup,
>     do you have a blog post or anything describing this setup, or some
>     of the issues you've have with hive?
>     >
>     > --
>     > Matthew Rathbone
>     > Foursquare | Software Engineer | Server Engineering Team
>     > matthew@foursquare.com <ma...@foursquare.com> |
>     @rathboma | 4sq
>     >
>     > On Tuesday, May 3, 2011 at 2:15 PM, Paul Ingles wrote:
>     > HiveServer does seem to support multiple connections but I think
>     it still has thread-safety problems
>     (https://issues.apache.org/jira/browse/HIVE-80).
>     >>
>     >> We've (www.forward.co.uk <http://www.forward.co.uk>) certainly
>     had instability problems with the thrift server in the past and
>     now run 5 or so instances behind the HAProxy load-balancer
>     (http://haproxy.1wt.eu/). Since we did that it's been
>     significantly better.
>     >>
>     >> I think the JDBC server still operates using thrift to connect
>     to the HiveServer so I would expect it to have similar problems
>     (but I may have got that wrong :)
>     >>
>     >>
>     >> On 3 May 2011, at 18:59, Matthew Rathbone wrote:
>     >>
>     >>> Even if it is single threaded it certainly seems to support
>     multiple connections.
>     >>>
>     >>> We run 5 workers all connected at the same time executing a
>     different query each ( with a different connection per worker).
>     >>>
>     >>> Hope that helps
>     >>>
>     >>> Matthew
>     >>> On Tuesday, May 3, 2011 at 1:40 PM, V.Senthil Kumar wrote:
>     >>> Thanks Matthew. The wiki page
>     http://wiki.apache.org/hadoop/Hive/HiveServer says
>     >>>> its single threaded. I have a queue of queries which gets
>     added dynamically all
>     >>>> the time. By the time I run 1 query using 1 JDBC connection,
>     the queue gets
>     >>>> added more queries and builds up a backlog. So, I was that's
>     why I was wondering
>     >>>> whether I can run two or more instances to avoid having a big
>     backlog in queue.
>     >>>>
>     >>>>
>     >>>>
>     >>>> ----- Original Message ----
>     >>>> From: Matthew Rathbone <matthew@foursquare.com
>     <ma...@foursquare.com>>
>     >>>> To: user@hive.apache.org <ma...@hive.apache.org>
>     >>>> Sent: Tue, May 3, 2011 7:46:49 AM
>     >>>> Subject: Re: HIVE Server multiple instances
>     >>>>
>     >>>> Why would you want to run two? I think it is multithreaded,
>     so you can query it
>     >>>> from two different connections
>     >>>>
>     >>>> --
>     >>>> Matthew Rathbone
>     >>>> Foursquare | Software Engineer | Server Engineering Team
>     >>>> matthew@foursquare.com <ma...@foursquare.com> |
>     @rathboma | 4sq
>     >>>>
>     >>>> On Monday, May 2, 2011 at 6:41 PM, V.Senthil Kumar wrote:
>     >>>> Hello,
>     >>>>>
>     >>>>> I have one instance of HIVE JDBC server running on port
>     10000. Can I run
>     >>>>> another
>     >>>>>
>     >>>>> instance on different port ? Would it cause a concurrency
>     issue on the
>     >>>>> underlying data warehouse files ? Please clarify.
>     >>>>>
>     >>>>> Thanks,
>     >>>>> V.Senthil Kumar
>     >>
>     >
>
>
Wow, good piece of information.
Thanks for share it

-- 
Marcos Luís Ortíz Valmaseda
  Software Engineer (Large-Scaled Distributed Systems)
  University of Information Sciences,
  La Habana, Cuba
  Linux User # 418229
  http://about.me/marcosortiz

Re: HIVE Server multiple instances

Posted by "V.Senthil Kumar" <va...@yahoo.com>.

This is great info. Thanks a lot for sharing :)




________________________________
From: Paul Ingles <pa...@oobaloo.co.uk>
To: user@hive.apache.org
Sent: Wed, May 4, 2011 4:48:20 AM
Subject: Re: HIVE Server multiple instances


For future reference I've posted a little more about our setup 
here: http://oobaloo.co.uk/multiple-connections-with-hive


On Tue, May 3, 2011 at 8:01 PM, Paul Ingles <pa...@oobaloo.co.uk> wrote:

Nothing specifically about our Hive setup although some of us at Forward have 
blogged bits and pieces about Hive + Hadoop and have a few Hadoop/Hive related 
libs on our GitHub account: https://github.com/forward.
>
>I've blogged a few bits (http://www.oobaloo.co.uk/) as has one of my colleagues 
>(http://blog.fingertap.org/post/1255463384/hive-thrift-client).
>
>Another colleague also presented a little about our setup during a Hadoop meetup 
>last summer (http://skillsmatter.com/podcast/home/hadoop-in-context-1591). The 
>numbers Andy mentioned will be a little out of date but it does include some 
>screenshots of a few of the surrounding apps we built that connect to Hive and 
>Hadoop (including a web based Hive query tool + work queue).
>
>I had a quick search through the mailing lists when we had connection problems 
>but I think most of it was discussed/resolved during a chat I had with Shevek 
>from Karmasphere at a London pub following a Hadoop meetup :)
>
>If you're interested, I've posted a gist (https://gist.github.com/953926) that 
>contains our HAProxy config; clients connect to 10000 and are balanced between 
>:10001 and :10005 on 2 servers (so actually 10 backend servers).
>
>Be happy to talk more about our experience- feel free to ping me an email off 
>list if you'd like.
>
>
>
>On 3 May 2011, at 19:18, Matthew Rathbone wrote:
>
>> Hey Paul,
>>
>> I'd be very interested in reading about your hadoop/hive setup, do you have a 
>>blog post or anything describing this setup, or some of the issues you've have 
>>with hive?
>>
>> --
>> Matthew Rathbone
>> Foursquare | Software Engineer | Server Engineering Team
>> matthew@foursquare.com | @rathboma | 4sq
>>
>> On Tuesday, May 3, 2011 at 2:15 PM, Paul Ingles wrote:
>> HiveServer does seem to support multiple connections but I think it still has 
>>thread-safety problems (https://issues.apache.org/jira/browse/HIVE-80).
>>>
>>> We've (www.forward.co.uk) certainly had instability problems with the thrift 
>>>server in the past and now run 5 or so instances behind the HAProxy 
>>>load-balancer (http://haproxy.1wt.eu/). Since we did that it's been 
>>>significantly better.
>>>
>>> I think the JDBC server still operates using thrift to connect to the 
>>>HiveServer so I would expect it to have similar problems (but I may have got 
>>>that wrong :)
>>>
>>>
>>> On 3 May 2011, at 18:59, Matthew Rathbone wrote:
>>>
>>>> Even if it is single threaded it certainly seems to support multiple 
>>>>connections.
>>>>
>>>> We run 5 workers all connected at the same time executing a different query 
>>>>each ( with a different connection per worker).
>>>>
>>>> Hope that helps
>>>>
>>>> Matthew
>>>> On Tuesday, May 3, 2011 at 1:40 PM, V.Senthil Kumar wrote:
>>>> Thanks Matthew. The wiki page http://wiki.apache.org/hadoop/Hive/HiveServer 
>>>>says
>>>>> its single threaded. I have a queue of queries which gets added dynamically 
>>>>all
>>>>> the time. By the time I run 1 query using 1 JDBC connection, the queue 
gets
>>>>> added more queries and builds up a backlog. So, I was that's why I was 
>>>>>wondering
>>>>> whether I can run two or more instances to avoid having a big backlog in 
>>>>queue.
>>>>>
>>>>>
>>>>>
>>>>> ----- Original Message ----
>>>>> From: Matthew Rathbone <ma...@foursquare.com>
>>>>> To: user@hive.apache.org
>>>>> Sent: Tue, May 3, 2011 7:46:49 AM
>>>>> Subject: Re: HIVE Server multiple instances
>>>>>
>>>>> Why would you want to run two? I think it is multithreaded, so you can query 
>>>>it
>>>>> from two different connections
>>>>>
>>>>> --
>>>>> Matthew Rathbone
>>>>> Foursquare | Software Engineer | Server Engineering Team
>>>>> matthew@foursquare.com | @rathboma | 4sq
>>>>>
>>>>> On Monday, May 2, 2011 at 6:41 PM, V.Senthil Kumar wrote:
>>>>> Hello,
>>>>>>
>>>>>> I have one instance of HIVE JDBC server running on port 10000. Can I run
>>>>>> another
>>>>>>
>>>>>> instance on different port ? Would it cause a concurrency issue on the
>>>>>> underlying data warehouse files ? Please clarify.
>>>>>>
>>>>>> Thanks,
>>>>>> V.Senthil Kumar
>>>
>>
>
>

Re: HIVE Server multiple instances

Posted by Paul Ingles <pa...@oobaloo.co.uk>.

For future reference I've posted a little more about our setup here:
http://oobaloo.co.uk/multiple-connections-with-hive


On Tue, May 3, 2011 at 8:01 PM, Paul Ingles <pa...@oobaloo.co.uk> wrote:

> Nothing specifically about our Hive setup although some of us at Forward
> have blogged bits and pieces about Hive + Hadoop and have a few Hadoop/Hive
> related libs on our GitHub account: https://github.com/forward.
>
> I've blogged a few bits (http://www.oobaloo.co.uk/) as has one of my
> colleagues (http://blog.fingertap.org/post/1255463384/hive-thrift-client).
>
> Another colleague also presented a little about our setup during a Hadoop
> meetup last summer (
> http://skillsmatter.com/podcast/home/hadoop-in-context-1591). The numbers
> Andy mentioned will be a little out of date but it does include some
> screenshots of a few of the surrounding apps we built that connect to Hive
> and Hadoop (including a web based Hive query tool + work queue).
>
> I had a quick search through the mailing lists when we had connection
> problems but I think most of it was discussed/resolved during a chat I had
> with Shevek from Karmasphere at a London pub following a Hadoop meetup :)
>
> If you're interested, I've posted a gist (https://gist.github.com/953926)
> that contains our HAProxy config; clients connect to 10000 and are balanced
> between :10001 and :10005 on 2 servers (so actually 10 backend servers).
>
> Be happy to talk more about our experience- feel free to ping me an email
> off list if you'd like.
>
>
> On 3 May 2011, at 19:18, Matthew Rathbone wrote:
>
> > Hey Paul,
> >
> > I'd be very interested in reading about your hadoop/hive setup, do you
> have a blog post or anything describing this setup, or some of the issues
> you've have with hive?
> >
> > --
> > Matthew Rathbone
> > Foursquare | Software Engineer | Server Engineering Team
> > matthew@foursquare.com | @rathboma | 4sq
> >
> > On Tuesday, May 3, 2011 at 2:15 PM, Paul Ingles wrote:
> > HiveServer does seem to support multiple connections but I think it still
> has thread-safety problems (https://issues.apache.org/jira/browse/HIVE-80
> ).
> >>
> >> We've (www.forward.co.uk) certainly had instability problems with the
> thrift server in the past and now run 5 or so instances behind the HAProxy
> load-balancer (http://haproxy.1wt.eu/). Since we did that it's been
> significantly better.
> >>
> >> I think the JDBC server still operates using thrift to connect to the
> HiveServer so I would expect it to have similar problems (but I may have got
> that wrong :)
> >>
> >>
> >> On 3 May 2011, at 18:59, Matthew Rathbone wrote:
> >>
> >>> Even if it is single threaded it certainly seems to support multiple
> connections.
> >>>
> >>> We run 5 workers all connected at the same time executing a different
> query each ( with a different connection per worker).
> >>>
> >>> Hope that helps
> >>>
> >>> Matthew
> >>> On Tuesday, May 3, 2011 at 1:40 PM, V.Senthil Kumar wrote:
> >>> Thanks Matthew. The wiki page
> http://wiki.apache.org/hadoop/Hive/HiveServer says
> >>>> its single threaded. I have a queue of queries which gets added
> dynamically all
> >>>> the time. By the time I run 1 query using 1 JDBC connection, the queue
> gets
> >>>> added more queries and builds up a backlog. So, I was that's why I was
> wondering
> >>>> whether I can run two or more instances to avoid having a big backlog
> in queue.
> >>>>
> >>>>
> >>>>
> >>>> ----- Original Message ----
> >>>> From: Matthew Rathbone <ma...@foursquare.com>
> >>>> To: user@hive.apache.org
> >>>> Sent: Tue, May 3, 2011 7:46:49 AM
> >>>> Subject: Re: HIVE Server multiple instances
> >>>>
> >>>> Why would you want to run two? I think it is multithreaded, so you can
> query it
> >>>> from two different connections
> >>>>
> >>>> --
> >>>> Matthew Rathbone
> >>>> Foursquare | Software Engineer | Server Engineering Team
> >>>> matthew@foursquare.com | @rathboma | 4sq
> >>>>
> >>>> On Monday, May 2, 2011 at 6:41 PM, V.Senthil Kumar wrote:
> >>>> Hello,
> >>>>>
> >>>>> I have one instance of HIVE JDBC server running on port 10000. Can I
> run
> >>>>> another
> >>>>>
> >>>>> instance on different port ? Would it cause a concurrency issue on
> the
> >>>>> underlying data warehouse files ? Please clarify.
> >>>>>
> >>>>> Thanks,
> >>>>> V.Senthil Kumar
> >>
> >
>
>

Re: HIVE Server multiple instances

Posted by Paul Ingles <pa...@oobaloo.co.uk>.

Nothing specifically about our Hive setup although some of us at Forward have blogged bits and pieces about Hive + Hadoop and have a few Hadoop/Hive related libs on our GitHub account: https://github.com/forward.

I've blogged a few bits (http://www.oobaloo.co.uk/) as has one of my colleagues (http://blog.fingertap.org/post/1255463384/hive-thrift-client).

Another colleague also presented a little about our setup during a Hadoop meetup last summer (http://skillsmatter.com/podcast/home/hadoop-in-context-1591). The numbers Andy mentioned will be a little out of date but it does include some screenshots of a few of the surrounding apps we built that connect to Hive and Hadoop (including a web based Hive query tool + work queue).

I had a quick search through the mailing lists when we had connection problems but I think most of it was discussed/resolved during a chat I had with Shevek from Karmasphere at a London pub following a Hadoop meetup :)

If you're interested, I've posted a gist (https://gist.github.com/953926) that contains our HAProxy config; clients connect to 10000 and are balanced between :10001 and :10005 on 2 servers (so actually 10 backend servers).

Be happy to talk more about our experience- feel free to ping me an email off list if you'd like.


On 3 May 2011, at 19:18, Matthew Rathbone wrote:

> Hey Paul,
> 
> I'd be very interested in reading about your hadoop/hive setup, do you have a blog post or anything describing this setup, or some of the issues you've have with hive?
> 
> -- 
> Matthew Rathbone
> Foursquare | Software Engineer | Server Engineering Team
> matthew@foursquare.com | @rathboma | 4sq
> 
> On Tuesday, May 3, 2011 at 2:15 PM, Paul Ingles wrote:
> HiveServer does seem to support multiple connections but I think it still has thread-safety problems (https://issues.apache.org/jira/browse/HIVE-80).
>> 
>> We've (www.forward.co.uk) certainly had instability problems with the thrift server in the past and now run 5 or so instances behind the HAProxy load-balancer (http://haproxy.1wt.eu/). Since we did that it's been significantly better. 
>> 
>> I think the JDBC server still operates using thrift to connect to the HiveServer so I would expect it to have similar problems (but I may have got that wrong :)
>> 
>> 
>> On 3 May 2011, at 18:59, Matthew Rathbone wrote:
>> 
>>> Even if it is single threaded it certainly seems to support multiple connections. 
>>> 
>>> We run 5 workers all connected at the same time executing a different query each ( with a different connection per worker).
>>> 
>>> Hope that helps
>>> 
>>> Matthew 
>>> On Tuesday, May 3, 2011 at 1:40 PM, V.Senthil Kumar wrote:
>>> Thanks Matthew. The wiki page http://wiki.apache.org/hadoop/Hive/HiveServer says 
>>>> its single threaded. I have a queue of queries which gets added dynamically all 
>>>> the time. By the time I run 1 query using 1 JDBC connection, the queue gets 
>>>> added more queries and builds up a backlog. So, I was that's why I was wondering 
>>>> whether I can run two or more instances to avoid having a big backlog in queue.
>>>> 
>>>> 
>>>> 
>>>> ----- Original Message ----
>>>> From: Matthew Rathbone <ma...@foursquare.com>
>>>> To: user@hive.apache.org
>>>> Sent: Tue, May 3, 2011 7:46:49 AM
>>>> Subject: Re: HIVE Server multiple instances
>>>> 
>>>> Why would you want to run two? I think it is multithreaded, so you can query it 
>>>> from two different connections
>>>> 
>>>> -- 
>>>> Matthew Rathbone
>>>> Foursquare | Software Engineer | Server Engineering Team
>>>> matthew@foursquare.com | @rathboma | 4sq
>>>> 
>>>> On Monday, May 2, 2011 at 6:41 PM, V.Senthil Kumar wrote:
>>>> Hello, 
>>>>> 
>>>>> I have one instance of HIVE JDBC server running on port 10000. Can I run 
>>>>> another 
>>>>> 
>>>>> instance on different port ? Would it cause a concurrency issue on the 
>>>>> underlying data warehouse files ? Please clarify.
>>>>> 
>>>>> Thanks,
>>>>> V.Senthil Kumar
>> 
>

Re: HIVE Server multiple instances

Posted by Matthew Rathbone <ma...@foursquare.com>.

Hey Paul,

I'd be very interested in reading about your hadoop/hive setup, do you have a blog post or anything describing this setup, or some of the issues you've have with hive?

-- 
Matthew Rathbone
Foursquare | Software Engineer | Server Engineering Team
matthew@foursquare.com | @rathboma | 4sq

On Tuesday, May 3, 2011 at 2:15 PM, Paul Ingles wrote:
HiveServer does seem to support multiple connections but I think it still has thread-safety problems (https://issues.apache.org/jira/browse/HIVE-80).
> 
> We've (www.forward.co.uk) certainly had instability problems with the thrift server in the past and now run 5 or so instances behind the HAProxy load-balancer (http://haproxy.1wt.eu/). Since we did that it's been significantly better. 
> 
> I think the JDBC server still operates using thrift to connect to the HiveServer so I would expect it to have similar problems (but I may have got that wrong :)
> 
> 
> On 3 May 2011, at 18:59, Matthew Rathbone wrote:
> 
> > Even if it is single threaded it certainly seems to support multiple connections. 
> > 
> > We run 5 workers all connected at the same time executing a different query each ( with a different connection per worker).
> > 
> > Hope that helps
> > 
> > Matthew 
> > On Tuesday, May 3, 2011 at 1:40 PM, V.Senthil Kumar wrote:
> > Thanks Matthew. The wiki page http://wiki.apache.org/hadoop/Hive/HiveServer says 
> > > its single threaded. I have a queue of queries which gets added dynamically all 
> > > the time. By the time I run 1 query using 1 JDBC connection, the queue gets 
> > > added more queries and builds up a backlog. So, I was that's why I was wondering 
> > > whether I can run two or more instances to avoid having a big backlog in queue.
> > > 
> > > 
> > > 
> > > ----- Original Message ----
> > > From: Matthew Rathbone <ma...@foursquare.com>
> > > To: user@hive.apache.org
> > > Sent: Tue, May 3, 2011 7:46:49 AM
> > > Subject: Re: HIVE Server multiple instances
> > > 
> > > Why would you want to run two? I think it is multithreaded, so you can query it 
> > > from two different connections
> > > 
> > > -- 
> > > Matthew Rathbone
> > > Foursquare | Software Engineer | Server Engineering Team
> > > matthew@foursquare.com | @rathboma | 4sq
> > > 
> > > On Monday, May 2, 2011 at 6:41 PM, V.Senthil Kumar wrote:
> > > Hello, 
> > > > 
> > > > I have one instance of HIVE JDBC server running on port 10000. Can I run 
> > > > another 
> > > > 
> > > > instance on different port ? Would it cause a concurrency issue on the 
> > > > underlying data warehouse files ? Please clarify.
> > > > 
> > > > Thanks,
> > > > V.Senthil Kumar
>

Re: HIVE Server multiple instances

Posted by Paul Ingles <pa...@oobaloo.co.uk>.

HiveServer does seem to support multiple connections but I think it still has thread-safety problems (https://issues.apache.org/jira/browse/HIVE-80).

We've (www.forward.co.uk) certainly had instability problems with the thrift server in the past and now run 5 or so instances behind the HAProxy load-balancer (http://haproxy.1wt.eu/). Since we did that it's been significantly better. 

I think the JDBC server still operates using thrift to connect to the HiveServer so I would expect it to have similar problems (but I may have got that wrong :)


On 3 May 2011, at 18:59, Matthew Rathbone wrote:

> Even if it is single threaded it certainly seems to support multiple connections. 
> 
> We run 5 workers all connected at the same time executing a different query each ( with a different connection per worker).
> 
> Hope that helps
> 
> Matthew 
> On Tuesday, May 3, 2011 at 1:40 PM, V.Senthil Kumar wrote:
> Thanks Matthew. The wiki page http://wiki.apache.org/hadoop/Hive/HiveServer says 
>> its single threaded. I have a queue of queries which gets added dynamically all 
>> the time. By the time I run 1 query using 1 JDBC connection, the queue gets 
>> added more queries and builds up a backlog. So, I was that's why I was wondering 
>> whether I can run two or more instances to avoid having a big backlog in queue.
>> 
>> 
>> 
>> ----- Original Message ----
>> From: Matthew Rathbone <ma...@foursquare.com>
>> To: user@hive.apache.org
>> Sent: Tue, May 3, 2011 7:46:49 AM
>> Subject: Re: HIVE Server multiple instances
>> 
>> Why would you want to run two? I think it is multithreaded, so you can query it 
>> from two different connections
>> 
>> -- 
>> Matthew Rathbone
>> Foursquare | Software Engineer | Server Engineering Team
>> matthew@foursquare.com | @rathboma | 4sq
>> 
>> On Monday, May 2, 2011 at 6:41 PM, V.Senthil Kumar wrote:
>> Hello, 
>>> 
>>> I have one instance of HIVE JDBC server running on port 10000. Can I run 
>>> another 
>>> 
>>> instance on different port ? Would it cause a concurrency issue on the 
>>> underlying data warehouse files ? Please clarify.
>>> 
>>> Thanks,
>>> V.Senthil Kumar
>> 
>

Re: HIVE Server multiple instances

Posted by "V.Senthil Kumar" <va...@yahoo.com>.

Thanks. That really helps and answers my question.  




----- Original Message ----
From: Matthew Rathbone <ma...@foursquare.com>
To: user@hive.apache.org
Sent: Tue, May 3, 2011 10:59:37 AM
Subject: Re: HIVE Server multiple instances

Even if it is single threaded it certainly seems to support multiple 
connections. 


We run 5 workers all connected at the same time executing a different query each 
( with a different connection per worker).

Hope that helps

Matthew 
On Tuesday, May 3, 2011 at 1:40 PM, V.Senthil Kumar wrote:
Thanks Matthew. The wiki page http://wiki.apache.org/hadoop/Hive/HiveServer says 

> its single threaded. I have a queue of queries which gets added dynamically all 
>
> the time. By the time I run 1 query using 1 JDBC connection, the queue gets 
> added more queries and builds up a backlog. So, I was that's why I was 
>wondering 
>
> whether I can run two or more instances to avoid having a big backlog in 
queue.
> 
> 
> 
> ----- Original Message ----
> From: Matthew Rathbone <ma...@foursquare.com>
> To: user@hive.apache.org
> Sent: Tue, May 3, 2011 7:46:49 AM
> Subject: Re: HIVE Server multiple instances
> 
> Why would you want to run two? I think it is multithreaded, so you can query it 
>
> from two different connections
> 
> -- 
> Matthew Rathbone
> Foursquare | Software Engineer | Server Engineering Team
> matthew@foursquare.com | @rathboma | 4sq
> 
> On Monday, May 2, 2011 at 6:41 PM, V.Senthil Kumar wrote:
> Hello, 
> > 
> > I have one instance of HIVE JDBC server running on port 10000. Can I run 
> > another 
> > 
> > instance on different port ? Would it cause a concurrency issue on the 
> > underlying data warehouse files ? Please clarify.
> > 
> > Thanks,
> > V.Senthil Kumar
>

Re: HIVE Server multiple instances

Posted by Matthew Rathbone <ma...@foursquare.com>.

Even if it is single threaded it certainly seems to support multiple connections. 

We run 5 workers all connected at the same time executing a different query each ( with a different connection per worker).

Hope that helps

Matthew 
On Tuesday, May 3, 2011 at 1:40 PM, V.Senthil Kumar wrote:
Thanks Matthew. The wiki page http://wiki.apache.org/hadoop/Hive/HiveServer says 
> its single threaded. I have a queue of queries which gets added dynamically all 
> the time. By the time I run 1 query using 1 JDBC connection, the queue gets 
> added more queries and builds up a backlog. So, I was that's why I was wondering 
> whether I can run two or more instances to avoid having a big backlog in queue.
> 
> 
> 
> ----- Original Message ----
> From: Matthew Rathbone <ma...@foursquare.com>
> To: user@hive.apache.org
> Sent: Tue, May 3, 2011 7:46:49 AM
> Subject: Re: HIVE Server multiple instances
> 
> Why would you want to run two? I think it is multithreaded, so you can query it 
> from two different connections
> 
> -- 
> Matthew Rathbone
> Foursquare | Software Engineer | Server Engineering Team
> matthew@foursquare.com | @rathboma | 4sq
> 
> On Monday, May 2, 2011 at 6:41 PM, V.Senthil Kumar wrote:
> Hello, 
> > 
> > I have one instance of HIVE JDBC server running on port 10000. Can I run 
> > another 
> > 
> > instance on different port ? Would it cause a concurrency issue on the 
> > underlying data warehouse files ? Please clarify.
> > 
> > Thanks,
> > V.Senthil Kumar
>

Re: HIVE Server multiple instances

Posted by "V.Senthil Kumar" <va...@yahoo.com>.

Thanks Matthew. The wiki page http://wiki.apache.org/hadoop/Hive/HiveServer says 
its single threaded.  I have a queue of queries which gets added dynamically all 
the time. By the time I run 1 query using 1 JDBC connection, the queue gets 
added more queries and builds up a backlog. So, I was that's why I was wondering 
whether I can run two or more instances to avoid having a big backlog in queue.

----- Original Message ----
From: Matthew Rathbone <ma...@foursquare.com>
To: user@hive.apache.org
Sent: Tue, May 3, 2011 7:46:49 AM
Subject: Re: HIVE Server multiple instances

Why would you want to run two? I think it is multithreaded, so you can query it 
from two different connections

-- 
Matthew Rathbone
Foursquare | Software Engineer | Server Engineering Team
matthew@foursquare.com | @rathboma | 4sq

On Monday, May 2, 2011 at 6:41 PM, V.Senthil Kumar wrote:
Hello, 
> 
> I have one instance of HIVE JDBC server running on port 10000. Can I run 
>another 
>
> instance on different port ? Would it cause a concurrency issue on the 
> underlying data warehouse files ? Please clarify.
> 
> Thanks,
> V.Senthil Kumar
>

Re: HIVE Server multiple instances

Posted by Matthew Rathbone <ma...@foursquare.com>.

Why would you want to run two? I think it is multithreaded, so you can query it from two different connections

-- 
Matthew Rathbone
Foursquare | Software Engineer | Server Engineering Team
matthew@foursquare.com | @rathboma | 4sq

On Monday, May 2, 2011 at 6:41 PM, V.Senthil Kumar wrote:
Hello, 
> 
> I have one instance of HIVE JDBC server running on port 10000. Can I run another 
> instance on different port ? Would it cause a concurrency issue on the 
> underlying data warehouse files ? Please clarify.
> 
> Thanks,
> V.Senthil Kumar
>