You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Manish Malhotra <ma...@gmail.com> on 2012/11/08 08:28:48 UTC

Hive Meta Server (Thrift Server) Failover / Redundancy / Load Balancing

Hi,

I need to build a failover/LB solution for Hive Services.
MySQL DB is fine, and can work out.
But for Hive Metastore Service, can I simply put the Load Balancer like HA
Proxy etc. in between the client and achieve this.
Thrift Servers and default stateless, not sure about hive one.

I red very few comments on this problem.
Similar approach blogged at :
http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/running_a_hive_thrift_cluster

Very Imp from HCatalog mailing thread:
http://mail-archives.apache.org/mod_mbox/incubator-hcatalog-user/201109.mbox/%3CDCAE2122-F3C6-428A-BE70-5C994C0DC90E@yahoo-inc.com%3E


As per this mailing thread, if security is used in HIVE thrift meta server
then need to do more modification in the server as it maintain the token of
user for that session. (user connection).

Please help me to move forward on this problem and please verify if my
understanding is correct or not on the above 2 blogs / mail.

Is there any initial work done under HCatalog or Hive, which I can look
into and extend / patch.


Regards,
Manish

Re: Hive Meta Server (Thrift Server) Failover / Redundancy / Load Balancing

Posted by Ashutosh Chauhan <as...@gmail.com>.
Hi Manish,

You can use LB. Trouble you might have while deploying just LB without
failover is that when metastore server is actually going down all your
active connections will be dropped as well. But, since most of the rpc
calls to metastore are expected to complete fairly quickly, depending on
your workload you might be ok with this.

For secure metastore, patch is ready to use. You may want to try it out.

Hope it helps,
Ashutosh


On Sat, Nov 10, 2012 at 7:39 PM, Manish Malhotra <
manish.malhotra.work@gmail.com> wrote:

> Thanks Ashutosh,
>
> For quick reply.
>
> 1. For non-secure MetaServer: I'm wondering if I add LoadBalancer like
> HAProxy in between then we don't need to handle the failover at the Thrift
> client side.
> So, If I use LB in between Thrift Client and MetaServer, then it should be
> good to use ?
> May be I'm missing something, but I'll check out the code and see what is
> the status of the patch and what additional work is required.
>
> 2. For Secure MetaServer: I need to dig further into code, and then will
> ask more questions if required.
> I believe storing token into DB patch is available for review but not
> available as the one which is ready to use. Or I can try out that patch
> when using the secured one?
>
> Again thanks your help !!
>
> Regards,
> Manish
>
>
>
> On Thu, Nov 8, 2012 at 9:39 AM, Ashutosh Chauhan <ha...@apache.org>wrote:
>
>> Hi Manish,
>>
>> Your understanding is mostly correct, though there is one additional bit.
>> MetastoreClient in current incarnation don't automatically reconnect in
>> case connection gets dropped for a connected session. As a result, it won't
>> failover active connections. New connections would be fine. Fortunately,
>> though there is a work in progress for this on :
>> https://issues.apache.org/jira/browse/HIVE-3400 In case you want to help
>> out you should help there.
>>
>> For secure case, as you pointed out you additionally need ZooKeeper to
>> store security tokens. So, you need to bring up a ZK cluster. But, if you
>> think dedicating 3 nodes for ZK for metastore is an overhead then you would
>> need https://issues.apache.org/jira/browse/HIVE-3255 With that patch,
>> tokens are stored in same backend db, so there would be no need to bring up
>> ZK cluster.
>>
>> Hopefully, both of these patches gets in for 0.10 release.
>>
>> Thanks,
>> Ashutosh
>>
>> On Wed, Nov 7, 2012 at 11:28 PM, Manish Malhotra <
>> manish.malhotra.work@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I need to build a failover/LB solution for Hive Services.
>>> MySQL DB is fine, and can work out.
>>> But for Hive Metastore Service, can I simply put the Load Balancer like
>>> HA Proxy etc. in between the client and achieve this.
>>> Thrift Servers and default stateless, not sure about hive one.
>>>
>>> I red very few comments on this problem.
>>> Similar approach blogged at :
>>> http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/running_a_hive_thrift_cluster
>>>
>>> Very Imp from HCatalog mailing thread:
>>> http://mail-archives.apache.org/mod_mbox/incubator-hcatalog-user/201109.mbox/%3CDCAE2122-F3C6-428A-BE70-5C994C0DC90E@yahoo-inc.com%3E
>>>
>>>
>>> As per this mailing thread, if security is used in HIVE thrift meta
>>> server then need to do more modification in the server as it maintain the
>>> token of user for that session. (user connection).
>>>
>>> Please help me to move forward on this problem and please verify if my
>>> understanding is correct or not on the above 2 blogs / mail.
>>>
>>> Is there any initial work done under HCatalog or Hive, which I can look
>>> into and extend / patch.
>>>
>>>
>>> Regards,
>>> Manish
>>>
>>
>>
>

Re: Hive Meta Server (Thrift Server) Failover / Redundancy / Load Balancing

Posted by Manish Malhotra <ma...@gmail.com>.
Thanks Ashutosh,

For quick reply.

1. For non-secure MetaServer: I'm wondering if I add LoadBalancer like
HAProxy in between then we don't need to handle the failover at the Thrift
client side.
So, If I use LB in between Thrift Client and MetaServer, then it should be
good to use ?
May be I'm missing something, but I'll check out the code and see what is
the status of the patch and what additional work is required.

2. For Secure MetaServer: I need to dig further into code, and then will
ask more questions if required.
I believe storing token into DB patch is available for review but not
available as the one which is ready to use. Or I can try out that patch
when using the secured one?

Again thanks your help !!

Regards,
Manish


On Thu, Nov 8, 2012 at 9:39 AM, Ashutosh Chauhan <ha...@apache.org>wrote:

> Hi Manish,
>
> Your understanding is mostly correct, though there is one additional bit.
> MetastoreClient in current incarnation don't automatically reconnect in
> case connection gets dropped for a connected session. As a result, it won't
> failover active connections. New connections would be fine. Fortunately,
> though there is a work in progress for this on :
> https://issues.apache.org/jira/browse/HIVE-3400 In case you want to help
> out you should help there.
>
> For secure case, as you pointed out you additionally need ZooKeeper to
> store security tokens. So, you need to bring up a ZK cluster. But, if you
> think dedicating 3 nodes for ZK for metastore is an overhead then you would
> need https://issues.apache.org/jira/browse/HIVE-3255 With that patch,
> tokens are stored in same backend db, so there would be no need to bring up
> ZK cluster.
>
> Hopefully, both of these patches gets in for 0.10 release.
>
> Thanks,
> Ashutosh
>
> On Wed, Nov 7, 2012 at 11:28 PM, Manish Malhotra <
> manish.malhotra.work@gmail.com> wrote:
>
>> Hi,
>>
>> I need to build a failover/LB solution for Hive Services.
>> MySQL DB is fine, and can work out.
>> But for Hive Metastore Service, can I simply put the Load Balancer like
>> HA Proxy etc. in between the client and achieve this.
>> Thrift Servers and default stateless, not sure about hive one.
>>
>> I red very few comments on this problem.
>> Similar approach blogged at :
>> http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/running_a_hive_thrift_cluster
>>
>> Very Imp from HCatalog mailing thread:
>> http://mail-archives.apache.org/mod_mbox/incubator-hcatalog-user/201109.mbox/%3CDCAE2122-F3C6-428A-BE70-5C994C0DC90E@yahoo-inc.com%3E
>>
>>
>> As per this mailing thread, if security is used in HIVE thrift meta
>> server then need to do more modification in the server as it maintain the
>> token of user for that session. (user connection).
>>
>> Please help me to move forward on this problem and please verify if my
>> understanding is correct or not on the above 2 blogs / mail.
>>
>> Is there any initial work done under HCatalog or Hive, which I can look
>> into and extend / patch.
>>
>>
>> Regards,
>> Manish
>>
>
>

Re: Hive Meta Server (Thrift Server) Failover / Redundancy / Load Balancing

Posted by Ashutosh Chauhan <ha...@apache.org>.
Hi Manish,

Your understanding is mostly correct, though there is one additional bit.
MetastoreClient in current incarnation don't automatically reconnect in
case connection gets dropped for a connected session. As a result, it won't
failover active connections. New connections would be fine. Fortunately,
though there is a work in progress for this on :
https://issues.apache.org/jira/browse/HIVE-3400 In case you want to help
out you should help there.

For secure case, as you pointed out you additionally need ZooKeeper to
store security tokens. So, you need to bring up a ZK cluster. But, if you
think dedicating 3 nodes for ZK for metastore is an overhead then you would
need https://issues.apache.org/jira/browse/HIVE-3255 With that patch,
tokens are stored in same backend db, so there would be no need to bring up
ZK cluster.

Hopefully, both of these patches gets in for 0.10 release.

Thanks,
Ashutosh

On Wed, Nov 7, 2012 at 11:28 PM, Manish Malhotra <
manish.malhotra.work@gmail.com> wrote:

> Hi,
>
> I need to build a failover/LB solution for Hive Services.
> MySQL DB is fine, and can work out.
> But for Hive Metastore Service, can I simply put the Load Balancer like HA
> Proxy etc. in between the client and achieve this.
> Thrift Servers and default stateless, not sure about hive one.
>
> I red very few comments on this problem.
> Similar approach blogged at :
> http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/running_a_hive_thrift_cluster
>
> Very Imp from HCatalog mailing thread:
> http://mail-archives.apache.org/mod_mbox/incubator-hcatalog-user/201109.mbox/%3CDCAE2122-F3C6-428A-BE70-5C994C0DC90E@yahoo-inc.com%3E
>
>
> As per this mailing thread, if security is used in HIVE thrift meta server
> then need to do more modification in the server as it maintain the token of
> user for that session. (user connection).
>
> Please help me to move forward on this problem and please verify if my
> understanding is correct or not on the above 2 blogs / mail.
>
> Is there any initial work done under HCatalog or Hive, which I can look
> into and extend / patch.
>
>
> Regards,
> Manish
>