You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Rong-en Fan <gr...@gmail.com> on 2009/07/16 18:40:19 UTC

multi masters in 0.20

Few days ago, I played with the latest trunk to see how fail-tolerance
works in 0.20. While running PerformanceEvaluation to generate
workloads, killing HRS and HMaster is not a big deal. The client
recovers after tens of secs to few minutes. This is good.

For multi masters, it seems that I have to manually start backup master by

bin/hbase-daemon.sh start master

This is ok, though it's better that we can specify this as part of
hbase-site.xml or a new conf/masters.

But stop  backup master is messy... if I just do

bin/hbase-daemon.sh stop master

It will bring the whole cluster down. That's bad.

Not sure if we can do something like this :

1. if there is an active master, stop master will just make HMaster
die without shutdown the whole cluster
2. otherwise, shutdown the whole cluster as before

Any ideas?

Thanks,
Rong-En Fan

Re: multi masters in 0.20

Posted by stack <st...@duboce.net>.

In this doc, http://wiki.apache.org/hadoop/Hbase/RollingRestart, I say kill
-9 the master for now.
St.Ack

On Thu, Jul 16, 2009 at 9:47 AM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> Rong-En Fan,
>
> I agree multi-master requires manual tasks and the current lack of doc
> does not help (it's on my list tho).
>
> I also agree that stop on a backup master shouldn't stop the cluster.
> Can you fill in a Jira? (kill -9 works well btw)
>
> wrt multi-master conf, I personally ruled it out of 0.20.0 but do you
> think we should still include it for usability? Is it currently too
> rough?
>
> Thx,
>
> J-D
>
> On Thu, Jul 16, 2009 at 12:40 PM, Rong-en Fan<gr...@gmail.com> wrote:
> > Few days ago, I played with the latest trunk to see how fail-tolerance
> > works in 0.20. While running PerformanceEvaluation to generate
> > workloads, killing HRS and HMaster is not a big deal. The client
> > recovers after tens of secs to few minutes. This is good.
> >
> > For multi masters, it seems that I have to manually start backup master
> by
> >
> > bin/hbase-daemon.sh start master
> >
> > This is ok, though it's better that we can specify this as part of
> > hbase-site.xml or a new conf/masters.
> >
> > But stop  backup master is messy... if I just do
> >
> > bin/hbase-daemon.sh stop master
> >
> > It will bring the whole cluster down. That's bad.
> >
> > Not sure if we can do something like this :
> >
> > 1. if there is an active master, stop master will just make HMaster
> > die without shutdown the whole cluster
> > 2. otherwise, shutdown the whole cluster as before
> >
> > Any ideas?
> >
> > Thanks,
> > Rong-En Fan
> >
>

Re: multi masters in 0.20

Posted by Jonathan Gray <jl...@streamy.com>.

I think we should add a conf file for "backupmasters", or just use 
"masters" but with the first in the list the one that always gets to be 
master first (introducing a delay should ensure he gets the ephemeral 
node first?).

Should not be too bad?  If it's hard we could wait but seems like it 
would be fairly simple.

JG

Jean-Daniel Cryans wrote:
> Rong-En Fan,
> 
> I agree multi-master requires manual tasks and the current lack of doc
> does not help (it's on my list tho).
> 
> I also agree that stop on a backup master shouldn't stop the cluster.
> Can you fill in a Jira? (kill -9 works well btw)
> 
> wrt multi-master conf, I personally ruled it out of 0.20.0 but do you
> think we should still include it for usability? Is it currently too
> rough?
> 
> Thx,
> 
> J-D
> 
> On Thu, Jul 16, 2009 at 12:40 PM, Rong-en Fan<gr...@gmail.com> wrote:
>> Few days ago, I played with the latest trunk to see how fail-tolerance
>> works in 0.20. While running PerformanceEvaluation to generate
>> workloads, killing HRS and HMaster is not a big deal. The client
>> recovers after tens of secs to few minutes. This is good.
>>
>> For multi masters, it seems that I have to manually start backup master by
>>
>> bin/hbase-daemon.sh start master
>>
>> This is ok, though it's better that we can specify this as part of
>> hbase-site.xml or a new conf/masters.
>>
>> But stop  backup master is messy... if I just do
>>
>> bin/hbase-daemon.sh stop master
>>
>> It will bring the whole cluster down. That's bad.
>>
>> Not sure if we can do something like this :
>>
>> 1. if there is an active master, stop master will just make HMaster
>> die without shutdown the whole cluster
>> 2. otherwise, shutdown the whole cluster as before
>>
>> Any ideas?
>>
>> Thanks,
>> Rong-En Fan
>>
>

Re: multi masters in 0.20

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Rong-En Fan,

I agree multi-master requires manual tasks and the current lack of doc
does not help (it's on my list tho).

I also agree that stop on a backup master shouldn't stop the cluster.
Can you fill in a Jira? (kill -9 works well btw)

wrt multi-master conf, I personally ruled it out of 0.20.0 but do you
think we should still include it for usability? Is it currently too
rough?

Thx,

J-D

On Thu, Jul 16, 2009 at 12:40 PM, Rong-en Fan<gr...@gmail.com> wrote:
> Few days ago, I played with the latest trunk to see how fail-tolerance
> works in 0.20. While running PerformanceEvaluation to generate
> workloads, killing HRS and HMaster is not a big deal. The client
> recovers after tens of secs to few minutes. This is good.
>
> For multi masters, it seems that I have to manually start backup master by
>
> bin/hbase-daemon.sh start master
>
> This is ok, though it's better that we can specify this as part of
> hbase-site.xml or a new conf/masters.
>
> But stop  backup master is messy... if I just do
>
> bin/hbase-daemon.sh stop master
>
> It will bring the whole cluster down. That's bad.
>
> Not sure if we can do something like this :
>
> 1. if there is an active master, stop master will just make HMaster
> die without shutdown the whole cluster
> 2. otherwise, shutdown the whole cluster as before
>
> Any ideas?
>
> Thanks,
> Rong-En Fan
>