You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Julian Zhou <ju...@me.com> on 2013/08/06 09:46:41 UTC

HBase master failover

Hi Community,
Could you help if this case makes sense for 0.94 or trunk?
Default of "hbase.rpc.timeout" is 60000 ms (1 min). User sometimes
increase them to a bigger value such as 600000 ms (10 mins) for many
concurrent loading application from client. Some user share the same
hbase-site.xml for both client and server. HRegionServer
#tryRegionServerReport via rpc channel to report to live master, but
there was a window for master failover scenario. That region server
attemping to connect to master, which was just killed, backup master
took the ative role immediately and put to /hbase/master, but region
server was still waiting for the rpc timeout from connecting to the dead
master. If "hbase.rpc.timeout" is too long, this master failover process
will be long due to long rpc timeout from dead master.

If so, could we seperate with 2 options, "hbase.rpc.timeout" is still
for hbase client, while "hbase.rpc.internal.timeout" was for this
regionserver/master rpc channel, which could be set shorted value
without affect real client rpc timeout value?

-- 
Best Regards, Julian

Re: HBase master failover

Posted by Nicolas Liochon <nk...@gmail.com>.

Thanks Julian. I've added a comment in the jira, let's continue there, it
will be easier to track later.


On Tue, Aug 6, 2013 at 5:44 PM, Julian Zhou <ju...@me.com> wrote:

> Thanks Nicolas. HBASE-9139: "Independent timeout configuration for rpc
> channel between cluster nodes" has been opened to track it. So do you think
> "hbase.rpc.internal.timeout" is a suitable name for this configuration?
>
> 于 8/6/2013 4:19 PM, Nicolas Liochon 写道:
>
>  Yes, it makes sense. Even a 1 minute timeout is not ideal in this case: we
>> know that the work to do server side is trivial, and we know it's
>> idempotent so we can retry. So I would to tend to use a specific setting
>> to
>> use for such operations.
>>
>> Could you please create a jira for this?
>>
>> Thanks,
>>
>> Nicolas
>>
>>
>>
>> On Tue, Aug 6, 2013 at 9:46 AM, Julian Zhou <ju...@me.com> wrote:
>>
>>  Hi Community,
>>> Could you help if this case makes sense for 0.94 or trunk?
>>> Default of "hbase.rpc.timeout" is 60000 ms (1 min). User sometimes
>>> increase them to a bigger value such as 600000 ms (10 mins) for many
>>> concurrent loading application from client. Some user share the same
>>> hbase-site.xml for both client and server. HRegionServer
>>> #tryRegionServerReport via rpc channel to report to live master, but
>>> there was a window for master failover scenario. That region server
>>> attemping to connect to master, which was just killed, backup master
>>> took the ative role immediately and put to /hbase/master, but region
>>> server was still waiting for the rpc timeout from connecting to the dead
>>> master. If "hbase.rpc.timeout" is too long, this master failover process
>>> will be long due to long rpc timeout from dead master.
>>>
>>> If so, could we seperate with 2 options, "hbase.rpc.timeout" is still
>>> for hbase client, while "hbase.rpc.internal.timeout" was for this
>>> regionserver/master rpc channel, which could be set shorted value
>>> without affect real client rpc timeout value?
>>>
>>> --
>>> Best Regards, Julian
>>>
>>>
>>>
>
> --
> Best Regards, Julian
>
>

Re: HBase master failover

Posted by Julian Zhou <ju...@me.com>.

Thanks Nicolas. HBASE-9139: "Independent timeout configuration for rpc 
channel between cluster nodes" has been opened to track it. So do you 
think "hbase.rpc.internal.timeout" is a suitable name for this 
configuration?

于 8/6/2013 4:19 PM, Nicolas Liochon 写道:
> Yes, it makes sense. Even a 1 minute timeout is not ideal in this case: we
> know that the work to do server side is trivial, and we know it's
> idempotent so we can retry. So I would to tend to use a specific setting to
> use for such operations.
>
> Could you please create a jira for this?
>
> Thanks,
>
> Nicolas
>
>
>
> On Tue, Aug 6, 2013 at 9:46 AM, Julian Zhou <ju...@me.com> wrote:
>
>> Hi Community,
>> Could you help if this case makes sense for 0.94 or trunk?
>> Default of "hbase.rpc.timeout" is 60000 ms (1 min). User sometimes
>> increase them to a bigger value such as 600000 ms (10 mins) for many
>> concurrent loading application from client. Some user share the same
>> hbase-site.xml for both client and server. HRegionServer
>> #tryRegionServerReport via rpc channel to report to live master, but
>> there was a window for master failover scenario. That region server
>> attemping to connect to master, which was just killed, backup master
>> took the ative role immediately and put to /hbase/master, but region
>> server was still waiting for the rpc timeout from connecting to the dead
>> master. If "hbase.rpc.timeout" is too long, this master failover process
>> will be long due to long rpc timeout from dead master.
>>
>> If so, could we seperate with 2 options, "hbase.rpc.timeout" is still
>> for hbase client, while "hbase.rpc.internal.timeout" was for this
>> regionserver/master rpc channel, which could be set shorted value
>> without affect real client rpc timeout value?
>>
>> --
>> Best Regards, Julian
>>
>>


-- 
Best Regards, Julian

Re: HBase master failover

Posted by Nicolas Liochon <nk...@gmail.com>.

Yes, it makes sense. Even a 1 minute timeout is not ideal in this case: we
know that the work to do server side is trivial, and we know it's
idempotent so we can retry. So I would to tend to use a specific setting to
use for such operations.

Could you please create a jira for this?

Thanks,

Nicolas



On Tue, Aug 6, 2013 at 9:46 AM, Julian Zhou <ju...@me.com> wrote:

> Hi Community,
> Could you help if this case makes sense for 0.94 or trunk?
> Default of "hbase.rpc.timeout" is 60000 ms (1 min). User sometimes
> increase them to a bigger value such as 600000 ms (10 mins) for many
> concurrent loading application from client. Some user share the same
> hbase-site.xml for both client and server. HRegionServer
> #tryRegionServerReport via rpc channel to report to live master, but
> there was a window for master failover scenario. That region server
> attemping to connect to master, which was just killed, backup master
> took the ative role immediately and put to /hbase/master, but region
> server was still waiting for the rpc timeout from connecting to the dead
> master. If "hbase.rpc.timeout" is too long, this master failover process
> will be long due to long rpc timeout from dead master.
>
> If so, could we seperate with 2 options, "hbase.rpc.timeout" is still
> for hbase client, while "hbase.rpc.internal.timeout" was for this
> regionserver/master rpc channel, which could be set shorted value
> without affect real client rpc timeout value?
>
> --
> Best Regards, Julian
>
>