You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Devaraj Das <dd...@hortonworks.com> on 2013/01/22 00:48:25 UTC

Multiple RS for serving one region

In 0.89-fb branch I stumbled upon stuff that indicated that there is a
concept of secondary and tertiary regionserver. Could someone with
more insights please shed some light on this?
Might be useful to do the analysis on whether it makes sense for trunk..
Thanks
Devaraj

RE: Multiple RS for serving one region

Posted by "Liu, Raymond" <ra...@intel.com>.

Is that also possible to control which disk the blocks are assigned?

Say when there are multiple disks on one node, I wish the blocks belong to the local region distribute evenly across the disks.

At present, it seems to that it is not. Though if you take non local regions' replica blocks in consider, overall blocks are distribute evenly, but when you scan a table, you don't touch those replica blocks. Thus, a non evenly distribute "local region's block" might lead to hotspot disk.

> 
> There is a jira after all. It is HBASE-4755.
> 
> On Tue, Jan 22, 2013 at 10:38 AM, Devaraj Das <dd...@hortonworks.com>
> wrote:
> > I'll raise a jira shortly (couldn't locate jiras that talk about this)
> > and update here.
> >
> > But as it stands, I take it that people here finds this feature
> > beneficial (although not many people chimed in yet). Yes, we'd
> > probably need to work with Hadoop core to see this feature go through.
> >
> > It'll be great to hear from some facebook devs on this topic.
> >
> > On Tue, Jan 22, 2013 at 10:31 AM, Ted Yu <yu...@gmail.com> wrote:
> >> The feature depends on hdfs support.
> >> Once we have that, we can implement this feature in HBase.
> >>
> >> Cheers
> >>
> >> On Tue, Jan 22, 2013 at 8:49 AM, Otis Gospodnetic <
> >> otis.gospodnetic@gmail.com> wrote:
> >>
> >>> This sounds hugely useful to me and is one of those "why doesn't
> >>> HBase have that" things that bugged me.
> >>>
> >>> Is there an issue to watch?
> >>>
> >>>
> >>> http://search-hadoop.com/?q=region+failover+secondary&fc_project=HBa
> >>> se&fc_type=issuedoesn't
> >>> find any.
> >>>
> >>> Thanks,
> >>> Otis
> >>> --
> >>> HBASE Performance Monitoring - http://sematext.com/spm/index.html
> >>>
> >>>
> >>>
> >>> On Mon, Jan 21, 2013 at 7:55 PM, Jonathan Hsieh <jo...@cloudera.com>
> wrote:
> >>>
> >>> > The main motivation is to maintain good performance on RS failovers.
> >>> > This is also tied with hdfs and its block placement policy.  Let me
> >>> > explain as I understand it.   If we control the hdfs block placement
> >>> > strategy we can write all blocks for a hfile (or for all hfiles
> >>> > related to a region) to the same set of data nodes.  If the RS
> >>> > fails, they favor failover to a node that has a local copy of all the blocks.
> >>> >
> >>> > Today, when you write an hfile to hdfs, for each block the first
> >>> > replica goes to the local data node but the others get disbursed
> >>> > around the cluster randomly at a per block granularity. The
> >>> > problem here is that if the rs fails, the new rs that gets the
> >>> > responsibility for the region has to read files that are spread
> >>> > all over the cluster and with roughly 1/nth of the data local.
> >>> > This means that the recovered region is slower until a compaction
> localizes the data gain.
> >>> >
> >>> > They've gone in and modified hdfs and their hbase to take
> >>> > advantage of this idea.  I believe the randomization policy is
> >>> > enforced per region
> >>> > -- if an rs serves 25 region, all the files within a each region
> >>> > are sent to the same set of secondary/tertiary nodes, but each
> >>> > region sends to a different set of secondary/tertiary nodes.
> >>> >
> >>> > Jon.
> >>> >
> >>> >
> >>> > On Mon, Jan 21, 2013 at 3:48 PM, Devaraj Das
> >>> > <dd...@hortonworks.com>
> >>> wrote:
> >>> > > In 0.89-fb branch I stumbled upon stuff that indicated that
> >>> > > there is a concept of secondary and tertiary regionserver. Could
> >>> > > someone with more insights please shed some light on this?
> >>> > > Might be useful to do the analysis on whether it makes sense for
> >>> trunk..
> >>> > > Thanks
> >>> > > Devaraj
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > // Jonathan Hsieh (shay)
> >>> > // Software Engineer, Cloudera
> >>> > // jon@cloudera.com
> >>> >
> >>>

Re: Multiple RS for serving one region

Posted by Devaraj Das <dd...@hortonworks.com>.

There is a jira after all. It is HBASE-4755.

On Tue, Jan 22, 2013 at 10:38 AM, Devaraj Das <dd...@hortonworks.com> wrote:
> I'll raise a jira shortly (couldn't locate jiras that talk about this)
> and update here.
>
> But as it stands, I take it that people here finds this feature
> beneficial (although not many people chimed in yet). Yes, we'd
> probably need to work with Hadoop core to see this feature go through.
>
> It'll be great to hear from some facebook devs on this topic.
>
> On Tue, Jan 22, 2013 at 10:31 AM, Ted Yu <yu...@gmail.com> wrote:
>> The feature depends on hdfs support.
>> Once we have that, we can implement this feature in HBase.
>>
>> Cheers
>>
>> On Tue, Jan 22, 2013 at 8:49 AM, Otis Gospodnetic <
>> otis.gospodnetic@gmail.com> wrote:
>>
>>> This sounds hugely useful to me and is one of those "why doesn't HBase have
>>> that" things that bugged me.
>>>
>>> Is there an issue to watch?
>>>
>>>
>>> http://search-hadoop.com/?q=region+failover+secondary&fc_project=HBase&fc_type=issuedoesn't
>>> find any.
>>>
>>> Thanks,
>>> Otis
>>> --
>>> HBASE Performance Monitoring - http://sematext.com/spm/index.html
>>>
>>>
>>>
>>> On Mon, Jan 21, 2013 at 7:55 PM, Jonathan Hsieh <jo...@cloudera.com> wrote:
>>>
>>> > The main motivation is to maintain good performance on RS failovers.
>>> > This is also tied with hdfs and its block placement policy.  Let me
>>> > explain as I understand it.   If we control the hdfs block placement
>>> > strategy we can write all blocks for a hfile (or for all hfiles
>>> > related to a region) to the same set of data nodes.  If the RS fails,
>>> > they favor failover to a node that has a local copy of all the blocks.
>>> >
>>> > Today, when you write an hfile to hdfs, for each block the first
>>> > replica goes to the local data node but the others get disbursed
>>> > around the cluster randomly at a per block granularity. The problem
>>> > here is that if the rs fails, the new rs that gets the responsibility
>>> > for the region has to read files that are spread all over the cluster
>>> > and with roughly 1/nth of the data local.  This means that the
>>> > recovered region is slower until a compaction localizes the data gain.
>>> >
>>> > They've gone in and modified hdfs and their hbase to take advantage of
>>> > this idea.  I believe the randomization policy is enforced per region
>>> > -- if an rs serves 25 region, all the files within a each region are
>>> > sent to the same set of secondary/tertiary nodes, but each region
>>> > sends to a different set of secondary/tertiary nodes.
>>> >
>>> > Jon.
>>> >
>>> >
>>> > On Mon, Jan 21, 2013 at 3:48 PM, Devaraj Das <dd...@hortonworks.com>
>>> wrote:
>>> > > In 0.89-fb branch I stumbled upon stuff that indicated that there is a
>>> > > concept of secondary and tertiary regionserver. Could someone with
>>> > > more insights please shed some light on this?
>>> > > Might be useful to do the analysis on whether it makes sense for
>>> trunk..
>>> > > Thanks
>>> > > Devaraj
>>> >
>>> >
>>> >
>>> > --
>>> > // Jonathan Hsieh (shay)
>>> > // Software Engineer, Cloudera
>>> > // jon@cloudera.com
>>> >
>>>

Re: Multiple RS for serving one region

Posted by Devaraj Das <dd...@hortonworks.com>.

I'll raise a jira shortly (couldn't locate jiras that talk about this)
and update here.

But as it stands, I take it that people here finds this feature
beneficial (although not many people chimed in yet). Yes, we'd
probably need to work with Hadoop core to see this feature go through.

It'll be great to hear from some facebook devs on this topic.

On Tue, Jan 22, 2013 at 10:31 AM, Ted Yu <yu...@gmail.com> wrote:
> The feature depends on hdfs support.
> Once we have that, we can implement this feature in HBase.
>
> Cheers
>
> On Tue, Jan 22, 2013 at 8:49 AM, Otis Gospodnetic <
> otis.gospodnetic@gmail.com> wrote:
>
>> This sounds hugely useful to me and is one of those "why doesn't HBase have
>> that" things that bugged me.
>>
>> Is there an issue to watch?
>>
>>
>> http://search-hadoop.com/?q=region+failover+secondary&fc_project=HBase&fc_type=issuedoesn't
>> find any.
>>
>> Thanks,
>> Otis
>> --
>> HBASE Performance Monitoring - http://sematext.com/spm/index.html
>>
>>
>>
>> On Mon, Jan 21, 2013 at 7:55 PM, Jonathan Hsieh <jo...@cloudera.com> wrote:
>>
>> > The main motivation is to maintain good performance on RS failovers.
>> > This is also tied with hdfs and its block placement policy.  Let me
>> > explain as I understand it.   If we control the hdfs block placement
>> > strategy we can write all blocks for a hfile (or for all hfiles
>> > related to a region) to the same set of data nodes.  If the RS fails,
>> > they favor failover to a node that has a local copy of all the blocks.
>> >
>> > Today, when you write an hfile to hdfs, for each block the first
>> > replica goes to the local data node but the others get disbursed
>> > around the cluster randomly at a per block granularity. The problem
>> > here is that if the rs fails, the new rs that gets the responsibility
>> > for the region has to read files that are spread all over the cluster
>> > and with roughly 1/nth of the data local.  This means that the
>> > recovered region is slower until a compaction localizes the data gain.
>> >
>> > They've gone in and modified hdfs and their hbase to take advantage of
>> > this idea.  I believe the randomization policy is enforced per region
>> > -- if an rs serves 25 region, all the files within a each region are
>> > sent to the same set of secondary/tertiary nodes, but each region
>> > sends to a different set of secondary/tertiary nodes.
>> >
>> > Jon.
>> >
>> >
>> > On Mon, Jan 21, 2013 at 3:48 PM, Devaraj Das <dd...@hortonworks.com>
>> wrote:
>> > > In 0.89-fb branch I stumbled upon stuff that indicated that there is a
>> > > concept of secondary and tertiary regionserver. Could someone with
>> > > more insights please shed some light on this?
>> > > Might be useful to do the analysis on whether it makes sense for
>> trunk..
>> > > Thanks
>> > > Devaraj
>> >
>> >
>> >
>> > --
>> > // Jonathan Hsieh (shay)
>> > // Software Engineer, Cloudera
>> > // jon@cloudera.com
>> >
>>

Re: Multiple RS for serving one region

Posted by Ted Yu <yu...@gmail.com>.

The feature depends on hdfs support.
Once we have that, we can implement this feature in HBase.

Cheers

On Tue, Jan 22, 2013 at 8:49 AM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

> This sounds hugely useful to me and is one of those "why doesn't HBase have
> that" things that bugged me.
>
> Is there an issue to watch?
>
>
> http://search-hadoop.com/?q=region+failover+secondary&fc_project=HBase&fc_type=issuedoesn't
> find any.
>
> Thanks,
> Otis
> --
> HBASE Performance Monitoring - http://sematext.com/spm/index.html
>
>
>
> On Mon, Jan 21, 2013 at 7:55 PM, Jonathan Hsieh <jo...@cloudera.com> wrote:
>
> > The main motivation is to maintain good performance on RS failovers.
> > This is also tied with hdfs and its block placement policy.  Let me
> > explain as I understand it.   If we control the hdfs block placement
> > strategy we can write all blocks for a hfile (or for all hfiles
> > related to a region) to the same set of data nodes.  If the RS fails,
> > they favor failover to a node that has a local copy of all the blocks.
> >
> > Today, when you write an hfile to hdfs, for each block the first
> > replica goes to the local data node but the others get disbursed
> > around the cluster randomly at a per block granularity. The problem
> > here is that if the rs fails, the new rs that gets the responsibility
> > for the region has to read files that are spread all over the cluster
> > and with roughly 1/nth of the data local.  This means that the
> > recovered region is slower until a compaction localizes the data gain.
> >
> > They've gone in and modified hdfs and their hbase to take advantage of
> > this idea.  I believe the randomization policy is enforced per region
> > -- if an rs serves 25 region, all the files within a each region are
> > sent to the same set of secondary/tertiary nodes, but each region
> > sends to a different set of secondary/tertiary nodes.
> >
> > Jon.
> >
> >
> > On Mon, Jan 21, 2013 at 3:48 PM, Devaraj Das <dd...@hortonworks.com>
> wrote:
> > > In 0.89-fb branch I stumbled upon stuff that indicated that there is a
> > > concept of secondary and tertiary regionserver. Could someone with
> > > more insights please shed some light on this?
> > > Might be useful to do the analysis on whether it makes sense for
> trunk..
> > > Thanks
> > > Devaraj
> >
> >
> >
> > --
> > // Jonathan Hsieh (shay)
> > // Software Engineer, Cloudera
> > // jon@cloudera.com
> >
>

Re: Multiple RS for serving one region

Posted by Otis Gospodnetic <ot...@gmail.com>.

This sounds hugely useful to me and is one of those "why doesn't HBase have
that" things that bugged me.

Is there an issue to watch?

http://search-hadoop.com/?q=region+failover+secondary&fc_project=HBase&fc_type=issuedoesn't
find any.

Thanks,
Otis
--
HBASE Performance Monitoring - http://sematext.com/spm/index.html



On Mon, Jan 21, 2013 at 7:55 PM, Jonathan Hsieh <jo...@cloudera.com> wrote:

> The main motivation is to maintain good performance on RS failovers.
> This is also tied with hdfs and its block placement policy.  Let me
> explain as I understand it.   If we control the hdfs block placement
> strategy we can write all blocks for a hfile (or for all hfiles
> related to a region) to the same set of data nodes.  If the RS fails,
> they favor failover to a node that has a local copy of all the blocks.
>
> Today, when you write an hfile to hdfs, for each block the first
> replica goes to the local data node but the others get disbursed
> around the cluster randomly at a per block granularity. The problem
> here is that if the rs fails, the new rs that gets the responsibility
> for the region has to read files that are spread all over the cluster
> and with roughly 1/nth of the data local.  This means that the
> recovered region is slower until a compaction localizes the data gain.
>
> They've gone in and modified hdfs and their hbase to take advantage of
> this idea.  I believe the randomization policy is enforced per region
> -- if an rs serves 25 region, all the files within a each region are
> sent to the same set of secondary/tertiary nodes, but each region
> sends to a different set of secondary/tertiary nodes.
>
> Jon.
>
>
> On Mon, Jan 21, 2013 at 3:48 PM, Devaraj Das <dd...@hortonworks.com> wrote:
> > In 0.89-fb branch I stumbled upon stuff that indicated that there is a
> > concept of secondary and tertiary regionserver. Could someone with
> > more insights please shed some light on this?
> > Might be useful to do the analysis on whether it makes sense for trunk..
> > Thanks
> > Devaraj
>
>
>
> --
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // jon@cloudera.com
>

Re: Multiple RS for serving one region

Posted by Jonathan Hsieh <jo...@cloudera.com>.

The main motivation is to maintain good performance on RS failovers.
This is also tied with hdfs and its block placement policy.  Let me
explain as I understand it.   If we control the hdfs block placement
strategy we can write all blocks for a hfile (or for all hfiles
related to a region) to the same set of data nodes.  If the RS fails,
they favor failover to a node that has a local copy of all the blocks.

Today, when you write an hfile to hdfs, for each block the first
replica goes to the local data node but the others get disbursed
around the cluster randomly at a per block granularity. The problem
here is that if the rs fails, the new rs that gets the responsibility
for the region has to read files that are spread all over the cluster
and with roughly 1/nth of the data local.  This means that the
recovered region is slower until a compaction localizes the data gain.

They've gone in and modified hdfs and their hbase to take advantage of
this idea.  I believe the randomization policy is enforced per region
-- if an rs serves 25 region, all the files within a each region are
sent to the same set of secondary/tertiary nodes, but each region
sends to a different set of secondary/tertiary nodes.

Jon.

On Mon, Jan 21, 2013 at 3:48 PM, Devaraj Das <dd...@hortonworks.com> wrote:
> In 0.89-fb branch I stumbled upon stuff that indicated that there is a
> concept of secondary and tertiary regionserver. Could someone with
> more insights please shed some light on this?
> Might be useful to do the analysis on whether it makes sense for trunk..
> Thanks
> Devaraj

-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com