You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Bradford Stephens <br...@gmail.com> on 2009/06/12 03:10:29 UTC

HBase Write to Regionservers behavior

Hey there,

So, I wiped my HDFS and reinstalled everything, and am running smaller
loads... so far, so good. I've got 7 regionservers.

My job basically takes a lot of documents and metadata with unique
binary keys (like "055E51294F9D9CA331D968D04B72A11C"), combines them
all in a reducer, then writes it to HBase.

What I'm noticing is that it's writing to mostly one or two regions on
one box at a time, even though I have 7 reducers running. Monitoring
everything with dstat -v, I notice that only 2 of my servers are doing
much. These boxes have very low CPU idling, and high disk output (a
few GB a minute).

Everything else has a a little bit of disk activity (maybe 500
MB/minute), but very idle CPUs.

Is this normal behavior? I guess as more data is loaded, more
regionservers are split, so over time, more boxen will be loading
data?

Cheers,
Bradford

RE: HBase Write to Regionservers behavior

Posted by "Jim Kellerman (POWERSET)" <Ji...@microsoft.com>.
You should write to HBase from the mapper and not use a reducer.
By the time data gets to the reducer it is sorted, and sorted
inserts into HBase cause one or two regions to be hot spots.

By inserting random data, regions split faster and then the
load will get distributed over more region servers.

---
Jim Kellerman, Powerset (Live Search, Microsoft Corporation)


> -----Original Message-----
> From: Bradford Stephens [mailto:bradfordstephens@gmail.com]
> Sent: Thursday, June 11, 2009 6:10 PM
> To: hbase-user@hadoop.apache.org
> Subject: HBase Write to Regionservers behavior
>
> Hey there,
>
> So, I wiped my HDFS and reinstalled everything, and am running
> smaller
> loads... so far, so good. I've got 7 regionservers.
>
> My job basically takes a lot of documents and metadata with unique
> binary keys (like "055E51294F9D9CA331D968D04B72A11C"), combines them
> all in a reducer, then writes it to HBase.
>
> What I'm noticing is that it's writing to mostly one or two regions
> on
> one box at a time, even though I have 7 reducers running. Monitoring
> everything with dstat -v, I notice that only 2 of my servers are
> doing
> much. These boxes have very low CPU idling, and high disk output (a
> few GB a minute).
>
> Everything else has a a little bit of disk activity (maybe 500
> MB/minute), but very idle CPUs.
>
> Is this normal behavior? I guess as more data is loaded, more
> regionservers are split, so over time, more boxen will be loading
> data?
>
> Cheers,
> Bradford


Re: HBase Write to Regionservers behavior

Posted by Billy Pearson <sa...@pearsonwholesale.com>.
once the table has split more you might look in to using
org.apache.hadoop.hbase.mapred.HRegionPartitioner.java

It will split up the data and only run one reduce per region so all that's 
regions rows will be sent to just one reducer
but does not help much as when the table is small and you have a lot of 
reduce task.

It has benefits while one region is done that region will likely be flushed 
as memcache gets full and has to starts flushing
So it can start compactions and splits with out having to worry about more 
data coming.
Right now all the reduce will sort the data by key so all the reduce task 
will start writing to the same regions as they go because the data is sorted 
so they start from the first of the table to the last.

Billy


"Bradford Stephens" 
<br...@gmail.com> wrote in message 
news:860544ed0906111810l2f80be29x8bc08a7463fc2b4b@mail.gmail.com...
> Hey there,
>
> So, I wiped my HDFS and reinstalled everything, and am running smaller
> loads... so far, so good. I've got 7 regionservers.
>
> My job basically takes a lot of documents and metadata with unique
> binary keys (like "055E51294F9D9CA331D968D04B72A11C"), combines them
> all in a reducer, then writes it to HBase.
>
> What I'm noticing is that it's writing to mostly one or two regions on
> one box at a time, even though I have 7 reducers running. Monitoring
> everything with dstat -v, I notice that only 2 of my servers are doing
> much. These boxes have very low CPU idling, and high disk output (a
> few GB a minute).
>
> Everything else has a a little bit of disk activity (maybe 500
> MB/minute), but very idle CPUs.
>
> Is this normal behavior? I guess as more data is loaded, more
> regionservers are split, so over time, more boxen will be loading
> data?
>
> Cheers,
> Bradford
> 



Re: HBase Write to Regionservers behavior

Posted by zsongbo <zs...@gmail.com>.
Thanks Bradford.
On Tue, Jun 16, 2009 at 2:17 AM, Bradford Stephens <
bradfordstephens@gmail.com> wrote:

> Right now, we're storing the documents in HBase. The indices are
> stored in HDFS and then 'sharded' to each node using Katta. Not sure
> if there's much of an advantage to storing the index itself in HBase,
> though I'd be interested to see some use cases for it.
>
> On Sat, Jun 13, 2009 at 11:27 AM, zsongbo<zs...@gmail.com> wrote:
> > Hi Bradford Stephens,
> > Could you please share something about your practices on "Katta+HBase"?
> > Do you store the documents or indexes in HBase?
> >
> > Schubert
> >
> > On Fri, Jun 12, 2009 at 1:19 PM, Bradford Stephens <
> > bradfordstephens@gmail.com> wrote:
> >
> >> That actually make a lot of sense. Thanks, awesome people! Me and the
> >> dev team are here to get Katta + HBase to play together, and it's
> >> looking pretty nice.
> >>
> >> On Thu, Jun 11, 2009 at 9:47 PM, stack<st...@duboce.net> wrote:
> >> > On Thu, Jun 11, 2009 at 6:10 PM, Bradford Stephens <
> >> > bradfordstephens@gmail.com> wrote:
> >> >
> >> >>
> >> >> What I'm noticing is that it's writing to mostly one or two regions
> on
> >> >> one box at a time, even though I have 7 reducers running. Monitoring
> >> >> everything with dstat -v, I notice that only 2 of my servers are
> doing
> >> >> much. These boxes have very low CPU idling, and high disk output (a
> >> >> few GB a minute).
> >> >>
> >> >
> >> >
> >> > How many regions in your table?
> >> >
> >> > At first, there is one.  All reducers will go against it.   When it
> >> splits,
> >> > then two regions field the 7 reducers and so on.
> >> >
> >> > You can manually split regions from the command-line.  See if that
> helps:
> >> >
> >> > hbase> split_region 'REGIONNAME'
> >> >
> >> > (IIRC -- type 'tools' in shell for help on the admin facilities).
> >> >
> >> > St.Ack
> >> >
> >>
> >
>

Re: HBase Write to Regionservers behavior

Posted by Bradford Stephens <br...@gmail.com>.
Right now, we're storing the documents in HBase. The indices are
stored in HDFS and then 'sharded' to each node using Katta. Not sure
if there's much of an advantage to storing the index itself in HBase,
though I'd be interested to see some use cases for it.

On Sat, Jun 13, 2009 at 11:27 AM, zsongbo<zs...@gmail.com> wrote:
> Hi Bradford Stephens,
> Could you please share something about your practices on "Katta+HBase"?
> Do you store the documents or indexes in HBase?
>
> Schubert
>
> On Fri, Jun 12, 2009 at 1:19 PM, Bradford Stephens <
> bradfordstephens@gmail.com> wrote:
>
>> That actually make a lot of sense. Thanks, awesome people! Me and the
>> dev team are here to get Katta + HBase to play together, and it's
>> looking pretty nice.
>>
>> On Thu, Jun 11, 2009 at 9:47 PM, stack<st...@duboce.net> wrote:
>> > On Thu, Jun 11, 2009 at 6:10 PM, Bradford Stephens <
>> > bradfordstephens@gmail.com> wrote:
>> >
>> >>
>> >> What I'm noticing is that it's writing to mostly one or two regions on
>> >> one box at a time, even though I have 7 reducers running. Monitoring
>> >> everything with dstat -v, I notice that only 2 of my servers are doing
>> >> much. These boxes have very low CPU idling, and high disk output (a
>> >> few GB a minute).
>> >>
>> >
>> >
>> > How many regions in your table?
>> >
>> > At first, there is one.  All reducers will go against it.   When it
>> splits,
>> > then two regions field the 7 reducers and so on.
>> >
>> > You can manually split regions from the command-line.  See if that helps:
>> >
>> > hbase> split_region 'REGIONNAME'
>> >
>> > (IIRC -- type 'tools' in shell for help on the admin facilities).
>> >
>> > St.Ack
>> >
>>
>

Re: HBase Write to Regionservers behavior

Posted by zsongbo <zs...@gmail.com>.
Hi Bradford Stephens,
Could you please share something about your practices on "Katta+HBase"?
Do you store the documents or indexes in HBase?

Schubert

On Fri, Jun 12, 2009 at 1:19 PM, Bradford Stephens <
bradfordstephens@gmail.com> wrote:

> That actually make a lot of sense. Thanks, awesome people! Me and the
> dev team are here to get Katta + HBase to play together, and it's
> looking pretty nice.
>
> On Thu, Jun 11, 2009 at 9:47 PM, stack<st...@duboce.net> wrote:
> > On Thu, Jun 11, 2009 at 6:10 PM, Bradford Stephens <
> > bradfordstephens@gmail.com> wrote:
> >
> >>
> >> What I'm noticing is that it's writing to mostly one or two regions on
> >> one box at a time, even though I have 7 reducers running. Monitoring
> >> everything with dstat -v, I notice that only 2 of my servers are doing
> >> much. These boxes have very low CPU idling, and high disk output (a
> >> few GB a minute).
> >>
> >
> >
> > How many regions in your table?
> >
> > At first, there is one.  All reducers will go against it.   When it
> splits,
> > then two regions field the 7 reducers and so on.
> >
> > You can manually split regions from the command-line.  See if that helps:
> >
> > hbase> split_region 'REGIONNAME'
> >
> > (IIRC -- type 'tools' in shell for help on the admin facilities).
> >
> > St.Ack
> >
>

Re: HBase Write to Regionservers behavior

Posted by Bradford Stephens <br...@gmail.com>.
Oh, I misspoke. The MR job is over tab-delimited text files. I have 14
mappers and 7 reducers -- loading into an empty table. The total
amount of regions generated after the job is done and some splits
happen is 70.

On Thu, Jun 11, 2009 at 10:47 PM, stack<st...@duboce.net> wrote:
> Is your MR job over the whole table or a subset?  If whole table then its
> odd that the 7 reducers are hitting only 2 regions.  What happens if 70
> reducers?
> St.Ack
>
> On Thu, Jun 11, 2009 at 10:42 PM, Bradford Stephens <
> bradfordstephens@gmail.com> wrote:
>
>> About 70.
>>
>> On Thu, Jun 11, 2009 at 10:24 PM, stack<st...@duboce.net> wrote:
>> > Hey, how many regions?  (smile)
>> > St.Ack
>> >
>> > On Thu, Jun 11, 2009 at 10:19 PM, Bradford Stephens <
>> > bradfordstephens@gmail.com> wrote:
>> >
>> >> I meant, here 'till Midnight :) thanks!
>> >>
>> >> On Thu, Jun 11, 2009 at 10:19 PM, Bradford
>> >> Stephens<br...@gmail.com> wrote:
>> >> > That actually make a lot of sense. Thanks, awesome people! Me and the
>> >> > dev team are here to get Katta + HBase to play together, and it's
>> >> > looking pretty nice.
>> >> >
>> >> > On Thu, Jun 11, 2009 at 9:47 PM, stack<st...@duboce.net> wrote:
>> >> >> On Thu, Jun 11, 2009 at 6:10 PM, Bradford Stephens <
>> >> >> bradfordstephens@gmail.com> wrote:
>> >> >>
>> >> >>>
>> >> >>> What I'm noticing is that it's writing to mostly one or two regions
>> on
>> >> >>> one box at a time, even though I have 7 reducers running. Monitoring
>> >> >>> everything with dstat -v, I notice that only 2 of my servers are
>> doing
>> >> >>> much. These boxes have very low CPU idling, and high disk output (a
>> >> >>> few GB a minute).
>> >> >>>
>> >> >>
>> >> >>
>> >> >> How many regions in your table?
>> >> >>
>> >> >> At first, there is one.  All reducers will go against it.   When it
>> >> splits,
>> >> >> then two regions field the 7 reducers and so on.
>> >> >>
>> >> >> You can manually split regions from the command-line.  See if that
>> >> helps:
>> >> >>
>> >> >> hbase> split_region 'REGIONNAME'
>> >> >>
>> >> >> (IIRC -- type 'tools' in shell for help on the admin facilities).
>> >> >>
>> >> >> St.Ack
>> >> >>
>> >> >
>> >>
>> >
>>
>

Re: HBase Write to Regionservers behavior

Posted by stack <st...@duboce.net>.
Is your MR job over the whole table or a subset?  If whole table then its
odd that the 7 reducers are hitting only 2 regions.  What happens if 70
reducers?
St.Ack

On Thu, Jun 11, 2009 at 10:42 PM, Bradford Stephens <
bradfordstephens@gmail.com> wrote:

> About 70.
>
> On Thu, Jun 11, 2009 at 10:24 PM, stack<st...@duboce.net> wrote:
> > Hey, how many regions?  (smile)
> > St.Ack
> >
> > On Thu, Jun 11, 2009 at 10:19 PM, Bradford Stephens <
> > bradfordstephens@gmail.com> wrote:
> >
> >> I meant, here 'till Midnight :) thanks!
> >>
> >> On Thu, Jun 11, 2009 at 10:19 PM, Bradford
> >> Stephens<br...@gmail.com> wrote:
> >> > That actually make a lot of sense. Thanks, awesome people! Me and the
> >> > dev team are here to get Katta + HBase to play together, and it's
> >> > looking pretty nice.
> >> >
> >> > On Thu, Jun 11, 2009 at 9:47 PM, stack<st...@duboce.net> wrote:
> >> >> On Thu, Jun 11, 2009 at 6:10 PM, Bradford Stephens <
> >> >> bradfordstephens@gmail.com> wrote:
> >> >>
> >> >>>
> >> >>> What I'm noticing is that it's writing to mostly one or two regions
> on
> >> >>> one box at a time, even though I have 7 reducers running. Monitoring
> >> >>> everything with dstat -v, I notice that only 2 of my servers are
> doing
> >> >>> much. These boxes have very low CPU idling, and high disk output (a
> >> >>> few GB a minute).
> >> >>>
> >> >>
> >> >>
> >> >> How many regions in your table?
> >> >>
> >> >> At first, there is one.  All reducers will go against it.   When it
> >> splits,
> >> >> then two regions field the 7 reducers and so on.
> >> >>
> >> >> You can manually split regions from the command-line.  See if that
> >> helps:
> >> >>
> >> >> hbase> split_region 'REGIONNAME'
> >> >>
> >> >> (IIRC -- type 'tools' in shell for help on the admin facilities).
> >> >>
> >> >> St.Ack
> >> >>
> >> >
> >>
> >
>

Re: HBase Write to Regionservers behavior

Posted by Bradford Stephens <br...@gmail.com>.
About 70.

On Thu, Jun 11, 2009 at 10:24 PM, stack<st...@duboce.net> wrote:
> Hey, how many regions?  (smile)
> St.Ack
>
> On Thu, Jun 11, 2009 at 10:19 PM, Bradford Stephens <
> bradfordstephens@gmail.com> wrote:
>
>> I meant, here 'till Midnight :) thanks!
>>
>> On Thu, Jun 11, 2009 at 10:19 PM, Bradford
>> Stephens<br...@gmail.com> wrote:
>> > That actually make a lot of sense. Thanks, awesome people! Me and the
>> > dev team are here to get Katta + HBase to play together, and it's
>> > looking pretty nice.
>> >
>> > On Thu, Jun 11, 2009 at 9:47 PM, stack<st...@duboce.net> wrote:
>> >> On Thu, Jun 11, 2009 at 6:10 PM, Bradford Stephens <
>> >> bradfordstephens@gmail.com> wrote:
>> >>
>> >>>
>> >>> What I'm noticing is that it's writing to mostly one or two regions on
>> >>> one box at a time, even though I have 7 reducers running. Monitoring
>> >>> everything with dstat -v, I notice that only 2 of my servers are doing
>> >>> much. These boxes have very low CPU idling, and high disk output (a
>> >>> few GB a minute).
>> >>>
>> >>
>> >>
>> >> How many regions in your table?
>> >>
>> >> At first, there is one.  All reducers will go against it.   When it
>> splits,
>> >> then two regions field the 7 reducers and so on.
>> >>
>> >> You can manually split regions from the command-line.  See if that
>> helps:
>> >>
>> >> hbase> split_region 'REGIONNAME'
>> >>
>> >> (IIRC -- type 'tools' in shell for help on the admin facilities).
>> >>
>> >> St.Ack
>> >>
>> >
>>
>

Re: HBase Write to Regionservers behavior

Posted by stack <st...@duboce.net>.
Hey, how many regions?  (smile)
St.Ack

On Thu, Jun 11, 2009 at 10:19 PM, Bradford Stephens <
bradfordstephens@gmail.com> wrote:

> I meant, here 'till Midnight :) thanks!
>
> On Thu, Jun 11, 2009 at 10:19 PM, Bradford
> Stephens<br...@gmail.com> wrote:
> > That actually make a lot of sense. Thanks, awesome people! Me and the
> > dev team are here to get Katta + HBase to play together, and it's
> > looking pretty nice.
> >
> > On Thu, Jun 11, 2009 at 9:47 PM, stack<st...@duboce.net> wrote:
> >> On Thu, Jun 11, 2009 at 6:10 PM, Bradford Stephens <
> >> bradfordstephens@gmail.com> wrote:
> >>
> >>>
> >>> What I'm noticing is that it's writing to mostly one or two regions on
> >>> one box at a time, even though I have 7 reducers running. Monitoring
> >>> everything with dstat -v, I notice that only 2 of my servers are doing
> >>> much. These boxes have very low CPU idling, and high disk output (a
> >>> few GB a minute).
> >>>
> >>
> >>
> >> How many regions in your table?
> >>
> >> At first, there is one.  All reducers will go against it.   When it
> splits,
> >> then two regions field the 7 reducers and so on.
> >>
> >> You can manually split regions from the command-line.  See if that
> helps:
> >>
> >> hbase> split_region 'REGIONNAME'
> >>
> >> (IIRC -- type 'tools' in shell for help on the admin facilities).
> >>
> >> St.Ack
> >>
> >
>

Re: HBase Write to Regionservers behavior

Posted by Bradford Stephens <br...@gmail.com>.
I meant, here 'till Midnight :) thanks!

On Thu, Jun 11, 2009 at 10:19 PM, Bradford
Stephens<br...@gmail.com> wrote:
> That actually make a lot of sense. Thanks, awesome people! Me and the
> dev team are here to get Katta + HBase to play together, and it's
> looking pretty nice.
>
> On Thu, Jun 11, 2009 at 9:47 PM, stack<st...@duboce.net> wrote:
>> On Thu, Jun 11, 2009 at 6:10 PM, Bradford Stephens <
>> bradfordstephens@gmail.com> wrote:
>>
>>>
>>> What I'm noticing is that it's writing to mostly one or two regions on
>>> one box at a time, even though I have 7 reducers running. Monitoring
>>> everything with dstat -v, I notice that only 2 of my servers are doing
>>> much. These boxes have very low CPU idling, and high disk output (a
>>> few GB a minute).
>>>
>>
>>
>> How many regions in your table?
>>
>> At first, there is one.  All reducers will go against it.   When it splits,
>> then two regions field the 7 reducers and so on.
>>
>> You can manually split regions from the command-line.  See if that helps:
>>
>> hbase> split_region 'REGIONNAME'
>>
>> (IIRC -- type 'tools' in shell for help on the admin facilities).
>>
>> St.Ack
>>
>

Re: HBase Write to Regionservers behavior

Posted by Bradford Stephens <br...@gmail.com>.
That actually make a lot of sense. Thanks, awesome people! Me and the
dev team are here to get Katta + HBase to play together, and it's
looking pretty nice.

On Thu, Jun 11, 2009 at 9:47 PM, stack<st...@duboce.net> wrote:
> On Thu, Jun 11, 2009 at 6:10 PM, Bradford Stephens <
> bradfordstephens@gmail.com> wrote:
>
>>
>> What I'm noticing is that it's writing to mostly one or two regions on
>> one box at a time, even though I have 7 reducers running. Monitoring
>> everything with dstat -v, I notice that only 2 of my servers are doing
>> much. These boxes have very low CPU idling, and high disk output (a
>> few GB a minute).
>>
>
>
> How many regions in your table?
>
> At first, there is one.  All reducers will go against it.   When it splits,
> then two regions field the 7 reducers and so on.
>
> You can manually split regions from the command-line.  See if that helps:
>
> hbase> split_region 'REGIONNAME'
>
> (IIRC -- type 'tools' in shell for help on the admin facilities).
>
> St.Ack
>

Re: HBase Write to Regionservers behavior

Posted by stack <st...@duboce.net>.
On Thu, Jun 11, 2009 at 6:10 PM, Bradford Stephens <
bradfordstephens@gmail.com> wrote:

>
> What I'm noticing is that it's writing to mostly one or two regions on
> one box at a time, even though I have 7 reducers running. Monitoring
> everything with dstat -v, I notice that only 2 of my servers are doing
> much. These boxes have very low CPU idling, and high disk output (a
> few GB a minute).
>


How many regions in your table?

At first, there is one.  All reducers will go against it.   When it splits,
then two regions field the 7 reducers and so on.

You can manually split regions from the command-line.  See if that helps:

hbase> split_region 'REGIONNAME'

(IIRC -- type 'tools' in shell for help on the admin facilities).

St.Ack