You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Himanish Kushary <hi...@gmail.com> on 2011/04/14 19:30:40 UTC

Region Splitting for moderate amount of daily data - Improve MapReduce Performance

Hi,

We are executing a small scale implementation using HBase. We receive around
200 - 300 MB of data each day for processing.Some of our number crunching
and processing are based on this single day data.

The problem we are facing is because of this low size of the data , a single
day data is residing in max 2-5 regions (our hbase split size is set to 64
MB).
Due to this when our Map-reduce runs there are only about 2-5 tasks doing
the effective work and so not performant to our expectations.It would be
ideal to have more tasks working on this 200-300 MB data.

One way we could increase the number of tasks is by further lowering the
split sizes but in that case other jobs which process 30 days or 60 days
data will be split into lots of tasks.

Is further lowering the hbase file split size recommended ?

Could anyone please suggest any other option to handle this scenario.

Is it possible through some configuration or code to split the 200 - 300 MB
daily data(maybe,while it gets inserted into HBase)into multiple regions but
still sticking with the hbase split size (64/128/256 MB whatever)

----------
Thanks
Himanish

Re: Region Splitting for moderate amount of daily data - Improve MapReduce Performance

Posted by Jason Rutherglen <ja...@gmail.com>.

I think you meant HBASE-1295 is broader?  HBASE-2357 sounds like MySQL
replication, and I'm  guessing is a bit easier to implement than
HBASE-1295.  Also in terms of the HBase use cases I think it'd allow
more 'online' production installations that are read heavy?

>  3) Highest availability with lowest latency over other concerns with read replicas updated best effort from the write path

This one sounds like it's the simplest to implement and would cover
the most common use case (eg, scaling reads)?

On Mon, Apr 18, 2011 at 5:10 PM, Andrew Purtell <ap...@apache.org> wrote:
>> From: Jason Rutherglen <ja...@gmail.com>
>
>> Andrew, thanks for the information.  On the surface it looks like
>> HBASE-2357 would be using the same mechanism for streaming the WAL
>> (except the master slave failover) as HBASE-1295, however HBASE-2357
>> seems to imply that's not the case?
>
> It could be done that way, but 2357 has a broader focus right? -- not master-slave only (though that certainly is an option) but also master-master maintaining the same write/read ordering as achievable with single-region-location deployment. Not that such ordering is necessarily required, but the notion is to consider a three way trade off:
>
>  1) Strong consistency over other concerns with single region location deployment
>
>  2) Higher availability with consistency over write latency with read replicas and a ZAB- or Paxos-serialized write path
>
>  3) Highest availability with lowest latency over other concerns with read replicas updated best effort from the write path
>
>   - Andy
>
>

Re: Region Splitting for moderate amount of daily data - Improve MapReduce Performance

Posted by Andrew Purtell <ap...@apache.org>.

> From: Jason Rutherglen <ja...@gmail.com>

> Andrew, thanks for the information.  On the surface it looks like
> HBASE-2357 would be using the same mechanism for streaming the WAL
> (except the master slave failover) as HBASE-1295, however HBASE-2357
> seems to imply that's not the case?

It could be done that way, but 2357 has a broader focus right? -- not master-slave only (though that certainly is an option) but also master-master maintaining the same write/read ordering as achievable with single-region-location deployment. Not that such ordering is necessarily required, but the notion is to consider a three way trade off:

  1) Strong consistency over other concerns with single region location deployment

  2) Higher availability with consistency over write latency with read replicas and a ZAB- or Paxos-serialized write path

  3) Highest availability with lowest latency over other concerns with read replicas updated best effort from the write path

   - Andy

Re: Region Splitting for moderate amount of daily data - Improve MapReduce Performance

Posted by Jason Rutherglen <ja...@gmail.com>.

Andrew, thanks for the information.  On the surface it looks like
HBASE-2357 would be using the same mechanism for streaming the WAL
(except the master slave failover) as HBASE-1295, however HBASE-2357
seems to imply that's not the case?

On Mon, Apr 18, 2011 at 12:10 AM, Andrew Purtell <ap...@apache.org> wrote:
>> From: Jason Rutherglen <ja...@gmail.com>
>> > With the new replication feature
>> > of 0.92 edits are streamed from one cluster
>> > to another
>>
>> Interesting, what does 'cluster' mean in this context?
>
> Cluster in this context is a typical data center deployment: HDFS + ZK + HBase master(s) + HBase regionservers.
>
>> Typically with MySQL one would have 1 master (for writes)
>> and N slave servers (for reads).  Is this a similar use case for
>> HBase replication?
>
> I think you are thinking more along the lines of HBASE-2357: https://issues.apache.org/jira/browse/HBASE-2357 (Hmm... I forgot I took on this issue...)
>
> What I'm talking about is HBASE-1295, or http://hbase.apache.org/replication.html .
>
> What I personally would use 0.92 replication for is:
>  - To deploy a service in multiple geographies and sync global state (eventually); but within the geography take advantage of HBase's consistency properties
>  - To stream a subset of important data to a small reserve cluster for disaster recovery
>
>   - Andy
>
>

Re: Region Splitting for moderate amount of daily data - Improve MapReduce Performance

Posted by Andrew Purtell <ap...@apache.org>.

> From: Jason Rutherglen <ja...@gmail.com>
> > With the new replication feature
> > of 0.92 edits are streamed from one cluster
> > to another
> 
> Interesting, what does 'cluster' mean in this context?

Cluster in this context is a typical data center deployment: HDFS + ZK + HBase master(s) + HBase regionservers.

> Typically with MySQL one would have 1 master (for writes)
> and N slave servers (for reads).  Is this a similar use case for
> HBase replication?

I think you are thinking more along the lines of HBASE-2357: https://issues.apache.org/jira/browse/HBASE-2357 (Hmm... I forgot I took on this issue...)

What I'm talking about is HBASE-1295, or http://hbase.apache.org/replication.html .

What I personally would use 0.92 replication for is:
  - To deploy a service in multiple geographies and sync global state (eventually); but within the geography take advantage of HBase's consistency properties
  - To stream a subset of important data to a small reserve cluster for disaster recovery

   - Andy

Re: Region Splitting for moderate amount of daily data - Improve MapReduce Performance

Posted by Jason Rutherglen <ja...@gmail.com>.

> With the new replication feature of 0.92 edits are streamed from one cluster to another

Interesting, what does 'cluster' mean in this context?

Typically with MySQL one would have 1 master (for writes) and N slave
servers (for reads).  Is this a similar use case for HBase
replication?

On Sun, Apr 17, 2011 at 12:46 PM, Andrew Purtell <ap...@apache.org> wrote:
> Jason,
>
>> Andrew, when you say this:
>>
>> > Because HBase is a DOT it can provide strongly consistent
>> > and atomic operations on rows, because rows exist in only
>> > one place at a time.
>>
>> This excludes the use of HBase replication?
>
> Yes.
>
> With the new replication feature of 0.92 edits are streamed from one cluster to another. Row mutations will be consistent/atomic as they are applied at the target, but of course the replication stream may lag for a number of reasons. Therefore the row data according to the view of each cluster may be different.
>
>> I'm curious as to where HBase replication places the duplicate(?)
>> region blocks in HDFS?
>
> The edits are streamed from the WAL. WALs are rolled per usual but are kept perhaps for a longer period of time; until all of their replication scoped edits have been streamed to the target cluster.
>
>> Also currently is there pass the baton failover when a replicated
>> region master fails?
>
> Yes. Via mechanisms mediated by ZooKeeper. But J-D could say more here.
>
>   - Andy
>
>

Re: Region Splitting for moderate amount of daily data - Improve MapReduce Performance

Posted by Andrew Purtell <ap...@apache.org>.

Jason,

> Andrew, when you say this:
> 
> > Because HBase is a DOT it can provide strongly consistent
> > and atomic operations on rows, because rows exist in only
> > one place at a time.
> 
> This excludes the use of HBase replication?

Yes.

With the new replication feature of 0.92 edits are streamed from one cluster to another. Row mutations will be consistent/atomic as they are applied at the target, but of course the replication stream may lag for a number of reasons. Therefore the row data according to the view of each cluster may be different. 

> I'm curious as to where HBase replication places the duplicate(?)
> region blocks in HDFS?

The edits are streamed from the WAL. WALs are rolled per usual but are kept perhaps for a longer period of time; until all of their replication scoped edits have been streamed to the target cluster.  

> Also currently is there pass the baton failover when a replicated
> region master fails?

Yes. Via mechanisms mediated by ZooKeeper. But J-D could say more here. 

   - Andy

Re: Region Splitting for moderate amount of daily data - Improve MapReduce Performance

Posted by Jason Rutherglen <ja...@gmail.com>.

Andrew, when you say this:

> Because HBase is a DOT it can provide strongly consistent and atomic operations on rows, because rows exist in only one place at a time.

This excludes the use of HBase replication?  I'm curious as to where
HBase replication places the duplicate(?) region blocks in HDFS?  Also
currently is there pass the baton failover when a replicated region
master fails?  Is a replicated RS a duplicate of another RS, or can
region's be replicated to RS' with different regions?  Sorry this
wasn't quite on topic however perhaps it's useful.

On Fri, Apr 15, 2011 at 3:48 PM, Andrew Purtell <ap...@yahoo.com> wrote:
>> From: Joe Pallas <pa...@cs.stanford.edu>
>
>> > Could it be that your row key is not distributing the
>> > data well enough?
>> > That is, if your key is primarily based on the current
>> > date, it will only put the data into a small number of
>> > regions.
>>
>> This, I have come to realize, is an essential difference
>> between the Cassandra approach and the HBase approach.
>> With HBase, your keys can be randomly distributed over the
>> entire keyspace, but if all your data fits in a single
>> region, then all your requests are going to a single
>> regionserver.
>
> Yes, BigTable == distributed ordered table; Cassandra == hash partitioned ring typically. (With great simplification.) Because HBase is a DOT it can provide strongly consistent and atomic operations on rows, because rows exist in only one place at a time. This is a feature, or a problem, or both, depending on your use case.
>
>> The only ways I know around this are to make the split
>> threshold low or to pre-split the table.  If you make
>> the split threshold low, you get distribution for smaller
>> tables, but if the tables get big, you have the overhead of
>> more regions to deal with.
>
> The split point is adjustable. It can be set as a table attribute on a per-table basis. Start small and revise upward after enough regions are split so the table itself is well distributed. This assumes the keys used while inserting were consistent with the expected distribution of the application.
>
> With HBase 0.90 changing the schema requires disabling the table, making the schema change, then enabling the table again.
>
> With HBase 0.92, attribute changes like changing the split point won't require a disable/enable.
>
>> If you pre-split the table,
>> you're in good shape provided you know the key distribution
>> in advance (although I am concerned about possible bugs
>> involving empty regions, based on one recent experience).
>
> Empty regions or underutilized regions can be merged (offline). Disable the table, use the Merge utility, then enable the table. Online merge is on the roadmap. It might be in 0.92, if not than the next.
>
>   - Andy
>
>

Re: Region Splitting for moderate amount of daily data - Improve MapReduce Performance

Posted by Ted Yu <yu...@gmail.com>.

Joe:
Take a look at hbase-3779

I linked my blog to hbase-3609 which reflects most recent development.

Cheers

On Friday, April 15, 2011, Joe Pallas <pa...@cs.stanford.edu> wrote:
>
> On Apr 15, 2011, at 3:22 PM, Stack wrote:
>
>>> The HBase rebalancer, as I understand it, adjusts region assignments, but doesn't adjust split points (hence, the number of regions).  Maybe that would be a useful feature for some cases.
>>>
>>
>> What would you suggest Joe?  It currently splits regions down the
>> middle.  You'd instead have a split point that split the requests
>> happening on a region over say, the last five or ten minutes?
>
> I thought the rebalancer was only moving regions among servers, not actively doing splits.  I guess I was mistaken.
>
> In any case, there could be either a heuristic or a hint from the table description to handle cases where distribution should be favored (split regions to distribute evenly across region servers), because the keyspace is sparsely occupied but updates are uniformly distributed and it's desirable to distribute the update load.
>
> Bear in mind, I'm just speculating—I don't have experience yet with a reasonably sized workload.
>
> joe
>
>

Re: Region Splitting for moderate amount of daily data - Improve MapReduce Performance

Posted by Joe Pallas <pa...@cs.stanford.edu>.

On Apr 15, 2011, at 3:22 PM, Stack wrote:

>> The HBase rebalancer, as I understand it, adjusts region assignments, but doesn't adjust split points (hence, the number of regions).  Maybe that would be a useful feature for some cases.
>> 
> 
> What would you suggest Joe?  It currently splits regions down the
> middle.  You'd instead have a split point that split the requests
> happening on a region over say, the last five or ten minutes?

I thought the rebalancer was only moving regions among servers, not actively doing splits.  I guess I was mistaken.

In any case, there could be either a heuristic or a hint from the table description to handle cases where distribution should be favored (split regions to distribute evenly across region servers), because the keyspace is sparsely occupied but updates are uniformly distributed and it's desirable to distribute the update load.

Bear in mind, I'm just speculating—I don't have experience yet with a reasonably sized workload.

joe

Re: Region Splitting for moderate amount of daily data - Improve MapReduce Performance

Posted by Stack <st...@duboce.net>.

On Fri, Apr 15, 2011 at 12:48 PM, Joe Pallas <pa...@cs.stanford.edu> wrote:
> It seems that, until you have enough data relative to your cluster size, you must choose between locality and distribution.  (When you have enough data, you get a better balance between the two.)
>

Yes.

> The HBase rebalancer, as I understand it, adjusts region assignments, but doesn't adjust split points (hence, the number of regions).  Maybe that would be a useful feature for some cases.
>

What would you suggest Joe?  It currently splits regions down the
middle.  You'd instead have a split point that split the requests
happening on a region over say, the last five or ten minutes?

St.Ack

Re: Region Splitting for moderate amount of daily data - Improve MapReduce Performance

Posted by Andrew Purtell <ap...@yahoo.com>.

> From: Joe Pallas <pa...@cs.stanford.edu>

> > Could it be that your row key is not distributing the
> > data well enough?
> > That is, if your key is primarily based on the current
> > date, it will only put the data into a small number of
> > regions.
> 
> This, I have come to realize, is an essential difference
> between the Cassandra approach and the HBase approach. 
> With HBase, your keys can be randomly distributed over the
> entire keyspace, but if all your data fits in a single
> region, then all your requests are going to a single
> regionserver.  

Yes, BigTable == distributed ordered table; Cassandra == hash partitioned ring typically. (With great simplification.) Because HBase is a DOT it can provide strongly consistent and atomic operations on rows, because rows exist in only one place at a time. This is a feature, or a problem, or both, depending on your use case.

> The only ways I know around this are to make the split
> threshold low or to pre-split the table.  If you make
> the split threshold low, you get distribution for smaller
> tables, but if the tables get big, you have the overhead of
> more regions to deal with.

The split point is adjustable. It can be set as a table attribute on a per-table basis. Start small and revise upward after enough regions are split so the table itself is well distributed. This assumes the keys used while inserting were consistent with the expected distribution of the application.

With HBase 0.90 changing the schema requires disabling the table, making the schema change, then enabling the table again.

With HBase 0.92, attribute changes like changing the split point won't require a disable/enable.

> If you pre-split the table,
> you're in good shape provided you know the key distribution
> in advance (although I am concerned about possible bugs
> involving empty regions, based on one recent experience).

Empty regions or underutilized regions can be merged (offline). Disable the table, use the Merge utility, then enable the table. Online merge is on the roadmap. It might be in 0.92, if not than the next.

   - Andy

Re: Region Splitting for moderate amount of daily data - Improve MapReduce Performance

Posted by Joe Pallas <pa...@cs.stanford.edu>.

On Apr 14, 2011, at 12:18 PM, David Schnepper wrote:

> Could it be that your row key is not distributing the data well enough?
> That is, if your key is primarily based on the current date, it will only put the
> data into a small number of regions.

This, I have come to realize, is an essential difference between the Cassandra approach and the HBase approach.  With HBase, your keys can be randomly distributed over the entire keyspace, but if all your data fits in a single region, then all your requests are going to a single regionserver.  

The only ways I know around this are to make the split threshold low or to pre-split the table.  If you make the split threshold low, you get distribution for smaller tables, but if the tables get big, you have the overhead of more regions to deal with.  If you pre-split the table, you're in good shape provided you know the key distribution in advance (although I am concerned about possible bugs involving empty regions, based on one recent experience).

It seems that, until you have enough data relative to your cluster size, you must choose between locality and distribution.  (When you have enough data, you get a better balance between the two.)

The HBase rebalancer, as I understand it, adjusts region assignments, but doesn't adjust split points (hence, the number of regions).  Maybe that would be a useful feature for some cases.

joe

Re: Region Splitting for moderate amount of daily data - Improve MapReduce Performance

Posted by David Schnepper <da...@yahoo-inc.com>.

Could it be that your row key is not distributing the data well enough?
That is, if your key is primarily based on the current date, it will 
only put the
data into a small number of regions.

Dave Schnepper

On 14/Apr/2011 11:19, Jean-Daniel Cryans wrote:
> Trying to tune for small data in MapReduce isn't really the situation
> you want to be in, because that's not what MR is meant for.
>
> I would suggest instead that you use a single process with good
> scanner caching to process that data. Since there's no overhead from
> the MR framework it might be even faster.
>
> J-D
>
> On Thu, Apr 14, 2011 at 10:30 AM, Himanish Kushary<hi...@gmail.com>  wrote:
>> Hi,
>>
>> We are executing a small scale implementation using HBase. We receive around
>> 200 - 300 MB of data each day for processing.Some of our number crunching
>> and processing are based on this single day data.
>>
>> The problem we are facing is because of this low size of the data , a single
>> day data is residing in max 2-5 regions (our hbase split size is set to 64
>> MB).
>> Due to this when our Map-reduce runs there are only about 2-5 tasks doing
>> the effective work and so not performant to our expectations.It would be
>> ideal to have more tasks working on this 200-300 MB data.
>>
>> One way we could increase the number of tasks is by further lowering the
>> split sizes but in that case other jobs which process 30 days or 60 days
>> data will be split into lots of tasks.
>>
>> Is further lowering the hbase file split size recommended ?
>>
>> Could anyone please suggest any other option to handle this scenario.
>>
>> Is it possible through some configuration or code to split the 200 - 300 MB
>> daily data(maybe,while it gets inserted into HBase)into multiple regions but
>> still sticking with the hbase split size (64/128/256 MB whatever)
>>
>> ----------
>> Thanks
>> Himanish
>>

Re: Region Splitting for moderate amount of daily data - Improve MapReduce Performance

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Trying to tune for small data in MapReduce isn't really the situation
you want to be in, because that's not what MR is meant for.

I would suggest instead that you use a single process with good
scanner caching to process that data. Since there's no overhead from
the MR framework it might be even faster.

J-D

On Thu, Apr 14, 2011 at 10:30 AM, Himanish Kushary <hi...@gmail.com> wrote:
> Hi,
>
> We are executing a small scale implementation using HBase. We receive around
> 200 - 300 MB of data each day for processing.Some of our number crunching
> and processing are based on this single day data.
>
> The problem we are facing is because of this low size of the data , a single
> day data is residing in max 2-5 regions (our hbase split size is set to 64
> MB).
> Due to this when our Map-reduce runs there are only about 2-5 tasks doing
> the effective work and so not performant to our expectations.It would be
> ideal to have more tasks working on this 200-300 MB data.
>
> One way we could increase the number of tasks is by further lowering the
> split sizes but in that case other jobs which process 30 days or 60 days
> data will be split into lots of tasks.
>
> Is further lowering the hbase file split size recommended ?
>
> Could anyone please suggest any other option to handle this scenario.
>
> Is it possible through some configuration or code to split the 200 - 300 MB
> daily data(maybe,while it gets inserted into HBase)into multiple regions but
> still sticking with the hbase split size (64/128/256 MB whatever)
>
> ----------
> Thanks
> Himanish
>