You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Damien Hardy <dh...@viadeoteam.com> on 2014/06/04 14:39:58 UTC

HBase export limit bandwith

Hello,

We are trying to export HBase table on S3 for backup purpose.
By default export tool run a map per region and we want to limit output
bandwidth on internet (to amazon s3).

We were thinking in adding some reducer to limit the number of writers
but this is explicitly hardcoded to 0 in Export class
```
    // No reducers. Just write straight to output files.
    job.setNumReduceTasks(0);
```

Is there an other way (propertie?) in hadoop to limit output bandwidth ?

-- 
Damien

Re: HBase export limit bandwith

Posted by Ted Yu <yu...@gmail.com>.

There have been 20 releases for 0.94 - the major release after 0.92.x

Please consider upgrading your cluster.

Cheers


On Wed, Jun 4, 2014 at 5:59 AM, Damien Hardy <dh...@viadeoteam.com> wrote:

> Not yet as we a stick with cdh4.1.2 (hbase 0.92.1) for now.
>
> Le 04/06/2014 14:51, Ted Yu a écrit :
> > Can you use ExportSnapshot tool ?
> >
> > ExportSnapshot has option to limit bandwidth. See HBASE-11090.
> > This feature is in the soon-to-be-released 0.98.3
> >
> > See HBASE-11204 as well.
> >
> > Cheers
> >
> > On Jun 4, 2014, at 5:39 AM, Damien Hardy <dh...@viadeoteam.com> wrote:
> >
> >> Hello,
> >>
> >> We are trying to export HBase table on S3 for backup purpose.
> >> By default export tool run a map per region and we want to limit output
> >> bandwidth on internet (to amazon s3).
> >>
> >> We were thinking in adding some reducer to limit the number of writers
> >> but this is explicitly hardcoded to 0 in Export class
> >> ```
> >>    // No reducers. Just write straight to output files.
> >>    job.setNumReduceTasks(0);
> >> ```
> >>
> >> Is there an other way (propertie?) in hadoop to limit output bandwidth ?
> >>
> >> --
> >> Damien
> >>
>
> --
> Damien HARDY
> IT Infrastructure Architect
> Viadeo - 30 rue de la Victoire - 75009 Paris - France
> PGP : 45D7F89A
>
>

Re: HBase export limit bandwith

Posted by Damien Hardy <dh...@viadeoteam.com>.

Not yet as we a stick with cdh4.1.2 (hbase 0.92.1) for now.

Le 04/06/2014 14:51, Ted Yu a écrit :
> Can you use ExportSnapshot tool ?
> 
> ExportSnapshot has option to limit bandwidth. See HBASE-11090. 
> This feature is in the soon-to-be-released 0.98.3
> 
> See HBASE-11204 as well. 
> 
> Cheers
> 
> On Jun 4, 2014, at 5:39 AM, Damien Hardy <dh...@viadeoteam.com> wrote:
> 
>> Hello,
>>
>> We are trying to export HBase table on S3 for backup purpose.
>> By default export tool run a map per region and we want to limit output
>> bandwidth on internet (to amazon s3).
>>
>> We were thinking in adding some reducer to limit the number of writers
>> but this is explicitly hardcoded to 0 in Export class
>> ```
>>    // No reducers. Just write straight to output files.
>>    job.setNumReduceTasks(0);
>> ```
>>
>> Is there an other way (propertie?) in hadoop to limit output bandwidth ?
>>
>> -- 
>> Damien
>>

-- 
Damien HARDY
IT Infrastructure Architect
Viadeo - 30 rue de la Victoire - 75009 Paris - France
PGP : 45D7F89A

Re: HBase export limit bandwith

Posted by Ted Yu <yu...@gmail.com>.

Can you use ExportSnapshot tool ?

ExportSnapshot has option to limit bandwidth. See HBASE-11090. 
This feature is in the soon-to-be-released 0.98.3

See HBASE-11204 as well. 

Cheers

On Jun 4, 2014, at 5:39 AM, Damien Hardy <dh...@viadeoteam.com> wrote:

> Hello,
> 
> We are trying to export HBase table on S3 for backup purpose.
> By default export tool run a map per region and we want to limit output
> bandwidth on internet (to amazon s3).
> 
> We were thinking in adding some reducer to limit the number of writers
> but this is explicitly hardcoded to 0 in Export class
> ```
>    // No reducers. Just write straight to output files.
>    job.setNumReduceTasks(0);
> ```
> 
> Is there an other way (propertie?) in hadoop to limit output bandwidth ?
> 
> -- 
> Damien
>

Re: HBase export limit bandwith

Posted by Michael Segel <mi...@hotmail.com>.

I guess you could say snap shot as in a point in time M/R job that exports all of the rows in the table before a specified time X which could default to the start time of the job. 

Since you're running your export to the same cluster (but a different directory from /hbase, you don't really have to worry about the number of mappers. 
However, since its a backup... you may want to reduce the number of region files so you could reduce the data set to 10, 100, etc... depending on the size of the underlying table and then as you write out from the reducer you could write to S3 directly, but if you want more control... you reduce to the local HDFS, then in a separate job or single threaded program, you could open up a file at a time and trickle it in. (Or write a map only job that has a set number of mappers defined to run in parallel. 

The only caveat is that you need to make sure you have enough disk space to store the local copy until you complete the S3 write. 

Of course there are other permutations.... like if you have a NAS/SAN you could move the export there. 
(Hot == Hbase table. Warm == HDFS outside of HBase, Luke Warm == local attached disks, Cold = S3...) 

Again, it depends on the resources available to you and your enterprise.  YMMV. 
On Jun 5, 2014, at 9:15 AM, Ted Yu <yu...@gmail.com> wrote:

> bq. take a snapshot and write the file(s)
> 
> Is the above referring to hbase snapshot ?
> hbase 0.92.x doesn't support snapshot.
> 
> FYI
> 
> 
> On Thu, Jun 5, 2014 at 5:11 AM, Michael Segel <mi...@hotmail.com>
> wrote:
> 
>> Ok...
>> 
>> So when the basic tools don't work...
>> How about roll your own?
>> 
>> Step 1 take a snapshot and write the file(s) to a different location
>> outside of /hbase.
>> (Export to local disk on the cluster)
>> 
>> Step 2 write your own M/R job and control the number of mappers who read
>> from HDFS and write to S3.
>> Assuming you want a block for block match. If you want to change the
>> #files since each region would be a separate file, you could do the write
>> to S3 in the reduce phase.
>> (Which is what you want.)
>> 
>> 
>> On Jun 4, 2014, at 7:39 AM, Damien Hardy <dh...@viadeoteam.com> wrote:
>> 
>>> Hello,
>>> 
>>> We are trying to export HBase table on S3 for backup purpose.
>>> By default export tool run a map per region and we want to limit output
>>> bandwidth on internet (to amazon s3).
>>> 
>>> We were thinking in adding some reducer to limit the number of writers
>>> but this is explicitly hardcoded to 0 in Export class
>>> ```
>>>   // No reducers. Just write straight to output files.
>>>   job.setNumReduceTasks(0);
>>> ```
>>> 
>>> Is there an other way (propertie?) in hadoop to limit output bandwidth ?
>>> 
>>> --
>>> Damien
>>> 
>> 
>> The opinions expressed here are mine, while they may reflect a cognitive
>> thought, that is purely accidental.
>> Use at your own risk.
>> Michael Segel
>> michael_segel (AT) hotmail.com
>> 
>> 
>> 
>> 
>> 
>> 

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: HBase export limit bandwith

Posted by Ted Yu <yu...@gmail.com>.

bq. take a snapshot and write the file(s)

Is the above referring to hbase snapshot ?
hbase 0.92.x doesn't support snapshot.

FYI


On Thu, Jun 5, 2014 at 5:11 AM, Michael Segel <mi...@hotmail.com>
wrote:

> Ok...
>
> So when the basic tools don't work...
> How about roll your own?
>
> Step 1 take a snapshot and write the file(s) to a different location
> outside of /hbase.
> (Export to local disk on the cluster)
>
> Step 2 write your own M/R job and control the number of mappers who read
> from HDFS and write to S3.
> Assuming you want a block for block match. If you want to change the
> #files since each region would be a separate file, you could do the write
> to S3 in the reduce phase.
> (Which is what you want.)
>
>
> On Jun 4, 2014, at 7:39 AM, Damien Hardy <dh...@viadeoteam.com> wrote:
>
> > Hello,
> >
> > We are trying to export HBase table on S3 for backup purpose.
> > By default export tool run a map per region and we want to limit output
> > bandwidth on internet (to amazon s3).
> >
> > We were thinking in adding some reducer to limit the number of writers
> > but this is explicitly hardcoded to 0 in Export class
> > ```
> >    // No reducers. Just write straight to output files.
> >    job.setNumReduceTasks(0);
> > ```
> >
> > Is there an other way (propertie?) in hadoop to limit output bandwidth ?
> >
> > --
> > Damien
> >
>
> The opinions expressed here are mine, while they may reflect a cognitive
> thought, that is purely accidental.
> Use at your own risk.
> Michael Segel
> michael_segel (AT) hotmail.com
>
>
>
>
>
>

Re: HBase export limit bandwith

Posted by Stack <st...@duboce.net>.

>
>
> On Jun 4, 2014, at 7:39 AM, Damien Hardy <dh...@viadeoteam.com> wrote:
>
> > Hello,
> >
> > We are trying to export HBase table on S3 for backup purpose.
> > By default export tool run a map per region and we want to limit output
> > bandwidth on internet (to amazon s3).
> >
> > We were thinking in adding some reducer to limit the number of writers
> > but this is explicitly hardcoded to 0 in Export class
> > ```
> >    // No reducers. Just write straight to output files.
> >    job.setNumReduceTasks(0);
> > ```
> >

Echoing Michael Segel, why not subclass and set reducers to whatever you
want in your subclass?

But you probably don't want to have reducers anyways.  The output from your
mappers will have to be sorted and fed to the reducers which will put up a
load on your cluster, a loading that could be better deployed moving the
data to S3.

Or limit the number of mappers you have running at any one time via
configuration or in a subclass limit the rate at which they write?

St.Ack

Re: HBase export limit bandwith

Posted by Michael Segel <mi...@hotmail.com>.

Ok... 

So when the basic tools don't work... 
How about roll your own? 

Step 1 take a snapshot and write the file(s) to a different location outside of /hbase. 
(Export to local disk on the cluster)

Step 2 write your own M/R job and control the number of mappers who read from HDFS and write to S3. 
Assuming you want a block for block match. If you want to change the #files since each region would be a separate file, you could do the write to S3 in the reduce phase. 
(Which is what you want.) 

On Jun 4, 2014, at 7:39 AM, Damien Hardy <dh...@viadeoteam.com> wrote:

> Hello,
> 
> We are trying to export HBase table on S3 for backup purpose.
> By default export tool run a map per region and we want to limit output
> bandwidth on internet (to amazon s3).
> 
> We were thinking in adding some reducer to limit the number of writers
> but this is explicitly hardcoded to 0 in Export class
> ```
>    // No reducers. Just write straight to output files.
>    job.setNumReduceTasks(0);
> ```
> 
> Is there an other way (propertie?) in hadoop to limit output bandwidth ?
> 
> -- 
> Damien
> 

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com