You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by "leiwangouc@gmail.com" <le...@gmail.com> on 2014/05/21 04:16:11 UTC

How to set number of mappers when using HBaseStorage

When using HBaseStorage to read data from hbase table, there will be one mapper for one region.
Howerver, my hbase table has more than 1000 regions and only 80 mappers capacity.
Is there a way to set the number of mappers when using HBaseStorage?

Thanks,
Lei



leiwangouc@gmail.com

Re: Re: How to set number of mappers when using HBaseStorage

Posted by "leiwangouc@gmail.com" <le...@gmail.com>.
Let me have a try of the fair scheduler. 

Thanks,
Lei



leiwangouc@gmail.com
 
From: Bryan Beaudreault
Date: 2014-05-21 23:20
To: user
Subject: Re: How to set number of mappers when using HBaseStorage
Hansi's scheduler configuration is the real solution here, but combining
more regions into a single split is useful for other reasons.  Specifically
it helps control load against an HBase cluster from the job; you don't
always want 50 mappers running against a single regionserver.
 
We run into this a lot at HubSpot, so I've created my own extension of
TableInputFormat and corresponding RecordReader so that you can partition
the mappers by regionserver.  This allows you to split all the regions for
a regionserver into a configurable number of mappers (1-N).  I haven't
contributed this yet, but you can get the code at
https://gist.github.com/bbeaudreault/9788499
 
 
 
 
On Wed, May 21, 2014 at 11:12 AM, Pradeep Gollakota <pr...@gmail.com>wrote:
 
> I just looked at the source code for HBaseStorage. It uses a modified
> version of TableInputFormat under the hood. TableInputFormat, AFAIK, does
> not support controlling the number of launched Map tasks. It might be a
> worthwhile contribution to HBase to write an analogous version of a
> CombineInputFormat, so a single Map task can read multiple regions.
>
>
> On Wed, May 21, 2014 at 10:21 AM, Hansi Klose <ha...@web.de> wrote:
>
> > Hi Lei,
> >
> > I don't know if that helps you, I had the same problem with the
> > replication verify jobs I
> > run in our environment.
> >
> > I created a fairscheduler pool on the jobtracker called "admin" and
> > configured
> > this pool with the maximum mappers the job should take.
> >
> > I inserted in my hbase-site.xml this section
> >
> >   <property>
> >     <name>mapred.queue.name</name>
> >     <value>admin</value>
> >   </property>
> >   <property>
> >
> > You need to insert this only on the node you start the job.
> >
> > Then I login as user "hbase" on that machine with the configuration.
> >
> > When i run my verify jobs as user "hbase" the job will go to the
> > fairscheduler pool
> > "admin" and will take only the allowed count of mappers.
> >
> > Before i took all mapper i could get.
> >
> > Regards Hansi
> >
> > > Gesendet: Mittwoch, 21. Mai 2014 um 04:16 Uhr
> > > Von: "leiwangouc@gmail.com" <le...@gmail.com>
> > > An: user <us...@pig.apache.org>, user <us...@hbase.apache.org>
> > > Betreff: How to set number of mappers when using HBaseStorage
> > >
> > >
> > > When using HBaseStorage to read data from hbase table, there will be
> one
> > mapper for one region.
> > > Howerver, my hbase table has more than 1000 regions and only 80 mappers
> > capacity.
> > > Is there a way to set the number of mappers when using HBaseStorage?
> > >
> > > Thanks,
> > > Lei
> > >
> > >
> > >
> > > leiwangouc@gmail.com
> > >
> >
>

Re: How to set number of mappers when using HBaseStorage

Posted by Bryan Beaudreault <bb...@hubspot.com>.
Hansi's scheduler configuration is the real solution here, but combining
more regions into a single split is useful for other reasons.  Specifically
it helps control load against an HBase cluster from the job; you don't
always want 50 mappers running against a single regionserver.

We run into this a lot at HubSpot, so I've created my own extension of
TableInputFormat and corresponding RecordReader so that you can partition
the mappers by regionserver.  This allows you to split all the regions for
a regionserver into a configurable number of mappers (1-N).  I haven't
contributed this yet, but you can get the code at
https://gist.github.com/bbeaudreault/9788499




On Wed, May 21, 2014 at 11:12 AM, Pradeep Gollakota <pr...@gmail.com>wrote:

> I just looked at the source code for HBaseStorage. It uses a modified
> version of TableInputFormat under the hood. TableInputFormat, AFAIK, does
> not support controlling the number of launched Map tasks. It might be a
> worthwhile contribution to HBase to write an analogous version of a
> CombineInputFormat, so a single Map task can read multiple regions.
>
>
> On Wed, May 21, 2014 at 10:21 AM, Hansi Klose <ha...@web.de> wrote:
>
> > Hi Lei,
> >
> > I don't know if that helps you, I had the same problem with the
> > replication verify jobs I
> > run in our environment.
> >
> > I created a fairscheduler pool on the jobtracker called "admin" and
> > configured
> > this pool with the maximum mappers the job should take.
> >
> > I inserted in my hbase-site.xml this section
> >
> >   <property>
> >     <name>mapred.queue.name</name>
> >     <value>admin</value>
> >   </property>
> >   <property>
> >
> > You need to insert this only on the node you start the job.
> >
> > Then I login as user "hbase" on that machine with the configuration.
> >
> > When i run my verify jobs as user "hbase" the job will go to the
> > fairscheduler pool
> > "admin" and will take only the allowed count of mappers.
> >
> > Before i took all mapper i could get.
> >
> > Regards Hansi
> >
> > > Gesendet: Mittwoch, 21. Mai 2014 um 04:16 Uhr
> > > Von: "leiwangouc@gmail.com" <le...@gmail.com>
> > > An: user <us...@pig.apache.org>, user <us...@hbase.apache.org>
> > > Betreff: How to set number of mappers when using HBaseStorage
> > >
> > >
> > > When using HBaseStorage to read data from hbase table, there will be
> one
> > mapper for one region.
> > > Howerver, my hbase table has more than 1000 regions and only 80 mappers
> > capacity.
> > > Is there a way to set the number of mappers when using HBaseStorage?
> > >
> > > Thanks,
> > > Lei
> > >
> > >
> > >
> > > leiwangouc@gmail.com
> > >
> >
>

Re: How to set number of mappers when using HBaseStorage

Posted by Pradeep Gollakota <pr...@gmail.com>.
I just looked at the source code for HBaseStorage. It uses a modified
version of TableInputFormat under the hood. TableInputFormat, AFAIK, does
not support controlling the number of launched Map tasks. It might be a
worthwhile contribution to HBase to write an analogous version of a
CombineInputFormat, so a single Map task can read multiple regions.


On Wed, May 21, 2014 at 10:21 AM, Hansi Klose <ha...@web.de> wrote:

> Hi Lei,
>
> I don't know if that helps you, I had the same problem with the
> replication verify jobs I
> run in our environment.
>
> I created a fairscheduler pool on the jobtracker called "admin" and
> configured
> this pool with the maximum mappers the job should take.
>
> I inserted in my hbase-site.xml this section
>
>   <property>
>     <name>mapred.queue.name</name>
>     <value>admin</value>
>   </property>
>   <property>
>
> You need to insert this only on the node you start the job.
>
> Then I login as user "hbase" on that machine with the configuration.
>
> When i run my verify jobs as user "hbase" the job will go to the
> fairscheduler pool
> "admin" and will take only the allowed count of mappers.
>
> Before i took all mapper i could get.
>
> Regards Hansi
>
> > Gesendet: Mittwoch, 21. Mai 2014 um 04:16 Uhr
> > Von: "leiwangouc@gmail.com" <le...@gmail.com>
> > An: user <us...@pig.apache.org>, user <us...@hbase.apache.org>
> > Betreff: How to set number of mappers when using HBaseStorage
> >
> >
> > When using HBaseStorage to read data from hbase table, there will be one
> mapper for one region.
> > Howerver, my hbase table has more than 1000 regions and only 80 mappers
> capacity.
> > Is there a way to set the number of mappers when using HBaseStorage?
> >
> > Thanks,
> > Lei
> >
> >
> >
> > leiwangouc@gmail.com
> >
>

Aw: How to set number of mappers when using HBaseStorage

Posted by Hansi Klose <ha...@web.de>.
Hi Lei,

I don't know if that helps you, I had the same problem with the replication verify jobs I 
run in our environment.

I created a fairscheduler pool on the jobtracker called "admin" and configured 
this pool with the maximum mappers the job should take.

I inserted in my hbase-site.xml this section

  <property>
    <name>mapred.queue.name</name>
    <value>admin</value>
  </property>
  <property>

You need to insert this only on the node you start the job.

Then I login as user "hbase" on that machine with the configuration.

When i run my verify jobs as user "hbase" the job will go to the fairscheduler pool
"admin" and will take only the allowed count of mappers.

Before i took all mapper i could get.

Regards Hansi

> Gesendet: Mittwoch, 21. Mai 2014 um 04:16 Uhr
> Von: "leiwangouc@gmail.com" <le...@gmail.com>
> An: user <us...@pig.apache.org>, user <us...@hbase.apache.org>
> Betreff: How to set number of mappers when using HBaseStorage
>
> 
> When using HBaseStorage to read data from hbase table, there will be one mapper for one region.
> Howerver, my hbase table has more than 1000 regions and only 80 mappers capacity.
> Is there a way to set the number of mappers when using HBaseStorage?
> 
> Thanks,
> Lei
> 
> 
> 
> leiwangouc@gmail.com
> 

Re: How to set number of mappers when using HBaseStorage

Posted by "leiwangouc@gmail.com" <le...@gmail.com>.
SET pig.maxCombinedSplitSize  can control the number of mappers. But it doesn't work for HBaseStorage. 

Thanks,
Lei



leiwangouc@gmail.com
 
From: leiwangouc@gmail.com
Date: 2014-05-21 10:16
To: user; user
Subject: How to set number of mappers when using HBaseStorage

When using HBaseStorage to read data from hbase table, there will be one mapper for one region.
Howerver, my hbase table has more than 1000 regions and only 80 mappers capacity.
Is there a way to set the number of mappers when using HBaseStorage?

Thanks,
Lei



leiwangouc@gmail.com

Re: How to set number of mappers when using HBaseStorage

Posted by "leiwangouc@gmail.com" <le...@gmail.com>.
SET pig.maxCombinedSplitSize  can control the number of mappers. But it doesn't work for HBaseStorage. 

Thanks,
Lei



leiwangouc@gmail.com
 
From: leiwangouc@gmail.com
Date: 2014-05-21 10:16
To: user; user
Subject: How to set number of mappers when using HBaseStorage

When using HBaseStorage to read data from hbase table, there will be one mapper for one region.
Howerver, my hbase table has more than 1000 regions and only 80 mappers capacity.
Is there a way to set the number of mappers when using HBaseStorage?

Thanks,
Lei



leiwangouc@gmail.com