You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by john smith <js...@gmail.com> on 2009/08/20 18:42:36 UTC
Doubt in HBase
Hi all ,
I have one small doubt . Kindly answer it even if it sounds silly.
Iam using Map Reduce in HBase in distributed mode . I have a table which
spans across 5 region servers . I am using TableInputFormat to read the data
from the tables in the map . When i run the program , by default how many
map regions are created ? Is it one per region server or more ?
Also after the map task is over.. reduce task is taking a bit more time . Is
it due to moving the map output across the regionservers? i.e, moving the
values of same key to a particular reduce phase to start the reducer? Is
there any way i can optimize the code (e.g. by storing data of same reducer
nearby )
Thanks :)
Re: Doubt in HBase
Posted by Jonathan Gray <jl...@streamy.com>.
What Amandeep said.
Also, one clarification for you. You mentioned the reduce task moving
map output across regionservers. Remember, HBase is just a MapReduce
input source or output sink. The sort/shuffle/reduce is a part of
Hadoop MapReduce and has nothing to do with HBase directly. It is
utilizing the JobTracker/TaskTrackers, not the RegionServers.
Like AK said, you can increase the number of reducers, or reduce the
amount of data you output from the maps.
JG
Amandeep Khurana wrote:
> On Thu, Aug 20, 2009 at 9:42 AM, john smith <js...@gmail.com> wrote:
>
>> Hi all ,
>>
>> I have one small doubt . Kindly answer it even if it sounds silly.
>>
>
> No questions are silly.. Dont worry
>
>
>> Iam using Map Reduce in HBase in distributed mode . I have a table which
>> spans across 5 region servers . I am using TableInputFormat to read the
>> data
>> from the tables in the map . When i run the program , by default how many
>> map regions are created ? Is it one per region server or more ?
>>
>
> If you set the number of map tasks to a high number, it automatically spawns
> one map task for each region (not region server). Otherwise, it'll spawn the
> number you have explicitly specified in the job.
>
>
>> Also after the map task is over.. reduce task is taking a bit more time .
>> Is
>> it due to moving the map output across the regionservers? i.e, moving the
>> values of same key to a particular reduce phase to start the reducer? Is
>> there any way i can optimize the code (e.g. by storing data of same reducer
>> nearby )
>>
>
> Increase the number of reducers. Each reducer will have lesser data to move.
>
>
>> Thanks :)
>>
>
Re: Doubt in HBase
Posted by Amandeep Khurana <am...@gmail.com>.
On Thu, Aug 20, 2009 at 9:42 AM, john smith <js...@gmail.com> wrote:
> Hi all ,
>
> I have one small doubt . Kindly answer it even if it sounds silly.
>
No questions are silly.. Dont worry
>
> Iam using Map Reduce in HBase in distributed mode . I have a table which
> spans across 5 region servers . I am using TableInputFormat to read the
> data
> from the tables in the map . When i run the program , by default how many
> map regions are created ? Is it one per region server or more ?
>
If you set the number of map tasks to a high number, it automatically spawns
one map task for each region (not region server). Otherwise, it'll spawn the
number you have explicitly specified in the job.
>
> Also after the map task is over.. reduce task is taking a bit more time .
> Is
> it due to moving the map output across the regionservers? i.e, moving the
> values of same key to a particular reduce phase to start the reducer? Is
> there any way i can optimize the code (e.g. by storing data of same reducer
> nearby )
>
Increase the number of reducers. Each reducer will have lesser data to move.
>
> Thanks :)
>