You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Vincent Barat <vi...@gmail.com> on 2013/01/21 14:27:13 UTC

Is there a way to limit the number of maps produced by HBaseStorage ?

Hi,

We are using HBaseStorage intensively to load data from tables 
having more than 100 regions.

HBaseStorage generates 1 map par region, and our cluster having 50 
map slots, it happens that our PIG scripts start 50 maps reading 
concurrently data from HBase.

The problem is that our HBase cluster has only 10 nodes, and thus 
the maps overload it (5 intensive readers per node is too much to 
bare).

So question: is there a way to say to PIG : limit the nb of maps to 
this maximum (ex: 10) ?
If not, how can I patch the code to do this ?

Thanks a lot for your help

Re: Is there a way to limit the number of maps produced by HBaseStorage ?

Posted by inelu nagamallikarjuna <ma...@gmail.com>.
Hi Vincent,

You can restrict the number of concurrent maps by setting this
parameter *mapred.tasktracker.map.tasks.maximum
= 1 or 2*.



*Thanks
Nagamallikarjuna*

On Mon, Jan 21, 2013 at 7:13 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Vincent,
>
>          The number of map tasks for a job is primarily governed by the
> InputSplits and the InputFormat you are using. So setting it through a
> config parameter doesn't guarantee that your job would have the specified
> number of map tasks. However, you can give it a try by using "set
> mapred.map.tasks=n" in your PigLatin job.
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Mon, Jan 21, 2013 at 6:57 PM, Vincent Barat <vincent.barat@gmail.com
> >wrote:
>
> > Hi,
> >
> > We are using HBaseStorage intensively to load data from tables having
> more
> > than 100 regions.
> >
> > HBaseStorage generates 1 map par region, and our cluster having 50 map
> > slots, it happens that our PIG scripts start 50 maps reading concurrently
> > data from HBase.
> >
> > The problem is that our HBase cluster has only 10 nodes, and thus the
> maps
> > overload it (5 intensive readers per node is too much to bare).
> >
> > So question: is there a way to say to PIG : limit the nb of maps to this
> > maximum (ex: 10) ?
> > If not, how can I patch the code to do this ?
> >
> > Thanks a lot for your help
> >
>
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Mon, Jan 21, 2013 at 6:57 PM, Vincent Barat <vincent.barat@gmail.com
> >wrote:
>
> > Hi,
> >
> > We are using HBaseStorage intensively to load data from tables having
> more
> > than 100 regions.
> >
> > HBaseStorage generates 1 map par region, and our cluster having 50 map
> > slots, it happens that our PIG scripts start 50 maps reading concurrently
> > data from HBase.
> >
> > The problem is that our HBase cluster has only 10 nodes, and thus the
> maps
> > overload it (5 intensive readers per node is too much to bare).
> >
> > So question: is there a way to say to PIG : limit the nb of maps to this
> > maximum (ex: 10) ?
> > If not, how can I patch the code to do this ?
> >
> > Thanks a lot for your help
> >
>



-- 
Thanks and Regards
Nagamallikarjuna

Re: Is there a way to limit the number of maps produced by HBaseStorage ?

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Vincent,

         The number of map tasks for a job is primarily governed by the
InputSplits and the InputFormat you are using. So setting it through a
config parameter doesn't guarantee that your job would have the specified
number of map tasks. However, you can give it a try by using "set
mapred.map.tasks=n" in your PigLatin job.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Mon, Jan 21, 2013 at 6:57 PM, Vincent Barat <vi...@gmail.com>wrote:

> Hi,
>
> We are using HBaseStorage intensively to load data from tables having more
> than 100 regions.
>
> HBaseStorage generates 1 map par region, and our cluster having 50 map
> slots, it happens that our PIG scripts start 50 maps reading concurrently
> data from HBase.
>
> The problem is that our HBase cluster has only 10 nodes, and thus the maps
> overload it (5 intensive readers per node is too much to bare).
>
> So question: is there a way to say to PIG : limit the nb of maps to this
> maximum (ex: 10) ?
> If not, how can I patch the code to do this ?
>
> Thanks a lot for your help
>


Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Mon, Jan 21, 2013 at 6:57 PM, Vincent Barat <vi...@gmail.com>wrote:

> Hi,
>
> We are using HBaseStorage intensively to load data from tables having more
> than 100 regions.
>
> HBaseStorage generates 1 map par region, and our cluster having 50 map
> slots, it happens that our PIG scripts start 50 maps reading concurrently
> data from HBase.
>
> The problem is that our HBase cluster has only 10 nodes, and thus the maps
> overload it (5 intensive readers per node is too much to bare).
>
> So question: is there a way to say to PIG : limit the nb of maps to this
> maximum (ex: 10) ?
> If not, how can I patch the code to do this ?
>
> Thanks a lot for your help
>