You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Sunil S Nandihalli <su...@gmail.com> on 2014/10/21 07:31:20 UTC

How to limit the number of containers requested by a pig script?

Hi Everybody,
 I would like to know how I can limit the number of concurrent containers
requested(and used ofcourse)  by my pig-script (not as a yarn queue
configuration or some such stuff..  I want to limit it from outside on a
 per job basis. I would ideally like to set the number in my pig-script.)
Can I do this?
Thanks,
Sunil.

Re: How to limit the number of containers requested by a pig script?

Posted by Jakub Stransky <st...@gmail.com>.
What I understand so far is that in pig you cannot decide how many mappers
will run. That is given by some optimalization - given the number of files,
size of blocks etc. What you can control is the number of reducers via
Parallel directive. But for sure you can SET mapreduce.job.maps  but not
sure what the effect will be. That is what I remember from doc.

Hope this helps

On 21 October 2014 13:30, Shahab Yunus <sh...@gmail.com> wrote:

> Jakub, are you saying that we can't change the mappers per job through the
> script, right? Because, otherwise, if invoking through command line or
> code, then we can, I think. We do have this property mapreduce.job.maps.
>
> Regards,
> Shahab
>
> On Tue, Oct 21, 2014 at 2:42 AM, Jakub Stransky <st...@gmail.com>
> wrote:
>
>> Hello,
>>
>> as far as I understand. Number of mappers you cannot drive. The number of
>> reducers you can control via PARALEL keyword. Number of containers on a
>> node is given by following combination of settings:
>> yarn.nodemanager.resource.memory-mb - set on a cluster. And following
>> properties can be "modified" from your script setting to a different
>> number, mapreduce.map.memory.mb and mapreduce.reduce.memory.mb.
>>
>> Hope this helps
>>
>> On 21 October 2014 07:31, Sunil S Nandihalli <su...@gmail.com>
>> wrote:
>>
>>> Hi Everybody,
>>>  I would like to know how I can limit the number of concurrent
>>> containers requested(and used ofcourse)  by my pig-script (not as a yarn
>>> queue configuration or some such stuff..  I want to limit it from outside
>>> on a  per job basis. I would ideally like to set the number in my
>>> pig-script.) Can I do this?
>>> Thanks,
>>> Sunil.
>>>
>>
>>
>>
>> --
>> Jakub Stransky
>> cz.linkedin.com/in/jakubstransky
>>
>>
>


-- 
Jakub Stransky
cz.linkedin.com/in/jakubstransky

Re: How to limit the number of containers requested by a pig script?

Posted by Jakub Stransky <st...@gmail.com>.
What I understand so far is that in pig you cannot decide how many mappers
will run. That is given by some optimalization - given the number of files,
size of blocks etc. What you can control is the number of reducers via
Parallel directive. But for sure you can SET mapreduce.job.maps  but not
sure what the effect will be. That is what I remember from doc.

Hope this helps

On 21 October 2014 13:30, Shahab Yunus <sh...@gmail.com> wrote:

> Jakub, are you saying that we can't change the mappers per job through the
> script, right? Because, otherwise, if invoking through command line or
> code, then we can, I think. We do have this property mapreduce.job.maps.
>
> Regards,
> Shahab
>
> On Tue, Oct 21, 2014 at 2:42 AM, Jakub Stransky <st...@gmail.com>
> wrote:
>
>> Hello,
>>
>> as far as I understand. Number of mappers you cannot drive. The number of
>> reducers you can control via PARALEL keyword. Number of containers on a
>> node is given by following combination of settings:
>> yarn.nodemanager.resource.memory-mb - set on a cluster. And following
>> properties can be "modified" from your script setting to a different
>> number, mapreduce.map.memory.mb and mapreduce.reduce.memory.mb.
>>
>> Hope this helps
>>
>> On 21 October 2014 07:31, Sunil S Nandihalli <su...@gmail.com>
>> wrote:
>>
>>> Hi Everybody,
>>>  I would like to know how I can limit the number of concurrent
>>> containers requested(and used ofcourse)  by my pig-script (not as a yarn
>>> queue configuration or some such stuff..  I want to limit it from outside
>>> on a  per job basis. I would ideally like to set the number in my
>>> pig-script.) Can I do this?
>>> Thanks,
>>> Sunil.
>>>
>>
>>
>>
>> --
>> Jakub Stransky
>> cz.linkedin.com/in/jakubstransky
>>
>>
>


-- 
Jakub Stransky
cz.linkedin.com/in/jakubstransky

Re: How to limit the number of containers requested by a pig script?

Posted by Jakub Stransky <st...@gmail.com>.
What I understand so far is that in pig you cannot decide how many mappers
will run. That is given by some optimalization - given the number of files,
size of blocks etc. What you can control is the number of reducers via
Parallel directive. But for sure you can SET mapreduce.job.maps  but not
sure what the effect will be. That is what I remember from doc.

Hope this helps

On 21 October 2014 13:30, Shahab Yunus <sh...@gmail.com> wrote:

> Jakub, are you saying that we can't change the mappers per job through the
> script, right? Because, otherwise, if invoking through command line or
> code, then we can, I think. We do have this property mapreduce.job.maps.
>
> Regards,
> Shahab
>
> On Tue, Oct 21, 2014 at 2:42 AM, Jakub Stransky <st...@gmail.com>
> wrote:
>
>> Hello,
>>
>> as far as I understand. Number of mappers you cannot drive. The number of
>> reducers you can control via PARALEL keyword. Number of containers on a
>> node is given by following combination of settings:
>> yarn.nodemanager.resource.memory-mb - set on a cluster. And following
>> properties can be "modified" from your script setting to a different
>> number, mapreduce.map.memory.mb and mapreduce.reduce.memory.mb.
>>
>> Hope this helps
>>
>> On 21 October 2014 07:31, Sunil S Nandihalli <su...@gmail.com>
>> wrote:
>>
>>> Hi Everybody,
>>>  I would like to know how I can limit the number of concurrent
>>> containers requested(and used ofcourse)  by my pig-script (not as a yarn
>>> queue configuration or some such stuff..  I want to limit it from outside
>>> on a  per job basis. I would ideally like to set the number in my
>>> pig-script.) Can I do this?
>>> Thanks,
>>> Sunil.
>>>
>>
>>
>>
>> --
>> Jakub Stransky
>> cz.linkedin.com/in/jakubstransky
>>
>>
>


-- 
Jakub Stransky
cz.linkedin.com/in/jakubstransky

Re: How to limit the number of containers requested by a pig script?

Posted by Jakub Stransky <st...@gmail.com>.
What I understand so far is that in pig you cannot decide how many mappers
will run. That is given by some optimalization - given the number of files,
size of blocks etc. What you can control is the number of reducers via
Parallel directive. But for sure you can SET mapreduce.job.maps  but not
sure what the effect will be. That is what I remember from doc.

Hope this helps

On 21 October 2014 13:30, Shahab Yunus <sh...@gmail.com> wrote:

> Jakub, are you saying that we can't change the mappers per job through the
> script, right? Because, otherwise, if invoking through command line or
> code, then we can, I think. We do have this property mapreduce.job.maps.
>
> Regards,
> Shahab
>
> On Tue, Oct 21, 2014 at 2:42 AM, Jakub Stransky <st...@gmail.com>
> wrote:
>
>> Hello,
>>
>> as far as I understand. Number of mappers you cannot drive. The number of
>> reducers you can control via PARALEL keyword. Number of containers on a
>> node is given by following combination of settings:
>> yarn.nodemanager.resource.memory-mb - set on a cluster. And following
>> properties can be "modified" from your script setting to a different
>> number, mapreduce.map.memory.mb and mapreduce.reduce.memory.mb.
>>
>> Hope this helps
>>
>> On 21 October 2014 07:31, Sunil S Nandihalli <su...@gmail.com>
>> wrote:
>>
>>> Hi Everybody,
>>>  I would like to know how I can limit the number of concurrent
>>> containers requested(and used ofcourse)  by my pig-script (not as a yarn
>>> queue configuration or some such stuff..  I want to limit it from outside
>>> on a  per job basis. I would ideally like to set the number in my
>>> pig-script.) Can I do this?
>>> Thanks,
>>> Sunil.
>>>
>>
>>
>>
>> --
>> Jakub Stransky
>> cz.linkedin.com/in/jakubstransky
>>
>>
>


-- 
Jakub Stransky
cz.linkedin.com/in/jakubstransky

Re: How to limit the number of containers requested by a pig script?

Posted by Shahab Yunus <sh...@gmail.com>.
Jakub, are you saying that we can't change the mappers per job through the
script, right? Because, otherwise, if invoking through command line or
code, then we can, I think. We do have this property mapreduce.job.maps.

Regards,
Shahab

On Tue, Oct 21, 2014 at 2:42 AM, Jakub Stransky <st...@gmail.com>
wrote:

> Hello,
>
> as far as I understand. Number of mappers you cannot drive. The number of
> reducers you can control via PARALEL keyword. Number of containers on a
> node is given by following combination of settings:
> yarn.nodemanager.resource.memory-mb - set on a cluster. And following
> properties can be "modified" from your script setting to a different
> number, mapreduce.map.memory.mb and mapreduce.reduce.memory.mb.
>
> Hope this helps
>
> On 21 October 2014 07:31, Sunil S Nandihalli <su...@gmail.com>
> wrote:
>
>> Hi Everybody,
>>  I would like to know how I can limit the number of concurrent containers
>> requested(and used ofcourse)  by my pig-script (not as a yarn queue
>> configuration or some such stuff..  I want to limit it from outside on a
>>  per job basis. I would ideally like to set the number in my pig-script.)
>> Can I do this?
>> Thanks,
>> Sunil.
>>
>
>
>
> --
> Jakub Stransky
> cz.linkedin.com/in/jakubstransky
>
>

Re: How to limit the number of containers requested by a pig script?

Posted by Shahab Yunus <sh...@gmail.com>.
Jakub, are you saying that we can't change the mappers per job through the
script, right? Because, otherwise, if invoking through command line or
code, then we can, I think. We do have this property mapreduce.job.maps.

Regards,
Shahab

On Tue, Oct 21, 2014 at 2:42 AM, Jakub Stransky <st...@gmail.com>
wrote:

> Hello,
>
> as far as I understand. Number of mappers you cannot drive. The number of
> reducers you can control via PARALEL keyword. Number of containers on a
> node is given by following combination of settings:
> yarn.nodemanager.resource.memory-mb - set on a cluster. And following
> properties can be "modified" from your script setting to a different
> number, mapreduce.map.memory.mb and mapreduce.reduce.memory.mb.
>
> Hope this helps
>
> On 21 October 2014 07:31, Sunil S Nandihalli <su...@gmail.com>
> wrote:
>
>> Hi Everybody,
>>  I would like to know how I can limit the number of concurrent containers
>> requested(and used ofcourse)  by my pig-script (not as a yarn queue
>> configuration or some such stuff..  I want to limit it from outside on a
>>  per job basis. I would ideally like to set the number in my pig-script.)
>> Can I do this?
>> Thanks,
>> Sunil.
>>
>
>
>
> --
> Jakub Stransky
> cz.linkedin.com/in/jakubstransky
>
>

Re: How to limit the number of containers requested by a pig script?

Posted by Shahab Yunus <sh...@gmail.com>.
Jakub, are you saying that we can't change the mappers per job through the
script, right? Because, otherwise, if invoking through command line or
code, then we can, I think. We do have this property mapreduce.job.maps.

Regards,
Shahab

On Tue, Oct 21, 2014 at 2:42 AM, Jakub Stransky <st...@gmail.com>
wrote:

> Hello,
>
> as far as I understand. Number of mappers you cannot drive. The number of
> reducers you can control via PARALEL keyword. Number of containers on a
> node is given by following combination of settings:
> yarn.nodemanager.resource.memory-mb - set on a cluster. And following
> properties can be "modified" from your script setting to a different
> number, mapreduce.map.memory.mb and mapreduce.reduce.memory.mb.
>
> Hope this helps
>
> On 21 October 2014 07:31, Sunil S Nandihalli <su...@gmail.com>
> wrote:
>
>> Hi Everybody,
>>  I would like to know how I can limit the number of concurrent containers
>> requested(and used ofcourse)  by my pig-script (not as a yarn queue
>> configuration or some such stuff..  I want to limit it from outside on a
>>  per job basis. I would ideally like to set the number in my pig-script.)
>> Can I do this?
>> Thanks,
>> Sunil.
>>
>
>
>
> --
> Jakub Stransky
> cz.linkedin.com/in/jakubstransky
>
>

Re: How to limit the number of containers requested by a pig script?

Posted by Shahab Yunus <sh...@gmail.com>.
Jakub, are you saying that we can't change the mappers per job through the
script, right? Because, otherwise, if invoking through command line or
code, then we can, I think. We do have this property mapreduce.job.maps.

Regards,
Shahab

On Tue, Oct 21, 2014 at 2:42 AM, Jakub Stransky <st...@gmail.com>
wrote:

> Hello,
>
> as far as I understand. Number of mappers you cannot drive. The number of
> reducers you can control via PARALEL keyword. Number of containers on a
> node is given by following combination of settings:
> yarn.nodemanager.resource.memory-mb - set on a cluster. And following
> properties can be "modified" from your script setting to a different
> number, mapreduce.map.memory.mb and mapreduce.reduce.memory.mb.
>
> Hope this helps
>
> On 21 October 2014 07:31, Sunil S Nandihalli <su...@gmail.com>
> wrote:
>
>> Hi Everybody,
>>  I would like to know how I can limit the number of concurrent containers
>> requested(and used ofcourse)  by my pig-script (not as a yarn queue
>> configuration or some such stuff..  I want to limit it from outside on a
>>  per job basis. I would ideally like to set the number in my pig-script.)
>> Can I do this?
>> Thanks,
>> Sunil.
>>
>
>
>
> --
> Jakub Stransky
> cz.linkedin.com/in/jakubstransky
>
>

Re: How to limit the number of containers requested by a pig script?

Posted by Jakub Stransky <st...@gmail.com>.
Hello,

as far as I understand. Number of mappers you cannot drive. The number of
reducers you can control via PARALEL keyword. Number of containers on a
node is given by following combination of settings:
yarn.nodemanager.resource.memory-mb - set on a cluster. And following
properties can be "modified" from your script setting to a different
number, mapreduce.map.memory.mb and mapreduce.reduce.memory.mb.

Hope this helps

On 21 October 2014 07:31, Sunil S Nandihalli <su...@gmail.com>
wrote:

> Hi Everybody,
>  I would like to know how I can limit the number of concurrent containers
> requested(and used ofcourse)  by my pig-script (not as a yarn queue
> configuration or some such stuff..  I want to limit it from outside on a
>  per job basis. I would ideally like to set the number in my pig-script.)
> Can I do this?
> Thanks,
> Sunil.
>



-- 
Jakub Stransky
cz.linkedin.com/in/jakubstransky

Re: How to limit the number of containers requested by a pig script?

Posted by Jakub Stransky <st...@gmail.com>.
Hello,

as far as I understand. Number of mappers you cannot drive. The number of
reducers you can control via PARALEL keyword. Number of containers on a
node is given by following combination of settings:
yarn.nodemanager.resource.memory-mb - set on a cluster. And following
properties can be "modified" from your script setting to a different
number, mapreduce.map.memory.mb and mapreduce.reduce.memory.mb.

Hope this helps

On 21 October 2014 07:31, Sunil S Nandihalli <su...@gmail.com>
wrote:

> Hi Everybody,
>  I would like to know how I can limit the number of concurrent containers
> requested(and used ofcourse)  by my pig-script (not as a yarn queue
> configuration or some such stuff..  I want to limit it from outside on a
>  per job basis. I would ideally like to set the number in my pig-script.)
> Can I do this?
> Thanks,
> Sunil.
>



-- 
Jakub Stransky
cz.linkedin.com/in/jakubstransky

Re: How to limit the number of containers requested by a pig script?

Posted by Jakub Stransky <st...@gmail.com>.
Hello,

as far as I understand. Number of mappers you cannot drive. The number of
reducers you can control via PARALEL keyword. Number of containers on a
node is given by following combination of settings:
yarn.nodemanager.resource.memory-mb - set on a cluster. And following
properties can be "modified" from your script setting to a different
number, mapreduce.map.memory.mb and mapreduce.reduce.memory.mb.

Hope this helps

On 21 October 2014 07:31, Sunil S Nandihalli <su...@gmail.com>
wrote:

> Hi Everybody,
>  I would like to know how I can limit the number of concurrent containers
> requested(and used ofcourse)  by my pig-script (not as a yarn queue
> configuration or some such stuff..  I want to limit it from outside on a
>  per job basis. I would ideally like to set the number in my pig-script.)
> Can I do this?
> Thanks,
> Sunil.
>



-- 
Jakub Stransky
cz.linkedin.com/in/jakubstransky

Re: How to limit the number of containers requested by a pig script?

Posted by Jakub Stransky <st...@gmail.com>.
Hello,

as far as I understand. Number of mappers you cannot drive. The number of
reducers you can control via PARALEL keyword. Number of containers on a
node is given by following combination of settings:
yarn.nodemanager.resource.memory-mb - set on a cluster. And following
properties can be "modified" from your script setting to a different
number, mapreduce.map.memory.mb and mapreduce.reduce.memory.mb.

Hope this helps

On 21 October 2014 07:31, Sunil S Nandihalli <su...@gmail.com>
wrote:

> Hi Everybody,
>  I would like to know how I can limit the number of concurrent containers
> requested(and used ofcourse)  by my pig-script (not as a yarn queue
> configuration or some such stuff..  I want to limit it from outside on a
>  per job basis. I would ideally like to set the number in my pig-script.)
> Can I do this?
> Thanks,
> Sunil.
>



-- 
Jakub Stransky
cz.linkedin.com/in/jakubstransky