You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Prabhu Dhakshina Murthy <pr...@yahoo-inc.com> on 2011/07/04 05:00:51 UTC

Few PIG questions

I have few questions on running the pig script/ map-reduce jobs.

1. I know that pig creates *logical, physical and then execution plans*
before it really starts executing the map/reduce job; I am able to look
at the logical/physical plans using the command *explain <alias_name>*;
But how do I view the execution plan (which I suppose list the different
map/reduce tasks planned)? In the course of pig execution, I see that
many jobs (map/reduce pair) are created. Want to understand what each of
these jobs solve.

2. Is there any definitive guide which I can use to understand the plans
created because what is spat is difficult to understand.

3. I am able to change the number of map jobs by changing the number of
input file blocks. Do I have control over the number of reduce jobs as
well? How do I set the number of reducers?

4. What is the default heap memory size in mapper/reducer nodes? Which
job parameters reflect these? Will I be able to change the heap memory
by -Xmx 1024m option? My jobs used to fail when I set the heap memory in
this way - May be there are some restrictions on what values can be
supplied?

Thanks much!

Re: Few PIG questions

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
In addition:

" In the course of pig execution, I see that
many jobs (map/reduce pair) are created. Want to understand what each of
these jobs solve."

For each MR job, there is a property in its xml config called
"pig.alias" that tells you which of the relations in your pig script
are involved in this particular job.

D

On Sun, Jul 3, 2011 at 9:11 PM, Daniel Dai <da...@hortonworks.com> wrote:
> On Sun, Jul 3, 2011 at 10:00 PM, Prabhu Dhakshina Murthy <
> prabhudm@yahoo-inc.com> wrote:
>
>> I have few questions on running the pig script/ map-reduce jobs.
>>
>> 1. I know that pig creates *logical, physical and then execution plans*
>> before it really starts executing the map/reduce job; I am able to look
>> at the logical/physical plans using the command *explain <alias_name>*;
>> But how do I view the execution plan (which I suppose list the different
>> map/reduce tasks planned)? In the course of pig execution, I see that
>> many jobs (map/reduce pair) are created. Want to understand what each of
>> these jobs solve.
>>
>>  "explain alias" will show logical plan, physical plan and MR plan. Check
> carefully.
>
> 2. Is there any definitive guide which I can use to understand the plans
>> created because what is spat is difficult to understand.
>>
> Check *
> http://ofps.oreilly.com/titles/9781449302641/developing_and_testing.html#dev_tools
> *
>
>
>
> 3. I am able to change the number of map jobs by changing the number of
>> input file blocks. Do I have control over the number of reduce jobs as
>> well? How do I set the number of reducers?
>>
> Check
> http://pig.apache.org/docs/r0.8.1/cookbook.html#Use+the+Parallel+Features
>
>
>> 4. What is the default heap memory size in mapper/reducer nodes? Which
>> job parameters reflect these? Will I be able to change the heap memory
>> by -Xmx 1024m option? My jobs used to fail when I set the heap memory in
>> this way - May be there are some restrictions on what values can be
>> supplied?
>>
> It is controlled "mapred.*child*.java.opts"
>
> Daniel
>
>
>>
>> Thanks much!
>>
>

Re: Few PIG questions

Posted by Daniel Dai <da...@hortonworks.com>.
On Sun, Jul 3, 2011 at 10:00 PM, Prabhu Dhakshina Murthy <
prabhudm@yahoo-inc.com> wrote:

> I have few questions on running the pig script/ map-reduce jobs.
>
> 1. I know that pig creates *logical, physical and then execution plans*
> before it really starts executing the map/reduce job; I am able to look
> at the logical/physical plans using the command *explain <alias_name>*;
> But how do I view the execution plan (which I suppose list the different
> map/reduce tasks planned)? In the course of pig execution, I see that
> many jobs (map/reduce pair) are created. Want to understand what each of
> these jobs solve.
>
>  "explain alias" will show logical plan, physical plan and MR plan. Check
carefully.

2. Is there any definitive guide which I can use to understand the plans
> created because what is spat is difficult to understand.
>
Check *
http://ofps.oreilly.com/titles/9781449302641/developing_and_testing.html#dev_tools
*



3. I am able to change the number of map jobs by changing the number of
> input file blocks. Do I have control over the number of reduce jobs as
> well? How do I set the number of reducers?
>
Check
http://pig.apache.org/docs/r0.8.1/cookbook.html#Use+the+Parallel+Features


> 4. What is the default heap memory size in mapper/reducer nodes? Which
> job parameters reflect these? Will I be able to change the heap memory
> by -Xmx 1024m option? My jobs used to fail when I set the heap memory in
> this way - May be there are some restrictions on what values can be
> supplied?
>
It is controlled "mapred.*child*.java.opts"

Daniel


>
> Thanks much!
>