You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by Aniket Mokashi <an...@gmail.com> on 2014/01/16 01:08:37 UTC

Review Request 16928: PIG-3463 Pig should use hadoop local mode for small jobs

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16928/
-----------------------------------------------------------

Review request for pig, Cheolsoo Park, Dmitriy Ryaboy, and Julien Le Dem.


Bugs: PIG-3463
    https://issues.apache.org/jira/browse/PIG-3463


Repository: pig


Description
-------

If pig.auto.local.enabled is set, JCC will modify Configuration of all the jobs with one reducer and input size less than pig.auto.local.input.maxbytes, so that they are forced to run in local mode. Output of local run is also written to hdfs.


Diffs
-----

  trunk/src/org/apache/pig/ExecTypeProvider.java 1558572 
  trunk/src/org/apache/pig/PigConfiguration.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/datastorage/ConfigurationUtil.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigInputFormat.java 1558572 
  trunk/src/org/apache/pig/impl/PigImplConstants.java 1558572 
  trunk/src/org/apache/pig/tools/pigstats/EmbeddedPigStats.java 1558572 
  trunk/src/org/apache/pig/tools/pigstats/PigStats.java 1558572 
  trunk/src/org/apache/pig/tools/pigstats/mapreduce/MRPigStatsUtil.java 1558572 
  trunk/src/org/apache/pig/tools/pigstats/mapreduce/SimplePigStats.java 1558572 

Diff: https://reviews.apache.org/r/16928/diff/


Testing
-------

Tried few scenarios with the patch-
Load small data, group all, count - works in local mode.
Load small data, another small data and replicated join - works in local mode.
Load small data and order by key - all 3 jobs work in local mode and .
Load small data and large data for replicated join - first job runs in local mode, second runs in MR mode.
Load large data and order by key - works in first stages in local mode and last stage in MR mode.


Thanks,

Aniket Mokashi

Re: Review Request 16928: PIG-3463 Pig should use hadoop local mode for small jobs

Posted by Aniket Mokashi <an...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16928/#review32245
-----------------------------------------------------------



trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
<https://reviews.apache.org/r/16928/#comment60975>

    Will remove this before commit.


- Aniket Mokashi


On Jan. 16, 2014, 10:04 p.m., Aniket Mokashi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16928/
> -----------------------------------------------------------
> 
> (Updated Jan. 16, 2014, 10:04 p.m.)
> 
> 
> Review request for pig, Cheolsoo Park, Daniel Dai, Dmitriy Ryaboy, and Julien Le Dem.
> 
> 
> Bugs: PIG-3463
>     https://issues.apache.org/jira/browse/PIG-3463
> 
> 
> Repository: pig
> 
> 
> Description
> -------
> 
> If pig.auto.local.enabled is set, JCC will modify Configuration of all the jobs with one reducer and input size less than pig.auto.local.input.maxbytes, so that they are forced to run in local mode. Output of local run is also written to hdfs.
> 
> 
> Diffs
> -----
> 
>   trunk/src/org/apache/pig/ExecTypeProvider.java 1558572 
>   trunk/src/org/apache/pig/PigConfiguration.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/datastorage/ConfigurationUtil.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigInputFormat.java 1558572 
>   trunk/src/org/apache/pig/impl/PigImplConstants.java 1558572 
>   trunk/src/org/apache/pig/tools/pigstats/EmbeddedPigStats.java 1558572 
>   trunk/src/org/apache/pig/tools/pigstats/PigStats.java 1558572 
>   trunk/src/org/apache/pig/tools/pigstats/mapreduce/MRPigStatsUtil.java 1558572 
>   trunk/src/org/apache/pig/tools/pigstats/mapreduce/SimplePigStats.java 1558572 
>   trunk/test/org/apache/pig/test/TestAutoLocalMode.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/16928/diff/
> 
> 
> Testing
> -------
> 
> Tried few scenarios with the patch-
> Load small data, group all, count - works in local mode.
> Load small data, another small data and replicated join - works in local mode.
> Load small data and order by key - all 3 jobs work in local mode and .
> Load small data and large data for replicated join - first job runs in local mode, second runs in MR mode.
> Load large data and order by key - works in first stages in local mode and last stage in MR mode.
> 
> 
> Thanks,
> 
> Aniket Mokashi
> 
>

Re: Review Request 16928: PIG-3463 Pig should use hadoop local mode for small jobs

Posted by Aniket Mokashi <an...@gmail.com>.


> On Jan. 20, 2014, 10:25 a.m., Cheolsoo Park wrote:
> > trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java, lines 138-140
> > <https://reviews.apache.org/r/16928/diff/2/?file=425055#file425055line138>
> >
> >     Do you mind replacing these with static variables too?

You can find more instances of these through out the code, let's open another jira documenting pattern we should have for Constants, configurations in Pig code and start making small patches to fix them.


- Aniket


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16928/#review32286
-----------------------------------------------------------


On Jan. 21, 2014, 2:24 a.m., Aniket Mokashi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16928/
> -----------------------------------------------------------
> 
> (Updated Jan. 21, 2014, 2:24 a.m.)
> 
> 
> Review request for pig, Cheolsoo Park, Daniel Dai, Dmitriy Ryaboy, and Julien Le Dem.
> 
> 
> Bugs: PIG-3463
>     https://issues.apache.org/jira/browse/PIG-3463
> 
> 
> Repository: pig
> 
> 
> Description
> -------
> 
> If pig.auto.local.enabled is set, JCC will modify Configuration of all the jobs with one reducer and input size less than pig.auto.local.input.maxbytes, so that they are forced to run in local mode. Output of local run is also written to hdfs.
> 
> 
> Diffs
> -----
> 
>   trunk/src/org/apache/pig/ExecTypeProvider.java 1558572 
>   trunk/src/org/apache/pig/PigConfiguration.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/datastorage/ConfigurationUtil.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigInputFormat.java 1558572 
>   trunk/src/org/apache/pig/impl/PigImplConstants.java 1558572 
>   trunk/test/org/apache/pig/test/TestAutoLocalMode.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/16928/diff/
> 
> 
> Testing
> -------
> 
> Tried few scenarios with the patch-
> Load small data, group all, count - works in local mode.
> Load small data, another small data and replicated join - works in local mode.
> Load small data and order by key - all 3 jobs work in local mode and .
> Load small data and large data for replicated join - first job runs in local mode, second runs in MR mode.
> Load large data and order by key - works in first stages in local mode and last stage in MR mode.
> 
> 
> Thanks,
> 
> Aniket Mokashi
> 
>

Re: Review Request 16928: PIG-3463 Pig should use hadoop local mode for small jobs

Posted by Aniket Mokashi <an...@gmail.com>.


> On Jan. 20, 2014, 10:25 a.m., Cheolsoo Park wrote:
> > This is great work. Thank you so much!
> > 
> > I have two comments-
> > 
> > 1) It doesn't seem to work for a map-only job. For eg, I tried to run load and dump in grunt as follows-
> > 
> > x = load '/user/cheolsoop/foo';
> > dump x;
> > 
> > This job doesn't get converted to local mode because no of reducers are 21, which doesn't make sense. See log output below-
> > 
> > 2014-01-20 10:05:30,578 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Size of input: 8 bytes.
> > 2014-01-20 10:05:30,578 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - No of reducers: 21
> > 2014-01-20 10:05:30,578 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
> > 
> > 2) The changes in PigStats and PigStatsUtil might break backward compatibility. Perhaps we could avoid them if they're not necessary. Thoughts?
> >

1) I tested load-dump on my side, and it got auto converted to local-mode. Digging deeper, I found that reducer estimation happens before okToRunLocal call. But, for map only job, we do not set num reducers to zero until later. So, I moved that code up. That should take care of map-only jobs.

2) Makes sense. Reverted.


- Aniket


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16928/#review32286
-----------------------------------------------------------


On Jan. 21, 2014, 2:24 a.m., Aniket Mokashi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16928/
> -----------------------------------------------------------
> 
> (Updated Jan. 21, 2014, 2:24 a.m.)
> 
> 
> Review request for pig, Cheolsoo Park, Daniel Dai, Dmitriy Ryaboy, and Julien Le Dem.
> 
> 
> Bugs: PIG-3463
>     https://issues.apache.org/jira/browse/PIG-3463
> 
> 
> Repository: pig
> 
> 
> Description
> -------
> 
> If pig.auto.local.enabled is set, JCC will modify Configuration of all the jobs with one reducer and input size less than pig.auto.local.input.maxbytes, so that they are forced to run in local mode. Output of local run is also written to hdfs.
> 
> 
> Diffs
> -----
> 
>   trunk/src/org/apache/pig/ExecTypeProvider.java 1558572 
>   trunk/src/org/apache/pig/PigConfiguration.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/datastorage/ConfigurationUtil.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigInputFormat.java 1558572 
>   trunk/src/org/apache/pig/impl/PigImplConstants.java 1558572 
>   trunk/test/org/apache/pig/test/TestAutoLocalMode.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/16928/diff/
> 
> 
> Testing
> -------
> 
> Tried few scenarios with the patch-
> Load small data, group all, count - works in local mode.
> Load small data, another small data and replicated join - works in local mode.
> Load small data and order by key - all 3 jobs work in local mode and .
> Load small data and large data for replicated join - first job runs in local mode, second runs in MR mode.
> Load large data and order by key - works in first stages in local mode and last stage in MR mode.
> 
> 
> Thanks,
> 
> Aniket Mokashi
> 
>

Re: Review Request 16928: PIG-3463 Pig should use hadoop local mode for small jobs

Posted by Cheolsoo Park <pi...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16928/#review32286
-----------------------------------------------------------


This is great work. Thank you so much!

I have two comments-

1) It doesn't seem to work for a map-only job. For eg, I tried to run load and dump in grunt as follows-

x = load '/user/cheolsoop/foo';
dump x;

This job doesn't get converted to local mode because no of reducers are 21, which doesn't make sense. See log output below-

2014-01-20 10:05:30,578 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Size of input: 8 bytes.
2014-01-20 10:05:30,578 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - No of reducers: 21
2014-01-20 10:05:30,578 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process

2) The changes in PigStats and PigStatsUtil might break backward compatibility. Perhaps we could avoid them if they're not necessary. Thoughts?



trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java
<https://reviews.apache.org/r/16928/#comment61021>

    Do you mind replacing these with static variables too?



trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
<https://reviews.apache.org/r/16928/#comment61022>

    I think the pseudo distributed mode means single-node and multi-processes. But you mean the local mode (multi-threads) here, don't you?



trunk/src/org/apache/pig/tools/pigstats/PigStats.java
<https://reviews.apache.org/r/16928/#comment61027>

    I like removing this from PigStats.
    
    But I am a bit worried that this might break backward compatibility with downstream applications since it is public.



trunk/src/org/apache/pig/tools/pigstats/mapreduce/MRPigStatsUtil.java
<https://reviews.apache.org/r/16928/#comment61023>

    Update the comment to reflect the change.



trunk/src/org/apache/pig/tools/pigstats/mapreduce/MRPigStatsUtil.java
<https://reviews.apache.org/r/16928/#comment61024>

    Update the comment to reflect the change.


- Cheolsoo Park


On Jan. 16, 2014, 10:04 p.m., Aniket Mokashi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16928/
> -----------------------------------------------------------
> 
> (Updated Jan. 16, 2014, 10:04 p.m.)
> 
> 
> Review request for pig, Cheolsoo Park, Daniel Dai, Dmitriy Ryaboy, and Julien Le Dem.
> 
> 
> Bugs: PIG-3463
>     https://issues.apache.org/jira/browse/PIG-3463
> 
> 
> Repository: pig
> 
> 
> Description
> -------
> 
> If pig.auto.local.enabled is set, JCC will modify Configuration of all the jobs with one reducer and input size less than pig.auto.local.input.maxbytes, so that they are forced to run in local mode. Output of local run is also written to hdfs.
> 
> 
> Diffs
> -----
> 
>   trunk/src/org/apache/pig/ExecTypeProvider.java 1558572 
>   trunk/src/org/apache/pig/PigConfiguration.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/datastorage/ConfigurationUtil.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigInputFormat.java 1558572 
>   trunk/src/org/apache/pig/impl/PigImplConstants.java 1558572 
>   trunk/src/org/apache/pig/tools/pigstats/EmbeddedPigStats.java 1558572 
>   trunk/src/org/apache/pig/tools/pigstats/PigStats.java 1558572 
>   trunk/src/org/apache/pig/tools/pigstats/mapreduce/MRPigStatsUtil.java 1558572 
>   trunk/src/org/apache/pig/tools/pigstats/mapreduce/SimplePigStats.java 1558572 
>   trunk/test/org/apache/pig/test/TestAutoLocalMode.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/16928/diff/
> 
> 
> Testing
> -------
> 
> Tried few scenarios with the patch-
> Load small data, group all, count - works in local mode.
> Load small data, another small data and replicated join - works in local mode.
> Load small data and order by key - all 3 jobs work in local mode and .
> Load small data and large data for replicated join - first job runs in local mode, second runs in MR mode.
> Load large data and order by key - works in first stages in local mode and last stage in MR mode.
> 
> 
> Thanks,
> 
> Aniket Mokashi
> 
>

Re: Review Request 16928: PIG-3463 Pig should use hadoop local mode for small jobs

Posted by Cheolsoo Park <pi...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16928/#review32458
-----------------------------------------------------------

Ship it!


Ship It!

- Cheolsoo Park


On Jan. 21, 2014, 2:52 a.m., Aniket Mokashi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16928/
> -----------------------------------------------------------
> 
> (Updated Jan. 21, 2014, 2:52 a.m.)
> 
> 
> Review request for pig, Cheolsoo Park, Daniel Dai, Dmitriy Ryaboy, and Julien Le Dem.
> 
> 
> Bugs: PIG-3463
>     https://issues.apache.org/jira/browse/PIG-3463
> 
> 
> Repository: pig
> 
> 
> Description
> -------
> 
> If pig.auto.local.enabled is set, JCC will modify Configuration of all the jobs with one reducer and input size less than pig.auto.local.input.maxbytes, so that they are forced to run in local mode. Output of local run is also written to hdfs.
> 
> 
> Diffs
> -----
> 
>   trunk/src/org/apache/pig/ExecTypeProvider.java 1558572 
>   trunk/src/org/apache/pig/PigConfiguration.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/datastorage/ConfigurationUtil.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigInputFormat.java 1558572 
>   trunk/src/org/apache/pig/impl/PigImplConstants.java 1558572 
>   trunk/test/org/apache/pig/test/TestAutoLocalMode.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/16928/diff/
> 
> 
> Testing
> -------
> 
> Tried few scenarios with the patch-
> Load small data, group all, count - works in local mode.
> Load small data, another small data and replicated join - works in local mode.
> Load small data and order by key - all 3 jobs work in local mode and .
> Load small data and large data for replicated join - first job runs in local mode, second runs in MR mode.
> Load large data and order by key - works in first stages in local mode and last stage in MR mode.
> 
> 
> Thanks,
> 
> Aniket Mokashi
> 
>

Re: Review Request 16928: PIG-3463 Pig should use hadoop local mode for small jobs

Posted by Aniket Mokashi <an...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16928/
-----------------------------------------------------------

(Updated Jan. 22, 2014, 8:49 a.m.)


Review request for pig, Cheolsoo Park, Daniel Dai, Dmitriy Ryaboy, and Julien Le Dem.


Changes
-------

+ pig.properties


Bugs: PIG-3463
    https://issues.apache.org/jira/browse/PIG-3463


Repository: pig


Description
-------

If pig.auto.local.enabled is set, JCC will modify Configuration of all the jobs with one reducer and input size less than pig.auto.local.input.maxbytes, so that they are forced to run in local mode. Output of local run is also written to hdfs.


Diffs (updated)
-----

  trunk/conf/pig.properties 1558572 
  trunk/src/org/apache/pig/ExecTypeProvider.java 1558572 
  trunk/src/org/apache/pig/PigConfiguration.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/datastorage/ConfigurationUtil.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigInputFormat.java 1558572 
  trunk/src/org/apache/pig/impl/PigImplConstants.java 1558572 
  trunk/test/org/apache/pig/test/TestAutoLocalMode.java PRE-CREATION 

Diff: https://reviews.apache.org/r/16928/diff/


Testing
-------

Tried few scenarios with the patch-
Load small data, group all, count - works in local mode.
Load small data, another small data and replicated join - works in local mode.
Load small data and order by key - all 3 jobs work in local mode and .
Load small data and large data for replicated join - first job runs in local mode, second runs in MR mode.
Load large data and order by key - works in first stages in local mode and last stage in MR mode.


Thanks,

Aniket Mokashi

Re: Review Request 16928: PIG-3463 Pig should use hadoop local mode for small jobs

Posted by Aniket Mokashi <an...@gmail.com>.


> On Jan. 21, 2014, 5:05 a.m., Daniel Dai wrote:
> > Looks good. We also need to add the configuration to conf/pig.properties comments (#pig.auto.local.enabled=true, #pig.auto.local.input.maxbytes=100000000), so user know this configuration.
> > 
> > This also reminds me we should read/write hdfs files in local mode, but that's a different issue.

Thanks for the review, Daniel and Cheolsoo. I will add the properties to pig.properties and commit tomorrow morning.


- Aniket


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16928/#review32339
-----------------------------------------------------------


On Jan. 21, 2014, 2:52 a.m., Aniket Mokashi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16928/
> -----------------------------------------------------------
> 
> (Updated Jan. 21, 2014, 2:52 a.m.)
> 
> 
> Review request for pig, Cheolsoo Park, Daniel Dai, Dmitriy Ryaboy, and Julien Le Dem.
> 
> 
> Bugs: PIG-3463
>     https://issues.apache.org/jira/browse/PIG-3463
> 
> 
> Repository: pig
> 
> 
> Description
> -------
> 
> If pig.auto.local.enabled is set, JCC will modify Configuration of all the jobs with one reducer and input size less than pig.auto.local.input.maxbytes, so that they are forced to run in local mode. Output of local run is also written to hdfs.
> 
> 
> Diffs
> -----
> 
>   trunk/src/org/apache/pig/ExecTypeProvider.java 1558572 
>   trunk/src/org/apache/pig/PigConfiguration.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/datastorage/ConfigurationUtil.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigInputFormat.java 1558572 
>   trunk/src/org/apache/pig/impl/PigImplConstants.java 1558572 
>   trunk/test/org/apache/pig/test/TestAutoLocalMode.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/16928/diff/
> 
> 
> Testing
> -------
> 
> Tried few scenarios with the patch-
> Load small data, group all, count - works in local mode.
> Load small data, another small data and replicated join - works in local mode.
> Load small data and order by key - all 3 jobs work in local mode and .
> Load small data and large data for replicated join - first job runs in local mode, second runs in MR mode.
> Load large data and order by key - works in first stages in local mode and last stage in MR mode.
> 
> 
> Thanks,
> 
> Aniket Mokashi
> 
>

Re: Review Request 16928: PIG-3463 Pig should use hadoop local mode for small jobs

Posted by Cheolsoo Park <pi...@gmail.com>.


> On Jan. 21, 2014, 5:05 a.m., Daniel Dai wrote:
> > Looks good. We also need to add the configuration to conf/pig.properties comments (#pig.auto.local.enabled=true, #pig.auto.local.input.maxbytes=100000000), so user know this configuration.
> > 
> > This also reminds me we should read/write hdfs files in local mode, but that's a different issue.
> 
> Aniket Mokashi wrote:
>     Thanks for the review, Daniel and Cheolsoo. I will add the properties to pig.properties and commit tomorrow morning.

Aniket, can we run unit tests before committing? It's not a small patch, so I'd suggest running unit tests. I can run it if that's not convenient for you. Give me one day.


- Cheolsoo


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16928/#review32339
-----------------------------------------------------------


On Jan. 21, 2014, 2:52 a.m., Aniket Mokashi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16928/
> -----------------------------------------------------------
> 
> (Updated Jan. 21, 2014, 2:52 a.m.)
> 
> 
> Review request for pig, Cheolsoo Park, Daniel Dai, Dmitriy Ryaboy, and Julien Le Dem.
> 
> 
> Bugs: PIG-3463
>     https://issues.apache.org/jira/browse/PIG-3463
> 
> 
> Repository: pig
> 
> 
> Description
> -------
> 
> If pig.auto.local.enabled is set, JCC will modify Configuration of all the jobs with one reducer and input size less than pig.auto.local.input.maxbytes, so that they are forced to run in local mode. Output of local run is also written to hdfs.
> 
> 
> Diffs
> -----
> 
>   trunk/src/org/apache/pig/ExecTypeProvider.java 1558572 
>   trunk/src/org/apache/pig/PigConfiguration.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/datastorage/ConfigurationUtil.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigInputFormat.java 1558572 
>   trunk/src/org/apache/pig/impl/PigImplConstants.java 1558572 
>   trunk/test/org/apache/pig/test/TestAutoLocalMode.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/16928/diff/
> 
> 
> Testing
> -------
> 
> Tried few scenarios with the patch-
> Load small data, group all, count - works in local mode.
> Load small data, another small data and replicated join - works in local mode.
> Load small data and order by key - all 3 jobs work in local mode and .
> Load small data and large data for replicated join - first job runs in local mode, second runs in MR mode.
> Load large data and order by key - works in first stages in local mode and last stage in MR mode.
> 
> 
> Thanks,
> 
> Aniket Mokashi
> 
>

Re: Review Request 16928: PIG-3463 Pig should use hadoop local mode for small jobs

Posted by Aniket Mokashi <an...@gmail.com>.


> On Jan. 21, 2014, 5:05 a.m., Daniel Dai wrote:
> > Looks good. We also need to add the configuration to conf/pig.properties comments (#pig.auto.local.enabled=true, #pig.auto.local.input.maxbytes=100000000), so user know this configuration.
> > 
> > This also reminds me we should read/write hdfs files in local mode, but that's a different issue.
> 
> Aniket Mokashi wrote:
>     Thanks for the review, Daniel and Cheolsoo. I will add the properties to pig.properties and commit tomorrow morning.
> 
> Cheolsoo Park wrote:
>     Aniket, can we run unit tests before committing? It's not a small patch, so I'd suggest running unit tests. I can run it if that's not convenient for you. Give me one day.

Sounds good. I will run the tests on my side too. Please take your time, I will wait for your +1.


- Aniket


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16928/#review32339
-----------------------------------------------------------


On Jan. 21, 2014, 2:52 a.m., Aniket Mokashi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16928/
> -----------------------------------------------------------
> 
> (Updated Jan. 21, 2014, 2:52 a.m.)
> 
> 
> Review request for pig, Cheolsoo Park, Daniel Dai, Dmitriy Ryaboy, and Julien Le Dem.
> 
> 
> Bugs: PIG-3463
>     https://issues.apache.org/jira/browse/PIG-3463
> 
> 
> Repository: pig
> 
> 
> Description
> -------
> 
> If pig.auto.local.enabled is set, JCC will modify Configuration of all the jobs with one reducer and input size less than pig.auto.local.input.maxbytes, so that they are forced to run in local mode. Output of local run is also written to hdfs.
> 
> 
> Diffs
> -----
> 
>   trunk/src/org/apache/pig/ExecTypeProvider.java 1558572 
>   trunk/src/org/apache/pig/PigConfiguration.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/datastorage/ConfigurationUtil.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigInputFormat.java 1558572 
>   trunk/src/org/apache/pig/impl/PigImplConstants.java 1558572 
>   trunk/test/org/apache/pig/test/TestAutoLocalMode.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/16928/diff/
> 
> 
> Testing
> -------
> 
> Tried few scenarios with the patch-
> Load small data, group all, count - works in local mode.
> Load small data, another small data and replicated join - works in local mode.
> Load small data and order by key - all 3 jobs work in local mode and .
> Load small data and large data for replicated join - first job runs in local mode, second runs in MR mode.
> Load large data and order by key - works in first stages in local mode and last stage in MR mode.
> 
> 
> Thanks,
> 
> Aniket Mokashi
> 
>

Re: Review Request 16928: PIG-3463 Pig should use hadoop local mode for small jobs

Posted by Daniel Dai <da...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16928/#review32339
-----------------------------------------------------------


Looks good. We also need to add the configuration to conf/pig.properties comments (#pig.auto.local.enabled=true, #pig.auto.local.input.maxbytes=100000000), so user know this configuration.

This also reminds me we should read/write hdfs files in local mode, but that's a different issue.

- Daniel Dai


On Jan. 21, 2014, 2:52 a.m., Aniket Mokashi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16928/
> -----------------------------------------------------------
> 
> (Updated Jan. 21, 2014, 2:52 a.m.)
> 
> 
> Review request for pig, Cheolsoo Park, Daniel Dai, Dmitriy Ryaboy, and Julien Le Dem.
> 
> 
> Bugs: PIG-3463
>     https://issues.apache.org/jira/browse/PIG-3463
> 
> 
> Repository: pig
> 
> 
> Description
> -------
> 
> If pig.auto.local.enabled is set, JCC will modify Configuration of all the jobs with one reducer and input size less than pig.auto.local.input.maxbytes, so that they are forced to run in local mode. Output of local run is also written to hdfs.
> 
> 
> Diffs
> -----
> 
>   trunk/src/org/apache/pig/ExecTypeProvider.java 1558572 
>   trunk/src/org/apache/pig/PigConfiguration.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/datastorage/ConfigurationUtil.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigInputFormat.java 1558572 
>   trunk/src/org/apache/pig/impl/PigImplConstants.java 1558572 
>   trunk/test/org/apache/pig/test/TestAutoLocalMode.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/16928/diff/
> 
> 
> Testing
> -------
> 
> Tried few scenarios with the patch-
> Load small data, group all, count - works in local mode.
> Load small data, another small data and replicated join - works in local mode.
> Load small data and order by key - all 3 jobs work in local mode and .
> Load small data and large data for replicated join - first job runs in local mode, second runs in MR mode.
> Load large data and order by key - works in first stages in local mode and last stage in MR mode.
> 
> 
> Thanks,
> 
> Aniket Mokashi
> 
>

Re: Review Request 16928: PIG-3463 Pig should use hadoop local mode for small jobs

Posted by Aniket Mokashi <an...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16928/
-----------------------------------------------------------

(Updated Jan. 21, 2014, 2:52 a.m.)


Review request for pig, Cheolsoo Park, Daniel Dai, Dmitriy Ryaboy, and Julien Le Dem.


Bugs: PIG-3463
    https://issues.apache.org/jira/browse/PIG-3463


Repository: pig


Description
-------

If pig.auto.local.enabled is set, JCC will modify Configuration of all the jobs with one reducer and input size less than pig.auto.local.input.maxbytes, so that they are forced to run in local mode. Output of local run is also written to hdfs.


Diffs (updated)
-----

  trunk/src/org/apache/pig/ExecTypeProvider.java 1558572 
  trunk/src/org/apache/pig/PigConfiguration.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/datastorage/ConfigurationUtil.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigInputFormat.java 1558572 
  trunk/src/org/apache/pig/impl/PigImplConstants.java 1558572 
  trunk/test/org/apache/pig/test/TestAutoLocalMode.java PRE-CREATION 

Diff: https://reviews.apache.org/r/16928/diff/


Testing
-------

Tried few scenarios with the patch-
Load small data, group all, count - works in local mode.
Load small data, another small data and replicated join - works in local mode.
Load small data and order by key - all 3 jobs work in local mode and .
Load small data and large data for replicated join - first job runs in local mode, second runs in MR mode.
Load large data and order by key - works in first stages in local mode and last stage in MR mode.


Thanks,

Aniket Mokashi

Re: Review Request 16928: PIG-3463 Pig should use hadoop local mode for small jobs

Posted by Aniket Mokashi <an...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16928/
-----------------------------------------------------------

(Updated Jan. 21, 2014, 2:50 a.m.)

Review request for pig, Cheolsoo Park, Daniel Dai, Dmitriy Ryaboy, and Julien Le Dem.

Changes
-------

remove white spaces.

Bugs: PIG-3463
https://issues.apache.org/jira/browse/PIG-3463

Repository: pig

Description
-------

If pig.auto.local.enabled is set, JCC will modify Configuration of all the jobs with one reducer and input size less than pig.auto.local.input.maxbytes, so that they are forced to run in local mode. Output of local run is also written to hdfs.

Diffs (updated)
-----

trunk/src/org/apache/pig/ExecTypeProvider.java 1558572
trunk/src/org/apache/pig/PigConfiguration.java 1558572
trunk/src/org/apache/pig/backend/hadoop/datastorage/ConfigurationUtil.java 1558572
trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java 1558572
trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1558572
trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java 1558572
trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java 1558572
trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigInputFormat.java 1558572
trunk/src/org/apache/pig/impl/PigImplConstants.java 1558572
trunk/test/org/apache/pig/test/TestAutoLocalMode.java PRE-CREATION

Diff: https://reviews.apache.org/r/16928/diff/

Testing
-------

Tried few scenarios with the patch-
Load small data, group all, count - works in local mode.
Load small data, another small data and replicated join - works in local mode.
Load small data and order by key - all 3 jobs work in local mode and .
Load small data and large data for replicated join - first job runs in local mode, second runs in MR mode.
Load large data and order by key - works in first stages in local mode and last stage in MR mode.

Thanks,

Aniket Mokashi

Re: Review Request 16928: PIG-3463 Pig should use hadoop local mode for small jobs

Posted by Aniket Mokashi <an...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16928/
-----------------------------------------------------------

(Updated Jan. 21, 2014, 2:47 a.m.)


Review request for pig, Cheolsoo Park, Daniel Dai, Dmitriy Ryaboy, and Julien Le Dem.


Bugs: PIG-3463
    https://issues.apache.org/jira/browse/PIG-3463


Repository: pig


Description
-------

If pig.auto.local.enabled is set, JCC will modify Configuration of all the jobs with one reducer and input size less than pig.auto.local.input.maxbytes, so that they are forced to run in local mode. Output of local run is also written to hdfs.


Diffs (updated)
-----

  trunk/src/org/apache/pig/ExecTypeProvider.java 1558572 
  trunk/src/org/apache/pig/PigConfiguration.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/datastorage/ConfigurationUtil.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigInputFormat.java 1558572 
  trunk/src/org/apache/pig/impl/PigImplConstants.java 1558572 
  trunk/test/org/apache/pig/test/TestAutoLocalMode.java PRE-CREATION 

Diff: https://reviews.apache.org/r/16928/diff/


Testing
-------

Tried few scenarios with the patch-
Load small data, group all, count - works in local mode.
Load small data, another small data and replicated join - works in local mode.
Load small data and order by key - all 3 jobs work in local mode and .
Load small data and large data for replicated join - first job runs in local mode, second runs in MR mode.
Load large data and order by key - works in first stages in local mode and last stage in MR mode.


Thanks,

Aniket Mokashi

Re: Review Request 16928: PIG-3463 Pig should use hadoop local mode for small jobs

Posted by Aniket Mokashi <an...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16928/
-----------------------------------------------------------

(Updated Jan. 21, 2014, 2:24 a.m.)


Review request for pig, Cheolsoo Park, Daniel Dai, Dmitriy Ryaboy, and Julien Le Dem.


Changes
-------

Code review changes


Bugs: PIG-3463
    https://issues.apache.org/jira/browse/PIG-3463


Repository: pig


Description
-------

If pig.auto.local.enabled is set, JCC will modify Configuration of all the jobs with one reducer and input size less than pig.auto.local.input.maxbytes, so that they are forced to run in local mode. Output of local run is also written to hdfs.


Diffs (updated)
-----

  trunk/src/org/apache/pig/ExecTypeProvider.java 1558572 
  trunk/src/org/apache/pig/PigConfiguration.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/datastorage/ConfigurationUtil.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigInputFormat.java 1558572 
  trunk/src/org/apache/pig/impl/PigImplConstants.java 1558572 
  trunk/test/org/apache/pig/test/TestAutoLocalMode.java PRE-CREATION 

Diff: https://reviews.apache.org/r/16928/diff/


Testing
-------

Tried few scenarios with the patch-
Load small data, group all, count - works in local mode.
Load small data, another small data and replicated join - works in local mode.
Load small data and order by key - all 3 jobs work in local mode and .
Load small data and large data for replicated join - first job runs in local mode, second runs in MR mode.
Load large data and order by key - works in first stages in local mode and last stage in MR mode.


Thanks,

Aniket Mokashi

Re: Review Request 16928: PIG-3463 Pig should use hadoop local mode for small jobs

Posted by Aniket Mokashi <an...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16928/
-----------------------------------------------------------

(Updated Jan. 16, 2014, 10:04 p.m.)


Review request for pig, Cheolsoo Park, Daniel Dai, Dmitriy Ryaboy, and Julien Le Dem.


Bugs: PIG-3463
    https://issues.apache.org/jira/browse/PIG-3463


Repository: pig


Description
-------

If pig.auto.local.enabled is set, JCC will modify Configuration of all the jobs with one reducer and input size less than pig.auto.local.input.maxbytes, so that they are forced to run in local mode. Output of local run is also written to hdfs.


Diffs
-----

  trunk/src/org/apache/pig/ExecTypeProvider.java 1558572 
  trunk/src/org/apache/pig/PigConfiguration.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/datastorage/ConfigurationUtil.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigInputFormat.java 1558572 
  trunk/src/org/apache/pig/impl/PigImplConstants.java 1558572 
  trunk/src/org/apache/pig/tools/pigstats/EmbeddedPigStats.java 1558572 
  trunk/src/org/apache/pig/tools/pigstats/PigStats.java 1558572 
  trunk/src/org/apache/pig/tools/pigstats/mapreduce/MRPigStatsUtil.java 1558572 
  trunk/src/org/apache/pig/tools/pigstats/mapreduce/SimplePigStats.java 1558572 
  trunk/test/org/apache/pig/test/TestAutoLocalMode.java PRE-CREATION 

Diff: https://reviews.apache.org/r/16928/diff/


Testing
-------

Tried few scenarios with the patch-
Load small data, group all, count - works in local mode.
Load small data, another small data and replicated join - works in local mode.
Load small data and order by key - all 3 jobs work in local mode and .
Load small data and large data for replicated join - first job runs in local mode, second runs in MR mode.
Load large data and order by key - works in first stages in local mode and last stage in MR mode.


Thanks,

Aniket Mokashi

Re: Review Request 16928: PIG-3463 Pig should use hadoop local mode for small jobs

Posted by Aniket Mokashi <an...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16928/
-----------------------------------------------------------

(Updated Jan. 16, 2014, 10:01 p.m.)


Review request for pig, Cheolsoo Park, Dmitriy Ryaboy, and Julien Le Dem.


Changes
-------

+tests


Bugs: PIG-3463
    https://issues.apache.org/jira/browse/PIG-3463


Repository: pig


Description
-------

If pig.auto.local.enabled is set, JCC will modify Configuration of all the jobs with one reducer and input size less than pig.auto.local.input.maxbytes, so that they are forced to run in local mode. Output of local run is also written to hdfs.


Diffs (updated)
-----

  trunk/src/org/apache/pig/ExecTypeProvider.java 1558572 
  trunk/src/org/apache/pig/PigConfiguration.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/datastorage/ConfigurationUtil.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java 1558572 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigInputFormat.java 1558572 
  trunk/src/org/apache/pig/impl/PigImplConstants.java 1558572 
  trunk/src/org/apache/pig/tools/pigstats/EmbeddedPigStats.java 1558572 
  trunk/src/org/apache/pig/tools/pigstats/PigStats.java 1558572 
  trunk/src/org/apache/pig/tools/pigstats/mapreduce/MRPigStatsUtil.java 1558572 
  trunk/src/org/apache/pig/tools/pigstats/mapreduce/SimplePigStats.java 1558572 
  trunk/test/org/apache/pig/test/TestAutoLocalMode.java PRE-CREATION 

Diff: https://reviews.apache.org/r/16928/diff/


Testing
-------

Tried few scenarios with the patch-
Load small data, group all, count - works in local mode.
Load small data, another small data and replicated join - works in local mode.
Load small data and order by key - all 3 jobs work in local mode and .
Load small data and large data for replicated join - first job runs in local mode, second runs in MR mode.
Load large data and order by key - works in first stages in local mode and last stage in MR mode.


Thanks,

Aniket Mokashi