You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Rohini Palaniswamy <ro...@gmail.com> on 2013/08/13 16:34:08 UTC

Review Request 13535: [PIG-3204] Reduce the number of getSchema calls during script parsing

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13535/
-----------------------------------------------------------

Review request for pig.


Bugs: PIG-3204
    https://issues.apache.org/jira/browse/PIG-3204


Repository: pig


Description
-------

Change parsing from line by line to whole script at once.


Diffs
-----

  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/PigServer.java 1510960 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/tools/grunt/GruntParser.java 1510960 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestGrunt.java 1510960 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPigServer.java 1510960 

Diff: https://reviews.apache.org/r/13535/diff/


Testing
-------

New unit tests added to track the number of times a line is parsed. TestGrunt and TestShortcuts test failures fixed.


Thanks,

Rohini Palaniswamy


Re: Review Request 13535: [PIG-3204] Reduce the number of getSchema calls during script parsing

Posted by Rohini Palaniswamy <ro...@gmail.com>.

> On Aug. 14, 2013, 7:35 p.m., Cheolsoo Park wrote:
> > Can you run the unit tests when you commit it? Thanks!

Have run the full suite of unit tests and e2e tests as it makes change to basic parsing. They pass


- Rohini


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13535/#review25173
-----------------------------------------------------------


On Aug. 13, 2013, 2:34 p.m., Rohini Palaniswamy wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13535/
> -----------------------------------------------------------
> 
> (Updated Aug. 13, 2013, 2:34 p.m.)
> 
> 
> Review request for pig.
> 
> 
> Bugs: PIG-3204
>     https://issues.apache.org/jira/browse/PIG-3204
> 
> 
> Repository: pig
> 
> 
> Description
> -------
> 
> Change parsing from line by line to whole script at once.
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/PigServer.java 1510960 
>   http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/tools/grunt/GruntParser.java 1510960 
>   http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestGrunt.java 1510960 
>   http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPigServer.java 1510960 
> 
> Diff: https://reviews.apache.org/r/13535/diff/
> 
> 
> Testing
> -------
> 
> New unit tests added to track the number of times a line is parsed. TestGrunt and TestShortcuts test failures fixed.
> 
> 
> Thanks,
> 
> Rohini Palaniswamy
> 
>


Re: Review Request 13535: [PIG-3204] Reduce the number of getSchema calls during script parsing

Posted by Cheolsoo Park <pi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13535/#review25173
-----------------------------------------------------------

Ship it!


Can you run the unit tests when you commit it? Thanks!

- Cheolsoo Park


On Aug. 13, 2013, 2:34 p.m., Rohini Palaniswamy wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13535/
> -----------------------------------------------------------
> 
> (Updated Aug. 13, 2013, 2:34 p.m.)
> 
> 
> Review request for pig.
> 
> 
> Bugs: PIG-3204
>     https://issues.apache.org/jira/browse/PIG-3204
> 
> 
> Repository: pig
> 
> 
> Description
> -------
> 
> Change parsing from line by line to whole script at once.
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/PigServer.java 1510960 
>   http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/tools/grunt/GruntParser.java 1510960 
>   http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestGrunt.java 1510960 
>   http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPigServer.java 1510960 
> 
> Diff: https://reviews.apache.org/r/13535/diff/
> 
> 
> Testing
> -------
> 
> New unit tests added to track the number of times a line is parsed. TestGrunt and TestShortcuts test failures fixed.
> 
> 
> Thanks,
> 
> Rohini Palaniswamy
> 
>


Re: Review Request 13535: [PIG-3204] Reduce the number of getSchema calls during script parsing

Posted by Rohini Palaniswamy <ro...@gmail.com>.

> On Aug. 14, 2013, 2:13 a.m., Cheolsoo Park wrote:
> > http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestGrunt.java, line 1083
> > <https://reviews.apache.org/r/13535/diff/2/?file=340613#file340613line1083>
> >
> >     I believe we shouldn't remove xargs. It was added by PIG-3099 to avoid some race condition.

I think it was a mistake. The test fails even before this patch with xargs. The touch command will not produce any output. So it does not make sense to have xargs after a touch.


> On Aug. 14, 2013, 2:13 a.m., Cheolsoo Park wrote:
> > http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPigServer.java, line 776
> > <https://reviews.apache.org/r/13535/diff/2/?file=340614#file340614line776>
> >
> >     This is a bit confusing to me. 10 - 4 + 6 = 12, but numTimesInitiated is set to 10.
> >     
> >     _testSkipParseInRegisterForBatch(false, 10, 4);

It is 10 (4 + 6). Will change the hyphen to full stop so that it is not mistaken as minus.


> On Aug. 14, 2013, 2:13 a.m., Cheolsoo Park wrote:
> > http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPigServer.java, lines 804-805
> > <https://reviews.apache.org/r/13535/diff/2/?file=340614#file340614line804>
> >
> >     If I understand correctly, this is equivalent to calling the followings:
> >     
> >     GruntParser grunt = new GruntParser(in);
> >     grunt.setInteractive(false);
> >     grunt.setParams(pigServer);
> >     grunt.parseStopOnError(false); //batch
> >     
> >     Can you explicitly call them, so it will be easier to identify the difference when skipParseInRegisterForBatch is on and off?

I added a comment. Left it at grunt.exec() as script execution from Main.java is through that call and if anything changes in that method would reflect in this test. 


- Rohini


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13535/#review25098
-----------------------------------------------------------


On Aug. 13, 2013, 2:34 p.m., Rohini Palaniswamy wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13535/
> -----------------------------------------------------------
> 
> (Updated Aug. 13, 2013, 2:34 p.m.)
> 
> 
> Review request for pig.
> 
> 
> Bugs: PIG-3204
>     https://issues.apache.org/jira/browse/PIG-3204
> 
> 
> Repository: pig
> 
> 
> Description
> -------
> 
> Change parsing from line by line to whole script at once.
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/PigServer.java 1510960 
>   http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/tools/grunt/GruntParser.java 1510960 
>   http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestGrunt.java 1510960 
>   http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPigServer.java 1510960 
> 
> Diff: https://reviews.apache.org/r/13535/diff/
> 
> 
> Testing
> -------
> 
> New unit tests added to track the number of times a line is parsed. TestGrunt and TestShortcuts test failures fixed.
> 
> 
> Thanks,
> 
> Rohini Palaniswamy
> 
>


Re: Review Request 13535: [PIG-3204] Reduce the number of getSchema calls during script parsing

Posted by Rohini Palaniswamy <ro...@gmail.com>.

> On Aug. 14, 2013, 2:13 a.m., Cheolsoo Park wrote:
> > http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPigServer.java, line 776
> > <https://reviews.apache.org/r/13535/diff/2/?file=340614#file340614line776>
> >
> >     This is a bit confusing to me. 10 - 4 + 6 = 12, but numTimesInitiated is set to 10.
> >     
> >     _testSkipParseInRegisterForBatch(false, 10, 4);
> 
> Rohini Palaniswamy wrote:
>     It is 10 (4 + 6). Will change the hyphen to full stop so that it is not mistaken as minus.
> 
> Cheolsoo Park wrote:
>     I see. Thanks!
>     
>     Btw, why are there 7 function calls instead of 6? Is this a typo?
>     
>     6 (parseAndBuild, launchPlan->RandomSampleLoader, InputSizeReducerEstimator, getSplits->RandomSampleLoader, createRecordReader->RandomSampleLoader, getSplits, createRecordReader)

I guess it is. Did this long ago that I have forgotten. Will step through and remove the one from comment that is not applicable. 


- Rohini


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13535/#review25098
-----------------------------------------------------------


On Aug. 13, 2013, 2:34 p.m., Rohini Palaniswamy wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13535/
> -----------------------------------------------------------
> 
> (Updated Aug. 13, 2013, 2:34 p.m.)
> 
> 
> Review request for pig.
> 
> 
> Bugs: PIG-3204
>     https://issues.apache.org/jira/browse/PIG-3204
> 
> 
> Repository: pig
> 
> 
> Description
> -------
> 
> Change parsing from line by line to whole script at once.
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/PigServer.java 1510960 
>   http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/tools/grunt/GruntParser.java 1510960 
>   http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestGrunt.java 1510960 
>   http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPigServer.java 1510960 
> 
> Diff: https://reviews.apache.org/r/13535/diff/
> 
> 
> Testing
> -------
> 
> New unit tests added to track the number of times a line is parsed. TestGrunt and TestShortcuts test failures fixed.
> 
> 
> Thanks,
> 
> Rohini Palaniswamy
> 
>


Re: Review Request 13535: [PIG-3204] Reduce the number of getSchema calls during script parsing

Posted by Cheolsoo Park <pi...@gmail.com>.

> On Aug. 14, 2013, 2:13 a.m., Cheolsoo Park wrote:
> > http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestGrunt.java, line 1083
> > <https://reviews.apache.org/r/13535/diff/2/?file=340613#file340613line1083>
> >
> >     I believe we shouldn't remove xargs. It was added by PIG-3099 to avoid some race condition.
> 
> Rohini Palaniswamy wrote:
>     I think it was a mistake. The test fails even before this patch with xargs. The touch command will not produce any output. So it does not make sense to have xargs after a touch.

You're right. Thanks for the clarification!


> On Aug. 14, 2013, 2:13 a.m., Cheolsoo Park wrote:
> > http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPigServer.java, line 776
> > <https://reviews.apache.org/r/13535/diff/2/?file=340614#file340614line776>
> >
> >     This is a bit confusing to me. 10 - 4 + 6 = 12, but numTimesInitiated is set to 10.
> >     
> >     _testSkipParseInRegisterForBatch(false, 10, 4);
> 
> Rohini Palaniswamy wrote:
>     It is 10 (4 + 6). Will change the hyphen to full stop so that it is not mistaken as minus.

I see. Thanks!

Btw, why are there 7 function calls instead of 6? Is this a typo?

6 (parseAndBuild, launchPlan->RandomSampleLoader, InputSizeReducerEstimator, getSplits->RandomSampleLoader, createRecordReader->RandomSampleLoader, getSplits, createRecordReader)


> On Aug. 14, 2013, 2:13 a.m., Cheolsoo Park wrote:
> > http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPigServer.java, lines 804-805
> > <https://reviews.apache.org/r/13535/diff/2/?file=340614#file340614line804>
> >
> >     If I understand correctly, this is equivalent to calling the followings:
> >     
> >     GruntParser grunt = new GruntParser(in);
> >     grunt.setInteractive(false);
> >     grunt.setParams(pigServer);
> >     grunt.parseStopOnError(false); //batch
> >     
> >     Can you explicitly call them, so it will be easier to identify the difference when skipParseInRegisterForBatch is on and off?
> 
> Rohini Palaniswamy wrote:
>     I added a comment. Left it at grunt.exec() as script execution from Main.java is through that call and if anything changes in that method would reflect in this test.

Sure, sounds good.


- Cheolsoo


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13535/#review25098
-----------------------------------------------------------


On Aug. 13, 2013, 2:34 p.m., Rohini Palaniswamy wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13535/
> -----------------------------------------------------------
> 
> (Updated Aug. 13, 2013, 2:34 p.m.)
> 
> 
> Review request for pig.
> 
> 
> Bugs: PIG-3204
>     https://issues.apache.org/jira/browse/PIG-3204
> 
> 
> Repository: pig
> 
> 
> Description
> -------
> 
> Change parsing from line by line to whole script at once.
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/PigServer.java 1510960 
>   http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/tools/grunt/GruntParser.java 1510960 
>   http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestGrunt.java 1510960 
>   http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPigServer.java 1510960 
> 
> Diff: https://reviews.apache.org/r/13535/diff/
> 
> 
> Testing
> -------
> 
> New unit tests added to track the number of times a line is parsed. TestGrunt and TestShortcuts test failures fixed.
> 
> 
> Thanks,
> 
> Rohini Palaniswamy
> 
>


Re: Review Request 13535: [PIG-3204] Reduce the number of getSchema calls during script parsing

Posted by Cheolsoo Park <pi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13535/#review25098
-----------------------------------------------------------


Overall looks good. I have few minor comments.


http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestGrunt.java
<https://reviews.apache.org/r/13535/#comment49348>

    I believe we shouldn't remove xargs. It was added by PIG-3099 to avoid some race condition.



http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPigServer.java
<https://reviews.apache.org/r/13535/#comment49349>

    This is a bit confusing to me. 10 - 4 + 6 = 12, but numTimesInitiated is set to 10.
    
    _testSkipParseInRegisterForBatch(false, 10, 4);



http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPigServer.java
<https://reviews.apache.org/r/13535/#comment49351>

    If I understand correctly, this is equivalent to calling the followings:
    
    GruntParser grunt = new GruntParser(in);
    grunt.setInteractive(false);
    grunt.setParams(pigServer);
    grunt.parseStopOnError(false); //batch
    
    Can you explicitly call them, so it will be easier to identify the difference when skipParseInRegisterForBatch is on and off?


- Cheolsoo Park


On Aug. 13, 2013, 2:34 p.m., Rohini Palaniswamy wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13535/
> -----------------------------------------------------------
> 
> (Updated Aug. 13, 2013, 2:34 p.m.)
> 
> 
> Review request for pig.
> 
> 
> Bugs: PIG-3204
>     https://issues.apache.org/jira/browse/PIG-3204
> 
> 
> Repository: pig
> 
> 
> Description
> -------
> 
> Change parsing from line by line to whole script at once.
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/PigServer.java 1510960 
>   http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/tools/grunt/GruntParser.java 1510960 
>   http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestGrunt.java 1510960 
>   http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPigServer.java 1510960 
> 
> Diff: https://reviews.apache.org/r/13535/diff/
> 
> 
> Testing
> -------
> 
> New unit tests added to track the number of times a line is parsed. TestGrunt and TestShortcuts test failures fixed.
> 
> 
> Thanks,
> 
> Rohini Palaniswamy
> 
>