You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Jonathan Coveney (Created) (JIRA)" <ji...@apache.org> on 2012/03/16 05:13:48 UTC

[jira] [Created] (PIG-2597) Move grunt from javacc to ANTRL

Move grunt from javacc to ANTRL
-------------------------------

                 Key: PIG-2597
                 URL: https://issues.apache.org/jira/browse/PIG-2597
             Project: Pig
          Issue Type: Improvement
            Reporter: Jonathan Coveney


Currently, the parser for queries is in ANTLR, but Grunt is still javacc. The parser is very difficult to work with, and next to impossible to understand or modify. ANTLR provides a much cleaner, more standard way to generate parsers/lexers/ASTs/etc, and moving from javacc to Grunt would be huge as we continue to add features to Pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2597) Move grunt from javacc to ANTRL

Posted by "Jonathan Coveney (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Coveney updated PIG-2597:
----------------------------------

    Description: 
Currently, the parser for queries is in ANTLR, but Grunt is still javacc. The parser is very difficult to work with, and next to impossible to understand or modify. ANTLR provides a much cleaner, more standard way to generate parsers/lexers/ASTs/etc, and moving from javacc to Grunt would be huge as we continue to add features to Pig.

This is a candidate project for Google summer of code 2012. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2012

  was:Currently, the parser for queries is in ANTLR, but Grunt is still javacc. The parser is very difficult to work with, and next to impossible to understand or modify. ANTLR provides a much cleaner, more standard way to generate parsers/lexers/ASTs/etc, and moving from javacc to Grunt would be huge as we continue to add features to Pig.

    
> Move grunt from javacc to ANTRL
> -------------------------------
>
>                 Key: PIG-2597
>                 URL: https://issues.apache.org/jira/browse/PIG-2597
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>              Labels: GSoC2012
>
> Currently, the parser for queries is in ANTLR, but Grunt is still javacc. The parser is very difficult to work with, and next to impossible to understand or modify. ANTLR provides a much cleaner, more standard way to generate parsers/lexers/ASTs/etc, and moving from javacc to Grunt would be huge as we continue to add features to Pig.
> This is a candidate project for Google summer of code 2012. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2597) Move grunt from javacc to ANTRL

Posted by "Boski Shah (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Boski Shah updated PIG-2597:
----------------------------

    Attachment: pig02.diff

I have modified pig grunt code to use ANTLR for a some grunt commands (CAT, HELP and QUIT). I have attached the diff file for your review. Please find more details about the changes below.

I have the basic code working, but I still think it is just the first draft. I would be refining and cleaning code as I proceed further. but before I do that, I want to make sure that I am heading in the right direction. Can you please take a look at the code and let me know if you see any issues with my approach?


Approach:

    Enhanced existing grammar: Instead of creating new grammar as I suggested earlier, I ended up modifying existing grammars to add grunt commands. i.e. I have modified Query{Lexer, Parser}.g, ASTValidator.g and LogicalPlanGenerator.g to support these commands. After trying various approaches including new grammer, enhanced existing grammar with changes in PigServer to support grunt commands etc. I think this is the cleanest approach. You had also suggested this as the preferred option as well.

    Deprecated GruntParser: I have depcrecated GruntParser. To replace that, I have created a new class 'GruntDriver'. Grunt.java now uses this new class instead.
        GruntDriver works in interactive as well as batch mode.
        GruntDriver.process method is similar to what GruntParser.parseStopOnError() does.
        process method first uses the grammar to parse the input stream (parsing code is identical to QueryParserDriver) and creates the tree.
        process method then traverses the tree: every time it comes across a grunt command's node, it executes it immediately. For all pig query nodes, GruntDriver delegates the work to PigServer by calling its registerQuery method.

    Retain the original input text:
        One caveat I encountered was that PigServer.registerQuery expects raw pig query string as input. Whereas, after AST generation, GruntDriver does not have the raw input anymore. I did consider modifying PigServer code to see if it can take the tree as input. But that change seemed way to intrusive. and also since PigServer is public interface, I do not feel comfortable it having an API that takes AST node.
        so, instead I modified grammar such that it retains the original input string as one of the children for all statement. for example general_statement in QueryParser.g now has an additional child TEXT[$general_statement.text]. this child value is then used by GruntDriver to pass the original input to PigServer.registerQuery.

Open Items:

    Add all commands: I have added only some commands in GruntDriver. I am working on adding many more at this time. I expect many of them to be trivial to add such as cd, cp etc. And some would require more work such as explain, run and exec.

    Secondary Prompt: With this new implementation, the secondary prompt in interactive mode does not work. i.e. existing pig gives a different kind of prompt (">>") if the statement provided through the grunt shell is incomplete. with my changes, it gives the error saying that input was invalid. I am not sure how critical it is to support such secondary prompts. I have a few ideas about how to support it, but I believe it requires lot of efforts and code changes in the grammar. So, before I start on that, I just want to understand how critical it is to retain that feature.


                
> Move grunt from javacc to ANTRL
> -------------------------------
>
>                 Key: PIG-2597
>                 URL: https://issues.apache.org/jira/browse/PIG-2597
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>              Labels: GSoC2012
>         Attachments: pig02.diff
>
>
> Currently, the parser for queries is in ANTLR, but Grunt is still javacc. The parser is very difficult to work with, and next to impossible to understand or modify. ANTLR provides a much cleaner, more standard way to generate parsers/lexers/ASTs/etc, and moving from javacc to Grunt would be huge as we continue to add features to Pig.
> This is a candidate project for Google summer of code 2012. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2597) Move grunt from javacc to ANTRL

Posted by "Russell Jurney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471948#comment-13471948 ] 

Russell Jurney commented on PIG-2597:
-------------------------------------

Jonathan, any update on this?
                
> Move grunt from javacc to ANTRL
> -------------------------------
>
>                 Key: PIG-2597
>                 URL: https://issues.apache.org/jira/browse/PIG-2597
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>              Labels: GSoC2012
>         Attachments: pig02.diff
>
>
> Currently, the parser for queries is in ANTLR, but Grunt is still javacc. The parser is very difficult to work with, and next to impossible to understand or modify. ANTLR provides a much cleaner, more standard way to generate parsers/lexers/ASTs/etc, and moving from javacc to Grunt would be huge as we continue to add features to Pig.
> This is a candidate project for Google summer of code 2012. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira