You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Rohan Garg (Jira)" <ji...@apache.org> on 2022/04/04 21:05:00 UTC

[jira] [Comment Edited] (CALCITE-5074) Allowing parser extensions

    [ https://issues.apache.org/jira/browse/CALCITE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517123#comment-17517123 ] 

Rohan Garg edited comment on CALCITE-5074 at 4/4/22 9:04 PM:
-------------------------------------------------------------

[~julianhyde] : Thanks a lot for the response and the suggestions!

As suggested, I've created https://issues.apache.org/jira/browse/CALCITE-5084 for tracking the ROWS with TABLESAMPLE support. Please let us know whether it should be a part of core or babel parser. I think a case can be made for both since the augmentation is small but is not a part of the SQL standard. Once decided, I can pick it up if it's ok.

 

> It seems to me that the {{TableSampleOpt}} rule would remain in the generated derived parser (e.g. the Babel parser). Is that correct? The rule would be unused, and might be invalid if incompatible changes have been made to other rules or Java code. That might be a problem. But it might be a problem we could live with.

Yes, that is correct. The extended rule would live in the template files of the extender (ie babel parser in calcite or in a template file in the custom config file of a downstream project). Regarding the incompatibility, do you think that it can be the responsibility of the extender to write integration tests with the extension rule to make sure that it works?

 

> My ideal would be to "override" rules in the same way that Java methods override base methods in a derived class. In order to override a rule, you would not need to modify the base parser. I don't think that can be accomplished using Fmpp or any other template system. But maybe JavaCC provides a toolkit so that it would be easy to write a program that parses .jj files, merges grammars, and spit out a new grammar.
 
Yes, I completely agree with the idea of having a Java like 'override'. That was something that [~cheddar] had also suggested. I took a cursory look at the JavaCC documentation and wasn't able to find anything out-of-the box for merging .jj files. Although, I found that the .jj file parsing happens via another JavaCC parser ([https://github.com/javacc/javacc/blob/master/src/main/javacc/JavaCC.jj)] . The state of the tokens and production rules in the .jj file are stored in [JavaCCGlobals.java|https://github.com/javacc/javacc/blob/e6620f210ce182af53d09ca1a556aba82f4cc244/src/main/java/org/javacc/parser/JavaCCGlobals.java]. Probably, that could be used to rewrite the .jj file with an override logic in the production rules. Since it doesn't seem too straightforward, do you think we can start with the if/else approach first and then move to an automatic rewrite later?


was (Author: rohangarg):
[~julianhyde] : Thanks a lot for the response and the suggestions!

As suggested, I've created https://issues.apache.org/jira/browse/CALCITE-5084 for tracking the ROWS with TABLESAMPLE support. Please let us know whether it should be a part of core or babel parser. I think a case can be made for both since the augmentation is small but is not a part of the SQL standard. Once decided, I can pick it up if it's ok.

 

> It seems to me that the {{TableSampleOpt}} rule would remain in the generated derived parser (e.g. the Babel parser). Is that correct? The rule would be unused, and might be invalid if incompatible changes have been made to other rules or Java code. That might be a problem. But it might be a problem we could live with.

Yes, that is correct. The extended rule would live in the template files of the extender (ie babel parser in calcite or in a template file in the custom config file of a downstream project). Regarding the incompatibility, do you think that it can be the responsibility of the extender to write integration tests with the extension rule to make sure that it works?

 

> My ideal would be to "override" rules in the same way that Java methods override base methods in a derived class. In order to override a rule, you would not need to modify the base parser. I don't think that can be accomplished using Fmpp or any other template system. But maybe JavaCC provides a toolkit so that it would be easy to write a program that parses .jj files, merges grammars, and spit out a new grammar.
 
Yes, I completely agree with the idea of having a Java like 'override'. That was something that [~cheddar] had also suggested. I took a cursory look at the JavaCC documentation and wasn't able to find anything out-of-the box for merging .jj files. Although, I found that the .jj file parsing happens via another JavaCC parser ([https://github.com/javacc/javacc/blob/master/src/main/javacc/JavaCC.jj)] . The state of the tokens and production rules in the .jj file are stored in [JavaCCGlobals.java|[https://github.com/javacc/javacc/blob/e6620f210ce182af53d09ca1a556aba82f4cc244/src/main/java/org/javacc/parser/JavaCCGlobals.java]|https://github.com/javacc/javacc/blob/e6620f210ce182af53d09ca1a556aba82f4cc244/src/main/java/org/javacc/parser/JavaCCGlobals.java].] Probably, that could be used to rewrite the .jj file with an override logic in the production rules. Since it doesn't seem too straightforward, do you think we can start with the if/else approach first and then move to an automatic rewrite later?

> Allowing parser extensions
> --------------------------
>
>                 Key: CALCITE-5074
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5074
>             Project: Calcite
>          Issue Type: New Feature
>            Reporter: Rohan Garg
>            Priority: Major
>
> We (I and [~cheddar]) recently had the need to extend various parts of Calcite in a non-standard SQL manner in our project which uses Calcite.  For example, Calcite's TABLESAMPLE keyword always takes a percentage to sample and we wanted to adjust it to also allow for an explicit number of rows.  Given that this is non-standard, we felt that it wouldn't make sense to do a PR that actually impacts Calcite's normal behavior, so we sought a method of extending Calcite to add support for our capabilities while still benefiting from all that Calcite offers.
> We came up with an idea for having an "override map" that can be provided in configuration and will cause the parser to override a specific production rule which can't be categorized in the current extension buckets.  We include a commit link to show the idea in action (https://github.com/rohangarg/calcite/commit/7f0c6ad8c8f6bb2a1d1cca025d35670a8c62b3c4).  It's a bit fiddly in that each override point needs to be extended with if/else template syntax, but that's a pattern that already seems to exist and maybe that's a feature rather than a bug?  In either case, if that's too fiddly, but this general pattern makes sense, a subsequent task could be taken on to try to see if there's another point that this could be added more generically without the need for adding the if/else in the template.  Does this seem like something that could be merged?
> We have included an override grammar that also relaxes the percentage constraints on TABLESAMPLE as an example of the type of customization that this is attempting to enable.
> Just for reference, this customization does exist in other systems as well, so it's non-standard but also not unheard of:
> 1. MS SQL Server : https://docs.microsoft.com/en-us/sql/t-sql/queries/from-transact-sql?view=sql-server-ver15#tablesample-clause
> 2. Snowflake : https://docs.snowflake.com/en/sql-reference/constructs/sample.html
> 3. Google Spanner : https://cloud.google.com/spanner/docs/reference/standard-sql/query-syntax#tablesample_operator
> 4. Apache Spark : https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-sampling.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)