You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@groovy.apache.org by "Isaac Dooley (JIRA)" <ji...@apache.org> on 2019/07/29 18:20:00 UTC

[jira] [Commented] (GROOVY-8625) Groovy Lexer does not accept UTF-8 characters like ° or § ... and a lot more

    [ https://issues.apache.org/jira/browse/GROOVY-8625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16895500#comment-16895500 ] 

Isaac Dooley commented on GROOVY-8625:
--------------------------------------

I've run into an issue that matches the title of this issue, but has nothing to do with the discussion of DSLs. Specifically, in 3.0.0b2, the lexer doesn't allow some valid characters to be part of an identifier.

The \u2040 character for example ought to be considered a valid character in an identifier because the [https://groovy-lang.org/syntax.html] says it should be, and also the Java Character class claims it should be:
{code:java}
assert Character.isJavaIdentifierPart("\u2040" as char){code}

Does the new grammar intentionally exclude '\u0100' to '\uFFFE' for some reason?

> Groovy Lexer does not accept UTF-8 characters like ° or § ... and a lot more
> ----------------------------------------------------------------------------
>
>                 Key: GROOVY-8625
>                 URL: https://issues.apache.org/jira/browse/GROOVY-8625
>             Project: Groovy
>          Issue Type: Bug
>          Components: Compiler
>    Affects Versions: 2.5.0
>            Reporter: Alexander Klein
>            Priority: Major
>              Labels: compiler, grammar, lexer
>
> The grammar uses a similar specification for LETTERs as the old Java-grammar. By intention most UTF-8 characters should possible to use for names to enable localization in languages using non-latin characters. This is especially important for DSLs.
> Ast-transformations will take place after the Lexer. With the Lexer accepting his characters, ast-transformations are now able to handle more things like creating custom operators and so on.
> This is a problem only for ANTLR 2.
> ANTLR 4 is only missing the '#'-sign.
> This maybe introduces a breaking change, because GStrings like "$first#$second" worked in the past, and now will not anymore. Before this change, "$first#" is interpreted as the value of the variable first plus a '#' sign. Now it is interpreted as the value of the variable first#.
> This, of cause, is a problem for all newly added letters. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)