You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@groovy.apache.org by "Alexander Klein (JIRA)" <ji...@apache.org> on 2018/06/01 12:07:00 UTC

[jira] [Updated] (GROOVY-8625) Groovy Lexer does not accept UTF-8 characters like ° or § ... and a lot more

     [ https://issues.apache.org/jira/browse/GROOVY-8625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alexander Klein updated GROOVY-8625:
------------------------------------
    Description: 
The grammar uses a similar specification for LETTERs as the old Java-grammar. By intention most UTF-8 characters should possible to use for names to enable localization in languages using non-latin characters. This is especially important for DSLs.

Ast-transformations will take place after the Lexer. With the Lexer accepting his characters, ast-transformations are now able to handle more things like creating custom operators and so on.

This is a problem only for ANTLR 2.

ANTLR 4 is only missing the '#'-sign.

  was:
The grammar uses a similar specification for LETTERs as the old Java-grammar. By intention most UTF-8 characters should possible to use for names to enable localization in languages using non-latin characters. This is especially important for DSLs.

Ast-transformations will take place after the Lexer. With the Lexer accepting his characters, ast-transformations are now able to handle more things like creating custom operators and so on.

This is a problem for ANTLR 2 and I do not know if it the same with ANTLR 4.


> Groovy Lexer does not accept UTF-8 characters like ° or § ... and a lot more
> ----------------------------------------------------------------------------
>
>                 Key: GROOVY-8625
>                 URL: https://issues.apache.org/jira/browse/GROOVY-8625
>             Project: Groovy
>          Issue Type: Bug
>          Components: Compiler
>    Affects Versions: 2.5.0
>            Reporter: Alexander Klein
>            Priority: Major
>              Labels: compiler, grammar, lexer
>
> The grammar uses a similar specification for LETTERs as the old Java-grammar. By intention most UTF-8 characters should possible to use for names to enable localization in languages using non-latin characters. This is especially important for DSLs.
> Ast-transformations will take place after the Lexer. With the Lexer accepting his characters, ast-transformations are now able to handle more things like creating custom operators and so on.
> This is a problem only for ANTLR 2.
> ANTLR 4 is only missing the '#'-sign.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)