You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by "Felix Meschberger (JIRA)" <ji...@apache.org> on 2012/09/28 11:29:08 UTC

[jira] [Created] (SLING-2609) Support non-ASCII based languages for node name generation

Felix Meschberger created SLING-2609:
----------------------------------------

             Summary: Support non-ASCII based languages for node name generation
                 Key: SLING-2609
                 URL: https://issues.apache.org/jira/browse/SLING-2609
             Project: Sling
          Issue Type: Improvement
          Components: Servlets
    Affects Versions: Servlets Post 2.1.2
            Reporter: Felix Meschberger
            Assignee: Felix Meschberger


The Sling POST Servlet has built-in support to automatically generate names for newly generated resources based of some name hint or the value of some select properties.

Such name hints are filtered in a very crude way, though:
  * the string is converted to lower case
  * only ascii letters and digits supported
  * non-accepted characters replaced by underscore (_)

This leads to the following problems:
  * Non-BMP (surrogate) Unicode characters are converted to just underscore
  * Words separated by whitespace (e.g. the title "My Brand new Page" are now separated by underscore instead of dash (-) which may lead to indexing problems (see http://www.youtube.com/watch?v=AQcSFsQyct8)

This all happens in the NodeNameFilter class.

I suggest we change this as follows:

* Operate on code points instead (int type) of just characters (char type)
* Accept all characters valid for JCR names. This is all Unicode characters except { ., /, :, [, ], *, ', ", | }. These characters are replaced by underscore
* Convert all white space characters (Character.isWhitespace(int)) by dash
* Convert all other characters to lower case (Character.toLowerCase(int))
* Consecutive dash and underscore characters folded into just one

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (SLING-2609) Support non-ASCII based languages for node name generation

Posted by "Felix Meschberger (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SLING-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Felix Meschberger updated SLING-2609:
-------------------------------------

    Attachment: NodeNameFilter.java

Proposed changes to the NodeNameFilter.

This change also includes handling the maximum length for the generated name in the filter itself to limit the actual work done.

In the end, I also think the NodeNameFilter class should be integrated into the DefaultNodeNameGenerator instead of being a separate single-method class.
                
> Support non-ASCII based languages for node name generation
> ----------------------------------------------------------
>
>                 Key: SLING-2609
>                 URL: https://issues.apache.org/jira/browse/SLING-2609
>             Project: Sling
>          Issue Type: Improvement
>          Components: Servlets
>    Affects Versions: Servlets Post 2.1.2
>            Reporter: Felix Meschberger
>            Assignee: Felix Meschberger
>         Attachments: NodeNameFilter.java
>
>
> The Sling POST Servlet has built-in support to automatically generate names for newly generated resources based of some name hint or the value of some select properties.
> Such name hints are filtered in a very crude way, though:
>   * the string is converted to lower case
>   * only ascii letters and digits supported
>   * non-accepted characters replaced by underscore (_)
> This leads to the following problems:
>   * Non-BMP (surrogate) Unicode characters are converted to just underscore
>   * Words separated by whitespace (e.g. the title "My Brand new Page" are now separated by underscore instead of dash (-) which may lead to indexing problems (see http://www.youtube.com/watch?v=AQcSFsQyct8)
> This all happens in the NodeNameFilter class.
> I suggest we change this as follows:
> * Operate on code points instead (int type) of just characters (char type)
> * Accept all characters valid for JCR names. This is all Unicode characters except { ., /, :, [, ], *, ', ", | }. These characters are replaced by underscore
> * Convert all white space characters (Character.isWhitespace(int)) by dash
> * Convert all other characters to lower case (Character.toLowerCase(int))
> * Consecutive dash and underscore characters folded into just one

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (SLING-2609) Support non-ASCII based languages for node name generation

Posted by "Felix Meschberger (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SLING-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Felix Meschberger updated SLING-2609:
-------------------------------------

    Attachment: SLING-2609.patch

Proposed patch:
  * Implements new filtering
  * Folds NodeNameFilter into DefaultNodeNameGenerator
                
> Support non-ASCII based languages for node name generation
> ----------------------------------------------------------
>
>                 Key: SLING-2609
>                 URL: https://issues.apache.org/jira/browse/SLING-2609
>             Project: Sling
>          Issue Type: Improvement
>          Components: Servlets
>    Affects Versions: Servlets Post 2.1.2
>            Reporter: Felix Meschberger
>            Assignee: Felix Meschberger
>         Attachments: NodeNameFilter.java, SLING-2609.patch
>
>
> The Sling POST Servlet has built-in support to automatically generate names for newly generated resources based of some name hint or the value of some select properties.
> Such name hints are filtered in a very crude way, though:
>   * the string is converted to lower case
>   * only ascii letters and digits supported
>   * non-accepted characters replaced by underscore (_)
> This leads to the following problems:
>   * Non-BMP (surrogate) Unicode characters are converted to just underscore
>   * Words separated by whitespace (e.g. the title "My Brand new Page" are now separated by underscore instead of dash (-) which may lead to indexing problems (see http://www.youtube.com/watch?v=AQcSFsQyct8)
> This all happens in the NodeNameFilter class.
> I suggest we change this as follows:
> * Operate on code points instead (int type) of just characters (char type)
> * Accept all characters valid for JCR names. This is all Unicode characters except { ., /, :, [, ], *, ', ", | }. These characters are replaced by underscore
> * Convert all white space characters (Character.isWhitespace(int)) by dash
> * Convert all other characters to lower case (Character.toLowerCase(int))
> * Consecutive dash and underscore characters folded into just one

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira