You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Alexander Klimetschek (JIRA)" <ji...@apache.org> on 2016/09/28 00:26:22 UTC

[jira] [Commented] (OAK-4857) Support spaces common in CJK inside node names

    [ https://issues.apache.org/jira/browse/OAK-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15527901#comment-15527901 ] 

Alexander Klimetschek commented on OAK-4857:
--------------------------------------------

To be exact, the entire [space separator "Zs" category|http://www.fileformat.info/info/unicode/category/Zs/list.htm], with the exception of the "normal" {{u20}} space, is affected. For regular {{u20}} spaces this is reversed, these are not allowed at the beginning or end, while allowed in the middle. Whitespace such as tabs or newlines are not allowed anywhere, since OAK-3412.

Note the {{oak.allowOtherWhitespaceChars}} setting introduced in OAK-3412 does not make a difference, setting it to true gives the pre-1.4 behavior, which actually allowed _whitespace_ such as newlines everywhere, while it still prevents all Zs spaces.

For reference, Jackrabbit 2 seems to have the same behavior as Oak post OAK-3412.

See also [this similar comment|https://issues.apache.org/jira/browse/OAK-3412?focusedCommentId=14991336&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14991336] by [~mevinay].

Technically, things can be slightly confusing in Java with the meaning of [Character.isSpaceChar()|https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#isSpaceChar(char)] vs. [Character.isWhitespace()|https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#isWhitespace(char)]. The latter includes the former for the most part, but with a few exceptions, and then adds all the newline etc. whitespace chars. I would be arguing for treating all characters of {{Character.isSpaceChar()}} like the normal {{u20}} space.

> Support spaces common in CJK inside node names
> ----------------------------------------------
>
>                 Key: OAK-4857
>                 URL: https://issues.apache.org/jira/browse/OAK-4857
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.4.7, 1.5.10
>            Reporter: Alexander Klimetschek
>
> Oak does not allow spaces commonly used in CJK like {{u3000}} (ideographic space) or {{u00A0}} (no-break space) _inside_ a node name, while allowing them at the _beginning or end_. They should be supported for better globalization readiness, and filesystems allow them, making common filesystem to JCR mappings unnecessarily hard.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)