You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Julian Reschke (JIRA)" <ji...@apache.org> on 2017/01/25 14:17:26 UTC

[jira] [Comment Edited] (OAK-5506) Segment store apparently doesn't round trip node names with unpaired surrogates

    [ https://issues.apache.org/jira/browse/OAK-5506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15837784#comment-15837784 ] 

Julian Reschke edited comment on OAK-5506 at 1/25/17 2:16 PM:
--------------------------------------------------------------

{{o.a.j.o.segment.SegmentWriter.SegmentWriteOperation#writeString}} doesn't seem to be involved (set a breakpoint, didn't get there).

FWIW; this (or something like this) is the code that would need to be added:
{noformat}
    private static void checkValidString(String s) throws IOException {
        for (int i = 0; i < s.length(); i++) {
            char c1 = s.charAt(i);
            if (Character.isSurrogate(c1)) {
                try {
                    char c2 = s.charAt(i + 1);
                    if (Character.isSurrogatePair(c1, c2)) {
                        // proceed
                        i += 1;
                    } else {
                        throw new IOException("Invalid surrogate pair sequence: " + (int) c1 + " " + (int) c2);
                    }
                } catch (IndexOutOfBoundsException ex) {
                    throw new IOException("String ends in unpaired surrogate character.", ex);
                }
            }
        }
    }
{noformat}

So, in general a single pass checking every char in the string.

[~mduerig]: agreed, but if we want to reject these values, then we'll have to detect them, right? Thinking of it, the cost would be smaller if we did it in a place where we have to parse the name already (tha is, in the jcr layer). 


was (Author: reschke):
{{o.a.j.o.segment.SegmentWriter.SegmentWriteOperation#writeString}} doesn't seem to be involved (set a breakpoint, didn't get there).

FWIW; this (or something like this) is the code that would need to be added:
{noformat}
    private static void checkValidString(String s) throws IOException {
        for (int i = 0; i < s.length(); i++) {
            char c1 = s.charAt(i);
            if (Character.isSurrogate(c1)) {
                try {
                    char c2 = s.charAt(i + 1);
                    if (Character.isSurrogatePair(c1, c2)) {
                        // proceed
                        i += 1;
                    } else {
                        throw new IOException("Invalid surrogate pair sequence: " + (int) c1 + " " + (int) c2);
                    }
                } catch (IndexOutOfBoundsException ex) {
                    throw new IOException("String ends in unpaired surrogate character.", ex);
                }
            }
        }
    }
{noformat}

So, in general a single pass checking every char in the string.

[~mduerig]: agreed, but if we want to reject these values, then we'll have to detect them, right?

> Segment store apparently doesn't round trip node names with unpaired surrogates
> -------------------------------------------------------------------------------
>
>                 Key: OAK-5506
>                 URL: https://issues.apache.org/jira/browse/OAK-5506
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: segment-tar
>    Affects Versions: 1.5.18
>            Reporter: Julian Reschke
>            Assignee: Francesco Mari
>         Attachments: ValidNamesTest.java
>
>
> Apparently, the following node name is accepted:
>    {{"foo\ud800"}}
> but a subsequent {{getPath()}} call fails:
> {noformat}
> javax.jcr.InvalidItemStateException: This item [/test_node/foo?] does not exist anymore
>     at org.apache.jackrabbit.oak.jcr.delegate.ItemDelegate.checkAlive(ItemDelegate.java:86)
>     at org.apache.jackrabbit.oak.jcr.session.operation.ItemOperation.checkPreconditions(ItemOperation.java:34)
>     at org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.prePerform(SessionDelegate.java:615)
>     at org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.perform(SessionDelegate.java:205)
>     at org.apache.jackrabbit.oak.jcr.session.ItemImpl.perform(ItemImpl.java:112)
>     at org.apache.jackrabbit.oak.jcr.session.ItemImpl.getPath(ItemImpl.java:140)
>     at org.apache.jackrabbit.oak.jcr.session.NodeImpl.getPath(NodeImpl.java:106)
>     at org.apache.jackrabbit.oak.jcr.ValidNamesTest.nameTest(ValidNamesTest.java:271)
>     at org.apache.jackrabbit.oak.jcr.ValidNamesTest.testUnpairedSurrogate(ValidNamesTest.java:259)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source){noformat}
> (test case follows)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)