You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Julian Reschke (JIRA)" <ji...@apache.org> on 2017/02/06 15:34:41 UTC

[jira] [Commented] (OAK-5506) Segment store apparently doesn't round trip node names with unpaired surrogates

    [ https://issues.apache.org/jira/browse/OAK-5506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15854212#comment-15854212 ] 

Julian Reschke commented on OAK-5506:
-------------------------------------

Segment store already does the conversion to UTF-8 anyway, so I believe it's worthwhile to make that detect garbage -- independently of other considerations.

That said, <http://psy-lob-saw.blogspot.de/2012/12/encode-utf-8-string-to-bytebuffer-faster.html> contains interesting information. On my system, I've been able to match the performance of {{getBytes}} by changing {{encode}} to:

{noformat}
    ThreadLocal<CharsetEncoder> cse = new ThreadLocal<CharsetEncoder>() {
        @Override
        protected CharsetEncoder initialValue() {
            CharsetEncoder e = Charsets.UTF_8.newEncoder();
            e.onUnmappableCharacter(CodingErrorAction.REPORT);
            e.onMalformedInput(CodingErrorAction.REPORT);
            return e;
        }
    };

    private static byte[] bytes(ByteBuffer b) {
        byte[] a = new byte[b.remaining()];
        b.get(a);
        return a;
    }

    private byte[] encode(String in) throws IOException {
        CharsetEncoder e = cse.get();
        e.reset();
        return bytes(e.encode(CharBuffer.wrap(in.toCharArray())));
    }
{noformat}

Note that the more important part might be to {{CharBuffer.wrap()}} a char array instead of the source string, as counter-intuitively that may sound.

> Segment store apparently doesn't round trip node names with unpaired surrogates
> -------------------------------------------------------------------------------
>
>                 Key: OAK-5506
>                 URL: https://issues.apache.org/jira/browse/OAK-5506
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: segment-tar
>    Affects Versions: 1.5.18
>            Reporter: Julian Reschke
>            Assignee: Francesco Mari
>             Fix For: 1.8
>
>         Attachments: OAK-5506-01.patch, OAK-5506-02.patch, ValidNamesTest.java
>
>
> Apparently, the following node name is accepted:
>    {{"foo\ud800"}}
> but a subsequent {{getPath()}} call fails:
> {noformat}
> javax.jcr.InvalidItemStateException: This item [/test_node/foo?] does not exist anymore
>     at org.apache.jackrabbit.oak.jcr.delegate.ItemDelegate.checkAlive(ItemDelegate.java:86)
>     at org.apache.jackrabbit.oak.jcr.session.operation.ItemOperation.checkPreconditions(ItemOperation.java:34)
>     at org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.prePerform(SessionDelegate.java:615)
>     at org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.perform(SessionDelegate.java:205)
>     at org.apache.jackrabbit.oak.jcr.session.ItemImpl.perform(ItemImpl.java:112)
>     at org.apache.jackrabbit.oak.jcr.session.ItemImpl.getPath(ItemImpl.java:140)
>     at org.apache.jackrabbit.oak.jcr.session.NodeImpl.getPath(NodeImpl.java:106)
>     at org.apache.jackrabbit.oak.jcr.ValidNamesTest.nameTest(ValidNamesTest.java:271)
>     at org.apache.jackrabbit.oak.jcr.ValidNamesTest.testUnpairedSurrogate(ValidNamesTest.java:259)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source){noformat}
> (test case follows)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)