You are viewing a plain text version of this content. The canonical link for it is here.
Posted to proton@qpid.apache.org by "Dominic Evans (JIRA)" <ji...@apache.org> on 2014/05/01 19:31:17 UTC
[jira] [Comment Edited] (PROTON-576) proton-j: codec support for
UTF-8 encoding and decoding appears broken?
[ https://issues.apache.org/jira/browse/PROTON-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13986753#comment-13986753 ]
Dominic Evans edited comment on PROTON-576 at 5/1/14 5:30 PM:
--------------------------------------------------------------
Whilst I think that was certainly true in Java 5 and 6. There has been a lot of work in 2009+ to Java 7 that massively improved the encoder+decoder performance.
e.g.,
https://blogs.oracle.com/xuemingshen/entry/faster_new_string_bytes_cs
https://blogs.oracle.com/xuemingshen/entry/the_big_overhaul_of_java
I can knock up a quick benchmark of using Proton-J to loop over encoding and decoding some AmqpValues and capture the values before and after the patch, if that'd be of interest?
was (Author: dnwe):
Whilst I think that was certainly true in Java 5 and 6. There has been a lot of work in 2009+ to Java 7 that massively improved the encoder+decoder performance.
e.g.,
https://blogs.oracle.com/xuemingshen/entry/faster_new_string_bytes_cs
https://blogs.oracle.com/xuemingshen/entry/the_big_overhaul_of_java
I can knock up a quick benchmark of using Proton-C to loop over encoding and decoding some AmqpValues and capture the values before and after the patch, if that'd be of interest?
> proton-j: codec support for UTF-8 encoding and decoding appears broken?
> -----------------------------------------------------------------------
>
> Key: PROTON-576
> URL: https://issues.apache.org/jira/browse/PROTON-576
> Project: Qpid Proton
> Issue Type: Bug
> Components: proton-j
> Affects Versions: 0.7
> Reporter: Dominic Evans
> Attachments: 02_fix_stringtype_encode_decode.patch
>
>
> It seems like Proton-J has its own custom UTF-8 encoder, but relies on Java String's built-in UTF-8 decoder. However, the code doesn't seem quite right and complex double byte UTF-8 like emoji ('📔🚢🍛🍴🍹🏊🏄') can quite easily fail to parse:
> | | Cause:1 :- java.lang.IllegalArgumentException: Cannot parse String
> | | Message:1 :- Cannot parse String
> | | StackTrace:1 :- java.lang.IllegalArgumentException: Cannot parse String
> | | at org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:48)
> | | at org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:36)
> | | at org.apache.qpid.proton.codec.DecoderImpl.readRaw(DecoderImpl.java:945)
> | | at org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:172)
> | | at org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:124)
> | | at org.apache.qpid.proton.codec.DynamicTypeConstructor.readValue(DynamicTypeConstructor.java:39)
> | | at org.apache.qpid.proton.codec.DecoderImpl.readObject(DecoderImpl.java:885)
> | | at org.apache.qpid.proton.message.impl.MessageImpl.decode(MessageImpl.java:629)
--
This message was sent by Atlassian JIRA
(v6.2#6252)