You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Thiruvalluvan M. G. (JIRA)" <ji...@apache.org> on 2013/06/21 07:08:20 UTC

[jira] [Commented] (AVRO-1348) Improve Utf8 to String conversion

    [ https://issues.apache.org/jira/browse/AVRO-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13690041#comment-13690041 ] 

Thiruvalluvan M. G. commented on AVRO-1348:
-------------------------------------------

The patch seems fine. But it leads to subtle bugs:

- The patch caches the string output in {{toString()}}. Since UTF8 exposes the underlying byte array through {{getBytes()}}, any change made to the contents of the array after first invocation of toString() will not be reflected in the future output of toString(). I don't think there is any simple way to intercept changes to byte array. One way is to do this - (a) don't cache if someone has ever called {{getBytes}} in the past (b) invalidate cache if {{getBytes()}} is called later (c) if Utf8 is constructed using {{Utf8(byte[] bytes)}} do not cache. Hopefully, in the most common cases, byte array is not exposed and hence cache would still work. If all these appear too complicated, we can just drop caching.
- Thread-safety. CharsetDecoder is not thread-safe. If two threads invoke {{toString()}} simultaneously, the behavior is undefined. Thread-safety need to be brought in. I'm not sure how expensive is {{Charset.newDocoder()}}. Since we need to serialize access to {{decode()}}, we can have a single static CharsetDecoder and get some additional performance.

Apart from these, there are some minor coding-style violations.
                
> Improve Utf8 to String conversion
> ---------------------------------
>
>                 Key: AVRO-1348
>                 URL: https://issues.apache.org/jira/browse/AVRO-1348
>             Project: Avro
>          Issue Type: Bug
>            Reporter: Mark Wagner
>            Assignee: Mohammad Kamrul Islam
>         Attachments: AVRO1348v1.patch
>
>
> AVRO-1241 found that the existing method of creating Strings from Utf8 byte arrays could be made faster. The same method is being used in the Utf8.toString(), and could likely be sped up by doing the same thing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira