You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2013/12/12 19:08:08 UTC

[jira] [Commented] (AVRO-1411) org.apache.avro.util.Utf8 performance improvement by remove private Charset in class

    [ https://issues.apache.org/jira/browse/AVRO-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13846520#comment-13846520 ] 

Doug Cutting commented on AVRO-1411:
------------------------------------

Please contribute a patch with this change.  Also please provide benchmark results.  Ideally these would use the existing performance suite (lang/java/ipc/src/test/java/org/apache/avro/io/Perf.java).  Once we can validate the performance improvement then we can probably get the change committed.

> org.apache.avro.util.Utf8 performance improvement by remove private Charset in class
> ------------------------------------------------------------------------------------
>
>                 Key: AVRO-1411
>                 URL: https://issues.apache.org/jira/browse/AVRO-1411
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.7.5
>            Reporter: Tie Liu
>            Priority: Minor
>
> Inside org.apache.avro.util.Utf8 class, it has a private member field defined as: private static final Charset UTF8 = Charset.forName("UTF-8");
> and it's used as:
>   public static final byte[] getBytesFor(String str) {
>     return str.getBytes(UTF8);
>   }
> I guess the intention of create this object is to save object creation, but when we dive into the string.getBytes code, when it's called with Charset, it actually create a new StringEncoder in java.lang.StringCoding:
>     static byte[] encode(Charset cs, char[] ca, int off, int len) {
> 	StringEncoder se = new StringEncoder(cs, cs.name());
> 	char[] c = Arrays.copyOf(ca, ca.length);
> 	return se.encode(c, off, len);
>     }
> If instead we just call it with string literal "UTF-8", it will just reuse the threadlocal StringEncoder. 
> We tried overwrite this class with passing string literal and proved those short lived StringEncoder objects is not created any more. Would like apache to fix this so we don't need to overwrite it anymore. 
>  



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)