You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Darrel Schneider (JIRA)" <ji...@apache.org> on 2019/04/03 20:14:00 UTC

[jira] [Commented] (GEODE-6579) Creating a String during deserialization could be optimized

    [ https://issues.apache.org/jira/browse/GEODE-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16809206#comment-16809206 ] 

Darrel Schneider commented on GEODE-6579:
-----------------------------------------

As of java 9 using reflection to directly set the char[] is a non-starter. Jdk 9 has changed the "value" field on String from a char[] to a byte[]. By default if the JVM is using a Latin character set then each character is stored as a single byte.

We also discussed using the package protected String(char[], boolean) constructor which is called by newStringUnsafe(char[]). In fact using "newStringUnsafe" would have been the way to do this optimization for jdk 8. But that method in jdk9 ends up copying the char[] into a byte[].

Our old code that used the deprecated String(byte[]) constructor is probably the fastest way to create a String instance in jdk9 since the amount of garbage produced by it will be half that of those who call String(char[], boolean).

Given how much the internal implementation of String changed from jdk8 to jdk9 I think the suggested optimization is a bad idea.

We have also considered trying to avoid garbage creation by instead having a long lived byte[] that we reuse each time. Given that this byte[] has a very short life, and will always be less than 65k in size, I'm not convinced that we should try to avoid this garbage creation. 

> Creating a String during deserialization could be optimized
> -----------------------------------------------------------
>
>                 Key: GEODE-6579
>                 URL: https://issues.apache.org/jira/browse/GEODE-6579
>             Project: Geode
>          Issue Type: Improvement
>          Components: serialization
>            Reporter: Darrel Schneider
>            Assignee: Darrel Schneider
>            Priority: Major
>              Labels: optimization
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> When creating a string during deserialization from data that we know is in the ASCII character set (each character can be represented by one byte) we currently read all the bytes into a temporary byte array and then create a String instance by giving it that byte array. The String constructor has to create its own char array and then copy all the bytes into it. After that the byte array is garbage.
> We could instead directly create a char array, fill it by reading each byte from the DataInput into it and then using reflection to directly set this char array as the value field of the String instance we just created (as an empty String). This prevents an extra copy of the data and reduces garbage creation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)