You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Zhe Zhang (JIRA)" <ji...@apache.org> on 2016/09/01 00:03:22 UTC
[jira] [Reopened] (HDFS-10662) Optimize UTF8 string/byte
conversions
[ https://issues.apache.org/jira/browse/HDFS-10662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhe Zhang reopened HDFS-10662:
------------------------------
Sorry to reopen the JIRA. I'm backporting to branch-2.7 and it was quite messy.
[~daryn] [~kihwal] I'd appreciate it if you could take a look.
> Optimize UTF8 string/byte conversions
> -------------------------------------
>
> Key: HDFS-10662
> URL: https://issues.apache.org/jira/browse/HDFS-10662
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: hdfs
> Reporter: Daryn Sharp
> Assignee: Daryn Sharp
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-10662-branch-2.7.00.patch, HDFS-10662.patch, HDFS-10662.patch.1
>
>
> String/byte conversions may take either a Charset instance or its canonical name. One might think a Charset instance would be faster due to avoiding a lookup and instantiation of a Charset, but it's not. The canonical string name variants will cache the string encoder/decoder (obtained from a Charset) resulting in better performance.
> LOG4J2-935 describes a real-world performance boost. I micro-benched a marginal runtime improvement on jdk 7/8. However for a 16 byte path, using the canonical name generated 50% less garbage. For a 64 byte path, 25% of the garbage. Given the sheer number of times that paths are (re)parsed, the cost adds up quickly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org