You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Robert Kruszewski (JIRA)" <ji...@apache.org> on 2018/03/29 12:09:00 UTC
[jira] [Created] (PARQUET-1261) Parquet-format interns strings when
reading filemetadata
Robert Kruszewski created PARQUET-1261:
------------------------------------------
Summary: Parquet-format interns strings when reading filemetadata
Key: PARQUET-1261
URL: https://issues.apache.org/jira/browse/PARQUET-1261
Project: Parquet
Issue Type: Bug
Affects Versions: 1.9.0
Reporter: Robert Kruszewski
Parquet-format when deserializing metadata will intern strings. References I could find suggested that it had been done to reduce memory pressure early on. Java (and jvm in particular) went a long way since then and interning is generally discouraged, see [https://shipilev.net/jvm-anatomy-park/10-string-intern/] for a good explanation. What is more since java 8 there's string deduplication implemented at GC level per [http://openjdk.java.net/jeps/192.] During our usage and testing we found the interning to cause significant gc pressure for long running applications due to bigger GC root set.
This issue proposes removing interning given it's questionable whether it should be used in modern jvms.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)