You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Armin Braun (Jira)" <ji...@apache.org> on 2022/08/08 12:05:00 UTC
[jira] [Created] (LUCENE-10677) Duplicate strings in FieldInfo#attributes contribute significantly to heap usage at scale
Armin Braun created LUCENE-10677:
------------------------------------
Summary: Duplicate strings in FieldInfo#attributes contribute significantly to heap usage at scale
Key: LUCENE-10677
URL: https://issues.apache.org/jira/browse/LUCENE-10677
Project: Lucene - Core
Issue Type: Bug
Components: core/codecs
Affects Versions: 9.3
Reporter: Armin Braun
Attachments: lucene_duplicate_fields.png
This has the same origin as issue LUCENE-10676 . Running a single process with thousands of fields across many indexes will lead to a lot of duplicate strings retained as keys and values in the `attributes` map. This can amount to GBs of heap for thousands of fields across a few thousand segments. The strings in the below heap dump analysis account for more than half (roughly 2/3 and the field names are somewhat unusually long in this example) the duplicate strings from `FieldInfo` instances.
If we could deduplicate theses obvious known strings when reading `FieldInfo` we could save GBs of heap for use cases like this.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org