You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@lucene.apache.org by "Armin Braun (Jira)" <ji...@apache.org> on 2022/08/08 11:24:00 UTC

[jira] [Updated] (LUCENE-10676) FieldInfo#name contributes significantly to heap usage at scale

     [ https://issues.apache.org/jira/browse/LUCENE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Armin Braun updated LUCENE-10676:
---------------------------------
    Attachment: image-2022-08-08-13-23-37-050.png

> FieldInfo#name contributes significantly to heap usage at scale
> ---------------------------------------------------------------
>
>                 Key: LUCENE-10676
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10676
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/codecs
>    Affects Versions: 9.3
>         Environment: Seen in Lucene 9.3.0 running on Linux using JDK18 but seems independent of environment.
>            Reporter: David Turner
>            Priority: Minor
>              Labels: heap, scalability
>         Attachments: image-2022-08-08-13-23-37-050.png
>
>
> We encountered an Elasticsearch user with high heap usage, a significant proportion of which was down to the contents of `FieldInfo#name`.
> This user was certainly pushing some scalability boundaries: this single process had thousands of active Lucene indices, many with 10k+ fields, and many indices had hundreds of segments due to an excess of flushes, so in total they had an enormous number of `FieldInfo` instances. Still, the bulk of the heap usage was just field names, and the total number of distinct field names was fairly small. That's pretty common, especially for time-based data like logs. Some kind of interning or deduplication of these strings would have reduced their heap usage by many GBs.
> Is there a way we could deduplicate these strings? Deduplicating them across segments within each index would already have helped, but ideally we'd like to deduplicate them across indices too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org