You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Adrien Grand (Jira)" <ji...@apache.org> on 2021/10/25 07:59:00 UTC

[jira] [Created] (LUCENE-10203) Improve reuse of StringTokenStream

Adrien Grand created LUCENE-10203:
-------------------------------------

             Summary: Improve reuse of StringTokenStream
                 Key: LUCENE-10203
                 URL: https://issues.apache.org/jira/browse/LUCENE-10203
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Adrien Grand


This issue is a follow-up to https://lists.apache.org/thread.html/rdcc6bd085a0e8ac6db22a1ef7dd3228197481b62bec3c6fe4972e50a%40%3Cdev.lucene.apache.org%3E.

StringField has a different mechanism for reusing token streams compared to TextField: while TextField relies on {{Analyzer#reuseStrategy}} to reuse token streams across inputs, StringField relies on IndexingChain passing the previously produced token stream as the `reuse` parameter of {{IndexableField#tokenStream}}. However one downside of this approach is that it can only reuse token streams within a single segment. And some nightly profiles suggest that not reusing across segments still gives room for attribute initialization to be a hotspot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org