You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Uwe Schindler (JIRA)" <ji...@apache.org> on 2014/05/02 14:46:17 UTC

[jira] [Comment Edited] (LUCENE-5634) Reuse TokenStream instances in Field

    [ https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987645#comment-13987645 ] 

Uwe Schindler edited comment on LUCENE-5634 at 5/2/14 12:45 PM:
----------------------------------------------------------------

bq. but it's trickier since the precStep is final (maybe we can un-final it and add a setter?)

Please don't do this. It is maybe better to do it like in Elasticsearch: Have a pool of NTS for each precision step.

bq. this optimization has proven to help a lot in the context of ES, but we can use a static thread local since we are fully in control of the threading model. With Lucene itself, where it can be used in many different environment, then this can cause some unexpected behavior. For example, this might cause Tomcat to warn on leaking resources when unloading a war.

Thanks Shay: This is really the reason why we always refused to use static (!) ThreadLocals in Lucene, especially for those heavy used components.

Maybe we can do a similar thing like with StringField in Mike's patch. Its a bit crazy to move out the TokenStreams from the field, but we can do this for performance here. Just have a lazy init pool of NumericTokenStreams for each precisionStep in each per thread DocumentsWriter (DefaultIndexingChain).

-1 to add thread locals in Lucene here!

Another idea how to manage the pools: Maybe add a protected method to Field that can get the DocumentsWriter instance and add some caching functionality for arbitrary TokenStreams (not just NumericTS or StringTS): Maybe some method on the per thread DocumentsWriter to set aTokenStream for reuse per field. The field (also custom ones) then could use setCachedTokenStream/getCachedTokenStream through the DocumentsWriter accessor from inside the Field.


was (Author: thetaphi):
bq. but it's trickier since the precStep is final (maybe we can un-final it and add a setter?)

Please donÄt do this. It is maybe better to do it like in Elasticsearch: Have a pool of NTS for each precision step.

bq. this optimization has proven to help a lot in the context of ES, but we can use a static thread local since we are fully in control of the threading model. With Lucene itself, where it can be used in many different environment, then this can cause some unexpected behavior. For example, this might cause Tomcat to warn on leaking resources when unloading a war.

Thanks Shay: This is really the reason why we always refused to use static (!) ThreadLocals in Lucene, especially for those heavy used components.

Maybe we can do a similar thing like with StringField in Mike's patch. Its a bit crazy to move out the TokenStreams from the field, but we can do this for performance here. Just have a lazy init pool of NumericTokenStreams for each precisionStep in each per thread DocumentsWriter (DefaultIndexingChain).

-1 to add thread locals in Lucene here!

Another idea how to manage the pools: Maybe add a protected method to Field that can get the DocumentsWriter instance and add some caching functionality for arbitrary TokenStreams (not just NumericTS or StringTS): Maybe some method on the per thread DocumentsWriter to set aTokenStream for reuse per field. The field (also custom ones) then could use setCachedTokenStream/getCachedTokenStream through the DocumentsWriter accessor from inside the Field.

> Reuse TokenStream instances in Field
> ------------------------------------
>
>                 Key: LUCENE-5634
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5634
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>             Fix For: 4.9, 5.0
>
>         Attachments: LUCENE-5634.patch, LUCENE-5634.patch
>
>
> If you don't reuse your Doc/Field instances (which is very expert: I
> suspect few apps do) then there's a lot of garbage created to index each
> StringField because we make a new StringTokenStream or
> NumericTokenStream (and their Attributes).
> We should be able to re-use these instances via a static
> ThreadLocal...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org