You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/12/12 22:05:48 UTC

[GitHub] [lucene] vigyasharma opened a new pull request, #12013: Clear thread local values on UTF8TaxonomyWriterCache.close()

vigyasharma opened a new pull request, #12013:
URL: https://github.com/apache/lucene/pull/12013

   `UTF8TaxonomyWriterCache` keeps a ThreadLocal `BytesRefBuilder`, the bytes for which, are not removed on thread close. This leads to memory leaks. 
   
   The change uses `CloseableThreadLocal` instead, and closes out the object on cache.close() 
   Addresses #12000 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on pull request #12013: Clear thread local values on UTF8TaxonomyWriterCache.close()

Posted by GitBox <gi...@apache.org>.
rmuir commented on PR #12013:
URL: https://github.com/apache/lucene/pull/12013#issuecomment-1347422875

   Thanks, it looks better to use `CloseableThreadLocal` than `ThreadLocal`... but I don't understand what this "cache" is doing and why it actually documents that it never frees memory. Especially as a default!
   
   As a followup can we open an issue to move away from Threadlocal storage here by default? I don't understand why we have this "cache" with this description:
   ```
   /** A "cache" that never frees memory, and stores labels in a BytesRefHash (utf-8 encoding). */
   ```
   
   Also, I'm trying to remove ThreadLocal (and CloseableThreadLocal) usages elsewhere in the codebase (e.g. #11998). These threadlocals are problematic in java apps that have many threads (or high thread churn rate), just as CloseableThreadLocal documents and attempts to workaround.
   
   At a glance, seems to me that using a bounded LRU cache would be much more reasonable...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] vigyasharma commented on pull request #12013: Clear thread local values on UTF8TaxonomyWriterCache.close()

Posted by GitBox <gi...@apache.org>.
vigyasharma commented on PR #12013:
URL: https://github.com/apache/lucene/pull/12013#issuecomment-1386545577

   Created a separate PR - #12092 to remove support for `UTF8TaxonomyWriterCache` from main. Will close this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] vigyasharma commented on pull request #12013: Clear thread local values on UTF8TaxonomyWriterCache.close()

Posted by GitBox <gi...@apache.org>.
vigyasharma commented on PR #12013:
URL: https://github.com/apache/lucene/pull/12013#issuecomment-1367688658

   > ... but I don't understand what this "cache" is doing and why it actually documents that it never frees memory.
   
   My understanding is that `TaxonomyWriterCache` caches ordinals for all categories created in the index so far, so that categories use the same ordinals when facet labels are added. 
   
   It seems that we need such a cache to get ordinals for newly added categories from documents that are still pending flush.
   From `TaxonomyWriterCache#put()` docstring:
   ```bash
      * <p>The reason why the caller needs to know if part of the cache was cleared is that in that
      * case it will have to commit its on-disk index (so that all the latest category additions can be
      * searched on disk, if we can't rely on the cache to contain them).
   ```
   
   However, with faster BinaryDocValue fields, maybe we don't need the "never evicting UTF8TaxonomyWriterCache" anymore. And we could make LruTaxonomyWriterCache as the default?
   
   I can close this PR and make that change if it makes sense. Or we can merge this change and take it up in a separate issue. Are there faceting specific benchmarks that can help validate the change?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on pull request #12013: Clear thread local values on UTF8TaxonomyWriterCache.close()

Posted by GitBox <gi...@apache.org>.
rmuir commented on PR #12013:
URL: https://github.com/apache/lucene/pull/12013#issuecomment-1347439475

   Also (summoning my inner @uschindler), curious what the perf impact is if we simply remove these caches completely. Maybe they are not relevant to the performance anymore due to faceting changes (e.g. storing labels in docvalues rather than payloads) ? Maybe they are even hurting performance?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on pull request #12013: Clear thread local values on UTF8TaxonomyWriterCache.close()

Posted by GitBox <gi...@apache.org>.
rmuir commented on PR #12013:
URL: https://github.com/apache/lucene/pull/12013#issuecomment-1374852106

   OK, crazy that having no cache at all could cause unnecessary amounts of flush/reopens. We want at least a small one.
   
   > I can close this PR and make that change if it makes sense. Or we can merge this change and take it up in a separate issue. Are there faceting specific benchmarks that can help validate the change?
   
   +1 to make the LRU cache the default. And please, lets also remove this ThreadLocal-based cache **completely** in main branch, and deprecate in 9.5


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] vigyasharma commented on pull request #12013: Clear thread local values on UTF8TaxonomyWriterCache.close()

Posted by GitBox <gi...@apache.org>.
vigyasharma commented on PR #12013:
URL: https://github.com/apache/lucene/pull/12013#issuecomment-1386565076

   PR - https://github.com/apache/lucene/pull/12093 to deprecate `UTF8TaxonomyWriterCache` in 9.x
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] vigyasharma closed pull request #12013: Clear thread local values on UTF8TaxonomyWriterCache.close()

Posted by GitBox <gi...@apache.org>.
vigyasharma closed pull request #12013: Clear thread local values on UTF8TaxonomyWriterCache.close()
URL: https://github.com/apache/lucene/pull/12013


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on pull request #12013: Clear thread local values on UTF8TaxonomyWriterCache.close()

Posted by GitBox <gi...@apache.org>.
rmuir commented on PR #12013:
URL: https://github.com/apache/lucene/pull/12013#issuecomment-1348842317

   Yeah, I'm highly suspicious of this cache. I dug into this and found that previously stored fields (!) were used for this lookup. So no surprise there was a cache around it, since this is a slow approach!!!
   
   But @gautamworah96 fixed this, and these days BinaryDocValues are used instead: see #10490
   
   IMO there is no sense in caching BinaryDocValues lookups into the heap, it is already mmap'd and the operating system will do this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on pull request #12013: Clear thread local values on UTF8TaxonomyWriterCache.close()

Posted by GitBox <gi...@apache.org>.
uschindler commented on PR #12013:
URL: https://github.com/apache/lucene/pull/12013#issuecomment-1347501489

   Please remove the cache. Totally useless. It hurts more because the escape analysis can't work.
   When tiered compilation stepped in, this is just a useless extra
   
   My rule: Never ever cache object instances to prevent object creation and help GC. This was needed in in Java 6 and java 7, but not anymore. It has opposite problem: it actually creates those objects on heap and stresses GC.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org