You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "dnhatn (via GitHub)" <gi...@apache.org> on 2023/02/10 06:18:09 UTC

[GitHub] [lucene] dnhatn opened a new issue, #12140: LRUQueryCache disabled for indices with more 34 segments

dnhatn opened a new issue, #12140:
URL: https://github.com/apache/lucene/issues/12140

   ### Description
   
   An Elasticsearch customer reported a search performance issue. We looked into the segment stats and found that the index has 34 * 5GB segments, and LRUQueryCache never cache these segments. The reason is that LRUQueryCache only caches segments that have [more than 3%](https://github.com/apache/lucene/blob/7baa01b3c2f93e6b172e986aac8ef577a87ebceb/lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java#L139-L148) of the total number of documents in the index, and all segments here have less than 3%.
   
   I will work on the fix for this issue. Any suggestions are welcomed.
   
   ### Version and environment details
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on issue #12140: LRUQueryCache disabled for indices with more than 33 segments

Posted by "rmuir (via GitHub)" <gi...@apache.org>.
rmuir commented on issue #12140:
URL: https://github.com/apache/lucene/issues/12140#issuecomment-1426700533

   > I like medians better than averages in many cases, but would this require iterating over all segments in the index everytime we need to make a caching decision? I worry this could be a bottleneck for indexes with many segments and cheap queries. My reasoning for the average segment size was that it's something that can be computed cheaply as `topLevelReader.maxDoc() / leaves.size()`.
   
   If you have this problem then you have too many segments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on issue #12140: LRUQueryCache disabled for indices with more than 33 segments

Posted by "rmuir (via GitHub)" <gi...@apache.org>.
rmuir commented on issue #12140:
URL: https://github.com/apache/lucene/issues/12140#issuecomment-1426700879

   > @rmuir Are you okay with Adrien's proposal? If so, I can start working on the fix.
   
   I don't really care so much, but let's please not overengineer it and just keep it simple.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on issue #12140: LRUQueryCache disabled for indices with more than 33 segments

Posted by "jpountz (via GitHub)" <gi...@apache.org>.
jpountz commented on issue #12140:
URL: https://github.com/apache/lucene/issues/12140#issuecomment-1425791875

   Ohhh this is an interesting corner-case for this heuristic. My first reaction was that we could have an exception for segments that reached the maximum tier, but it feels potentially dangerous to iterate over the set of segments to figure out the size of the maximum tier every time we need to make a caching decision. Another idea could be to cache on segments whose number of docs is greater than the average number of docs of segments in the index. This condition would generally be more selective than the 3% threshold, except in the case when there are many segments at the maximum tier.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dnhatn commented on issue #12140: LRUQueryCache disabled for indices with more than 33 segments

Posted by "dnhatn (via GitHub)" <gi...@apache.org>.
dnhatn commented on issue #12140:
URL: https://github.com/apache/lucene/issues/12140#issuecomment-1426141243

   Thanks, Adrien. +1 to what you suggested.
   
   @rmuir Are you okay with Adrien's proposal? If so, I can start working on the fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on issue #12140: LRUQueryCache disabled for indices with more than 33 segments

Posted by "jpountz (via GitHub)" <gi...@apache.org>.
jpountz commented on issue #12140:
URL: https://github.com/apache/lucene/issues/12140#issuecomment-1426119587

   I like medians better than averages in many cases, but would this require iterating over all segments in the index everytime we need to make a caching decision? I worry this could be a bottleneck for indexes with many segments and cheap queries. My reasoning for the average segment size was that it's something that can be computed cheaply as `topLevelReader.maxDoc() / leaves.size()`. To @dnhatn 's point, maybe it should even be half the average segment size to make sure it includes all segments from the upper tier? I'm going below 95% because I'd expect the next tier to have segments in the order of 10x smaller, so with a 50% threshold we'd cover segments from the upper tier with greater confidence while still excluding the next tier?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dnhatn closed issue #12140: LRUQueryCache disabled for indices with more than 33 segments

Posted by "dnhatn (via GitHub)" <gi...@apache.org>.
dnhatn closed issue #12140: LRUQueryCache disabled for indices with more than 33 segments
URL: https://github.com/apache/lucene/issues/12140


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on issue #12140: LRUQueryCache disabled for indices with more than 33 segments

Posted by "rmuir (via GitHub)" <gi...@apache.org>.
rmuir commented on issue #12140:
URL: https://github.com/apache/lucene/issues/12140#issuecomment-1425800860

   or maybe use median instead of average to help prevent issues. then its guaranteed that biggest segments get cached.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dnhatn commented on issue #12140: LRUQueryCache disabled for indices with more than 33 segments

Posted by "dnhatn (via GitHub)" <gi...@apache.org>.
dnhatn commented on issue #12140:
URL: https://github.com/apache/lucene/issues/12140#issuecomment-1426113655

   @jpountz @rmuir Thank you for your suggestions. That means we only cache 17 out of 34 segments in this case. I wonder if we also cache segments that are greater than 95% of the median (or average) segment?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org