You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by GitBox <gi...@apache.org> on 2021/04/17 05:48:04 UTC

[GitHub] [solr] dsmiley commented on pull request #2: SOLR-14185: introduce DocSet.iterator(LeafReaderContext), replacing Filter where possible

dsmiley commented on pull request #2:
URL: https://github.com/apache/solr/pull/2#issuecomment-821772007


   I did some benchmarking, finally.  The new implementation appears 8% faster overall, excluding the optimization I added.  The data set was a million docs and a field with a gaussian distribution of terms.  The queries had a filter query with a term against that field, and it resulted in a SortedIntDocSet the vast majority of the time.  The main "q" query was a randomly produced phrase query that would always match some subphrase of a sentence found in many documents.  The benchmark produced 2000 consistently random queries and I re-ran this about 10 times and took the average of the fastest 3 runs.
   
   I added a small optimization to short-circuit intersect(DocSet) when there was no intersection change.  The % improvement moved to ~11%.  In my benchmark, there was another filter query that matched everything, and SolrIndexSearcher.getProcessedFilter would intersect that with the cached SortedIntDocSet producing a new SortedIntDocSet every time, and thus there was _never_ any cache re-use of cachedOrdIdxMap.  Of course this is highly dependent on the benchmarking scenario.  This really emphasizes how "YMMV" applies to benchmarking this stuff because it's so dependent on what the app's usage pattern looks like.  I think in some future JIRA issue, getProcessedFilter would be better off not intersecting any SortedIntDocSets; they could simply be added as separate filters in a BooleanQuery (dependent on SOLR-14166).
   
   So I think this is ready to merge!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org