You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Smiley (JIRA)" <ji...@apache.org> on 2016/04/05 18:21:25 UTC

[jira] [Commented] (SOLR-8944) Improve geospatial garbage generation

    [ https://issues.apache.org/jira/browse/SOLR-8944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15226561#comment-15226561 ] 

David Smiley commented on SOLR-8944:
------------------------------------

What should be done is to enhance this query (and most other predicates) to use {{DocIdSetBuilder}}. File an issue if you wish to pursue it; it'd be easy I think.  In LUCENE-6645 some performance testing of some new spatial approaches wasdone that also needed to build up a BitSet, and it was shown that SparseFixedBitSet caused a significant performance hit.  DocIdSetBuilder has an internal sparse sorted array mode which is used when the number of docs is less than 1/128th of the total docs in a segment.

I hope that helps enough and we can stop there.  I don't like the idea of adding complexity to re-use FixedBitSets.  Instead... perhaps more could be done to enhance the cache-ability of your spatial queries.  I've thought of perhaps using {{TermQueryPrefixTreeStrategy}} with a very coarse/approximate and thus more cacheable filter, although with a non-cached Solr post-filter using perhaps LatLonType.  LatLonType _can_ be slow, but using projected space (2D) instead of surface-of-sphere might help a lot if your data isn't world-wide.

> Improve geospatial garbage generation
> -------------------------------------
>
>                 Key: SOLR-8944
>                 URL: https://issues.apache.org/jira/browse/SOLR-8944
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Jeff Wartes
>              Labels: spatialrecursiveprefixtreefieldtype
>
> I’ve been continuing some analysis into JVM garbage sources in my Solr index. (5.4, 86M docs/core, 56k 99.9th percentile hit count with my query corpus)
> After applying SOLR-8922, I find my biggest source of garbage by a literal order of magnitude (by size) is the long[] allocated by FixedBitSet. From the backtraces, it appears the biggest source of FixBitSet creation in my case (by two orders of magnitude) is my use of queries that involve geospatial filtering.
> Specifically, IntersectsPrefixTreeQuery.getDocIdSet, here:
> https://github.com/apache/lucene-solr/blob/569b6ca9ca439ee82734622f35f6b6342c0e9228/lucene/spatial-extras/src/java/org/apache/lucene/spatial/prefix/IntersectsPrefixTreeQuery.java#L60
> Has this been considered for optimization? I can think of a few paths:
> 1. Persistent Object pools - FixedBitSet size is allocated based on maxDoc, which presumably changes less frequently than queries are issued. If an existing FixedBitSet were not available from a pool, the worst case (create a new one) would be no worse than the current behavior. The complication would be enforcement around when to return the object to the pool, but it looks like this has some lifecycle hooks already.
> 2. I note that a thing called a SparseFixedBitSet already exists, and puts considerable effort into allocating smaller chunks only as necessary. Is this not usable for this purpose? How significant is the performance difference?
> I'd be happy to spend some time on a patch, but I was hoping for a little more data around the current choices before choosing an approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org