You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Jeff Wartes (JIRA)" <ji...@apache.org> on 2016/05/17 16:23:12 UTC
[jira] [Commented] (SOLR-9125) CollapseQParserPlugin allocations are index based, not query based

    [ https://issues.apache.org/jira/browse/SOLR-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15286940#comment-15286940 ] 

Jeff Wartes commented on SOLR-9125:
-----------------------------------

I messed around a little bit, but I don't have a solution for this. I thought I'd file the issue anyway just to shine some light.

I had attempted to use CollapseQParserPlugin on a very large index using a collapse on a field whose cardinality was about 1/7th the doc count... it didn't go well. Worse, the issue didn't come up until pretty late in the game, because at low query rate and/or on smaller indexes, the problem isn't evident. I abandoned the attempt.

Some stuff I tried:

- I thought about replacing the FBS with a DocIdSetBuilder, but DelegatingCollector.finish() gets called twice, and you can't DocIdSetBuilder.build() twice on the same builder. We'd need to save the first build() result and use it to initialize a new builder for the second, but I wasn't convinced I understood the distinction between the two passes.
- I did one quick test where I replaced the "ords" and "scores" arrays with an IntIntScatterMap IntFloatScatterMap, thinking those would work better for small result sets. That ended up being worse (from a total allocations standpoint) for the queries I was trying, probably due to the map resizing necessary. It might be possible to set initial size values from statistics and help this case that way. It would also be possible to encode the docId/score into a long and just use one IntLongScatterMap, but I didn't try that.

> CollapseQParserPlugin allocations are index based, not query based
> ------------------------------------------------------------------
>
>                 Key: SOLR-9125
>                 URL: https://issues.apache.org/jira/browse/SOLR-9125
>             Project: Solr
>          Issue Type: Improvement
>          Components: query parsers
>            Reporter: Jeff Wartes
>            Priority: Minor
>              Labels: collapsingQParserPlugin
>
> Among other things, CollapsingQParserPlugin’s OrdScoreCollector allocates space per-query for: 
> 1 int (doc id) per ordinal
> 1 float (score) per ordinal
> 1 bit (FixedBitSet) per document in the index
>  
> So the higher the cardinality of the thing you’re grouping on, and the more documents in the index, the more memory gets consumed per query. Since high cardinality and large indexes are the use-cases CollapseQParserPlugin was designed for, I thought I'd point this out.
> My real issue is that this does not vary based on the number of results in the query, either before or after collapsing, so a query that results in one doc consumes the same amount of memory as one that returns all of them. All of the Collectors suffer from this to some degree, but I think OrdScore is the worst offender.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org