You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Joel Bernstein (JIRA)" <ji...@apache.org> on 2016/05/17 18:49:13 UTC

[jira] [Comment Edited] (SOLR-9125) CollapseQParserPlugin allocations are index based, not query based

    [ https://issues.apache.org/jira/browse/SOLR-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15287265#comment-15287265 ] 

Joel Bernstein edited comment on SOLR-9125 at 5/17/16 6:49 PM:
---------------------------------------------------------------

One approach that might work for switching to primitive maps, would be first to estimate the cardinality of the collapse values in the result set using hyperloglog, and then sizing the primitive map accordingly. But my guess is this approach is going to really hurt performance. 




was (Author: joel.bernstein):
One approach that might work for switching to primitive maps, would be first to estimate the cardinality of the collapse values in the result set using hyperloglog, and then sizing the primitive map accordingly. But my guess is this approach is going really hurt performance quite a bit. 



> CollapseQParserPlugin allocations are index based, not query based
> ------------------------------------------------------------------
>
>                 Key: SOLR-9125
>                 URL: https://issues.apache.org/jira/browse/SOLR-9125
>             Project: Solr
>          Issue Type: Improvement
>          Components: query parsers
>            Reporter: Jeff Wartes
>            Priority: Minor
>              Labels: collapsingQParserPlugin
>
> Among other things, CollapsingQParserPlugin’s OrdScoreCollector allocates space per-query for: 
> 1 int (doc id) per ordinal
> 1 float (score) per ordinal
> 1 bit (FixedBitSet) per document in the index
>  
> So the higher the cardinality of the thing you’re grouping on, and the more documents in the index, the more memory gets consumed per query. Since high cardinality and large indexes are the use-cases CollapseQParserPlugin was designed for, I thought I'd point this out.
> My real issue is that this does not vary based on the number of results in the query, either before or after collapsing, so a query that results in one doc consumes the same amount of memory as one that returns all of them. All of the Collectors suffer from this to some degree, but I think OrdScore is the worst offender.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org