You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2015/04/24 18:46:39 UTC

[jira] [Commented] (SOLR-7461) StatsComponent, calcdistinct, ability to disable distinctValues while keeping countDistinct

    [ https://issues.apache.org/jira/browse/SOLR-7461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14511324#comment-14511324 ] 

Hoss Man commented on SOLR-7461:
--------------------------------

As noted in SOLR-6349...

bq. i think the best approach would be to leave "calcDistinct" alone as it is now but deprecate/discourage it andmove towards adding an entirely new stats option for computing an aproximated count using hyperloglog (i opened a new issue for this: SOLR-6968)

...the problem is that the "exact" count returned by calcDistinct today requires that all distinctValues be aggregated (from all shards in a distrib setup) and dumped into a giant Set in memory.  returning the distinctValues may seem cumbersome to clients, but not returning them would just mask how painful this feature is on the server side, and the biggest problems with it (notably server OOMs) wouldn't go away, they'd just be harder to understand.

so i'm generally opposed to adding more flags to _hide_ what is, in my opinion, a broken "feature" and instead aim to move on and implement a better version of it (hopefully within the next week or so)

> StatsComponent, calcdistinct, ability to disable distinctValues while keeping countDistinct
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-7461
>                 URL: https://issues.apache.org/jira/browse/SOLR-7461
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: James Andres
>              Labels: statscomponent
>
> When using calcdistinct with large amounts of data the distinctValues field can be extremely large. In cases where the countDistinct is only required it would be helpful if the server did not return distinctValues in the response.
> I'm no expert, but here are some ideas for how the syntax could look.
> {code}
> # Both countDistinct and distinctValues are returned, along with all other stats
> stats.calcdistinct=true&stats.field=myfield
> # Only countDistinct and distinctValues are returned
> stats.calcdistinct=true&stats.field={!countDistinct=true distinctValues=true}myfield
> # Only countDistinct is returned
> stats.calcdistinct=true&stats.field={!countDistinct=true}myfield
> # Only distinctValues is returned
> stats.calcdistinct=true&stats.field={!distinctValues=true}myfield
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org