You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Erick Erickson (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2011/12/21 16:47:30 UTC

[jira] [Issue Comment Edited] (SOLR-2242) Get distinct count of names for a facet field

    [ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174152#comment-13174152 ] 

Erick Erickson edited comment on SOLR-2242 at 12/21/11 3:45 PM:
----------------------------------------------------------------

OK, it seems like we have several themes here. I'd like to get a reasonable consensus before going forward... I'll put out a straw-man proposal here and we can go from there.

But lets figure out where we're going before revamping stuff yet again.

1> Distributed support. I sure don't see a good way to support this currently. Perhaps some of the future enhancements will make this easier (thinking distributed TF/IDF & such while being totally ignorant of that code), but returning the entire list of constraints (or names or terms or whatever we call it) is just a bad idea. The first time someone tries this on a field with 1,000,000 terms (yes, I've seen this) it'll just blow things up. I'm also slightly anti the min/max idea. I'm not sure what value there is in telling someone "there are between 10,000 and 90,000 distinct values". And if it's a field with just a few pre-defined values, that information is already known anyway.... But if someone can show a use-case here I'm not completely against it. But I'd like to see the use case first, not "someone might find it useful" <G>.

2> back compat. Cody's suggestion seems to be the slickest in terms of not breaking things, but we use attributes in just a few places, are there reasons NOT to do it that way? Or does this mess up JSON, PHP, etc?

3> Possibly add a new JIRA for changing the facet response format to be tolerant of sub-fields, but don't do that here.

Again, I want a clearly defined end point for the concerns raised before we dive back in here....


                
      was (Author: erickerickson):
    OK, it seems like we have several themes here. I'd like to get a reasonable consensus before going forward... I'll put out a straw-man proposal here and we can go from there.

But lets figure out where we're going before revamping stuff yet again.

1> Distributed support. I sure don't see a good way to support this currently. Perhaps some of the future enhancements will make this easier (thinking distributed TF/IDF & such while being totally ignorant of that code), but returning the entire list of constraints (or names or terms or whatever we call it) is just a bad idea. The first time someone tries this on a field with 1,000,000 terms (yes, I've seen this) it'll just blow things up. I'm also slightly anti the min/max idea. I'm not sure what value there is in telling someone "there are between 10,000 and 90,000 distinct values". And if it's a field with just a few pre-defined values, that information is already known anyway.... But if someone can show a use-case here I'm not completely against it. But I'd like to see the use case first, not "someone might find it useful" <G>.

2> back compat. Cody's suggestion seems to be the slickest in terms of not breaking things, but we use attributes in just a few places, are there reasons NOT to do it that way?

3> Possibly add a new JIRA for changing the facet response format to be tolerant of sub-fields, but don't do that here.

Again, I want a clearly defined end point for the concerns raised before we dive back in here....


                  
> Get distinct count of names for a facet field
> ---------------------------------------------
>
>                 Key: SOLR-2242
>                 URL: https://issues.apache.org/jira/browse/SOLR-2242
>             Project: Solr
>          Issue Type: New Feature
>          Components: Response Writers
>    Affects Versions: 4.0
>            Reporter: Bill Bell
>            Assignee: Erick Erickson
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: NumFacetTermsFacetsTest.java, SOLR-2242-notworkingtest.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.shard.patch, SOLR-2242.shard.patch, SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1-fix.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch
>
>
> When returning facet.field=<name of field> you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1.
> The feature is called "namedistinct". Here is an example:
> http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solr&indent=true&q=*:*&facet=true&facet.mincount=1&facet.numFacetTerms=2&facet.limit=-1&facet.field=price
> http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solr&indent=true&q=*:*&facet=true&facet.mincount=1&facet.numFacetTerms=0&facet.limit=-1&facet.field=price
> http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solr&indent=true&q=*:*&facet=true&facet.mincount=1&facet.numFacetTerms=1&facet.limit=-1&facet.field=price
> This currently only works on facet.field.
> {code}
> <lst name="facet_fields">
>   <lst name="price">
>     <int name="numFacetTerms">14</int>
>     <int name="0.0">3</int><int name="11.5">1</int><int name="19.95">1</int><int name="74.99">1</int><int name="92.0">1</int><int name="179.99">1</int><int name="185.0">1</int><int name="279.95">1</int><int name="329.95">1</int><int name="350.0">1</int><int name="399.0">1</int><int name="479.95">1</int><int name="649.99">1</int><int name="2199.0">1</int>
>   </lst>
> </lst>
> {code} 
> Several people use this to get the group.field count (the # of groups).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org