You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by smock <ha...@gmail.com> on 2009/09/11 21:50:44 UTC

Facet Response Structure

I'd like to propose a change to the facet response structure.  Currently, it
looks like:
{'facet_fields':{'field1':[('value1',count1),('value2',count2),(null,missingCount)]}}

My immediate problem with this structure is that null is not of the same
type as the 'value's.  Also, the meaning of the (null,missingCount) tuple is
not the same as the meaning of the ('value',count) tuples, it is a special
case to represent the documents for which the field has no value.  I'd like
to propose changing the response to:
{'facet_fields',:{'field1':{'facets':[('value1',count1),('value2',count2)],'missing':missingCount}}}


In addition to cleaning up the 'null' issue mentioned above, I think this
will allow for greater flexibility moving forward with the facet component. 
For instance, it would be great if the FacetComponent could add an optional
count of the 'hits', or number of distinct facet values contained in the
query result.  If the facet request has a limit on it, this number is not
available via a count of the returned facet values.  The response structure
I've outlined above could accomodate this piece of metadata very easily:
{'facet_fields',:{'field1':{'facets':[('value1',count1),('value2',count2)],'missing':missingCount,'hits':hitsCount}}}


What does everyone think?  I'd be happy to submit a patch to solr (for 1.5,
of course), if the solr community is in favor of it.


-- 
View this message in context: http://www.nabble.com/Facet-Response-Structure-tp25407363p25407363.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facet Response Structure

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Sat, Sep 12, 2009 at 6:29 PM, smock <ha...@gmail.com> wrote:

>
> As to point 1 - this is not a problem with the response structure I've
> outlined.  This is exactly the problem I'm trying to solve.  NULL is not a
> value in the field, it is a placeholder to indicate how many documents the
> field does not exist for.  In my example response structure above,
> 'missing'
> is placed outside of the 'facets' list, clearing up the confusion.
> 'missing' could indeed be a facet value without any collisions.
>
>
You are right, I missed that.


> To point 2 - I understand it would cause compatibility issues, that is why
> I
> was suggesting it be incorporated into the next SOLR release.  I'd also be
> willing to work
>
>
I'm not convinced that it is something that needs to be changed. I'm also
not sure about the right way to deprecate a widely used response format. Go
ahead and raise an issue if you want and we can collect thoughts from
others.


> Regarding the stats component, it does not do what you think it does.  It
> reports a count of all values, not distinct values.  The stats component
> also strictly works on numeric fields, which would make it impossible to
> use
> in a lot of cases where the FacetComponent does work.
>
>
Yes, my bad. Though it does report the count of missing values.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Facet Response Structure

Posted by smock <ha...@gmail.com>.
As to point 1 - this is not a problem with the response structure I've
outlined.  This is exactly the problem I'm trying to solve.  NULL is not a
value in the field, it is a placeholder to indicate how many documents the
field does not exist for.  In my example response structure above, 'missing'
is placed outside of the 'facets' list, clearing up the confusion. 
'missing' could indeed be a facet value without any collisions.

To point 2 - I understand it would cause compatibility issues, that is why I
was suggesting it be incorporated into the next SOLR release.  I'd also be
willing to work 

Regarding the stats component, it does not do what you think it does.  It
reports a count of all values, not distinct values.  The stats component
also strictly works on numeric fields, which would make it impossible to use
in a lot of cases where the FacetComponent does work.


Shalin Shekhar Mangar wrote:
> 
> On Sat, Sep 12, 2009 at 1:20 AM, smock <ha...@gmail.com> wrote:
> 
>>
>> I'd like to propose a change to the facet response structure.  Currently,
>> it
>> looks like:
>>
>> {'facet_fields':{'field1':[('value1',count1),('value2',count2),(null,missingCount)]}}
>>
>> My immediate problem with this structure is that null is not of the same
>> type as the 'value's.  Also, the meaning of the (null,missingCount) tuple
>> is
>> not the same as the meaning of the ('value',count) tuples, it is a
>> special
>> case to represent the documents for which the field has no value.  I'd
>> like
>> to propose changing the response to:
>>
>> {'facet_fields',:{'field1':{'facets':[('value1',count1),('value2',count2)],'missing':missingCount}}}
>>
>>
> Well, there are two problems:
> 1. 'missing' can be a value in the field
> 2. Facet support has been there for a long time. This would break
> compatibility with existing clients.
> 
> 
>>
>> In addition to cleaning up the 'null' issue mentioned above, I think this
>> will allow for greater flexibility moving forward with the facet
>> component.
>> For instance, it would be great if the FacetComponent could add an
>> optional
>> count of the 'hits', or number of distinct facet values contained in the
>> query result.  If the facet request has a limit on it, this number is not
>> available via a count of the returned facet values.  The response
>> structure
>> I've outlined above could accomodate this piece of metadata very easily:
>>
>> {'facet_fields',:{'field1':{'facets':[('value1',count1),('value2',count2)],'missing':missingCount,'hits':hitsCount}}}
>>
>>
> Have you looked at StatsComponent? It give counts for total distinct
> values
> and count of documents missing a value among other things:
> 
> http://wiki.apache.org/solr/StatsComponent
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: http://www.nabble.com/Facet-Response-Structure-tp25407363p25414267.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facet Response Structure

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Sat, Sep 12, 2009 at 1:20 AM, smock <ha...@gmail.com> wrote:

>
> I'd like to propose a change to the facet response structure.  Currently,
> it
> looks like:
>
> {'facet_fields':{'field1':[('value1',count1),('value2',count2),(null,missingCount)]}}
>
> My immediate problem with this structure is that null is not of the same
> type as the 'value's.  Also, the meaning of the (null,missingCount) tuple
> is
> not the same as the meaning of the ('value',count) tuples, it is a special
> case to represent the documents for which the field has no value.  I'd like
> to propose changing the response to:
>
> {'facet_fields',:{'field1':{'facets':[('value1',count1),('value2',count2)],'missing':missingCount}}}
>
>
Well, there are two problems:
1. 'missing' can be a value in the field
2. Facet support has been there for a long time. This would break
compatibility with existing clients.


>
> In addition to cleaning up the 'null' issue mentioned above, I think this
> will allow for greater flexibility moving forward with the facet component.
> For instance, it would be great if the FacetComponent could add an optional
> count of the 'hits', or number of distinct facet values contained in the
> query result.  If the facet request has a limit on it, this number is not
> available via a count of the returned facet values.  The response structure
> I've outlined above could accomodate this piece of metadata very easily:
>
> {'facet_fields',:{'field1':{'facets':[('value1',count1),('value2',count2)],'missing':missingCount,'hits':hitsCount}}}
>
>
Have you looked at StatsComponent? It give counts for total distinct values
and count of documents missing a value among other things:

http://wiki.apache.org/solr/StatsComponent

-- 
Regards,
Shalin Shekhar Mangar.