You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Alain Rogister <al...@gmail.com> on 2011/10/20 22:57:26 UTC

inconsistent results when faceting on multivalued field

I am surprised by the results I am getting from a search in a Solr 3.4
index.

My schema has a multivalued field of type 'string' :

<field name="qua_code" type="string" multiValued="true" indexed="true"
stored="true"/>

The field values are 7-digit or 9-digit integer numbers; this corresponds to
a hierarchy. I could have used a numeric type instead of string but no
numerical operations are performed against the values.

Now, each document contains 0-N values for this field, such as:

8625774
1234567
123456701
123456702
123456703
9384738

When I make a search on facet.qua_code=1234567 , I am getting the counts I
expect (seemingly correct) + a large number of counts for *other* field
values (e.g. 9384738).

If I reword the query as 'facet.query=qua_code:1234567 TO 1234567', I only
get the expected counts.

I can also filter out the extraneous results with a facet.prefix clause.

Should I file an issue or am I misunderstanding something about faceting on
multivalued fields ?

Thanks.

Re: inconsistent results when faceting on multivalued field

Posted by Erick Erickson <er...@gmail.com>.
I think the key here is you are a bit confused about what
the multiValued thing is all about. The fq clause says,
essentially, "restrict all my search results to the documents
where 1213206 occurs in sou_codeMetier.
That's *all* the fq clause does.

Now, by saying facet.field=sou_codeMetier you're asking Solr
to count the number of documents that exist for each unique
value in that field. A single document can be counted many
times. Each "bucket" is a unique value in the field.

On the other hand, saying
facet.query=sou_codeMetier:[1213206 TO
1213206] you're asking Solr to count all the documents
that make it through your query (*:* in this case) with
*any* value in the indicated range.

Facet queries really have nothing to do with filter queries. That is,
facet queries in no way restrict the documents that are returned,
they just indicate ways of counting documents into buckets

Best
Erick

On Fri, Oct 21, 2011 at 10:01 AM, Darren Govoni <da...@ontrenet.com> wrote:
> My interpretation of your results are that your FQ found 1281 documents
> with 1213206 value in sou_codeMetier field. Of those results, 476 also
> had 1212104 as a value...and so on. Since ALL the results will have
> the field value in your FQ, then I would expect the "other" values to
> be equal or less occurring from the result set, which they appear to be.
>
>
>
> On 10/21/2011 03:55 AM, Alain Rogister wrote:
>>
>> Pravesh,
>>
>> Not exactly. Here is the search I do, in more details (different field
>> name,
>> but same issue).
>>
>> I want to get a count for a specific value of the sou_codeMetier field,
>> which is multivalued. I expressed this by including a fq clause :
>>
>>
>> /select/?q=*:*&facet=true&facet.field=sou_codeMetier&fq=sou_codeMetier:1213206&rows=0
>>
>> The response (excerpt only):
>>
>> <lst name="facet_fields">
>> <lst name="sou_codeMetier">
>> <int name="1213206">1281</int>
>> <int name="1212104">476</int>
>> <int name="121320603">285</int>
>> <int name="1213101">260</int>
>> <int name="121320602">208</int>
>> <int name="121320605">171</int>
>> <int name="1212201">152</int>
>> ...
>>
>> As you see, I get back both the expected results and extra results I would
>> expect to be filtered out by the fq clause.
>>
>> I can eliminate the extra results with a
>> 'f.sou_codeMetier.facet.prefix=1213206' clause.
>>
>> But I wonder if Solr's behavior is correct and how the fq filtering works
>> exactly.
>>
>> If I replace the facet.field clause with a facet.query clause, like this:
>>
>> /select/?q=*:*&facet=true&facet.query=sou_codeMetier:[1213206 TO
>> 1213206]&rows=0
>>
>> The results contain a single item:
>>
>> <lst name="facet_queries">
>> <int name="sou_codeMetier:[1213206 TO 1213206]">1281</int>
>> </lst>
>>
>> The 'fq=sou_codeMetier:1213206' clause isn't necessary here and does not
>> affect the results.
>>
>> Thanks,
>>
>> Alain
>>
>> On Fri, Oct 21, 2011 at 9:18 AM, pravesh<su...@yahoo.com>  wrote:
>>
>>> Could u clarify on below:
>>>>>
>>>>> When I make a search on facet.qua_code=1234567 ??
>>>
>>> Are u trying to say, when u fire a fresh search for a facet item, like;
>>> q=qua_code:1234567??
>>>
>>> This this would fetch for documents where qua_code fields contains either
>>> the terms 1234567 OR both terms (1234567&  9384738.....and others terms).
>>> This would be since its a multivalued field and hence if you see the
>>> facet,
>>> then its shown for both the terms.
>>>
>>>>> If I reword the query as 'facet.query=qua_code:1234567 TO 1234567', I
>>>
>>> only
>>> get the expected counts
>>>
>>> You will get facet for documents which have term 1234567 only
>>> (facet.query
>>> would apply to the facets,so as to which facet to be picked/shown)
>>>
>>> Regds
>>> Pravesh
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>>
>>> http://lucene.472066.n3.nabble.com/inconsistent-results-when-faceting-on-multivalued-field-tp3438991p3440128.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>
>

Re: inconsistent results when faceting on multivalued field

Posted by Darren Govoni <da...@ontrenet.com>.
My interpretation of your results are that your FQ found 1281 documents
with 1213206 value in sou_codeMetier field. Of those results, 476 also
had 1212104 as a value...and so on. Since ALL the results will have
the field value in your FQ, then I would expect the "other" values to
be equal or less occurring from the result set, which they appear to be.



On 10/21/2011 03:55 AM, Alain Rogister wrote:
> Pravesh,
>
> Not exactly. Here is the search I do, in more details (different field name,
> but same issue).
>
> I want to get a count for a specific value of the sou_codeMetier field,
> which is multivalued. I expressed this by including a fq clause :
>
> /select/?q=*:*&facet=true&facet.field=sou_codeMetier&fq=sou_codeMetier:1213206&rows=0
>
> The response (excerpt only):
>
> <lst name="facet_fields">
> <lst name="sou_codeMetier">
> <int name="1213206">1281</int>
> <int name="1212104">476</int>
> <int name="121320603">285</int>
> <int name="1213101">260</int>
> <int name="121320602">208</int>
> <int name="121320605">171</int>
> <int name="1212201">152</int>
> ...
>
> As you see, I get back both the expected results and extra results I would
> expect to be filtered out by the fq clause.
>
> I can eliminate the extra results with a
> 'f.sou_codeMetier.facet.prefix=1213206' clause.
>
> But I wonder if Solr's behavior is correct and how the fq filtering works
> exactly.
>
> If I replace the facet.field clause with a facet.query clause, like this:
>
> /select/?q=*:*&facet=true&facet.query=sou_codeMetier:[1213206 TO
> 1213206]&rows=0
>
> The results contain a single item:
>
> <lst name="facet_queries">
> <int name="sou_codeMetier:[1213206 TO 1213206]">1281</int>
> </lst>
>
> The 'fq=sou_codeMetier:1213206' clause isn't necessary here and does not
> affect the results.
>
> Thanks,
>
> Alain
>
> On Fri, Oct 21, 2011 at 9:18 AM, pravesh<su...@yahoo.com>  wrote:
>
>> Could u clarify on below:
>>>> When I make a search on facet.qua_code=1234567 ??
>> Are u trying to say, when u fire a fresh search for a facet item, like;
>> q=qua_code:1234567??
>>
>> This this would fetch for documents where qua_code fields contains either
>> the terms 1234567 OR both terms (1234567&  9384738.....and others terms).
>> This would be since its a multivalued field and hence if you see the facet,
>> then its shown for both the terms.
>>
>>>> If I reword the query as 'facet.query=qua_code:1234567 TO 1234567', I
>> only
>> get the expected counts
>>
>> You will get facet for documents which have term 1234567 only (facet.query
>> would apply to the facets,so as to which facet to be picked/shown)
>>
>> Regds
>> Pravesh
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/inconsistent-results-when-faceting-on-multivalued-field-tp3438991p3440128.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>


Re: inconsistent results when faceting on multivalued field

Posted by Alain Rogister <al...@gmail.com>.
Pravesh,

Not exactly. Here is the search I do, in more details (different field name,
but same issue).

I want to get a count for a specific value of the sou_codeMetier field,
which is multivalued. I expressed this by including a fq clause :

/select/?q=*:*&facet=true&facet.field=sou_codeMetier&fq=sou_codeMetier:1213206&rows=0

The response (excerpt only):

<lst name="facet_fields">
<lst name="sou_codeMetier">
<int name="1213206">1281</int>
<int name="1212104">476</int>
<int name="121320603">285</int>
<int name="1213101">260</int>
<int name="121320602">208</int>
<int name="121320605">171</int>
<int name="1212201">152</int>
...

As you see, I get back both the expected results and extra results I would
expect to be filtered out by the fq clause.

I can eliminate the extra results with a
'f.sou_codeMetier.facet.prefix=1213206' clause.

But I wonder if Solr's behavior is correct and how the fq filtering works
exactly.

If I replace the facet.field clause with a facet.query clause, like this:

/select/?q=*:*&facet=true&facet.query=sou_codeMetier:[1213206 TO
1213206]&rows=0

The results contain a single item:

<lst name="facet_queries">
<int name="sou_codeMetier:[1213206 TO 1213206]">1281</int>
</lst>

The 'fq=sou_codeMetier:1213206' clause isn't necessary here and does not
affect the results.

Thanks,

Alain

On Fri, Oct 21, 2011 at 9:18 AM, pravesh <su...@yahoo.com> wrote:

> Could u clarify on below:
> >>When I make a search on facet.qua_code=1234567 ??
>
> Are u trying to say, when u fire a fresh search for a facet item, like;
> q=qua_code:1234567??
>
> This this would fetch for documents where qua_code fields contains either
> the terms 1234567 OR both terms (1234567 & 9384738.....and others terms).
> This would be since its a multivalued field and hence if you see the facet,
> then its shown for both the terms.
>
> >>If I reword the query as 'facet.query=qua_code:1234567 TO 1234567', I
> only
> get the expected counts
>
> You will get facet for documents which have term 1234567 only (facet.query
> would apply to the facets,so as to which facet to be picked/shown)
>
> Regds
> Pravesh
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/inconsistent-results-when-faceting-on-multivalued-field-tp3438991p3440128.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: inconsistent results when faceting on multivalued field

Posted by pravesh <su...@yahoo.com>.
Could u clarify on below:
>>When I make a search on facet.qua_code=1234567 ??

Are u trying to say, when u fire a fresh search for a facet item, like;
q=qua_code:1234567??

This this would fetch for documents where qua_code fields contains either
the terms 1234567 OR both terms (1234567 & 9384738.....and others terms).
This would be since its a multivalued field and hence if you see the facet,
then its shown for both the terms.

>>If I reword the query as 'facet.query=qua_code:1234567 TO 1234567', I only
get the expected counts

You will get facet for documents which have term 1234567 only (facet.query
would apply to the facets,so as to which facet to be picked/shown)

Regds
Pravesh



--
View this message in context: http://lucene.472066.n3.nabble.com/inconsistent-results-when-faceting-on-multivalued-field-tp3438991p3440128.html
Sent from the Solr - User mailing list archive at Nabble.com.