You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Bastien Latard | MDPI AG <la...@mdpi.com.INVALID> on 2016/10/19 10:23:08 UTC

Facet behavior

Hi everybody,

I just had a question about facets.
*==> Is the facet run on all documents (to pre-process/cache the data) 
or only on returned documents?*

Because I have exactly the same index locally and on the prod server.. 
(except that my dev. contains much less docs)

When I make a query, and want the facets for the query, it's taking much 
longer in the production server, even if the query returns less 
documents ...

e.g.:
q=nanoparticles AND 
gold&facet.limit=5&facet.field=author&rows=0&facet=true&wt=xml&facet.offset=0
- live : 4059 documents <=> 11 secs
- local: 22298 documents <=> 1 sec

Thanks in advance.

Kind regards,
Bastien


Re: Facet behavior

Posted by Bastien Latard | MDPI AG <la...@mdpi.com.INVALID>.
Hi Guys,

Could any of you tell me if I'm right?
Thanks in advance.

kr,
Bast



-------- Forwarded Message --------
Subject: 	Re: Facet behavior
Date: 	Thu, 20 Oct 2016 14:45:23 +0200
From: 	Bastien Latard | MDPI AG <la...@mdpi.com>
To: 	solr-user@lucene.apache.org



Hi Yonik,

Thanks for your answer!
I'm not quite I understood everything...please, see my comments below.


> On Wed, Oct 19, 2016 at 6:23 AM, Bastien Latard | MDPI AG
> <la...@mdpi.com.invalid> wrote:
>> I just had a question about facets.
>> *==> Is the facet run on all documents (to pre-process/cache the data) or
>> only on returned documents?*
> Yes ;-)
>
> There are sometimes per-field data structures that are cached to
> support faceting.  This can make the first facet request after a new
> searcher take longer.  Unless you're using docValues, then the cost is
> much less.
So how to force it to use docValues? Simply:
<field name="my_field" type="string" indexed="false" stored="false"
docValues="true" />
Are there other advantage/inconvenient?

> Then there are per-request data structures (like a count array) that
> are O(field_cardinality) and not O(matching_docs).
> But then for default field-cache faceting, the actual counting part is
> O(matching_docs).
> So yes, at the end of  the day we only facet on the matching
> documents... but what the total field looks like certainly matters.
This would only be like that if I would use docValues, right?

If I have such field declaration (dedicated field for facet-- without
stemming), what would be the best setting?
<field name="author_facet" type="text_facet" indexed="true"
stored="true" required="false" multiValued="true" />

Kind regards,
Bastien


Re: Facet behavior

Posted by Yonik Seeley <ys...@gmail.com>.
On Thu, Oct 20, 2016 at 8:45 AM, Bastien Latard | MDPI AG
<la...@mdpi.com.invalid> wrote:
> Hi Yonik,
>
> Thanks for your answer!
> I'm not quite I understood everything...please, see my comments below.
>
>
>> On Wed, Oct 19, 2016 at 6:23 AM, Bastien Latard | MDPI AG
>> <la...@mdpi.com.invalid> wrote:
>>>
>>> I just had a question about facets.
>>> *==> Is the facet run on all documents (to pre-process/cache the data) or
>>> only on returned documents?*
>>
>> Yes ;-)
>>
>> There are sometimes per-field data structures that are cached to
>> support faceting.  This can make the first facet request after a new
>> searcher take longer.  Unless you're using docValues, then the cost is
>> much less.
>
> So how to force it to use docValues? Simply:
> <field name="my_field" type="string" indexed="false" stored="false"
> docValues="true" />
> Are there other advantage/inconvenient?

You probably still want the field indexed as well... that supports
fast filtering by specific values (fq=my_field:value1)
without having to do a complete column scan.

>> Then there are per-request data structures (like a count array) that
>> are O(field_cardinality) and not O(matching_docs).
>> But then for default field-cache faceting, the actual counting part is
>> O(matching_docs).
>> So yes, at the end of  the day we only facet on the matching
>> documents... but what the total field looks like certainly matters.
>
> This would only be like that if I would use docValues, right?

If docvalues aren't indexed, then they are built in memory (or
something like them) before they are used.

-Yonik

> If I have such field declaration (dedicated field for facet-- without
> stemming), what would be the best setting?
> <field name="author_facet" type="text_facet" indexed="true" stored="true"
> required="false" multiValued="true" />
>
> Kind regards,
> Bastien
>

Re: Facet behavior

Posted by Bastien Latard | MDPI AG <la...@mdpi.com.INVALID>.
Hi Yonik,

Thanks for your answer!
I'm not quite I understood everything...please, see my comments below.


> On Wed, Oct 19, 2016 at 6:23 AM, Bastien Latard | MDPI AG
> <la...@mdpi.com.invalid> wrote:
>> I just had a question about facets.
>> *==> Is the facet run on all documents (to pre-process/cache the data) or
>> only on returned documents?*
> Yes ;-)
>
> There are sometimes per-field data structures that are cached to
> support faceting.  This can make the first facet request after a new
> searcher take longer.  Unless you're using docValues, then the cost is
> much less.
So how to force it to use docValues? Simply:
<field name="my_field" type="string" indexed="false" stored="false" 
docValues="true" />
Are there other advantage/inconvenient?

> Then there are per-request data structures (like a count array) that
> are O(field_cardinality) and not O(matching_docs).
> But then for default field-cache faceting, the actual counting part is
> O(matching_docs).
> So yes, at the end of  the day we only facet on the matching
> documents... but what the total field looks like certainly matters.
This would only be like that if I would use docValues, right?

If I have such field declaration (dedicated field for facet-- without 
stemming), what would be the best setting?
<field name="author_facet" type="text_facet" indexed="true" 
stored="true" required="false" multiValued="true" />

Kind regards,
Bastien


Re: Facet behavior

Posted by Yonik Seeley <ys...@gmail.com>.
On Wed, Oct 19, 2016 at 6:23 AM, Bastien Latard | MDPI AG
<la...@mdpi.com.invalid> wrote:
> Hi everybody,
>
> I just had a question about facets.
> *==> Is the facet run on all documents (to pre-process/cache the data) or
> only on returned documents?*

Yes ;-)

There are sometimes per-field data structures that are cached to
support faceting.  This can make the first facet request after a new
searcher take longer.  Unless you're using docValues, then the cost is
much less.

Then there are per-request data structures (like a count array) that
are O(field_cardinality) and not O(matching_docs).
But then for default field-cache faceting, the actual counting part is
O(matching_docs).
So yes, at the end of  the day we only facet on the matching
documents... but what the total field looks like certainly matters.

-Yonik