You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by MOUSSA MZE Oussama-ext <ou...@pole-emploi.fr> on 2018/02/19 15:47:39 UTC

Facet performance problem

Hi

We have following environement :

3 nodes cluster
1 shard
Replication factor = 2
8GB per node

29 millions of documents

We've faceting over field "motifPresence" defined as follow:

<field name="motifPresence" type="string" docValues="true" indexed="false" stored="true" required="false"/>

Once the user selects motifPresence filter we executes search again with:

fq: (value1 OR value2 OR value3 OR ...)

The problem is: During facet filtering query is too slow and her response time is greater than main search (without facet filtering).

Thanks in advance!

Re: Facet performance problem

Posted by Shawn Heisey <el...@elyograg.org>.
On 2/20/2018 1:18 AM, LOPEZ-CORTES Mariano-ext wrote:
> We return a facet list of values in "motifPresence" field (person status).
> 	Status:
> 	[ ] status1
> 	[x] status2
> 	[x] status3
>
> The user then selects 1 or multiple status (It's this step that we called "facet filtering").
>
> Query is then re-executed with fq=motifPresence:(status2 OR status3)
>
> We use fq in order to not alter the score in main query.
>
> We've read that docValues=true for facet fields.
>
> We need also indexed=true?

Facets, grouping, and sorting are more efficient with docValues, but 
searches aren't helped by docValues.  Without indexed="true", searches 
on the field will be VERY slow.  A filter query is still a search.  The 
"filter" in filter query just refers to the fact that it's separate from 
the main query, and that it does not affect relevancy scoring.

Thanks,
Shawn


RE: Facet performance problem

Posted by LOPEZ-CORTES Mariano-ext <ma...@pole-emploi.fr>.
Our query looks like this:

...factet=true&facet.field=motifPresence

We return a facet list of values in "motifPresence" field (person status).
	Status:
	[ ] status1
	[x] status2
	[x] status3

The user then selects 1 or multiple status (It's this step that we called "facet filtering").

Query is then re-executed with fq=motifPresence:(status2 OR status3)

We use fq in order to not alter the score in main query.

We've read that docValues=true for facet fields.  

We need also indexed=true?
Is there any other problem in our solution?

-----Message d'origine-----
De : Erick Erickson [mailto:erickerickson@gmail.com] 
Envoyé : lundi 19 février 2018 18:18
À : solr-user
Objet : Re: Facet performance problem

I'm confused here. What do you mean by "facet filtering"? Your examples have no facets at all, just a _filter query_.

I'll assume you want to use filter query (fq), and faceting has nothing to do with it. This is one of the tricky bits of docValues.
While it's _possible_ to search on a field that's defined as above, it's very inefficient since there's no "inverted index" for the field, you specified 'indexed="false" '. So the docValues are searched, and it's essentially a table scan.

If you mean to search against this field, set indexed="true". You'll have to completely reindex your corpus of course.

If you intend to facet, group or sort on this field, you should _also_ have docValues="true".

Best,
Erick

On Mon, Feb 19, 2018 at 7:47 AM, MOUSSA MZE Oussama-ext <ou...@pole-emploi.fr> wrote:
> Hi
>
> We have following environement :
>
> 3 nodes cluster
> 1 shard
> Replication factor = 2
> 8GB per node
>
> 29 millions of documents
>
> We've faceting over field "motifPresence" defined as follow:
>
> <field name="motifPresence" type="string" docValues="true" 
> indexed="false" stored="true" required="false"/>
>
> Once the user selects motifPresence filter we executes search again with:
>
> fq: (value1 OR value2 OR value3 OR ...)
>
> The problem is: During facet filtering query is too slow and her response time is greater than main search (without facet filtering).
>
> Thanks in advance!

Re: Facet performance problem

Posted by Erick Erickson <er...@gmail.com>.
I'm confused here. What do you mean by "facet filtering"? Your
examples have no facets at all, just a _filter query_.

I'll assume you want to use filter query (fq), and faceting has
nothing to do with it. This is one of the tricky bits of docValues.
While it's _possible_ to search on a field that's defined as above,
it's very inefficient since there's no "inverted index" for the field,
you specified 'indexed="false" '. So the docValues are searched, and
it's essentially a table scan.

If you mean to search against this field, set indexed="true". You'll
have to completely reindex your corpus of course.

If you intend to facet, group or sort on this field, you should _also_
have docValues="true".

Best,
Erick

On Mon, Feb 19, 2018 at 7:47 AM, MOUSSA MZE Oussama-ext
<ou...@pole-emploi.fr> wrote:
> Hi
>
> We have following environement :
>
> 3 nodes cluster
> 1 shard
> Replication factor = 2
> 8GB per node
>
> 29 millions of documents
>
> We've faceting over field "motifPresence" defined as follow:
>
> <field name="motifPresence" type="string" docValues="true" indexed="false" stored="true" required="false"/>
>
> Once the user selects motifPresence filter we executes search again with:
>
> fq: (value1 OR value2 OR value3 OR ...)
>
> The problem is: During facet filtering query is too slow and her response time is greater than main search (without facet filtering).
>
> Thanks in advance!