You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Midas A <te...@gmail.com> on 2019/07/08 09:08:01 UTC

Facet Query performance

Hi ,

I have enabled docvalues on facet field but query is still taking time.

How i can improve the Query time .
<field name="cat" type="string" indexed="true" stored="true"
docValues="true" multiValued="true" termVectors="true" /> <!--Category ID-->

*Query: *
http://X.X.X.X:PPPP
/solr/search/select?df=ttl&ps=0&hl=true&fl=id,upt&f.ind.mincount=1&hl.usePhraseHighlighter=true&f.pref.mincount=1&q.op=OR&fq=NOT+hemp:(%22xgidx29760%22+%22xmwxmonster%22+%22xmwxmonsterindia%22+%22xmwxcom%22+%22xswxmonster+com%22+%22xswxmonster%22+%22xswxmonsterindia+com%22+%22xswxmonsterindia%22)&fq=NOT+cEmp:(%
22nomster.com%22+OR+%22utyu%22)&fq=NOT+pEmp:(%22nomster.com
%22+OR+%22utyu%22)&fq=ind:(5)&fq=NOT+is_udis:2&fq=NOT+id:(92197+OR+240613+OR+249717+OR+1007148+OR+2500513+OR+2534675+OR+2813498+OR+9401682)&lowercaseOperators=true&ps2=0&bq=is_resume:0^-10000000&bq=upt_date:[*+TO+NOW/DAY-36MONTHS]^2&bq=upt_date:[NOW/DAY-36MONTHS+TO+NOW/DAY-24MONTHS]^3&bq=upt_date:[NOW/DAY-24MONTHS+TO+NOW/DAY-12MONTHS]^4&bq=upt_date:[NOW/DAY-12MONTHS+TO+NOW/DAY-9MONTHS]^5&bq=upt_date:[NOW/DAY-9MONTHS+TO+NOW/DAY-6MONTHS]^10&bq=upt_date:[NOW/DAY-6MONTHS+TO+NOW/DAY-3MONTHS]^15&bq=upt_date:[NOW/DAY-3MONTHS+TO+*]^20&bq=NOT+country:isoin^-1000000000&facet.query=exp:[+10+TO+11+]&facet.query=exp:[+11+TO+13+]&facet.query=exp:[+13+TO+15+]&facet.query=exp:[+15+TO+17+]&facet.query=exp:[+17+TO+20+]&facet.query=exp:[+20+TO+25+]&facet.query=exp:[+25+TO+109+]&facet.query=ctc:[+100+TO+101+]&facet.query=ctc:[+101+TO+101.5+]&facet.query=ctc:[+101.5+TO+102+]&facet.query=ctc:[+102+TO+103+]&facet.query=ctc:[+103+TO+104+]&facet.query=ctc:[+104+TO+105+]&facet.query=ctc:[+105+TO+107.5+]&facet.query=ctc:[+107.5+TO+110+]&facet.query=ctc:[+110+TO+115+]&facet.query=ctc:[+115+TO+10100+]&ps3=0&qf=contents^0.05+currdesig^1.5+predesig^1.5+lng^2+ttl+kw_skl+kw_it&f.cl.mincount=1&sow=false&hl.fl=ttl,kw_skl,kw_it,contents&wt=json&f.cat.mincount=1&qs=0&facet.field=ind&facet.field=cat&facet.field=rol&facet.field=cl&facet.field=pref&debug=timing&qt=/resumesearch&f.rol.mincount=1&start=0&rows=40&version=2&q=*&facet.limit=10&pf=id&hl.q=&facet.mincount=1&pf3=id&pf2=id&facet=true&debugQuery=false

Re: Facet Query performance

Posted by Shawn Heisey <ap...@elyograg.org>.
On 7/8/2019 12:00 PM, Midas A wrote:
> Number of Docs :500000+ docs
> Index Size: 300 GB
> RAM: 256 GB
> JVM: 32 GB

Half a million documents producing an index size of 300GB suggests 
*very* large documents.  That typically produces an index with fields 
that have very high cardinality, due to text tokenization.

Is Solr the only thing running on this machine, or does it have other 
memory-hungry software running on it?

The screenshot described at the following URL may provide more insight. 
It will be important to get the sort correct.  If the columns have been 
customized to show information other than the examples, it may need to 
be adjusted:

https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue

Assuming that Solr is the only thing on the machine, then it means you 
have about 224 GB of memory available to cache your index data, which is 
at least 300GB.  Normally I would think being able to cache two thirds 
of the index should be enough for good performance, but it's always 
possible that there is something about your setup that means you don't 
have enough memory.

Are you sure that you need a 32GB heap?  Half a million documents should 
NOT require anywhere near that much heap.

> Cardinality:
> cat=44
> rol=1005
> ind=504
> cl=2000

These cardinality values are VERY low.  If you are certain about those 
numbers, it is not likely that these fields are significant contributors 
to query time, either with or without docValues.  How did you obtain 
those numbers?

Those are not the only fields referenced in your query.  I also see these:

hemp
cEmp
pEmp
is_udis
id
is_resume
upt_date
country
exp
ctc
contents
currdesig
predesig
lng
ttl
kw_sql
kw_it

> QTime:  2988 ms

Three seconds for a query with so many facets is something I would 
probably be pretty happy to get.

> Our 35% queries takes more than 10 sec.

I have no idea what this sentence means.

> Please suggest the ways to improve response time . Attached queries and 
> schema.xml and solrconfig.xml
> 
> 1. Is there any other ways to rewrite queries that improve our query 
> performance .?

With the information available, the only suggestion I have currently is 
to replace "q=*" with "q=*:*" -- assuming that the intent is to match 
all documents with the main query.  According to what you attached 
(which I am very surprised to see -- attachments usually don't make it 
to the list), your df parameter is "ttl" ... a field that is heavily 
tokenized.  That means that the cardinality of the ttl field is probably 
VERY high, which would make the wildcard query VERY slow.

> 2. can we see the DocValues cache in plugin/ stats->cache-> section on 
> solr UI panel ?

The admin UI only shows Solr caches.  If Lucene even has a docValues 
cache (and I do not know whether it does), it will not be available in 
Solr's statistics.  I am unaware of any cache in Solr for docValues. 
The entire point of docValues is to avoid the need to generate and cache 
large amounts of data, so I suspect there is not going to be anything 
available in this regard.

Thanks,
Shawn

Re: Facet Query performance

Posted by Midas A <te...@gmail.com>.
Thanks shawn and sorry also for short question

Please find the details .

Number of Docs :500000+ docs
Index Size: 300 GB
RAM: 256 GB
JVM: 32 GB

Cardinality:
cat=44
rol=1005
ind=504
cl=2000

QTime:  2988 ms


Our 35% queries takes more than 10 sec.

Earlier DocValues are not enabled . We enabled it and reindexed whole index.

Please suggest the ways to improve response time . Attached queries and
schema.xml and solrconfig.xml

1. Is there any other ways to rewrite queries that improve our query
performance .?
2. can we see the DocValues cache in plugin/ stats->cache-> section on solr
UI panel ?






On Mon, Jul 8, 2019 at 9:10 PM Shawn Heisey <ap...@elyograg.org> wrote:

> On 7/8/2019 3:08 AM, Midas A wrote:
> > I have enabled docvalues on facet field but query is still taking time.
> >
> > How i can improve the Query time .
> > <field name="cat" type="string" indexed="true" stored="true"
> > docValues="true" multiValued="true" termVectors="true" /> <!--Category
> ID-->
> >
> > *Query: *
>
> <snip>
>
> There's very little information here -- only a single field definition
> and the query URL.  No information about how many documents, what sort
> of cardinality there is in the fields being used in the query, no
> information about memory and settings, etc.  You haven't even told us
> how long the query takes.
>
> Your main query is a single * wildcard.  A wildcard query is typically
> quite slow.  If you are aiming for all documents, change that to q=*:*
> instead -- this is special syntax that the query parser understands, and
> is normally executed very quickly.
>
> When a field has DocValues defined, it will automatically be used for
> field-based sorting, field-based facets, and field-based grouping.
> DocValues should not be relied on for queries, because indexed data is
> far faster for that usage.  Queries *can* be done with docValues, but it
> would be VERY slow.  Solr will avoid that usage if it can.
>
> I'm reasonably certain that docValues will NOT be used for facet.query
> as long as the field is indexed.
>
> You do have three-field based facets -- using the facet.field parameter.
>   If docValues was present on cat for ALL of the indexing that has
> happened, then they will work for that field, but you have not told us
> whether rol and pref have them defined.
>
> You have a lot of faceting in this query.  That can cause things to be
> slow.
>
> Thanks,
> Shawn
>

Re: Facet Query performance

Posted by Shawn Heisey <ap...@elyograg.org>.
On 7/8/2019 3:08 AM, Midas A wrote:
> I have enabled docvalues on facet field but query is still taking time.
> 
> How i can improve the Query time .
> <field name="cat" type="string" indexed="true" stored="true" 
> docValues="true" multiValued="true" termVectors="true" /> <!--Category ID-->
> 
> *Query: *

<snip>

There's very little information here -- only a single field definition 
and the query URL.  No information about how many documents, what sort 
of cardinality there is in the fields being used in the query, no 
information about memory and settings, etc.  You haven't even told us 
how long the query takes.

Your main query is a single * wildcard.  A wildcard query is typically 
quite slow.  If you are aiming for all documents, change that to q=*:* 
instead -- this is special syntax that the query parser understands, and 
is normally executed very quickly.

When a field has DocValues defined, it will automatically be used for 
field-based sorting, field-based facets, and field-based grouping. 
DocValues should not be relied on for queries, because indexed data is 
far faster for that usage.  Queries *can* be done with docValues, but it 
would be VERY slow.  Solr will avoid that usage if it can.

I'm reasonably certain that docValues will NOT be used for facet.query 
as long as the field is indexed.

You do have three-field based facets -- using the facet.field parameter. 
  If docValues was present on cat for ALL of the indexing that has 
happened, then they will work for that field, but you have not told us 
whether rol and pref have them defined.

You have a lot of faceting in this query.  That can cause things to be slow.

Thanks,
Shawn

Re: Facet Query performance

Posted by Midas A <te...@gmail.com>.
Hi
How i can know whether DocValues are getting used or not ?
Please help me here .

On Mon, Jul 8, 2019 at 2:38 PM Midas A <te...@gmail.com> wrote:

> Hi ,
>
> I have enabled docvalues on facet field but query is still taking time.
>
> How i can improve the Query time .
> <field name="cat" type="string" indexed="true" stored="true"
> docValues="true" multiValued="true" termVectors="true" /> <!--Category ID-->
>
> *Query: *
> http://X.X.X.X:PPPP
> /solr/search/select?df=ttl&ps=0&hl=true&fl=id,upt&f.ind.mincount=1&hl.usePhraseHighlighter=true&f.pref.mincount=1&q.op=OR&fq=NOT+hemp:(%22xgidx29760%22+%22xmwxmonster%22+%22xmwxmonsterindia%22+%22xmwxcom%22+%22xswxmonster+com%22+%22xswxmonster%22+%22xswxmonsterindia+com%22+%22xswxmonsterindia%22)&fq=NOT+cEmp:(%
> 22nomster.com%22+OR+%22utyu%22)&fq=NOT+pEmp:(%22nomster.com
> %22+OR+%22utyu%22)&fq=ind:(5)&fq=NOT+is_udis:2&fq=NOT+id:(92197+OR+240613+OR+249717+OR+1007148+OR+2500513+OR+2534675+OR+2813498+OR+9401682)&lowercaseOperators=true&ps2=0&bq=is_resume:0^-10000000&bq=upt_date:[*+TO+NOW/DAY-36MONTHS]^2&bq=upt_date:[NOW/DAY-36MONTHS+TO+NOW/DAY-24MONTHS]^3&bq=upt_date:[NOW/DAY-24MONTHS+TO+NOW/DAY-12MONTHS]^4&bq=upt_date:[NOW/DAY-12MONTHS+TO+NOW/DAY-9MONTHS]^5&bq=upt_date:[NOW/DAY-9MONTHS+TO+NOW/DAY-6MONTHS]^10&bq=upt_date:[NOW/DAY-6MONTHS+TO+NOW/DAY-3MONTHS]^15&bq=upt_date:[NOW/DAY-3MONTHS+TO+*]^20&bq=NOT+country:isoin^-1000000000&facet.query=exp:[+10+TO+11+]&facet.query=exp:[+11+TO+13+]&facet.query=exp:[+13+TO+15+]&facet.query=exp:[+15+TO+17+]&facet.query=exp:[+17+TO+20+]&facet.query=exp:[+20+TO+25+]&facet.query=exp:[+25+TO+109+]&facet.query=ctc:[+100+TO+101+]&facet.query=ctc:[+101+TO+101.5+]&facet.query=ctc:[+101.5+TO+102+]&facet.query=ctc:[+102+TO+103+]&facet.query=ctc:[+103+TO+104+]&facet.query=ctc:[+104+TO+105+]&facet.query=ctc:[+105+TO+107.5+]&facet.query=ctc:[+107.5+TO+110+]&facet.query=ctc:[+110+TO+115+]&facet.query=ctc:[+115+TO+10100+]&ps3=0&qf=contents^0.05+currdesig^1.5+predesig^1.5+lng^2+ttl+kw_skl+kw_it&f.cl.mincount=1&sow=false&hl.fl=ttl,kw_skl,kw_it,contents&wt=json&f.cat.mincount=1&qs=0&facet.field=ind&facet.field=cat&facet.field=rol&facet.field=cl&facet.field=pref&debug=timing&qt=/resumesearch&f.rol.mincount=1&start=0&rows=40&version=2&q=*&facet.limit=10&pf=id&hl.q=&facet.mincount=1&pf3=id&pf2=id&facet=true&debugQuery=false
>
>