You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Fer-Bj <fe...@gmail.com> on 2009/12/12 02:17:09 UTC

using q= , adding fq=

We're running a 14M documents index. For each document we have:
   <field name="id" 			type="sint" 	indexed="true"	stored="true"
required="true" /> 
   <field name="title" 			type="text_ngram" indexed="true"
stored="true"omitNorms="true"/>
   <field name="cat_id" 		type="sint" 	indexed="true" 	stored="true"/>
   <field name="geo_id" 		type="sint" 	indexed="true" 	stored="true"/>
   <field name="body" 			type="text" 	indexed="true" 	stored="false"
omitNorms="true"/>
   <field name="modified_datetime"  	type="date" 	indexed="true" 
stored="true"/>
(and a few other fields).

Our most usual query is something like this:
q=cat_id:xxx AND geo_id:yyyy&sort=id desc   where cat_id = which "category"
(cars,sports,toys,etc) the item belongs to, and geo_id = which city/district
the item belongs to.
So this query will return a list of documents posted in category xxx, region
yyy. 
Sorted by ID DESC, to get the newest first.

There are 2 questions I'd like to ask:

1) adding something like:  q=cat_id:xxx&fq=geo_id=yyyy would boost
performance?

2) we do find problems when we ask for a page=large offset!  ie: 
q=cat_id:xxx and geo_id:yyy&start=544545
(note that we limit docs to 50 max per resultset).
When start is 500 or more, Qtime is >=5 seconds.... while the avg qtime is
<100 ms

Any help or tips would be appreciated!

Thanks,



-- 
View this message in context: http://old.nabble.com/using-q%3D--%2C-adding-fq%3D-tp26753938p26753938.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: using q= , adding fq=

Posted by Chris Hostetter <ho...@fucit.org>.

: > 1) adding something like:  q=cat_id:xxx&fq=geo_id=yyyy would boost
: > performance?
: 
: 
: For the n > 1 query, yes, adding filters should improve performance 
: assuming it is selective enough.  The tradeoff is memory.

You might even find that something like this is faster...

   q=*:*&fq=cat_id:xxxx&fq=geo_id:yyyy

...but it can vary based on circumstances (depends a lot on how many 
unique xxxx and yyyy values you have, and how big each of those sets are, 
and how big you make your filterCache)

: > 2) we do find problems when we ask for a page=large offset!  ie: 
: > q=cat_id:xxx and geo_id:yyy&start=544545
: > (note that we limit docs to 50 max per resultset).
: > When start is 500 or more, Qtime is >=5 seconds.... while the avg qtime is
: > <100 ms

FWIW: limiting the number of rows per request to 50, but not limiting the 
start doesn't make much sense -- the same amount of work is needed to 
handle start=0&rows=5050 and start=5000&rows=50.

There are very few use cases for allowing people to iterate through all 
the rows that also require sorting.


-Hoss

Re: using q= , adding fq=

Posted by Grant Ingersoll <gs...@apache.org>.

On Dec 11, 2009, at 8:17 PM, Fer-Bj wrote:

> 
> We're running a 14M documents index. For each document we have:
>   <field name="id" 			type="sint" 	indexed="true"	stored="true"
> required="true" /> 
>   <field name="title" 			type="text_ngram" indexed="true"
> stored="true"omitNorms="true"/>
>   <field name="cat_id" 		type="sint" 	indexed="true" 	stored="true"/>
>   <field name="geo_id" 		type="sint" 	indexed="true" 	stored="true"/>
>   <field name="body" 			type="text" 	indexed="true" 	stored="false"
> omitNorms="true"/>
>   <field name="modified_datetime"  	type="date" 	indexed="true" 
> stored="true"/>
> (and a few other fields).
> 
> Our most usual query is something like this:
> q=cat_id:xxx AND geo_id:yyyy&sort=id desc   where cat_id = which "category"
> (cars,sports,toys,etc) the item belongs to, and geo_id = which city/district
> the item belongs to.
> So this query will return a list of documents posted in category xxx, region
> yyy. 
> Sorted by ID DESC, to get the newest first.
> 
> There are 2 questions I'd like to ask:
> 
> 1) adding something like:  q=cat_id:xxx&fq=geo_id=yyyy would boost
> performance?


For the n > 1 query, yes, adding filters should improve performance assuming it is selective enough.  The tradeoff is memory.

> 
> 2) we do find problems when we ask for a page=large offset!  ie: 
> q=cat_id:xxx and geo_id:yyy&start=544545
> (note that we limit docs to 50 max per resultset).
> When start is 500 or more, Qtime is >=5 seconds.... while the avg qtime is
> <100 ms

Yes, this is likely the case.  Deep paging is not the typical use case, so what happens is you have more and more disk accesses, plus there is a whole bunch of priority queue stuff going on.

See http://issues.apache.org/jira/browse/LUCENE-2127


> 
> Any help or tips would be appreciated!

Do you really need "sortable ints" for all those fields?  Are you doing range queries against them?  The name "sortable" X is a bit of a misnomer.  It doesn't mean sortable in the sense of the &sort parameter, it means sortable in the range query sense, as in cat_id:[55 TO 1005].

-Grant

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search