You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Fer-Bj <fe...@gmail.com> on 2009/12/12 02:17:09 UTC
using q= , adding fq=
We're running a 14M documents index. For each document we have:
<field name="id" type="sint" indexed="true" stored="true"
required="true" />
<field name="title" type="text_ngram" indexed="true"
stored="true"omitNorms="true"/>
<field name="cat_id" type="sint" indexed="true" stored="true"/>
<field name="geo_id" type="sint" indexed="true" stored="true"/>
<field name="body" type="text" indexed="true" stored="false"
omitNorms="true"/>
<field name="modified_datetime" type="date" indexed="true"
stored="true"/>
(and a few other fields).
Our most usual query is something like this:
q=cat_id:xxx AND geo_id:yyyy&sort=id desc where cat_id = which "category"
(cars,sports,toys,etc) the item belongs to, and geo_id = which city/district
the item belongs to.
So this query will return a list of documents posted in category xxx, region
yyy.
Sorted by ID DESC, to get the newest first.
There are 2 questions I'd like to ask:
1) adding something like: q=cat_id:xxx&fq=geo_id=yyyy would boost
performance?
2) we do find problems when we ask for a page=large offset! ie:
q=cat_id:xxx and geo_id:yyy&start=544545
(note that we limit docs to 50 max per resultset).
When start is 500 or more, Qtime is >=5 seconds.... while the avg qtime is
<100 ms
Any help or tips would be appreciated!
Thanks,
--
View this message in context: http://old.nabble.com/using-q%3D--%2C-adding-fq%3D-tp26753938p26753938.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: using q= , adding fq=
Posted by Chris Hostetter <ho...@fucit.org>.
: > 1) adding something like: q=cat_id:xxx&fq=geo_id=yyyy would boost
: > performance?
:
:
: For the n > 1 query, yes, adding filters should improve performance
: assuming it is selective enough. The tradeoff is memory.
You might even find that something like this is faster...
q=*:*&fq=cat_id:xxxx&fq=geo_id:yyyy
...but it can vary based on circumstances (depends a lot on how many
unique xxxx and yyyy values you have, and how big each of those sets are,
and how big you make your filterCache)
: > 2) we do find problems when we ask for a page=large offset! ie:
: > q=cat_id:xxx and geo_id:yyy&start=544545
: > (note that we limit docs to 50 max per resultset).
: > When start is 500 or more, Qtime is >=5 seconds.... while the avg qtime is
: > <100 ms
FWIW: limiting the number of rows per request to 50, but not limiting the
start doesn't make much sense -- the same amount of work is needed to
handle start=0&rows=5050 and start=5000&rows=50.
There are very few use cases for allowing people to iterate through all
the rows that also require sorting.
-Hoss
Re: using q= , adding fq=
Posted by Grant Ingersoll <gs...@apache.org>.
On Dec 11, 2009, at 8:17 PM, Fer-Bj wrote:
>
> We're running a 14M documents index. For each document we have:
> <field name="id" type="sint" indexed="true" stored="true"
> required="true" />
> <field name="title" type="text_ngram" indexed="true"
> stored="true"omitNorms="true"/>
> <field name="cat_id" type="sint" indexed="true" stored="true"/>
> <field name="geo_id" type="sint" indexed="true" stored="true"/>
> <field name="body" type="text" indexed="true" stored="false"
> omitNorms="true"/>
> <field name="modified_datetime" type="date" indexed="true"
> stored="true"/>
> (and a few other fields).
>
> Our most usual query is something like this:
> q=cat_id:xxx AND geo_id:yyyy&sort=id desc where cat_id = which "category"
> (cars,sports,toys,etc) the item belongs to, and geo_id = which city/district
> the item belongs to.
> So this query will return a list of documents posted in category xxx, region
> yyy.
> Sorted by ID DESC, to get the newest first.
>
> There are 2 questions I'd like to ask:
>
> 1) adding something like: q=cat_id:xxx&fq=geo_id=yyyy would boost
> performance?
For the n > 1 query, yes, adding filters should improve performance assuming it is selective enough. The tradeoff is memory.
>
> 2) we do find problems when we ask for a page=large offset! ie:
> q=cat_id:xxx and geo_id:yyy&start=544545
> (note that we limit docs to 50 max per resultset).
> When start is 500 or more, Qtime is >=5 seconds.... while the avg qtime is
> <100 ms
Yes, this is likely the case. Deep paging is not the typical use case, so what happens is you have more and more disk accesses, plus there is a whole bunch of priority queue stuff going on.
See http://issues.apache.org/jira/browse/LUCENE-2127
>
> Any help or tips would be appreciated!
Do you really need "sortable ints" for all those fields? Are you doing range queries against them? The name "sortable" X is a bit of a misnomer. It doesn't mean sortable in the sense of the &sort parameter, it means sortable in the range query sense, as in cat_id:[55 TO 1005].
-Grant
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search