You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by mikopacz <ka...@gmail.com> on 2011/08/19 13:34:32 UTC

Solr performance for query without filter

Hi

I have one instance of solr running on JBoss with the following schema and
partial config:

Schema:

<schema name="users_szukacz" version="1.4">
−
<types>
<fieldType name="string" class="solr.StrField" sortMissingLast="true"
omitNorms="true"/>
<fieldType name="int" class="solr.TrieIntField" omitNorms="true"
precisionStep="1" positionIncrementGap="0"/>
<fieldType name="date" class="solr.TrieDateField" omitNorms="true"
positionIncrementGap="0"/>
−
<fieldType name="text_pl" class="solr.TextField" positionIncrementGap="100">
−
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
−
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
</types>
−
<fields>
<field name="user_id" type="int" indexed="true" required="true"/>
<field name="birth_date" type="date" indexed="true" stored="false"/>
<field name="city" type="text_pl" indexed="true" stored="false"/>
<field name="sex" type="text_pl" indexed="true" stored="false"/>
<field name="show_search" type="int" indexed="true" stored="false"/>
<field name="confirmed" type="int" indexed="true" stored="false"/>
<field name="search_text" type="text_pl" indexed="true"/>
</fields>
<uniqueKey>user_id</uniqueKey>
<defaultSearchField>search_text</defaultSearchField>
<solrQueryParser defaultOperator="AND"/>
</schema>

Config:

<directoryFactory name="DirectoryFactory"
class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>

<mergeFactor>10</mergeFactor>

<ramBufferSizeMB>1024</ramBufferSizeMB>

<maxBufferedDocs>1000</maxBufferedDocs>
<maxFieldLength>10000</maxFieldLength>
<writeLockTimeout>1000</writeLockTimeout>
<commitLockTimeout>10000</commitLockTimeout>
<filterCache class="solr.FastLRUCache" size="1000000" initialSize="1000000"
autowarmCount="0"/>
<queryResultCache class="solr.LRUCache" size="512" initialSize="512"
autowarmCount="0"/>
<documentCache class="solr.LRUCache" size="13000000" initialSize="13000000"
autowarmCount="0"/>

Index has 41 000 000 documents and 9 GB size. For query like:
1)
*q=Jarecki+Jan*&fq=sex:M&fq=confirmed:1&fq=show_search:3&fl=user_id&start=0&rows=10&wt=json&version=2.2

server reaches avarage *90 query/s* on 4 theards and is very small for me.

For query with filer on filed city:
2) ex.
fl=user_id&indent=on&start=0&q=Tarkowski+Bartłomiej&wt=json&*fq=city:Kwidzyn*&fq=sex:M&fq=confirmed:1&fq=show_search:3&version=2.2&rows=10

server reaches 800 query/s.

Do you have any advice to speed the search for first query? Is this speed is
the norm?

Server has 32GB RAM and 4 processors Intel Xeon 2.5GHz.

--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-performance-for-query-without-filter-tp3267785p3267785.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr performance for query without filter

Posted by Chris Hostetter <ho...@fucit.org>.
: Index has 41 000 000 documents and 9 GB size. For query like:
: 1)
: *q=Jarecki+Jan*&fq=sex:M&fq=confirmed:1&fq=show_search:3&fl=user_id&start=0&rows=10&wt=json&version=2.2
: 
: server reaches avarage *90 query/s* on 4 theards and is very small for me.
: 
: For query with filer on filed city:
: 2) ex.
: fl=user_id&indent=on&start=0&q=Tarkowski+Bartłomiej&wt=json&*fq=city:Kwidzyn*&fq=sex:M&fq=confirmed:1&fq=show_search:3&version=2.2&rows=10
: 
: server reaches 800 query/s.
: 
: Do you have any advice to speed the search for first query? Is this speed is
: the norm?

"norm" is hard to define, but one key element you left out is how many 
docs are (typically) matched by requests of type #1 vs type #2. and how 
good a job your "city" filters do in partitioning the total number of 
documents.

I suspect that your city filters are heavily reused (ie: good cache hit 
rates) and do a really good job of cutting down the number of matching 
docs -- (ie: num docs matching fq=sex:M&fq=confirmed:1&fq=show_search:3 is 
probably significantly higher then num docs matching 
fq=sex:M&fq=confirmed:1&fq=show_search:3&fq=city:Kwidzyn ).  In which case 
it makes sense that type#1 queries would take a lot longer on average -- 
there are a lot more docs to consider when evaluating the "q" to find 
matches.



-Hoss