You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Dennis Gearon <ge...@sbcglobal.net> on 2010/12/10 06:54:56 UTC

Re: my index has 500 million docs ,how to improve solr search performance?

Late reply on this, but how is that big installation working out?

 Dennis Gearon


Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



----- Original Message ----
From: Lance Norskog <go...@gmail.com>
To: solr-user@lucene.apache.org
Sent: Wed, November 17, 2010 10:53:47 PM
Subject: Re: my index has 500 million docs ,how to improve solr search 
performance?

This is pretty standard. I think the problem is basic probabilities: 
when there are multiple shards, the query waits until the final shard 
responds, then does another query which may wait for more than one 
shard. The nature of probabilities is that there will be "stragglers" 
(late responses) and a long tail of response times by stragglers. The 
response time for a single solr is like a raindrop: that is, the chart 
of response time (X) v.s. number of samples with that time (Y). The 
curve starts at the earliest possible search, zooms up, then rounds off 
to a long tail.

So the time for searching 100 shards on 10 machines is that curve times 
ten, that is, longer and flatter. Virtual machines in general do not 
give solid consistent performance numbers. That there is no 'fairness' 
in dispatching searches, so some searches get good service and some bad.

Put these together (multiply the probability curves) and you will get 
really variable response times. I don't know how to guide you.

lu.rongbin wrote:
> thanks,Lance Norskog-2. I've tested the EBS, but it's not better. so ,maybe I
> have to optimize my solr config for ec2 m2.4xlarge.this kind computer config
> is :
>    cpu units: 26 ECUs
>    cpu cores: 8
>    memery: 68G
>
> ----------------
> solrconfig.xml content:
>
> <?xml version="1.0" encoding="UTF-8" ?>
>
> <config>
>
>
><abortOnConfigurationError>${solr.abortOnConfigurationError:true}</abortOnConfigurationError>
>>
>
>    <!-- there must be an index dir under this -->
>    <indexDefaults>
>     <!-- Values here affect all index writers and act as a default unless
> overridden. -->
>      <useCompoundFile>false</useCompoundFile>
>
>      <mergeFactor>10</mergeFactor>
>      <ramBufferSizeMB>32</ramBufferSizeMB>
>      <maxFieldLength>10000</maxFieldLength>
>      <writeLockTimeout>1000</writeLockTimeout>
>      <commitLockTimeout>10000</commitLockTimeout>
>
>
>      <!--
>        This option specifies which Lucene LockFactory implementation to use.
>
>        single = SingleInstanceLockFactory - suggested for a read-only index
>                 or when there is no possibility of another process trying
>                 to modify the index.
>        native = NativeFSLockFactory  - uses OS native file locking
>        simple = SimpleFSLockFactory  - uses a plain file for locking
>
>        (For backwards compatibility with Solr 1.2, 'simple' is the default
>         if not specified.)
>      -->
>      <lockType>native</lockType>
>    </indexDefaults>
>
>    <mainIndex>
>      <!-- options specific to the main on-disk lucene index -->
>      <useCompoundFile>false</useCompoundFile>
>      <ramBufferSizeMB>32</ramBufferSizeMB>
>      <mergeFactor>10</mergeFactor>
>      <maxFieldLength>10000</maxFieldLength>
>      <unlockOnStartup>false</unlockOnStartup>
>      <reopenReaders>true</reopenReaders>
>      <deletionPolicy class="solr.SolrDeletionPolicy">
>        <str name="keepOptimizedOnly">false</str>
>        <str name="maxCommitsToKeep">1</str>
>      </deletionPolicy>
>
>    </mainIndex>
>    <jmx />
>
>    <!-- Use the following format to specify a custom IndexReaderFactory -
> allows for alternate
>         IndexReader implementations.
>    <indexReaderFactory name="IndexReaderFactory" class="package.class">
>      Parameters as required by the implementation
>    </indexReaderFactory>
>    -->
>
>
>    <query>
>      <!-- Maximum number of clauses in a boolean query... can affect
>          range or prefix queries that expand to big boolean
>          queries.  An exception is thrown if exceeded.  -->
>      <maxBooleanClauses>1024</maxBooleanClauses>
>       <filterCache
>        class="solr.FastLRUCache"
>        size="5120"
>        initialSize="512"
>        autowarmCount="128"
>        cleanupThread="true"/>
>
>      <queryResultCache
>        class="solr.FastLRUCache"
>        size="20000"
>        initialSize="10240"
>        autowarmCount="320"
>        cleanupThread="true"/>
>
>      <documentCache
>        class="solr.FastLRUCache"
>        size="10240"
>        initialSize="10240"
>        autowarmCount="320"
>        cleanupThread="true"/>
>
>      <enableLazyFieldLoading>true</enableLazyFieldLoading>
>
>
>      <queryResultWindowSize>20</queryResultWindowSize>
>
>       <queryResultMaxDocsCached>20</queryResultMaxDocsCached>
>
>      <HashDocSet maxSize="3000" loadFactor="0.75"/>
>
>      <listener event="firstSearcher" class="solr.QuerySenderListener">
>        <arr name="queries">
>          <lst>  <str name="q">solr rocks</str><str name="start">0</str><str
> name="rows">10</str></lst>
>          <lst><str name="q">static firstSearcher warming query from
> solrconfig.xml</str></lst>
>        </arr>
>      </listener>
>
>      <useColdSearcher>false</useColdSearcher>
>      <maxWarmingSearchers>2</maxWarmingSearchers>
>
>    </query>
>
>    <requestDispatcher handleSelect="true">
>       <requestParsers enableRemoteStreaming="true"
> multipartUploadLimitInKB="2048" />
>      <httpCaching lastModifiedFrom="openTime"
>                   etagSeed="Solr">
>        </httpCaching>
>    </requestDispatcher>
>
>
>   <requestHandler name="standard" class="solr.SearchHandler" default="true">
>      <!-- default values for query parameters -->
>       <lst name="defaults">
>         <str name="echoParams">explicit</str><!--
>         <bool name="hl">true</bool>
>         <str name="hl.fl">name</str>
>         <int name="hl.snippets">1</int>
>         <str name="hl.formatter">html</str>
>         <str name="hl.fragsize">500</str>
>         <str name="hl.simple.pre"><![CDATA[]]></str>
>         <str name="hl.simple.post"><![CDATA[]]></str>
>         <str name="fl">*</str>
>         <int name="rows">10</int>
>         <str name="version">2.1</str>
>          -->
>       </lst>
>    </requestHandler>
>
> -------------------------------------
> schema.xml content:
>
> <schema name="example" version="1.1">
>    <types>
>      <!-- solr.StrField:by default tokenized="false" -->
>      <fieldType name="string" class="solr.StrField" sortMissingLast="true"
> omitNorms="false" />
>      <!-- solr.TextField:by default tokenized="true" -->
>      <fieldType name="text" class="solr.TextField" omitNorms="false">
>        <analyzer
> class="org.apache.lucene.analysis.cn.smart.MySmartChineseAnalyzer" />
>      </fieldType>
>      <fieldType name="w_text" class="solr.TextField" omitNorms="false">
>        <analyzer>
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        </analyzer>
>      </fieldType>
>      <fieldType name="t_double" class="solr.TrieDoubleField"
> precisionStep="8" omitNorms="false" positionIncrementGap="0" />
>   </types>
>
>   <fields>
>     <field name="id" type="string" indexed="true" stored="true"
> required="true" />
>     <field name="price"  type="t_double" indexed="true" stored="true"
> required="true" />
>     <field name="name" type="text" indexed="true" stored="false"
> required="true" />
>     <field name="comefrom" type="string" indexed="true" stored="false"/>
>     <field name="seller" type="string" indexed="true" stored="false" />
>     <field name="category" type="string" indexed="true" stored="false" />
>     <field name="detailpath" type="w_text" indexed="true" stored="false" />
>   </fields>
>
>   <uniqueKey>id</uniqueKey>
>   <defaultSearchField>name</defaultSearchField>
>
> I'm looking forward to your opinion
>
>
>