You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Toke Eskildsen <te...@statsbiblioteket.dk> on 2010/11/15 11:00:44 UTC

Re: my index has 500 million docs ,how to improve solr searchperformance？

On Mon, 2010-11-15 at 06:35 +0100, lu.rongbin wrote:
> In addition,my index has only two store fields, id and price, and other
> fields are index. I increase the document and query cache. the ec2
> m2.4xLarge instance is 8 cores, 68G memery. all indexs size is about 100G.

Looking at http://aws.amazon.com/ec2/instance-types/ I can see that
Amazon recommends using "EBS to get improved storage I/O performance for
disk bound applications". As Lucene/Solr is very often I/O bound (or
more precisely random access I/O bound), you might consider the EBS
option.

I found this article that looks very relevant:
http://www.coreyhulen.org/?p=326
It is about Cassandra (a database), but I'm guessing that the I/O
pattern is fairly similar to Lucene/Solr with a lot random access reads.

Extrapolating wildly it would seem that disk I/O latency is a problem
with Amazon's cloud, at least compared with the obvious choice of using
SSD on a local machine. If this holds true, some things you could try
would be better warming of your searches, holding (part of) your index
in RAM, switching to EBS or ... moving away from the cloud.

All this is assuming that it really is I/O that is your problem. Have
you looked at CPU-load vs. I/O wait while issuing a batch of queries?

Disclaimer: I have no experience with Amazon's cloud service.

Re: my index has 500 million docs ,how to improve solr search performance？

Posted by Dennis Gearon <ge...@sbcglobal.net>.

Late reply on this, but how is that big installation working out?

 Dennis Gearon


Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



----- Original Message ----
From: Lance Norskog <go...@gmail.com>
To: solr-user@lucene.apache.org
Sent: Wed, November 17, 2010 10:53:47 PM
Subject: Re: my index has 500 million docs ,how to improve solr search 
performance？

This is pretty standard. I think the problem is basic probabilities: 
when there are multiple shards, the query waits until the final shard 
responds, then does another query which may wait for more than one 
shard. The nature of probabilities is that there will be "stragglers" 
(late responses) and a long tail of response times by stragglers. The 
response time for a single solr is like a raindrop: that is, the chart 
of response time (X) v.s. number of samples with that time (Y). The 
curve starts at the earliest possible search, zooms up, then rounds off 
to a long tail.

So the time for searching 100 shards on 10 machines is that curve times 
ten, that is, longer and flatter. Virtual machines in general do not 
give solid consistent performance numbers. That there is no 'fairness' 
in dispatching searches, so some searches get good service and some bad.

Put these together (multiply the probability curves) and you will get 
really variable response times. I don't know how to guide you.

lu.rongbin wrote:
> thanks,Lance Norskog-2. I've tested the EBS, but it's not better. so ,maybe I
> have to optimize my solr config for ec2 m2.4xlarge.this kind computer config
> is :
>    cpu units: 26 ECUs
>    cpu cores: 8
>    memery: 68G
>
> ----------------
> solrconfig.xml content:
>
> <?xml version="1.0" encoding="UTF-8" ?>
>
> <config>
>
>
><abortOnConfigurationError>${solr.abortOnConfigurationError:true}</abortOnConfigurationError>
>>
>
>    <!-- there must be an index dir under this -->
>    <indexDefaults>
>     <!-- Values here affect all index writers and act as a default unless
> overridden. -->
>      <useCompoundFile>false</useCompoundFile>
>
>      <mergeFactor>10</mergeFactor>
>      <ramBufferSizeMB>32</ramBufferSizeMB>
>      <maxFieldLength>10000</maxFieldLength>
>      <writeLockTimeout>1000</writeLockTimeout>
>      <commitLockTimeout>10000</commitLockTimeout>
>
>
>      <!--
>        This option specifies which Lucene LockFactory implementation to use.
>
>        single = SingleInstanceLockFactory - suggested for a read-only index
>                 or when there is no possibility of another process trying
>                 to modify the index.
>        native = NativeFSLockFactory  - uses OS native file locking
>        simple = SimpleFSLockFactory  - uses a plain file for locking
>
>        (For backwards compatibility with Solr 1.2, 'simple' is the default
>         if not specified.)
>      -->
>      <lockType>native</lockType>
>    </indexDefaults>
>
>    <mainIndex>
>      <!-- options specific to the main on-disk lucene index -->
>      <useCompoundFile>false</useCompoundFile>
>      <ramBufferSizeMB>32</ramBufferSizeMB>
>      <mergeFactor>10</mergeFactor>
>      <maxFieldLength>10000</maxFieldLength>
>      <unlockOnStartup>false</unlockOnStartup>
>      <reopenReaders>true</reopenReaders>
>      <deletionPolicy class="solr.SolrDeletionPolicy">
>        <str name="keepOptimizedOnly">false</str>
>        <str name="maxCommitsToKeep">1</str>
>      </deletionPolicy>
>
>    </mainIndex>
>    <jmx />
>
>    <!-- Use the following format to specify a custom IndexReaderFactory -
> allows for alternate
>         IndexReader implementations.
>    <indexReaderFactory name="IndexReaderFactory" class="package.class">
>      Parameters as required by the implementation
>    </indexReaderFactory>
>    -->
>
>
>    <query>
>      <!-- Maximum number of clauses in a boolean query... can affect
>          range or prefix queries that expand to big boolean
>          queries.  An exception is thrown if exceeded.  -->
>      <maxBooleanClauses>1024</maxBooleanClauses>
>       <filterCache
>        class="solr.FastLRUCache"
>        size="5120"
>        initialSize="512"
>        autowarmCount="128"
>        cleanupThread="true"/>
>
>      <queryResultCache
>        class="solr.FastLRUCache"
>        size="20000"
>        initialSize="10240"
>        autowarmCount="320"
>        cleanupThread="true"/>
>
>      <documentCache
>        class="solr.FastLRUCache"
>        size="10240"
>        initialSize="10240"
>        autowarmCount="320"
>        cleanupThread="true"/>
>
>      <enableLazyFieldLoading>true</enableLazyFieldLoading>
>
>
>      <queryResultWindowSize>20</queryResultWindowSize>
>
>       <queryResultMaxDocsCached>20</queryResultMaxDocsCached>
>
>      <HashDocSet maxSize="3000" loadFactor="0.75"/>
>
>      <listener event="firstSearcher" class="solr.QuerySenderListener">
>        <arr name="queries">
>          <lst>  <str name="q">solr rocks</str><str name="start">0</str><str
> name="rows">10</str></lst>
>          <lst><str name="q">static firstSearcher warming query from
> solrconfig.xml</str></lst>
>        </arr>
>      </listener>
>
>      <useColdSearcher>false</useColdSearcher>
>      <maxWarmingSearchers>2</maxWarmingSearchers>
>
>    </query>
>
>    <requestDispatcher handleSelect="true">
>       <requestParsers enableRemoteStreaming="true"
> multipartUploadLimitInKB="2048" />
>      <httpCaching lastModifiedFrom="openTime"
>                   etagSeed="Solr">
>        </httpCaching>
>    </requestDispatcher>
>
>
>   <requestHandler name="standard" class="solr.SearchHandler" default="true">
>      <!-- default values for query parameters -->
>       <lst name="defaults">
>         <str name="echoParams">explicit</str><!--
>         <bool name="hl">true</bool>
>         <str name="hl.fl">name</str>
>         <int name="hl.snippets">1</int>
>         <str name="hl.formatter">html</str>
>         <str name="hl.fragsize">500</str>
>         <str name="hl.simple.pre"><![CDATA[]]></str>
>         <str name="hl.simple.post"><![CDATA[]]></str>
>         <str name="fl">*</str>
>         <int name="rows">10</int>
>         <str name="version">2.1</str>
>          -->
>       </lst>
>    </requestHandler>
>
> -------------------------------------
> schema.xml content:
>
> <schema name="example" version="1.1">
>    <types>
>      <!-- solr.StrField:by default tokenized="false" -->
>      <fieldType name="string" class="solr.StrField" sortMissingLast="true"
> omitNorms="false" />
>      <!-- solr.TextField:by default tokenized="true" -->
>      <fieldType name="text" class="solr.TextField" omitNorms="false">
>        <analyzer
> class="org.apache.lucene.analysis.cn.smart.MySmartChineseAnalyzer" />
>      </fieldType>
>      <fieldType name="w_text" class="solr.TextField" omitNorms="false">
>        <analyzer>
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        </analyzer>
>      </fieldType>
>      <fieldType name="t_double" class="solr.TrieDoubleField"
> precisionStep="8" omitNorms="false" positionIncrementGap="0" />
>   </types>
>
>   <fields>
>     <field name="id" type="string" indexed="true" stored="true"
> required="true" />
>     <field name="price"  type="t_double" indexed="true" stored="true"
> required="true" />
>     <field name="name" type="text" indexed="true" stored="false"
> required="true" />
>     <field name="comefrom" type="string" indexed="true" stored="false"/>
>     <field name="seller" type="string" indexed="true" stored="false" />
>     <field name="category" type="string" indexed="true" stored="false" />
>     <field name="detailpath" type="w_text" indexed="true" stored="false" />
>   </fields>
>
>   <uniqueKey>id</uniqueKey>
>   <defaultSearchField>name</defaultSearchField>
>
> I'm looking forward to your opinion
>
>
>

Re: my index has 500 million docs ,how to improve solr search performance？

Posted by Lance Norskog <go...@gmail.com>.

This is pretty standard. I think the problem is basic probabilities: 
when there are multiple shards, the query waits until the final shard 
responds, then does another query which may wait for more than one 
shard. The nature of probabilities is that there will be "stragglers" 
(late responses) and a long tail of response times by stragglers. The 
response time for a single solr is like a raindrop: that is, the chart 
of response time (X) v.s. number of samples with that time (Y). The 
curve starts at the earliest possible search, zooms up, then rounds off 
to a long tail.

So the time for searching 100 shards on 10 machines is that curve times 
ten, that is, longer and flatter. Virtual machines in general do not 
give solid consistent performance numbers. That there is no 'fairness' 
in dispatching searches, so some searches get good service and some bad.

Put these together (multiply the probability curves) and you will get 
really variable response times. I don't know how to guide you.

lu.rongbin wrote:
> thanks,Lance Norskog-2. I've tested the EBS, but it's not better. so ,maybe I
> have to optimize my solr config for ec2 m2.4xlarge.this kind computer config
> is :
>    cpu units: 26 ECUs
>    cpu cores: 8
>    memery: 68G
>
> ----------------
> solrconfig.xml content:
>
> <?xml version="1.0" encoding="UTF-8" ?>
>
> <config>
>
>
> <abortOnConfigurationError>${solr.abortOnConfigurationError:true}</abortOnConfigurationError>
>
>    <!-- there must be an index dir under this -->
>    <indexDefaults>
>     <!-- Values here affect all index writers and act as a default unless
> overridden. -->
>      <useCompoundFile>false</useCompoundFile>
>
>      <mergeFactor>10</mergeFactor>
>      <ramBufferSizeMB>32</ramBufferSizeMB>
>      <maxFieldLength>10000</maxFieldLength>
>      <writeLockTimeout>1000</writeLockTimeout>
>      <commitLockTimeout>10000</commitLockTimeout>
>
>
>      <!--
>        This option specifies which Lucene LockFactory implementation to use.
>
>        single = SingleInstanceLockFactory - suggested for a read-only index
>                 or when there is no possibility of another process trying
>                 to modify the index.
>        native = NativeFSLockFactory  - uses OS native file locking
>        simple = SimpleFSLockFactory  - uses a plain file for locking
>
>        (For backwards compatibility with Solr 1.2, 'simple' is the default
>         if not specified.)
>      -->
>      <lockType>native</lockType>
>    </indexDefaults>
>
>    <mainIndex>
>      <!-- options specific to the main on-disk lucene index -->
>      <useCompoundFile>false</useCompoundFile>
>      <ramBufferSizeMB>32</ramBufferSizeMB>
>      <mergeFactor>10</mergeFactor>
>      <maxFieldLength>10000</maxFieldLength>
>      <unlockOnStartup>false</unlockOnStartup>
>      <reopenReaders>true</reopenReaders>
>      <deletionPolicy class="solr.SolrDeletionPolicy">
>        <str name="keepOptimizedOnly">false</str>
>        <str name="maxCommitsToKeep">1</str>
>      </deletionPolicy>
>
>    </mainIndex>
>    <jmx />
>
>    <!-- Use the following format to specify a custom IndexReaderFactory -
> allows for alternate
>         IndexReader implementations.
>    <indexReaderFactory name="IndexReaderFactory" class="package.class">
>      Parameters as required by the implementation
>    </indexReaderFactory>
>    -->
>
>
>    <query>
>      <!-- Maximum number of clauses in a boolean query... can affect
>          range or prefix queries that expand to big boolean
>          queries.  An exception is thrown if exceeded.  -->
>      <maxBooleanClauses>1024</maxBooleanClauses>
>       <filterCache
>        class="solr.FastLRUCache"
>        size="5120"
>        initialSize="512"
>        autowarmCount="128"
>        cleanupThread="true"/>
>
>      <queryResultCache
>        class="solr.FastLRUCache"
>        size="20000"
>        initialSize="10240"
>        autowarmCount="320"
>        cleanupThread="true"/>
>
>      <documentCache
>        class="solr.FastLRUCache"
>        size="10240"
>        initialSize="10240"
>        autowarmCount="320"
>        cleanupThread="true"/>
>
>      <enableLazyFieldLoading>true</enableLazyFieldLoading>
>
>
>      <queryResultWindowSize>20</queryResultWindowSize>
>
>       <queryResultMaxDocsCached>20</queryResultMaxDocsCached>
>
>      <HashDocSet maxSize="3000" loadFactor="0.75"/>
>
>      <listener event="firstSearcher" class="solr.QuerySenderListener">
>        <arr name="queries">
>          <lst>  <str name="q">solr rocks</str><str name="start">0</str><str
> name="rows">10</str></lst>
>          <lst><str name="q">static firstSearcher warming query from
> solrconfig.xml</str></lst>
>        </arr>
>      </listener>
>
>      <useColdSearcher>false</useColdSearcher>
>      <maxWarmingSearchers>2</maxWarmingSearchers>
>
>    </query>
>
>    <requestDispatcher handleSelect="true">
>       <requestParsers enableRemoteStreaming="true"
> multipartUploadLimitInKB="2048" />
>      <httpCaching lastModifiedFrom="openTime"
>                   etagSeed="Solr">
>        </httpCaching>
>    </requestDispatcher>
>
>
>   <requestHandler name="standard" class="solr.SearchHandler" default="true">
>      <!-- default values for query parameters -->
>       <lst name="defaults">
>         <str name="echoParams">explicit</str><!--
>         <bool name="hl">true</bool>
>         <str name="hl.fl">name</str>
>         <int name="hl.snippets">1</int>
>         <str name="hl.formatter">html</str>
>         <str name="hl.fragsize">500</str>
>         <str name="hl.simple.pre"><![CDATA[]]></str>
>         <str name="hl.simple.post"><![CDATA[]]></str>
>         <str name="fl">*</str>
>         <int name="rows">10</int>
>         <str name="version">2.1</str>
>          -->
>       </lst>
>    </requestHandler>
>
> -------------------------------------
> schema.xml content:
>
> <schema name="example" version="1.1">
>    <types>
>      <!-- solr.StrField:by default tokenized="false" -->
>      <fieldType name="string" class="solr.StrField" sortMissingLast="true"
> omitNorms="false" />
>      <!-- solr.TextField:by default tokenized="true" -->
>      <fieldType name="text" class="solr.TextField" omitNorms="false">
>        <analyzer
> class="org.apache.lucene.analysis.cn.smart.MySmartChineseAnalyzer" />
>      </fieldType>
>      <fieldType name="w_text" class="solr.TextField" omitNorms="false">
>        <analyzer>
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        </analyzer>
>      </fieldType>
>      <fieldType name="t_double" class="solr.TrieDoubleField"
> precisionStep="8" omitNorms="false" positionIncrementGap="0" />
>   </types>
>
>   <fields>
>     <field name="id" type="string" indexed="true" stored="true"
> required="true" />
>     <field name="price"  type="t_double" indexed="true" stored="true"
> required="true" />
>     <field name="name" type="text" indexed="true" stored="false"
> required="true" />
>     <field name="comefrom" type="string" indexed="true" stored="false"/>
>     <field name="seller" type="string" indexed="true" stored="false" />
>     <field name="category" type="string" indexed="true" stored="false" />
>     <field name="detailpath" type="w_text" indexed="true" stored="false" />
>   </fields>
>
>   <uniqueKey>id</uniqueKey>
>   <defaultSearchField>name</defaultSearchField>
>
> I'm looking forward to your opinion
>
>
>

Re: my index has 500 million docs ,how to improve solr search performance？

Posted by "lu.rongbin" <lu...@goodhope.net>.

thanks,Lance Norskog-2. I've tested the EBS, but it's not better. so ,maybe I
have to optimize my solr config for ec2 m2.4xlarge.this kind computer config
is : 
  cpu units: 26 ECUs
  cpu cores: 8
  memery: 68G

----------------
solrconfig.xml content:

<?xml version="1.0" encoding="UTF-8" ?>

<config>

 
<abortOnConfigurationError>${solr.abortOnConfigurationError:true}</abortOnConfigurationError>
  
  <!-- there must be an index dir under this -->
  <indexDefaults>
   <!-- Values here affect all index writers and act as a default unless
overridden. -->
    <useCompoundFile>false</useCompoundFile>

    <mergeFactor>10</mergeFactor>
    <ramBufferSizeMB>32</ramBufferSizeMB>
    <maxFieldLength>10000</maxFieldLength>
    <writeLockTimeout>1000</writeLockTimeout>
    <commitLockTimeout>10000</commitLockTimeout>

     
    <!--
      This option specifies which Lucene LockFactory implementation to use.
      
      single = SingleInstanceLockFactory - suggested for a read-only index
               or when there is no possibility of another process trying
               to modify the index.
      native = NativeFSLockFactory  - uses OS native file locking
      simple = SimpleFSLockFactory  - uses a plain file for locking

      (For backwards compatibility with Solr 1.2, 'simple' is the default
       if not specified.)
    -->
    <lockType>native</lockType>
  </indexDefaults>

  <mainIndex>
    <!-- options specific to the main on-disk lucene index -->
    <useCompoundFile>false</useCompoundFile>
    <ramBufferSizeMB>32</ramBufferSizeMB>
    <mergeFactor>10</mergeFactor>
    <maxFieldLength>10000</maxFieldLength>
    <unlockOnStartup>false</unlockOnStartup>
    <reopenReaders>true</reopenReaders>
    <deletionPolicy class="solr.SolrDeletionPolicy">
      <str name="keepOptimizedOnly">false</str>
      <str name="maxCommitsToKeep">1</str>
    </deletionPolicy>

  </mainIndex>
  <jmx />
  
  <!-- Use the following format to specify a custom IndexReaderFactory -
allows for alternate
       IndexReader implementations.
  <indexReaderFactory name="IndexReaderFactory" class="package.class">
    Parameters as required by the implementation
  </indexReaderFactory >
  -->


  <query>
    <!-- Maximum number of clauses in a boolean query... can affect
        range or prefix queries that expand to big boolean
        queries.  An exception is thrown if exceeded.  -->
    <maxBooleanClauses>1024</maxBooleanClauses>
     <filterCache
      class="solr.FastLRUCache"
      size="5120"
      initialSize="512"
      autowarmCount="128"
      cleanupThread="true"/>

    <queryResultCache
      class="solr.FastLRUCache"
      size="20000"
      initialSize="10240"
      autowarmCount="320"
      cleanupThread="true"/>

    <documentCache
      class="solr.FastLRUCache"
      size="10240"
      initialSize="10240"
      autowarmCount="320"
      cleanupThread="true"/>

    <enableLazyFieldLoading>true</enableLazyFieldLoading>


    <queryResultWindowSize>20</queryResultWindowSize>

     <queryResultMaxDocsCached>20</queryResultMaxDocsCached>
    
    <HashDocSet maxSize="3000" loadFactor="0.75"/>

    <listener event="firstSearcher" class="solr.QuerySenderListener">
      <arr name="queries">
        <lst> <str name="q">solr rocks</str><str name="start">0</str><str
name="rows">10</str></lst>
        <lst><str name="q">static firstSearcher warming query from
solrconfig.xml</str></lst>
      </arr>
    </listener>

    <useColdSearcher>false</useColdSearcher>
    <maxWarmingSearchers>2</maxWarmingSearchers>

  </query>

  <requestDispatcher handleSelect="true" >
     <requestParsers enableRemoteStreaming="true"
multipartUploadLimitInKB="2048" />
    <httpCaching lastModifiedFrom="openTime"
                 etagSeed="Solr">
      </httpCaching>
  </requestDispatcher>


 <requestHandler name="standard" class="solr.SearchHandler" default="true">
    <!-- default values for query parameters -->
     <lst name="defaults">
       <str name="echoParams">explicit</str><!--
       <bool name="hl">true</bool>
       <str name="hl.fl">name</str>
       <int name="hl.snippets">1</int>       
       <str name="hl.formatter">html</str>
       <str name="hl.fragsize">500</str>
       <str name="hl.simple.pre"><![CDATA[]]></str>
       <str name="hl.simple.post"><![CDATA[]]></str>
       <str name="fl">*</str>
       <int name="rows">10</int>
       <str name="version">2.1</str>
        -->
     </lst>
  </requestHandler>

-------------------------------------
schema.xml content:

<schema name="example" version="1.1">
  <types>
    <!-- solr.StrField:by default tokenized="false" -->
    <fieldType name="string" class="solr.StrField" sortMissingLast="true"
omitNorms="false" />
    <!-- solr.TextField:by default tokenized="true" -->
    <fieldType name="text" class="solr.TextField" omitNorms="false">
      <analyzer
class="org.apache.lucene.analysis.cn.smart.MySmartChineseAnalyzer" />
    </fieldType>
    <fieldType name="w_text" class="solr.TextField" omitNorms="false">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      </analyzer>
    </fieldType>
    <fieldType name="t_double" class="solr.TrieDoubleField"
precisionStep="8" omitNorms="false" positionIncrementGap="0" />
 </types>

 <fields>
   <field name="id" type="string" indexed="true" stored="true"
required="true" />
   <field name="price"  type="t_double" indexed="true" stored="true"
required="true" />
   <field name="name" type="text" indexed="true" stored="false"
required="true" />
   <field name="comefrom" type="string" indexed="true" stored="false"/>
   <field name="seller" type="string" indexed="true" stored="false" />
   <field name="category" type="string" indexed="true" stored="false" />
   <field name="detailpath" type="w_text" indexed="true" stored="false" />
 </fields>

 <uniqueKey>id</uniqueKey>
 <defaultSearchField>name</defaultSearchField>

I'm looking forward to your opinion


-- 
View this message in context: http://lucene.472066.n3.nabble.com/my-index-has-500-million-docs-how-to-improve-solr-search-performance-tp1902595p1916398.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: my index has 500 million docs ,how to improve solr search performance？

Posted by Lance Norskog <go...@gmail.com>.

It's not that EC2 instances have slow disks, it's that they have no 
quota system  to guarantee you X amount of throughput. I've benchmarked 
1x to 3x on the same instance type at different times. That is, 300% 
variation in disk speeds.

Filter queries are only slow once; after that they create a filter and 
the second time it is very fast. You should use one of the Trie data 
types (TrieInt for example) for the prices. The Trie data types can be 
tuned for fast range queries but I don't know how.

Post your schema and solrconfig, we might be able to help you.

Lance

Toke Eskildsen wrote:
> On Mon, 2010-11-15 at 06:35 +0100, lu.rongbin wrote:
>    
>> In addition,my index has only two store fields, id and price, and other
>> fields are index. I increase the document and query cache. the ec2
>> m2.4xLarge instance is 8 cores, 68G memery. all indexs size is about 100G.
>>      
> Looking at http://aws.amazon.com/ec2/instance-types/ I can see that
> Amazon recommends using "EBS to get improved storage I/O performance for
> disk bound applications". As Lucene/Solr is very often I/O bound (or
> more precisely random access I/O bound), you might consider the EBS
> option.
>
> I found this article that looks very relevant:
> http://www.coreyhulen.org/?p=326
> It is about Cassandra (a database), but I'm guessing that the I/O
> pattern is fairly similar to Lucene/Solr with a lot random access reads.
>
> Extrapolating wildly it would seem that disk I/O latency is a problem
> with Amazon's cloud, at least compared with the obvious choice of using
> SSD on a local machine. If this holds true, some things you could try
> would be better warming of your searches, holding (part of) your index
> in RAM, switching to EBS or ... moving away from the cloud.
>
>
> All this is assuming that it really is I/O that is your problem. Have
> you looked at CPU-load vs. I/O wait while issuing a batch of queries?
>
>
> Disclaimer: I have no experience with Amazon's cloud service.
>
>

Re: my index has 500 million docs ,how to improve solr search performance？

Posted by "lu.rongbin" <lu...@goodhope.net>.

thanks Toke,Once I've used "EBS" , I think that it can improve the I/O
performence, but it's not obvious better.so, maybe I/O is not the important
problem. thanks for your answer.
-- 
View this message in context: http://lucene.472066.n3.nabble.com/my-index-has-500-million-docs-how-to-improve-solr-search-performance-tp1902595p1916289.html
Sent from the Solr - User mailing list archive at Nabble.com.