You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Danyal Mark <ma...@gmail.com> on 2010/06/09 11:59:49 UTC

Re: Index search optimization for fulltext remote streaming

We have following solr configuration:

java -Xms512M -Xmx1024M -Dsolr.solr.home=<solr home directory> -jar
start.jar

in SolrConfig.xml

 <indexDefaults>  
    <useCompoundFile>false</useCompoundFile>
    <mergeFactor>40000</mergeFactor>  
    <maxBufferedDocs>200000</maxBufferedDocs>  
    <ramBufferSizeMB>1024</ramBufferSizeMB>    
    <maxFieldLength>10000</maxFieldLength>
    <writeLockTimeout>1000</writeLockTimeout>
    <commitLockTimeout>10000</commitLockTimeout>    
    <lockType>native</lockType>      
 </indexDefaults>


<mainIndex>    
    <useCompoundFile>false</useCompoundFile>
    <ramBufferSizeMB>1024</ramBufferSizeMB>
    <mergeFactor>40000</mergeFactor>
    <!-- Deprecated -->
    <!--<maxBufferedDocs>10</maxBufferedDocs>-->
    <!--<maxMergeDocs>2147483647</maxMergeDocs>-->    
    <unlockOnStartup>false</unlockOnStartup>  
    <reopenReaders>true</reopenReaders>  
    <deletionPolicy class="solr.SolrDeletionPolicy">      
      <str name="maxCommitsToKeep">1</str>    
      <str name="maxOptimizedCommitsToKeep">0</str>      
    </deletionPolicy>    
     <infoStream file="INFOSTREAM.txt">false</infoStream>
  </mainIndex>


Also, we have used autoCommit=false. We have our PC spec:

Core2-Duo
2GB RAM
Solr Server running in localhost
Index Directory is also in local FileSystem
Input Fulltext files using remoteStreaming from another PC


Here, when we indexed 100000 Fulltext documents, the total time taken is
40mins. We want to optimize the time lesser to this. We have been studying
on UpdateRequestProcessorChain section

<requestHandler name="/update" class="solr.XmlUpdateRequestHandler">
  <lst name="defaults">
   <str name="update.processor">dedupe</str>
  </lst>
 </requestHandler>  

How to use this UpdateRequestProcessorChain in /update/extract/ to run
indexing in multiple chains (i.e multiple threads). Can you suggest me if I
can optimize the process changing any of these configurations?

with regards,
Danyal Mark 
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Index-search-optimization-for-fulltext-remote-streaming-tp828274p881809.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Index search optimization for fulltext remote streaming

Posted by Lance Norskog <go...@gmail.com>.
'mergeFactor' should be 5 or 10, not 40k. This means Solr can open
thousands of small files and this will not work well.

ramBufferSizeMB  is 1G. The entire solr has 1G allocated, so there may
be a lot of garbage collection. Try 50 to 100 megs for
ramBufferSizeMB.

1G is a little small for doing large numbers of fulltext documents.

On Wed, Jun 9, 2010 at 2:59 AM, Danyal Mark <ma...@gmail.com> wrote:
>
> We have following solr configuration:
>
> java -Xms512M -Xmx1024M -Dsolr.solr.home=<solr home directory> -jar
> start.jar
>
> in SolrConfig.xml
>
>  <indexDefaults>
>    <useCompoundFile>false</useCompoundFile>
>    <mergeFactor>40000</mergeFactor>
>    <maxBufferedDocs>200000</maxBufferedDocs>
>    <ramBufferSizeMB>1024</ramBufferSizeMB>
>    <maxFieldLength>10000</maxFieldLength>
>    <writeLockTimeout>1000</writeLockTimeout>
>    <commitLockTimeout>10000</commitLockTimeout>
>    <lockType>native</lockType>
>  </indexDefaults>
>
>
> <mainIndex>
>    <useCompoundFile>false</useCompoundFile>
>    <ramBufferSizeMB>1024</ramBufferSizeMB>
>    <mergeFactor>40000</mergeFactor>
>    <!-- Deprecated -->
>    <!--<maxBufferedDocs>10</maxBufferedDocs>-->
>    <!--<maxMergeDocs>2147483647</maxMergeDocs>-->
>    <unlockOnStartup>false</unlockOnStartup>
>    <reopenReaders>true</reopenReaders>
>    <deletionPolicy class="solr.SolrDeletionPolicy">
>      <str name="maxCommitsToKeep">1</str>
>      <str name="maxOptimizedCommitsToKeep">0</str>
>    </deletionPolicy>
>     <infoStream file="INFOSTREAM.txt">false</infoStream>
>  </mainIndex>
>
>
> Also, we have used autoCommit=false. We have our PC spec:
>
> Core2-Duo
> 2GB RAM
> Solr Server running in localhost
> Index Directory is also in local FileSystem
> Input Fulltext files using remoteStreaming from another PC
>
>
> Here, when we indexed 100000 Fulltext documents, the total time taken is
> 40mins. We want to optimize the time lesser to this. We have been studying
> on UpdateRequestProcessorChain section
>
> <requestHandler name="/update" class="solr.XmlUpdateRequestHandler">
>  <lst name="defaults">
>   <str name="update.processor">dedupe</str>
>  </lst>
>  </requestHandler>
>
> How to use this UpdateRequestProcessorChain in /update/extract/ to run
> indexing in multiple chains (i.e multiple threads). Can you suggest me if I
> can optimize the process changing any of these configurations?
>
> with regards,
> Danyal Mark
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Index-search-optimization-for-fulltext-remote-streaming-tp828274p881809.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Lance Norskog
goksron@gmail.com