You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Nawab Zada Asad Iqbal <kh...@gmail.com> on 2017/08/06 22:59:50 UTC
Solr 6.6: Configure number of indexing threads
Hi
I have switched between solr and lucene user lists while debugging this
issue (detail In following thread) My current hypothesis is that since a
large number of indexing threads are being created ( maxIndexingThreads
config is now obsolete) , each output segment is really small . Reference:
https://issues.apache.org/jira/plugins/servlet/mobile#issue/LUCENE-6659
Is there any config in solr 6.6 to control this ?
If not , why was the current config considered useless ?
Thanks
Nawab
---------- Forwarded message ---------
From: Nawab Zada Asad Iqbal <kh...@gmail.com>
Date: Sun, Aug 6, 2017 at 8:25 AM
Subject: Re: Understanding flush and DocumentsWriterPerThread
To: <ja...@lucene.apache.org>
I think I am hitting this problem. Since, maxIndexingThreads is not used
anymore, i see 330+ indexing threads (in the attached log:- "334 in-use
non-flushing threads states" )
The bugfix recommends using custom code to control concurrency in
IndexWriter, how can I configure it using solr6.6 ?
On Sat, Aug 5, 2017 at 12:59 PM, Nawab Zada Asad Iqbal <kh...@gmail.com>
wrote:
> Hi,
>
> I am debugging a bulk indexing performance issue while upgrading to 6.6
> from 4.5.0 . I have commits disabled while indexing total of 85G data
> during 7 hours. At the end of it, I want some 30 or so big segments. But i
> am getting 3000 segments.
> I deleted the index and enabled infostream logging ; i have attached the
> log when first segment is flushed. Here are few questions:
>
> 1. When a segment if flushed , then is it permanent or can more documents
> be written to it (besides the merge scenario)?
> 2. It seems that 330+ threads are writing in parallel. Will each one of
> them become one segment when written to the disk? In which case, i should
> probably decrease concurrency?
> 3. One possibility is to delay flushing, the flush is getting triggered at
> 10000MB, probably coming from <ramBufferSizeMB>10000</ramBufferSizeMB> ;
> however, the segment which is flushed is only 115MB. Is this limit for the
> combined size of all in-memory segments? In which case, is it ok to
> increase it further to use more of my heap (48GB).
> 4. How can I decrease the concurrency, maybe the solution is to use fewer
> in memory segments?
>
> In previous run, there were 110k files in the index folder after I
> stopping indexing. Before doing commit, I noticed that the file count
> continued to decrease every few minutes, until it reduced to 27k or so. (I
> committed after it stabilized)
>
>
> My Indexconfig is this:
>
> <indexConfig>
> <writeLockTimeout>1000</writeLockTimeout>
> <commitLockTimeout>10000</commitLockTimeout>
> <maxIndexingThreads>10</maxIndexingThreads>
> <useCompoundFile>false</useCompoundFile>
> <ramBufferSizeMB>10000</ramBufferSizeMB>
> <mergePolicyFactory
> class="org.apache.solr.index.TieredMergePolicyFactory">
> <int name="maxMergeAtOnce">5</int>
> <int name="segmentsPerTier">3000</int>
> <int name="maxMergeAtOnceExplicit">10</int>
> <int name="floorSegmentMB">16</int>
> <!-- 200 gb since we want few big segments during full indexing -->
> <double name="maxMergedSegmentMB">200000</double>
> <double name="forceMergeDeletesPctAllowed">1</double>
> </mergePolicyFactory>
> <mergeScheduler
> class="org.apache.lucene.index.ConcurrentMergeScheduler">
> <int name="maxThreadCount">10</int>
> <int name="maxMergeCount">10</int>
> </mergeScheduler>
> <lockType>${solr.lock.type:native}</lockType>
> <reopenReaders>true</reopenReaders>
> <deletionPolicy class="solr.SolrDeletionPolicy">
> <str name="maxCommitsToKeep">1</str>
> <str name="maxOptimizedCommitsToKeep">0</str>
> </deletionPolicy>
> <infoStream>true</infoStream>
> <applyAllDeletesOnFlush>false</applyAllDeletesOnFlush>
> </indexConfig>
>
>
> Thanks
> Nawab
>
>
>