You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by yasoobhaider <ya...@gmail.com> on 2017/09/07 10:25:26 UTC

Re: CommitScheduler Thread blocked due to excessive number of Merging Threads

So I did a little more digging around why the merging is taking so long, and
it looks like merging postings is the culprit.

On the 5.4 version, merging 500 docs is taking approximately 100 msec, while
on the 6.6 version, it is taking more than 3000 msec. The difference seems
to get worse when more docs are being merged.

Any ideas why this may be the case?

Yasoob



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: CommitScheduler Thread blocked due to excessive number of Merging Threads

Posted by yasoobhaider <ya...@gmail.com>.

Hi Shawn

Thanks for putting the settings in context. This definitely helps.

Before I put these settings, I was doing a bit more digging, and wanted to
really understand why the merging was so slow. Looking at the thread dump of
Solr 6.6 and Solr 5.4, I found that in the merging process, the merging of
"postings" is taking a lot of time on 6.6 compared to 5.4.

In a merge containing 500 docs, it took on average 100msec on 5.4, vs
3500msec on 6.6.

I compared the source code for the two versions and found that different
merge functions were being used to merge the postings. In 5.4, the default
merge method of FieldsConsumer class was being used. While in 6.6, the
PerFieldPostingsFormat's merge method is being used. I checked and it looks
like this change went in Solr 6.3. So I replaced the 6.6 instance with 6.2.1
and re-indexed all the data, and it is working very well, even with the
settings I had initially used.

This is the issue that prompted the change:
https://issues.apache.org/jira/browse/LUCENE-7456

I plan to experiment with the settings provided by you and see if it further
helps our case. But out of curiosity I wanted to understand what is the
change in the two algorithms that it has such drastic effect on the merging
speed.

Yasoob



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: CommitScheduler Thread blocked due to excessive number of Merging Threads

Posted by Shawn Heisey <ap...@elyograg.org>.

On 9/7/2017 4:25 AM, yasoobhaider wrote:
> So I did a little more digging around why the merging is taking so
> long, and it looks like merging postings is the culprit. On the 5.4
> version, merging 500 docs is taking approximately 100 msec, while on
> the 6.6 version, it is taking more than 3000 msec. The difference
> seems to get worse when more docs are being merged. Any ideas why this
> may be the case? 

The rest of this thread is completely lost here, I only found the info
by going to Nabble, which is a mirror of the mailing list in forum
format.  The mailing list is the canonical repository.

Setting the ramBufferSizeMB to nearly 5 gigabytes is only going to be
helpful if the docs you are indexing into Solr are enormous -- many
megabytes of text data in each one.  Testing by Solr developers has
shown that values above about 128MB do not typically provide any
performance advantage with normal sized documents.  The commit
characteristics will have more to do with how large each segment is than
the ramBufferSizeMB.  The default ramBufferSizeMB value in modern Solr
versions is 100.

Assuming we are dealing with relatively small documents, I would
recommend these settings in indexConfig (removing ramBufferSizeMB,
mergePolicyFactory, and maxBufferedDocs entirely):

<autoCommit>
      <maxTime>60000</maxTime>
      <openSearcher>false</openSearcher>
</autoCommit>

<autoSoftCommit>
      <maxTime>600000</maxTime>
</autoSoftCommit>

<mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler">
      <int name="maxMergeCount">6</int>
      <int name="maxThreadCount">1</int>
</mergeScheduler>

If your data is on standard disks, then you want maxThreadCount at one. 
If it's on SSD, then you can raise it a little bit, but I wouldn't go
beyond about 2 or 3.  On standard disks with many threads writing merged
segments, the disk will begin thrashing excessively and I/O will slow to
a crawl.

If the documents are huge, then you can raise ramBufferSizeMB, but five
gigabytes is REALLY BIG and will require a very large heap.

If there is good reason to increase the values in mergePolicy, then this
is what I would recommend:

<mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
      <int name="maxMergeAtOnce">30</int>
      <int name="segmentsPerTier">30</int>
      <int name="maxMergeAtOnceExplicit">90</int>
</mergePolicyFactory>

The settings I've described here may help, or it may do nothing.  If it
doesn't help, then the problems may be memory-related, which is a whole
separate discussion.

When Lucene says "too many merge threads, stalling" it means there are
many merges scheduled at the same time, which usually means that there
are multiple *levels* of merging scheduled -- one that combines a bunch
of initial level segments into one second level segment, one that
combines multiple second level segments into third-level segments, and
so on.  The "stalling" means that the *indexing* thread is paused until
the number of merges drops below maxMergeCount.  If this is happening
with maxMergeCount at eight, it is likely because of the current
autoCommit maxDocs setting of 10000 -- each of the initial segments are
very small, so there are a LOT of segments that need merging.  The
autoCommit and autoSoftCommit settings that I provided will hopefully
make that less of a problem.

Merging segments goes slower than the speed of your disks.  This is
because Lucene must collect a lot of information from each source
segment and combine it in memory to write a new segment.  The gathering
and combining is much slower than modern disk speeds.

Thanks,
Shawn