You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/06/16 08:28:00 UTC

[jira] [Commented] (LUCENE-9535) Investigate recent indexing slowdown for wikimedium documents

    [ https://issues.apache.org/jira/browse/LUCENE-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364135#comment-17364135 ] 

ASF subversion and git services commented on LUCENE-9535:
---------------------------------------------------------

Commit 803d131fd08cee1765613b81a305187fbc841616 in lucene's branch refs/heads/main from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=803d131 ]

LUCENE-9535: Try to do larger flushes.

DWPTPool currently always returns the last DWPT that was added to the
pool. By returning the largest DWPT instead, we could try to do larger
flushes by finishing DWPTs that are close to being full instead of the
last one that was added to the pool, which might be close to being
empty.

When indexing wikimediumall, this change did not seem to improve the
indexing rate significantly, but it didn't slow things down either and
the number of flushes went from 224-226 to 216, about 4% less.

My expectation is that our nightly benchmarks are a best-case scenario
for DWPTPool as the same number of threads is dedicated to indexing over
time, but in the case when you have e.g. a single fixed threadpool that
is responsible for indexing into several indices, the number of indexing
threads that contribute to a given index might greatly vary over time.


> Investigate recent indexing slowdown for wikimedium documents
> -------------------------------------------------------------
>
>                 Key: LUCENE-9535
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9535
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Adrien Grand
>            Priority: Minor
>             Fix For: 8.7
>
>         Attachments: cpu_profile.svg
>
>          Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Nightly benchmarks report a ~10% slowdown for 1kB documents as of September 9th: [http://people.apache.org/~mikemccand/lucenebench/indexing.html].
> On that day, we added stored fields in DWPT accounting (LUCENE-9511), so I first thought this could be due to smaller flushed segments and more merging, but I still wonder whether there's something else. The benchmark runs with 8GB of heap, 2GB of RAM buffer and 36 indexing threads. So it's about 2GB/36 = 57MB of RAM buffer per thread in the worst-case scenario that all DWPTs get full at the same time. Stored fields account for about 0.7MB of memory, or 1% of the indexing buffer size. How can a 1% reduction of buffering capacity explain a 10% indexing slowdown? I looked into this further by running indexing benchmarks locally with 8 indexing threads and 128MB of indexing buffer memory, which would make this issue even more apparent if the smaller RAM buffer was the cause, but I'm not seeing a regression and actually I'm seeing similar number of flushes when I disabled memory accounting for stored fields.
> I ran indexing under a profiler to see whether something else could cause this slowdown, e.g. slow implementations of ramBytesUsed on stored fields writers, but nothing surprising showed up and the profile looked just like I would have expected.
> Another question I have is why the 4kB benchmark is not affected at all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org