You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/03/02 17:20:00 UTC

[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?

    [ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049458#comment-17049458 ] 

ASF subversion and git services commented on LUCENE-8962:
---------------------------------------------------------

Commit 043c5dff6f44c9bb2415005ac97db3c2c561ab45 in lucene-solr's branch refs/heads/master from msfroh
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=043c5df ]

LUCENE-8962: Add ability to selectively merge on commit (#1155)

* LUCENE-8962: Add ability to selectively merge on commit

This adds a new "findCommitMerges" method to MergePolicy, which can
specify merges to be executed before the
IndexWriter.prepareCommitInternal method returns.

If we have many index writer threads, they will flush their DWPT buffers
on commit, resulting in many small segments, which can be merged before
the commit returns.

* Add missing Javadoc

* Fix incorrect comment

* Refactoring and fix intermittent test failure

1. Made some changes to the callback to update toCommit, leveraging
SegmentInfos.applyMergeChanges.
2. I realized that we'll never end up with 0 registered merges, because
we throw an exception if we fail to register a merge.
3. Moved the IndexWriterEvents.beginMergeOnCommit notification to before
we call MergeScheduler.merge, since we may not be merging on another
thread.
4. There was an intermittent test failure due to randomness in the time
it takes for merges to complete. Before doing the final commit, we wait
for pending merges to finish. We may still end up abandoning the final
merge, but we can detect that and assert that either the merge was
abandoned (and we have > 1 segment) or we did merge down to 1 segment.

* Fix typo

* Fix/improve comments based on PR feedback

* More comment improvements from PR feedback

* Rename method and add new MergeTrigger

1. Renamed findCommitMerges -> findFullFlushMerges.
2. Added MergeTrigger.COMMIT, passed to findFullFlushMerges and to
   MergeScheduler when merging on commit.

* Update renamed method name in strings and comments


> Can we merge small segments during refresh, for faster searching?
> -----------------------------------------------------------------
>
>                 Key: LUCENE-8962
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8962
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Michael McCandless
>            Priority: Major
>         Attachments: LUCENE-8962_demo.png
>
>          Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory segments to disk and open an {{IndexReader}} to search them, and this is typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} will accumulate write many small segments during {{refresh}} and this then adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if given a little time ... so, could we somehow improve {{IndexWriter'}}s refresh to optionally kick off merge policy to merge segments below some threshold before opening the near-real-time reader?  It'd be a bit tricky because while we are waiting for merges, indexing may continue, and new segments may be flushed, but those new segments shouldn't be included in the point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, and some hackity logic to have the merge policy target small segments just written by refresh, but it's tricky to then open a near-real-time reader, excluding newly flushed but including newly merged segments since the refresh originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for discussion!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org