You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shai Erera (JIRA)" <ji...@apache.org> on 2010/12/02 10:13:10 UTC

[jira] Commented: (LUCENE-2755) Some improvements to CMS

    [ https://issues.apache.org/jira/browse/LUCENE-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966029#action_12966029 ] 

Shai Erera commented on LUCENE-2755:
------------------------------------

Getting rid of the IndexWriter member in CMS is not trivial w/o API change. The IndexWriter member is used for verbosing purposes, which is accessed by some public/protected API, like sync() and doMerge(). So on 3x it'd mean to deprecate methods, which IMO does not justify it. On trunk it's easier.

On the other hand, we can consider two things:
# Make IndexWriter ThreadLocal -- the thread who calls merge() will own its ThreadLocal, and if different threads index to different indexes, then it should work. But it won't help in case the indexing threads are taken from a pool.
# Think whether we want CMS to be shareable across several IndexWriters at all. I haven't heard that requirement coming up on the list, and definitely if someone attempted to do it, things would break, so I guess no one really does it. Therefore maybe we should leave it to the users to develop something like that on their own, and maybe even contribute back. A MS which might even be simplified by not implementing all of CMS functionality today (controlling threads' priority, pause merge threads etc.).

bq, The proper route is to take a handful of dirt and sticks and slap together some working code to illustrate my point. And that's what I'm gonna do.

It'd be great if you will do that ! Sometimes it's indeed easier to fight over a concrete example then theoretical "can and can't work" arguments.

bq. MP will recieve SegmentInfos and return OneMerge.

>From whom will it receive it? In the case of cascading merges, the merge threads need to continuously pull MP for getNextMerge(MergeType), but they don't have the global picture IW holds about the existing segments (SegmentInfos). Also, IW keeps track of the segments that existed when you first called optimize() and doesn't allow the cascading merges to include segments that didn't exist at the time. Who will do that accounting now?

> Some improvements to CMS
> ------------------------
>
>                 Key: LUCENE-2755
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2755
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>
> While running optimize on a large index, I've noticed several things that got me to read CMS code more carefully, and find these issues:
> * CMS may hold onto a merge if maxMergeCount is hit. That results in the MergeThreads taking merges from the IndexWriter until they are exhausted, and only then that blocked merge will run. I think it's unnecessary that that merge will be blocked.
> * CMS sorts merges by segments size, doc-based and not bytes-based. Since the default MP is LogByteSizeMP, and I hardly believe people care about doc-based size segments anymore, I think we should switch the default impl. There are two ways to make it extensible, if we want:
> ** Have an overridable member/method in CMS that you can extend and override - easy.
> ** Have OneMerge be comparable and let the MP determine the order (e.g. by bytes, docs, calibrate deletes etc.). Better, but will need to tap into several places in the code, so more risky and complicated.
> On the go, I'd like to add some documentation to CMS - it's not very easy to read and follow.
> I'll work on a patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org