You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Earwin Burrfoot (JIRA)" <ji...@apache.org> on 2010/05/21 01:50:19 UTC
[jira] Commented: (LUCENE-2355) Refactor Directory/Multi/SegmentReader creation/reopening/cloning/closing

    [ https://issues.apache.org/jira/browse/LUCENE-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869813#action_12869813 ] 

Earwin Burrfoot commented on LUCENE-2355:
-----------------------------------------

* Norms are now in fact loaded upfront. But not when applying deletions. Norm handling code is isolated, so lazy-loading can be brought back if really needed.
* IndexWriter gets a new Config parameter - readerTermsIndexDivisor. This is used for opening all the readers IW pools or uses otherwise. Same parameter to IW.getReader() is annihilated as nondeterministic.
* IndexWriter no longer does partial SR sharing when pooling is off. It brought no real benefits, had a potential to blow memory usage to pooling=on levels, and also made ReaderPool.get() code look buggy. : )
* Added a notion of SR.RunLevel that governs SR's intended usage (two variants of merging, applying deletes, searching). SR.upgradeRunlevel(..) replaces loadTermsIndex/openDocStores/lazy-norms

> Refactor Directory/Multi/SegmentReader creation/reopening/cloning/closing
> -------------------------------------------------------------------------
>
>                 Key: LUCENE-2355
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2355
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Earwin Burrfoot
>
> *Reader lifecycle evolved over time to become some heavily tangled mess. It's hard to understand what's going on there, it's even harder to add some fields/logic while ensuring that all possible code paths preserve these fields/interact with the logic properly. While some of said mess is justified by the task at hand, a big part is just badly done copypaste and can be removed.
> I am currently refactoring this and intended to open an issue with a working patch, but the task winded up somewhat bigger than I expected, so I'm opening it earlier to track stuff encountered/changed/fixed.
> The list is by no means exhaustive.
> - an iteration to create SRs is copypasted several times, one of them (IW) with wrong iteration bound
> - it is also overly complex and can be folded for create/reopen cases
> - readers sent to IndexReaderWarmer are termindexless/docstoreless on some occasions
> - it is possible to clone() your way to readwrite NRT reader
> - IndexDeletionPolicy is not always preserved through clones/reopens
> - cloned readers share CoreReaders and, consequently, updated termsIndex/docStores
> - threadlocal versions of fieldsReader/termsVector are bound to SR, not CoreReaders and thus are recreated on clone/reopen
> - double-initialization for some fields (someone got lost and did this to be sure I guess), stupid assert checks ( qwe = new(); assert qwe != null )
> - SR is not always recreated when compound status of underlying segment changes
> - deleting already deleted doc marks deletions dirty and rewrites them
> - lots of synchronization is done around Reader, while it can be narrowed down to norms/deletions/whatever
> I did some structural modifications:
> - CompositeReader extracts common code from DirectoryReader and MultiReader (complete)
> - ReadonlyDirectoryReader and ReadonlySegmentReader are dead, MutableD/SReaders are introduced and carry all modification logic/fields (DR complete, SR in progress)
> - WriterBackedReader encapsulates NRT reader logic (complete)
> - CoreReaders split into CoreReaders, DocStores, TermInfos. All of these are immutable and SR is cloned when you need to change its mode (in progress)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org