You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Yonik Seeley (JIRA)" <ji...@apache.org> on 2006/11/10 14:06:38 UTC

[jira] Commented: (LUCENE-709) [PATCH] Enable application-level management of IndexWriter.ramDirectory size

    [ http://issues.apache.org/jira/browse/LUCENE-709?page=comments#action_12448725 ] 
            
Yonik Seeley commented on LUCENE-709:
-------------------------------------

Thanks Chuck, I think I like this additional view/control into IndexWriter, and I don't think opening this up more further constrains future implementation.  I'll wait a few days to see if others have comments though.

I think there might be a thread safety issue with your patch: you use an unsynchronized fail-fast iterator in RAMDirectory.sizeInBytes().   I think using an Enumerator here should work, right?

Too bad there doesn't seem to be an easy way to incrementally maintain sizeInBytes... waking over the whole Hashtable for each document addition isn't pretty for large maxBufferedDocs, esp if the number of indexed fields is large.  At least this only affects people using this functionallity though.

> [PATCH] Enable application-level management of IndexWriter.ramDirectory size
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-709
>                 URL: http://issues.apache.org/jira/browse/LUCENE-709
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.0.1
>         Environment: All
>            Reporter: Chuck Williams
>         Attachments: ramDirSizeManagement.patch
>
>
> IndexWriter currently only supports bounding of in the in-memory index cache using maxBufferedDocs, which limits it to a fixed number of documents.  When document sizes vary substantially, especially when documents cannot be truncated, this leads either to inefficiencies from a too-small value or OutOfMemoryErrors from a too large value.
> This simple patch exposes IndexWriter.flushRamSegments(), and provides access to size information about IndexWriter.ramDirectory so that an application can manage this based on total number of bytes consumed by the in-memory cache, thereby allow a larger number of smaller documents or a smaller number of larger documents.  This can lead to much better performance while elimianting the possibility of OutOfMemoryErrors.
> The actual job of managing to a size constraint, or any other constraint, is left up the applicatation.
> The addition of synchronized to flushRamSegments() is only for safety of an external call.  It has no significant effect on internal calls since they all come from a sychronized caller.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org