You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (Commented) (JIRA)" <ji...@apache.org> on 2012/04/03 16:16:24 UTC

[jira] [Commented] (LUCENE-2357) Reduce transient RAM usage while merging by using packed ints array for docID re-mapping

    [ https://issues.apache.org/jira/browse/LUCENE-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245343#comment-13245343 ] 

Michael McCandless commented on LUCENE-2357:
--------------------------------------------

Hi Iulius,

The basic idea is to replace the fixed int[] that we now have (in oal.index.MergeState's docMaps array) with a PackedInts store (see oal.util.packed.PackedInts.getMutable).  This should be fairly simple, since a PackedInts store is concetually just like an int[].

I think that (a rote swap) would be phase one.

After that, we can save more RAM by storing either the new docID (what we do today), or, inverting that and storing instead the number of del docs seen so far, depending on which requires fewer bits.  EG if we are merging 1M docs but only 100K are deleted it's cheaper to store the number of deletes...
                
> Reduce transient RAM usage while merging by using packed ints array for docID re-mapping
> ----------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2357
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2357
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Michael McCandless
>            Priority: Minor
>              Labels: gsoc2012, lucene-gsoc-12
>             Fix For: 4.0
>
>
> We allocate this int[] to remap docIDs due to compaction of deleted ones.
> This uses alot of RAM for large segment merges, and can fail to allocate due to fragmentation on 32 bit JREs.
> Now that we have packed ints, a simple fix would be to use a packed int array... and maybe instead of storing abs docID in the mapping, we could store the number of del docs seen so far (so the remap would do a lookup then a subtract).  This may add some CPU cost to merging but should bring down transient RAM usage quite a bit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org