You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Karl Wettin (JIRA)" <ji...@apache.org> on 2008/01/14 16:48:34 UTC

[jira] Commented: (LUCENE-550) InstantiatedIndex - faster but memory consuming index

    [ https://issues.apache.org/jira/browse/LUCENE-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558640#action_12558640 ] 

Karl Wettin commented on LUCENE-550:
------------------------------------

I was poking around in the javadocs of this and came to the conclution that InstantiatedIndexWriter is depricated code, that it is enough one can construct InstantiatedIndex using an optimized IndexReader. This makes all InstantiatedIndexes immutable. That makes the no-locks caveat to go away.

Also, it is a hassle to make sure that InstantiatedIndexWriter work just as IndexWriter does.

In the future, a segmented Directory-facade could be built on top of this, where each InstantiatedIndex is a segment created by IndexWriter flush. It would potentially be slower to populate this, but it would be compatible with everything. Adding more than one segement will requite merging and optimizing indices forth and back in RAMDirectories a but, but InstantiatedIndexes are usually quite small.

It feels like much of that code is already there.

On the matter of RAM consumption, using a profiler I recently noticed a 3.2MB directory of 3-5;3-3;3-5 ngrams with term vectors consumed something like 35MB RAM when loaded to an InstantiatedIndex.




> InstantiatedIndex - faster but memory consuming index
> -----------------------------------------------------
>
>                 Key: LUCENE-550
>                 URL: https://issues.apache.org/jira/browse/LUCENE-550
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Store
>    Affects Versions: 2.0.0
>            Reporter: Karl Wettin
>            Assignee: Grant Ingersoll
>         Attachments: HitCollectionBench.jpg, LUCENE-550_20071021_no_core_changes.txt, test-reports.zip
>
>
> Represented as a coupled graph of class instances, this all-in-memory index store implementation delivers search results up to a 100 times faster than the file-centric RAMDirectory at the cost of greater RAM consumption.
> Performance seems to be a little bit better than log2n (binary search). No real data on that, just my eyes.
> Populated with a single document InstantiatedIndex is almost, but not quite, as fast as MemoryIndex.    
> At 20,000 document 10-50 characters long InstantiatedIndex outperforms RAMDirectory some 30x,
> 15x at 100 documents of 2000 charachters length,
> and is linear to RAMDirectory at 10,000 documents of 2000 characters length.
> Mileage may vary depending on term saturation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org