You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by eks dev <ek...@yahoo.co.uk> on 2008/02/13 14:22:17 UTC

Stored Field vs "offset plus external file"?

I would like to try to replace our external storage of documents with Lucene stored field, so a few questions before we proceed:

Background: We store currently complete documents  in a simple binary file and only keep offsets into this file as a Stored field   in Lucene index. Documents (compressed) are short, avg 300-400bytes per document

What we would like to try is to store these documents as binary (compressed with simple/fast static huffman + dictionary) stored Fields in order to make maintenance of this setup easier as Lucene already does maintain updates and fetching of these fields, also indexing works very well from multiple threads now. Complete Document would be the only Stored field we have in index (I know about lazy loading). 
Simply what we do today is , search index, get offsets from found documents and than fetch them from binary file. 

So the questions would be:

1. Does anyone see any theoretical/practical reason why our homemade "fetch offset from lucene stored field-> jump to this offset in document" would be faster than "fetch stored field from Lucene directly"

2.  We use Lucene Index wit MMAP directory now, so the concern is that index could grow too large for MMAP with stored field like that.  Is there a way to say, "do not use MMAP Directory for stored Fields, rather FSDirectory". I think not, but it is worth to ask as I think this could be useful thing... Imagine to be able to say "Load terms with RAM Directory, postings wit MMAP and stored fields with FSDirectory"... of course this is only for searching, not indexing.

Thanks a lot.
Eks
 



      ___________________________________________________________
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/ 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Stored Field vs "offset plus external file"?

Posted by Andrzej Bialecki <ab...@getopt.org>.
eks dev wrote:

> 2.  We use Lucene Index wit MMAP directory now, so the concern is
> that index could grow too large for MMAP with stored field like that.
> Is there a way to say, "do not use MMAP Directory for stored Fields,
> rather FSDirectory". I think not, but it is worth to ask as I think
> this could be useful thing... Imagine to be able to say "Load terms
> with RAM Directory, postings wit MMAP and stored fields with
> FSDirectory"... of course this is only for searching, not indexing.


IMHO it should be possible to do this by implementing a Directory 
subclass that can be configured to use mmap / ram / fs for specific 
index file types (e.g. tis, tii, fdt, prx and so on) - you should be 
able to cut & paste large chunks of each directory code to start the 
implementation.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org