You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Doron Cohen <DO...@il.ibm.com> on 2006/10/24 07:49:37 UTC

Re: "Catalog" backend for document stored fields?

> I'm indexing logs from a transaction-based application.
> ...
> millions documents per month, the size of the indices is ~35 gigs per
month
> (that's the lower bound).  I have no choice but to 'store' each field
values
> (as well as indexing/tokenizing them) because I'll need to retrieve them
in
> order to create various reports.  Also, I have a backlog of ~2 years of
logs
> to index!
> ...
> 1-       is there someone out there that already wrote an extension to
> Lucene so that 'stored' string for each document/field is in fact stored
in
> a centralized repository? Meaning, only an 'index' is actually stored in
the
> document and the real data is put somewhere else.

Do you gain anything from storing the document fields within Lucene?  In
case not, especially if log files are kept somewhere, you cuold make all
'content' fields unstored (reduce index size), and add a stored non-indexed
ID field. It can also be a POINTER field - e.g. <log file name + start
offset + length>.  At search time, for found documents you can retrieve
this ID/POINTER field and then fetch the document from the (original) log
file. Makes sense?


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org