You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2008/03/25 02:34:48 UTC

Re: how to control the disk size of the indices

Hi Yannis,

I don't think there is anything of that sort in Lucene, but this shouldn't be hard to do with a process outside Lucene.  Of course. optimizing an index increases its size temporarily, so your external process would have to take that into account and play it safe.  You could also set mergeFactor to 1, which should keep your index in a fully optimized state if you don't do any deletions and near-optimized state if you do deletions.

You should discuss this on java-user list, though, so I'm CCing that list where you can continue the discussion.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: Yannis Pavlidis <yp...@me.dium.com>
To: general@lucene.apache.org
Sent: Monday, March 24, 2008 7:33:26 PM
Subject: how to control the disk size of the indices


Hi all,

I wanted to ask the list whether there is an easy and efficient way to manage the size (in bytes) of a lucene index stored on disk.

Basically I would like to limit lucene storing only 100 GB of information. When lucene reaches that limit then I would delete the documents (using an LRU algorithm based on timestaps) but in no case the disk space occupied by Lucene should exceed 100GB.

I experimented with lucene 2.3.1 and the only I could accomplish that was by calling the optimize method (after the index size exceeded the max size) on the IndexWriter. I was looking for a more performant way to "perhaps control" Lucene on when to merge the segments so as to not exceed the pre-set limit.

Any ideas or suggestions would be highly appreciated.

Thanks in advance,

Yannis.




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: how to control the disk size of the indices

Posted by Yonik Seeley <yo...@apache.org>.
On Mon, Mar 24, 2008 at 9:34 PM, Otis Gospodnetic
<ot...@yahoo.com> wrote:
> Hi Yannis,
>
>  I don't think there is anything of that sort in Lucene, but this shouldn't be hard to do with a process outside Lucene.  Of course. optimizing an index increases its size temporarily, so your external process would have to take that into account and play it safe.  You could also set mergeFactor to 1, which should keep your index in a fully optimized state

MergeFactor must be >= 2

You will always need to allow for double the index size due to
increased temporary disk usage during segment merges (including
optimize).   Peak use on a system being searched and indexed
concurrently will often be even higher since currently open readers
reference files that have been deleted.

-Yonik

Re: how to control the disk size of the indices

Posted by Yonik Seeley <yo...@apache.org>.
On Mon, Mar 24, 2008 at 9:34 PM, Otis Gospodnetic
<ot...@yahoo.com> wrote:
> Hi Yannis,
>
>  I don't think there is anything of that sort in Lucene, but this shouldn't be hard to do with a process outside Lucene.  Of course. optimizing an index increases its size temporarily, so your external process would have to take that into account and play it safe.  You could also set mergeFactor to 1, which should keep your index in a fully optimized state

MergeFactor must be >= 2

You will always need to allow for double the index size due to
increased temporary disk usage during segment merges (including
optimize).   Peak use on a system being searched and indexed
concurrently will often be even higher since currently open readers
reference files that have been deleted.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org