You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Kauler, Leto S" <le...@education.tas.gov.au> on 2005/01/28 00:18:19 UTC

Disk space used by optimize

Just a quick question:  after writing an index and then calling
optimize(), is it normal for the index to expand to about three times
the size before finally compressing?

In our case the optimise grinds the disk, expanding the index into many
files of about 145MB total, before compressing down to three files of
about 47MB total.  That must be a lot of disk activity for the people
with multi-gigabyte indexes!

Regards,
Leto

CONFIDENTIALITY NOTICE AND DISCLAIMER

Information in this transmission is intended only for the person(s) to whom it is addressed and may contain privileged and/or confidential information. If you are not the intended recipient, any disclosure, copying or dissemination of the information is unauthorised and you should delete/destroy all copies and notify the sender. No liability is accepted for any unauthorised use of the information contained in this transmission.

This disclaimer has been automatically added.

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Disk space used by optimize

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Morus,

that description of 3 sets of index files is what I was imagining, too.
 I'll have to test and add to the book errata, it seems.

Thanks for the info,
Otis

--- Morus Walter <mo...@tanto.de> wrote:

> Otis Gospodnetic writes:
> > Hello,
> > 
> > Yes, that is how optimize works - copies all existing index
> segments
> > into one unified index segment, thus optimizing it.
> > 
> > see hit #1:
> http://www.lucenebook.com/search?query=optimize+disk+space
> > 
> > However, three times the space sounds a bit too much, or I make a
> > mistake in the book. :)
> > 
> I cannot explain why, but ~ three times the size of the final index
> is
> what I observed, when I logged disk usage during optimize of an index
> in compound index format.
> The test was on linux, I simply did a 'du -s' every few seconds
> parallel 
> to the optimize.
> I didn't test noncompund format. Probably optimizing a compund format
> requires to store the different parts of the compound file separately
> before joining them to the compound file (sound reasonable, otherwise
> you would need to know the sizes before creating the parts). In that
> case 
> you had the original index, the separate files and the new compound
> file 
> as the disk usage peak.
> 
> So IMHO the book is wrong.
> 
> Morus
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Disk space used by optimize

Posted by Morus Walter <mo...@tanto.de>.
Otis Gospodnetic writes:
> Hello,
> 
> Yes, that is how optimize works - copies all existing index segments
> into one unified index segment, thus optimizing it.
> 
> see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space
> 
> However, three times the space sounds a bit too much, or I make a
> mistake in the book. :)
> 
I cannot explain why, but ~ three times the size of the final index is
what I observed, when I logged disk usage during optimize of an index
in compound index format.
The test was on linux, I simply did a 'du -s' every few seconds parallel 
to the optimize.
I didn't test noncompund format. Probably optimizing a compund format
requires to store the different parts of the compound file separately
before joining them to the compound file (sound reasonable, otherwise
you would need to know the sizes before creating the parts). In that case 
you had the original index, the separate files and the new compound file 
as the disk usage peak.

So IMHO the book is wrong.

Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Disk space used by optimize

Posted by Morus Walter <mo...@tanto.de>.
Bernhard Messer writes:
> 
> >However, three times the space sounds a bit too much, or I make a
> >mistake in the book. :)
> >  
> >
> there already was  a discussion about disk usage during index optimize. 
> Please have a look to the developers list at: 
> http://mail-archives.apache.org/eyebrowse/ReadMsg?listName=lucene-dev@jakarta.apache.org&msgId=1797569 
> <http://mail-archives.apache.org/eyebrowse/ReadMsg?listName=lucene-dev@jakarta.apache.org&msgId=1797569>
> where i made some measurements about the disk usage within lucene.
> At that time i proposed a patch which was reducing disk total used disk 
> size from 3 times to a little more than 2 times of the final index size. 
> Together with Christoph we implemented some improvements to the 
> optimization patch and finally commit the changes.
> 
Hmm. In the case that the index is used (open reader), I doubt your patch 
makes a difference. In that case the disk space used by the non optimized 
index will still be used even if the files are deleted (on unix/linux).
What happens, if disk space run's out during creation of the compound index?
Will the non compound files be a usable index?
Otherwise you risk to loose the index.

Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Disk space used by optimize

Posted by Bernhard Messer <be...@intrafind.de>.
>However, three times the space sounds a bit too much, or I make a
>mistake in the book. :)
>  
>
there already was  a discussion about disk usage during index optimize. 
Please have a look to the developers list at: 
http://mail-archives.apache.org/eyebrowse/ReadMsg?listName=lucene-dev@jakarta.apache.org&msgId=1797569 
<http://mail-archives.apache.org/eyebrowse/ReadMsg?listName=lucene-dev@jakarta.apache.org&msgId=1797569>
where i made some measurements about the disk usage within lucene.
At that time i proposed a patch which was reducing disk total used disk 
size from 3 times to a little more than 2 times of the final index size. 
Together with Christoph we implemented some improvements to the 
optimization patch and finally commit the changes.

Bernhard

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Disk space used by optimize

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hello,

Yes, that is how optimize works - copies all existing index segments
into one unified index segment, thus optimizing it.

see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space

However, three times the space sounds a bit too much, or I make a
mistake in the book. :)

You said you end up with 3 files - .cfs is one of them, right?

Otis


--- "Kauler, Leto S" <le...@education.tas.gov.au> wrote:

> 
> Just a quick question:  after writing an index and then calling
> optimize(), is it normal for the index to expand to about three times
> the size before finally compressing?
> 
> In our case the optimise grinds the disk, expanding the index into
> many
> files of about 145MB total, before compressing down to three files of
> about 47MB total.  That must be a lot of disk activity for the people
> with multi-gigabyte indexes!
> 
> Regards,
> Leto
> 
> CONFIDENTIALITY NOTICE AND DISCLAIMER
> 
> Information in this transmission is intended only for the person(s)
> to whom it is addressed and may contain privileged and/or
> confidential information. If you are not the intended recipient, any
> disclosure, copying or dissemination of the information is
> unauthorised and you should delete/destroy all copies and notify the
> sender. No liability is accepted for any unauthorised use of the
> information contained in this transmission.
> 
> This disclaimer has been automatically added.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org