You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by aurora <au...@gmail.com> on 2004/12/21 05:49:03 UTC

index size doubled?

I'm testing the rebuilding of the index. I add several hundred documents,  
optimize and add another few hundred and so on. Right now I have around  
7000 files. I observed after the index gets to certain size. Everytime  
after optimize, the are two files roughly the same size like below:

12/20/2004  01:57p                  13 deletable
12/20/2004  01:57p                  29 segments
12/20/2004  01:53p          14,460,367 _5qf.cfs
12/20/2004  01:57p          15,069,013 _5zr.cfs

The index total index is double of what I expect. This is not always  
reproducible. (I'm constantly tuning my program and the set of document).  
Sometime I get a decent single document after optimize. What was happening?


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: index size doubled?

Posted by Otis Gospodnetic <ot...@yahoo.com>.
You don't need to optimize to simulate an incremental update.  You just
have to re-open your index with the IndexSearcher to see newly added
documents.

Otis

--- aurora <au...@gmail.com> wrote:

> Thanks for the heads up. I'm using Lucene 1.4.2.
> 
> I tried to do optimize() again but it has no effect. Adding a just
> tiny  
> dummy document would get rid of it.
> 
> I'm doing optimize every few hundred documents because I tried to
> simulate  
> incremental update. This lead to another question I would post
> separately.
> 
> Thanks.
> 
> 
> > Another possibility is that you are using an older version of
> Lucene,
> > which was known to have a bug with similar symptoms.  Get the
> latest
> > version of Lucene.
> >
> > You shouldn't really have multiple .cfs files after optimizing your
> > index.  Also, optimize only at the end, if you care about indexing
> > speed.
> >
> > Otis
> >
> > --- Paul Elschot <pa...@xs4all.nl> wrote:
> >
> >> On Tuesday 21 December 2004 05:49, aurora wrote:
> >> > I'm testing the rebuilding of the index. I add several hundred
> >> documents,
> >> > optimize and add another few hundred and so on. Right now I have
> >> around
> >> > 7000 files. I observed after the index gets to certain size.
> >> Everytime
> >> > after optimize, the are two files roughly the same size like
> below:
> >> >
> >> > 12/20/2004  01:57p                  13 deletable
> >> > 12/20/2004  01:57p                  29 segments
> >> > 12/20/2004  01:53p          14,460,367 _5qf.cfs
> >> > 12/20/2004  01:57p          15,069,013 _5zr.cfs
> >> >
> >> > The index total index is double of what I expect. This is not
> >> always
> >> > reproducible. (I'm constantly tuning my program and the set of
> >> document).
> >> > Sometime I get a decent single document after optimize. What was
> >> happening?
> >>
> >> Lucene tried to delete the older version (_5cf.cfs above), but got
> an
> >> error
> >> back from the file system. After that it has put the name of that
> >> segment in
> >> the deletable file, so it can try later to delete that segment.
> >>
> >> This is known behaviour on FAT file systems. These randomly take
> some
> >> time
> >> for themselves to finish closing a file after it has been
> correctly
> >> closed by
> >> a program.
> >>
> >> Regards,
> >> Paul Elschot
> >>
> >>
> >>
> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >> For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> >>
> >>
> 
> 
> 
> -- 
> Using Opera's revolutionary e-mail client: http://www.opera.com/m2/
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: index size doubled?

Posted by aurora <au...@gmail.com>.
Thanks for the heads up. I'm using Lucene 1.4.2.

I tried to do optimize() again but it has no effect. Adding a just tiny  
dummy document would get rid of it.

I'm doing optimize every few hundred documents because I tried to simulate  
incremental update. This lead to another question I would post separately.

Thanks.


> Another possibility is that you are using an older version of Lucene,
> which was known to have a bug with similar symptoms.  Get the latest
> version of Lucene.
>
> You shouldn't really have multiple .cfs files after optimizing your
> index.  Also, optimize only at the end, if you care about indexing
> speed.
>
> Otis
>
> --- Paul Elschot <pa...@xs4all.nl> wrote:
>
>> On Tuesday 21 December 2004 05:49, aurora wrote:
>> > I'm testing the rebuilding of the index. I add several hundred
>> documents,
>> > optimize and add another few hundred and so on. Right now I have
>> around
>> > 7000 files. I observed after the index gets to certain size.
>> Everytime
>> > after optimize, the are two files roughly the same size like below:
>> >
>> > 12/20/2004  01:57p                  13 deletable
>> > 12/20/2004  01:57p                  29 segments
>> > 12/20/2004  01:53p          14,460,367 _5qf.cfs
>> > 12/20/2004  01:57p          15,069,013 _5zr.cfs
>> >
>> > The index total index is double of what I expect. This is not
>> always
>> > reproducible. (I'm constantly tuning my program and the set of
>> document).
>> > Sometime I get a decent single document after optimize. What was
>> happening?
>>
>> Lucene tried to delete the older version (_5cf.cfs above), but got an
>> error
>> back from the file system. After that it has put the name of that
>> segment in
>> the deletable file, so it can try later to delete that segment.
>>
>> This is known behaviour on FAT file systems. These randomly take some
>> time
>> for themselves to finish closing a file after it has been correctly
>> closed by
>> a program.
>>
>> Regards,
>> Paul Elschot
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>



-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/m2/


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: index size doubled?

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Another possibility is that you are using an older version of Lucene,
which was known to have a bug with similar symptoms.  Get the latest
version of Lucene.

You shouldn't really have multiple .cfs files after optimizing your
index.  Also, optimize only at the end, if you care about indexing
speed.

Otis

--- Paul Elschot <pa...@xs4all.nl> wrote:

> On Tuesday 21 December 2004 05:49, aurora wrote:
> > I'm testing the rebuilding of the index. I add several hundred
> documents,  
> > optimize and add another few hundred and so on. Right now I have
> around  
> > 7000 files. I observed after the index gets to certain size.
> Everytime  
> > after optimize, the are two files roughly the same size like below:
> > 
> > 12/20/2004  01:57p                  13 deletable
> > 12/20/2004  01:57p                  29 segments
> > 12/20/2004  01:53p          14,460,367 _5qf.cfs
> > 12/20/2004  01:57p          15,069,013 _5zr.cfs
> > 
> > The index total index is double of what I expect. This is not
> always  
> > reproducible. (I'm constantly tuning my program and the set of
> document).  
> > Sometime I get a decent single document after optimize. What was
> happening?
> 
> Lucene tried to delete the older version (_5cf.cfs above), but got an
> error
> back from the file system. After that it has put the name of that
> segment in
> the deletable file, so it can try later to delete that segment.
> 
> This is known behaviour on FAT file systems. These randomly take some
> time
> for themselves to finish closing a file after it has been correctly
> closed by
> a program.
> 
> Regards,
> Paul Elschot
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: index size doubled?

Posted by Paul Elschot <pa...@xs4all.nl>.
On Tuesday 21 December 2004 05:49, aurora wrote:
> I'm testing the rebuilding of the index. I add several hundred documents,  
> optimize and add another few hundred and so on. Right now I have around  
> 7000 files. I observed after the index gets to certain size. Everytime  
> after optimize, the are two files roughly the same size like below:
> 
> 12/20/2004  01:57p                  13 deletable
> 12/20/2004  01:57p                  29 segments
> 12/20/2004  01:53p          14,460,367 _5qf.cfs
> 12/20/2004  01:57p          15,069,013 _5zr.cfs
> 
> The index total index is double of what I expect. This is not always  
> reproducible. (I'm constantly tuning my program and the set of document).  
> Sometime I get a decent single document after optimize. What was happening?

Lucene tried to delete the older version (_5cf.cfs above), but got an error
back from the file system. After that it has put the name of that segment in
the deletable file, so it can try later to delete that segment.

This is known behaviour on FAT file systems. These randomly take some time
for themselves to finish closing a file after it has been correctly closed by
a program.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org