You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by sol myr <so...@yahoo.com> on 2011/01/10 18:56:36 UTC

Newbie question: optimized files?

Hi, 

I'm new to Lucene (using 3.0.3), and just started to check out the behavior of the 'optimize()' method (which is quite important for our application).
Could it be that 'optimize' cancels out the 'compoundFile' mode? Or am I doing something wrong? 

Here's my test: I create an indexWriter with compoundFile=true, then perform some writes+commits (which generates several 'cfs' files).
Then I call 'optimize()'... I expected this to yield a single optimized 'cfs' file, but instead I get lots of different files - 'fdt', 'fdx', 'fnm' etc...

The detailed code:
// Create indexWriter with 'compoundfile=true':
Version version=Version.LUCENE_30;
Directory dir = FSDirectory.open(new File("c:/luceneTemp"));
Analyzer analyzer = new StandardAnalyzer(version);
IndexWriter writer = new IndexWriter(
         dir, analyzer, true, IndexWriter.MaxFieldLength.LIMITED);
writer.setUseCompoundFile(true);

// Perform some writes + commits.
// This yields several 'cfs' files as expected:
writer.addDocument(...);
writer.commit();
writer.addDocument(...);

writer.commit();
Thread.sleep(20000);  // give myself time to see the generated 'cfs' files

// Optimize - why does it yield lots of separate files ('fdt', 'fdx', etc)?
writer.optimize();


Thanks.



      

Re: Newbie question: optimized files?

Posted by sol myr <so...@yahoo.com>.
That was it - thank :)

--- On Tue, 1/11/11, Simon Willnauer <si...@googlemail.com> wrote:

From: Simon Willnauer <si...@googlemail.com>
Subject: Re: Newbie question: optimized files?
To: java-user@lucene.apache.org
Date: Tuesday, January 11, 2011, 12:06 AM

Hey,

this looks like you are hitting the optimization done in LUCENE-2773
(https://issues.apache.org/jira/browse/LUCENE-2773) that prevents
merged segments that are larger than 10% (by default - see
LogMergePolicy#noCFSRatio) of the index size it will be left in
non-compound format even if compound format is on.

for some background see:
http://lucene.apache.org/java/3_0_3/changes/Changes.html#3.0.3.changes_in_runtime_behavior

so its a feature not a bug :)

Simon

On Tue, Jan 11, 2011 at 7:46 AM, sol myr <so...@yahoo.com> wrote:
> Hi,
>
> Continuing my question - I now suspect a bug in Lucene 3.0.3, because I ran the test with Lucene 3.0.0 and it worked okay (no junk files)... could anyone please confirm?
>
> --- On Mon, 1/10/11, sol myr <so...@yahoo.com> wrote:
>
> From: sol myr <so...@yahoo.com>
> Subject: Newbie question: optimized files?
> To: java-user@lucene.apache.org
> Date: Monday, January 10, 2011, 9:56 AM
>
> Hi,
>
> I'm new to Lucene (using 3.0.3), and just started to check out the behavior of the 'optimize()' method (which is quite important for our application).
> Could it be that 'optimize' cancels out the 'compoundFile' mode? Or am I doing something wrong?
>
> Here's my test: I create an indexWriter with compoundFile=true, then perform some writes+commits (which generates several 'cfs' files).
> Then I call 'optimize()'... I expected this to yield a single optimized 'cfs' file, but instead I get lots of different files - 'fdt', 'fdx', 'fnm' etc...
>
> The detailed code:
> // Create indexWriter with 'compoundfile=true':
> Version version=Version.LUCENE_30;
> Directory dir = FSDirectory.open(new File("c:/luceneTemp"));
> Analyzer analyzer = new StandardAnalyzer(version);
> IndexWriter writer = new IndexWriter(
>          dir, analyzer, true, IndexWriter.MaxFieldLength.LIMITED);
> writer.setUseCompoundFile(true);
>
> // Perform some writes + commits.
> // This yields several 'cfs' files as expected:
> writer.addDocument(...);
> writer.commit();
> writer.addDocument(...);
>
> writer.commit();
> Thread.sleep(20000);  // give myself time to see the generated 'cfs' files
>
> // Optimize - why does it yield lots of separate files ('fdt', 'fdx', etc)?
> writer.optimize();
>
>
> Thanks.
>
>
>
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




      

Re: Newbie question: optimized files?

Posted by Simon Willnauer <si...@googlemail.com>.
Hey,

this looks like you are hitting the optimization done in LUCENE-2773
(https://issues.apache.org/jira/browse/LUCENE-2773) that prevents
merged segments that are larger than 10% (by default - see
LogMergePolicy#noCFSRatio) of the index size it will be left in
non-compound format even if compound format is on.

for some background see:
http://lucene.apache.org/java/3_0_3/changes/Changes.html#3.0.3.changes_in_runtime_behavior

so its a feature not a bug :)

Simon

On Tue, Jan 11, 2011 at 7:46 AM, sol myr <so...@yahoo.com> wrote:
> Hi,
>
> Continuing my question - I now suspect a bug in Lucene 3.0.3, because I ran the test with Lucene 3.0.0 and it worked okay (no junk files)... could anyone please confirm?
>
> --- On Mon, 1/10/11, sol myr <so...@yahoo.com> wrote:
>
> From: sol myr <so...@yahoo.com>
> Subject: Newbie question: optimized files?
> To: java-user@lucene.apache.org
> Date: Monday, January 10, 2011, 9:56 AM
>
> Hi,
>
> I'm new to Lucene (using 3.0.3), and just started to check out the behavior of the 'optimize()' method (which is quite important for our application).
> Could it be that 'optimize' cancels out the 'compoundFile' mode? Or am I doing something wrong?
>
> Here's my test: I create an indexWriter with compoundFile=true, then perform some writes+commits (which generates several 'cfs' files).
> Then I call 'optimize()'... I expected this to yield a single optimized 'cfs' file, but instead I get lots of different files - 'fdt', 'fdx', 'fnm' etc...
>
> The detailed code:
> // Create indexWriter with 'compoundfile=true':
> Version version=Version.LUCENE_30;
> Directory dir = FSDirectory.open(new File("c:/luceneTemp"));
> Analyzer analyzer = new StandardAnalyzer(version);
> IndexWriter writer = new IndexWriter(
>          dir, analyzer, true, IndexWriter.MaxFieldLength.LIMITED);
> writer.setUseCompoundFile(true);
>
> // Perform some writes + commits.
> // This yields several 'cfs' files as expected:
> writer.addDocument(...);
> writer.commit();
> writer.addDocument(...);
>
> writer.commit();
> Thread.sleep(20000);  // give myself time to see the generated 'cfs' files
>
> // Optimize - why does it yield lots of separate files ('fdt', 'fdx', etc)?
> writer.optimize();
>
>
> Thanks.
>
>
>
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Newbie question: optimized files?

Posted by sol myr <so...@yahoo.com>.
Hi,

Continuing my question - I now suspect a bug in Lucene 3.0.3, because I ran the test with Lucene 3.0.0 and it worked okay (no junk files)... could anyone please confirm?

--- On Mon, 1/10/11, sol myr <so...@yahoo.com> wrote:

From: sol myr <so...@yahoo.com>
Subject: Newbie question: optimized files?
To: java-user@lucene.apache.org
Date: Monday, January 10, 2011, 9:56 AM

Hi, 

I'm new to Lucene (using 3.0.3), and just started to check out the behavior of the 'optimize()' method (which is quite important for our application).
Could it be that 'optimize' cancels out the 'compoundFile' mode? Or am I doing something wrong? 

Here's my test: I create an indexWriter with compoundFile=true, then perform some writes+commits (which generates several 'cfs' files).
Then I call 'optimize()'... I expected this to yield a single optimized 'cfs' file, but instead I get lots of different files - 'fdt', 'fdx', 'fnm' etc...

The detailed code:
// Create indexWriter with 'compoundfile=true':
Version version=Version.LUCENE_30;
Directory dir = FSDirectory.open(new File("c:/luceneTemp"));
Analyzer analyzer = new StandardAnalyzer(version);
IndexWriter writer = new IndexWriter(
         dir, analyzer, true, IndexWriter.MaxFieldLength.LIMITED);
writer.setUseCompoundFile(true);

// Perform some writes + commits.
// This yields several 'cfs' files as expected:
writer.addDocument(...);
writer.commit();
writer.addDocument(...);

writer.commit();
Thread.sleep(20000);  // give myself time to see the generated 'cfs' files

// Optimize - why does it yield lots of separate files ('fdt', 'fdx', etc)?
writer.optimize();


Thanks.