You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by saisantoshi <sa...@gmail.com> on 2013/01/24 17:41:48 UTC

List of files that Lucene 4.0 generates during indexing

Is there any doc on how many files that lucene generates during indexing
(with 4.0) and what are those files? Once we migrate to 4.0, we would need
to validate looking at the index directory if the files that needs to be
generated was created in the first place. It helps for debugging purposes.
Can someone post a link to the doc for 4.0 generated files?

Thanks,
Sai.



--
View this message in context: http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: List of files that Lucene 4.0 generates during indexing

Posted by saisantoshi <sa...@gmail.com>.
Thanks. Could you please also comment on the following as well?

http://lucene.472066.n3.nabble.com/TopDocCollector-vs-TopScoreDocCollector-semantics-changed-in-4-0-not-backward-comptabile-td4035806.html

Thanks and really appreciate your help.

Thanks,
Sai.



--
View this message in context: http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036098.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: List of files that Lucene 4.0 generates during indexing

Posted by saisantoshi <sa...@gmail.com>.
The following files are originally created files (upon an initial indexing):

  _0.fdt
        _0.fdx
        _0.fnm
        _0.si
        _0_Lucene40_0.frq
        _0_Lucene40_0.prx
        _0_Lucene40_0.tim
        _0_Lucene40_0.tip
        _0_nrm.cfe
        _0_nrm.cfs
        index.v0008
        segments.gen
        segments_1


But when I added a new document, in one case, I got several other files that
got generated apart from the above:

         _0.fdt
          _0.fdx
        _0.fnm
        _0.si
        _0_Lucene40_0.frq
        _0_Lucene40_0.prx
        _0_Lucene40_0.tim
        _0_Lucene40_0.tip
        _0_nrm.cfe
        _0_nrm.cfs
     *   _2.fdx                          // what is the significance of
these _2 prefix files.
        _2.fnm
        _2.si
        _2_Lucene40_0.frq
        _2_Lucene40_0.prx
        _2_Lucene40_0.tim
        _2_Lucene40_0.tip
        _2_nrm.cfe
        _2_nrm.cfs*
         segments_3


Sometimes, it does create the _2 prefix files apart from incrementing the
segement_<N> version. Could anyone please let me know why those files (_2
prefix files are there in the index directory) are generated in the first
place and its importance/significance.

I haven't seen it generated for other updates hence would like to understand
the concept behind.

Thanks,
Sai.





--
View this message in context: http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4037530.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: List of files that Lucene 4.0 generates during indexing

Posted by Michael McCandless <lu...@mikemccandless.com>.
You get/set the merge policy on IndexWriterConfig (which you pass to
IndexWriter).

And then you can set this CFS ratio via that merge policy.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Jan 24, 2013 at 5:35 PM, saisantoshi <sa...@gmail.com>wrote:

> Thanks a lot. One last question, how do we set it? IndexWriter.???
>
> Thanks,
> Ranjith.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036091.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: List of files that Lucene 4.0 generates during indexing

Posted by saisantoshi <sa...@gmail.com>.
Thanks a lot. One last question, how do we set it? IndexWriter.???

Thanks,
Ranjith.



--
View this message in context: http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036091.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: List of files that Lucene 4.0 generates during indexing

Posted by Michael McCandless <lu...@mikemccandless.com>.
I would leave the default until/unless something goes wrong ...

Mike McCandless

http://blog.mikemccandless.com

On Thu, Jan 24, 2013 at 5:28 PM, saisantoshi <sa...@gmail.com>wrote:

> Thanks. Are there any best practices to follow here? or leave the the
> default
> ( which is hybrid approach as you mentioned).
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036086.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: List of files that Lucene 4.0 generates during indexing

Posted by saisantoshi <sa...@gmail.com>.
Thanks. Are there any best practices to follow here? or leave the the default
( which is hybrid approach as you mentioned).



--
View this message in context: http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036086.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: List of files that Lucene 4.0 generates during indexing

Posted by Michael McCandless <lu...@mikemccandless.com>.
4.0 has a hybrid approach by default: "big" segments (> 10% of index size,
by default) are non-compound-files and small segments are compound files.

See TieredMergePolicy.setNoCFSRatio if you want to always use compound file
format.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Jan 24, 2013 at 4:39 PM, saisantoshi <sa...@gmail.com>wrote:

> Thanks Michael. The additional file in the list is just a typo.
>
> One more question is, we were using 2.4 before, and it only generated few
> files
>
> _0.cfs
> _0.cfx
> // segment files
>
> I am assuming that the 2.4 version has the compound index structure enabled
> by default. Do we need to set it explicitly with 4.0 version. Is it not
> enabled by default? 4.0 seems to be using the multifile index structure. Is
> there any change in the behavior with the latest version.
>
> Thanks,
> Sai.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036075.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: List of files that Lucene 4.0 generates during indexing

Posted by saisantoshi <sa...@gmail.com>.
Thanks Michael. The additional file in the list is just a typo.

One more question is, we were using 2.4 before, and it only generated few
files

_0.cfs
_0.cfx
// segment files

I am assuming that the 2.4 version has the compound index structure enabled
by default. Do we need to set it explicitly with 4.0 version. Is it not
enabled by default? 4.0 seems to be using the multifile index structure. Is
there any change in the behavior with the latest version.

Thanks,
Sai.



--
View this message in context: http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036075.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: List of files that Lucene 4.0 generates during indexing

Posted by Michael McCandless <lu...@mikemccandless.com>.
That looks correct, except I don't know what index.v0008 is.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Jan 24, 2013 at 1:22 PM, saisantoshi <sa...@gmail.com>wrote:

> Thanks. I checked it out.
>
> Here are the list of files that has been generated:
>
>         _0.fdt
>         _0.fdx
>         _0.fnm
>         _0.si
>         _0_Lucene40_0.frq
>         _0_Lucene40_0.prx
>         _0_Lucene40_0.tim
>         _0_Lucene40_0.tip
>         _0_nrm.cfe
>         _0_nrm.cfs
>         index.v0008
>         segments.gen
>         segments_1
>
> My question is, are the above files are the right set of files that needs
> to
> be generated? I just want to make sure if there is a check list to verify
> what files must be there or any files that are missing from the above. I am
> looking to validate the above. The docs says various number of different
> file formats but does not explicit mention the necessary files for 4.0.
>
> Thanks,
> Sai.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036028.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: List of files that Lucene 4.0 generates during indexing

Posted by saisantoshi <sa...@gmail.com>.
Thanks. I checked it out.

Here are the list of files that has been generated:

        _0.fdt
	_0.fdx
	_0.fnm
	_0.si
	_0_Lucene40_0.frq
	_0_Lucene40_0.prx
	_0_Lucene40_0.tim
	_0_Lucene40_0.tip
	_0_nrm.cfe
	_0_nrm.cfs
	index.v0008
	segments.gen
	segments_1

My question is, are the above files are the right set of files that needs to
be generated? I just want to make sure if there is a check list to verify
what files must be there or any files that are missing from the above. I am
looking to validate the above. The docs says various number of different
file formats but does not explicit mention the necessary files for 4.0.

Thanks,
Sai.



--
View this message in context: http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036028.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: List of files that Lucene 4.0 generates during indexing

Posted by Steve Rowe <sa...@gmail.com>.
Hi saisantoshi,

Check out the documentation: <http://lucene.apache.org/core/4_1_0/index.html> - particularly the "File Formats" link under "Reference Documents".

Steve

On Jan 24, 2013, at 11:41 AM, saisantoshi <sa...@gmail.com> wrote:

> Is there any doc on how many files that lucene generates during indexing
> (with 4.0) and what are those files? Once we migrate to 4.0, we would need
> to validate looking at the index directory if the files that needs to be
> generated was created in the first place. It helps for debugging purposes.
> Can someone post a link to the doc for 4.0 generated files?
> 
> Thanks,
> Sai.
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org