You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by saisantoshi <sa...@gmail.com> on 2013/01/24 17:41:48 UTC
List of files that Lucene 4.0 generates during indexing
Is there any doc on how many files that lucene generates during indexing
(with 4.0) and what are those files? Once we migrate to 4.0, we would need
to validate looking at the index directory if the files that needs to be
generated was created in the first place. It helps for debugging purposes.
Can someone post a link to the doc for 4.0 generated files?
Thanks,
Sai.
--
View this message in context: http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: List of files that Lucene 4.0 generates during indexing
Posted by saisantoshi <sa...@gmail.com>.
Thanks. Could you please also comment on the following as well?
http://lucene.472066.n3.nabble.com/TopDocCollector-vs-TopScoreDocCollector-semantics-changed-in-4-0-not-backward-comptabile-td4035806.html
Thanks and really appreciate your help.
Thanks,
Sai.
--
View this message in context: http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036098.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: List of files that Lucene 4.0 generates during indexing
Posted by saisantoshi <sa...@gmail.com>.
The following files are originally created files (upon an initial indexing):
_0.fdt
_0.fdx
_0.fnm
_0.si
_0_Lucene40_0.frq
_0_Lucene40_0.prx
_0_Lucene40_0.tim
_0_Lucene40_0.tip
_0_nrm.cfe
_0_nrm.cfs
index.v0008
segments.gen
segments_1
But when I added a new document, in one case, I got several other files that
got generated apart from the above:
_0.fdt
_0.fdx
_0.fnm
_0.si
_0_Lucene40_0.frq
_0_Lucene40_0.prx
_0_Lucene40_0.tim
_0_Lucene40_0.tip
_0_nrm.cfe
_0_nrm.cfs
* _2.fdx // what is the significance of
these _2 prefix files.
_2.fnm
_2.si
_2_Lucene40_0.frq
_2_Lucene40_0.prx
_2_Lucene40_0.tim
_2_Lucene40_0.tip
_2_nrm.cfe
_2_nrm.cfs*
segments_3
Sometimes, it does create the _2 prefix files apart from incrementing the
segement_<N> version. Could anyone please let me know why those files (_2
prefix files are there in the index directory) are generated in the first
place and its importance/significance.
I haven't seen it generated for other updates hence would like to understand
the concept behind.
Thanks,
Sai.
--
View this message in context: http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4037530.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: List of files that Lucene 4.0 generates during indexing
Posted by Michael McCandless <lu...@mikemccandless.com>.
You get/set the merge policy on IndexWriterConfig (which you pass to
IndexWriter).
And then you can set this CFS ratio via that merge policy.
Mike McCandless
http://blog.mikemccandless.com
On Thu, Jan 24, 2013 at 5:35 PM, saisantoshi <sa...@gmail.com>wrote:
> Thanks a lot. One last question, how do we set it? IndexWriter.???
>
> Thanks,
> Ranjith.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036091.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: List of files that Lucene 4.0 generates during indexing
Posted by saisantoshi <sa...@gmail.com>.
Thanks a lot. One last question, how do we set it? IndexWriter.???
Thanks,
Ranjith.
--
View this message in context: http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036091.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: List of files that Lucene 4.0 generates during indexing
Posted by Michael McCandless <lu...@mikemccandless.com>.
I would leave the default until/unless something goes wrong ...
Mike McCandless
http://blog.mikemccandless.com
On Thu, Jan 24, 2013 at 5:28 PM, saisantoshi <sa...@gmail.com>wrote:
> Thanks. Are there any best practices to follow here? or leave the the
> default
> ( which is hybrid approach as you mentioned).
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036086.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: List of files that Lucene 4.0 generates during indexing
Posted by saisantoshi <sa...@gmail.com>.
Thanks. Are there any best practices to follow here? or leave the the default
( which is hybrid approach as you mentioned).
--
View this message in context: http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036086.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: List of files that Lucene 4.0 generates during indexing
Posted by Michael McCandless <lu...@mikemccandless.com>.
4.0 has a hybrid approach by default: "big" segments (> 10% of index size,
by default) are non-compound-files and small segments are compound files.
See TieredMergePolicy.setNoCFSRatio if you want to always use compound file
format.
Mike McCandless
http://blog.mikemccandless.com
On Thu, Jan 24, 2013 at 4:39 PM, saisantoshi <sa...@gmail.com>wrote:
> Thanks Michael. The additional file in the list is just a typo.
>
> One more question is, we were using 2.4 before, and it only generated few
> files
>
> _0.cfs
> _0.cfx
> // segment files
>
> I am assuming that the 2.4 version has the compound index structure enabled
> by default. Do we need to set it explicitly with 4.0 version. Is it not
> enabled by default? 4.0 seems to be using the multifile index structure. Is
> there any change in the behavior with the latest version.
>
> Thanks,
> Sai.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036075.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: List of files that Lucene 4.0 generates during indexing
Posted by saisantoshi <sa...@gmail.com>.
Thanks Michael. The additional file in the list is just a typo.
One more question is, we were using 2.4 before, and it only generated few
files
_0.cfs
_0.cfx
// segment files
I am assuming that the 2.4 version has the compound index structure enabled
by default. Do we need to set it explicitly with 4.0 version. Is it not
enabled by default? 4.0 seems to be using the multifile index structure. Is
there any change in the behavior with the latest version.
Thanks,
Sai.
--
View this message in context: http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036075.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: List of files that Lucene 4.0 generates during indexing
Posted by Michael McCandless <lu...@mikemccandless.com>.
That looks correct, except I don't know what index.v0008 is.
Mike McCandless
http://blog.mikemccandless.com
On Thu, Jan 24, 2013 at 1:22 PM, saisantoshi <sa...@gmail.com>wrote:
> Thanks. I checked it out.
>
> Here are the list of files that has been generated:
>
> _0.fdt
> _0.fdx
> _0.fnm
> _0.si
> _0_Lucene40_0.frq
> _0_Lucene40_0.prx
> _0_Lucene40_0.tim
> _0_Lucene40_0.tip
> _0_nrm.cfe
> _0_nrm.cfs
> index.v0008
> segments.gen
> segments_1
>
> My question is, are the above files are the right set of files that needs
> to
> be generated? I just want to make sure if there is a check list to verify
> what files must be there or any files that are missing from the above. I am
> looking to validate the above. The docs says various number of different
> file formats but does not explicit mention the necessary files for 4.0.
>
> Thanks,
> Sai.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036028.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: List of files that Lucene 4.0 generates during indexing
Posted by saisantoshi <sa...@gmail.com>.
Thanks. I checked it out.
Here are the list of files that has been generated:
_0.fdt
_0.fdx
_0.fnm
_0.si
_0_Lucene40_0.frq
_0_Lucene40_0.prx
_0_Lucene40_0.tim
_0_Lucene40_0.tip
_0_nrm.cfe
_0_nrm.cfs
index.v0008
segments.gen
segments_1
My question is, are the above files are the right set of files that needs to
be generated? I just want to make sure if there is a check list to verify
what files must be there or any files that are missing from the above. I am
looking to validate the above. The docs says various number of different
file formats but does not explicit mention the necessary files for 4.0.
Thanks,
Sai.
--
View this message in context: http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036028.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: List of files that Lucene 4.0 generates during indexing
Posted by Steve Rowe <sa...@gmail.com>.
Hi saisantoshi,
Check out the documentation: <http://lucene.apache.org/core/4_1_0/index.html> - particularly the "File Formats" link under "Reference Documents".
Steve
On Jan 24, 2013, at 11:41 AM, saisantoshi <sa...@gmail.com> wrote:
> Is there any doc on how many files that lucene generates during indexing
> (with 4.0) and what are those files? Once we migrate to 4.0, we would need
> to validate looking at the index directory if the files that needs to be
> generated was created in the first place. It helps for debugging purposes.
> Can someone post a link to the doc for 4.0 generated files?
>
> Thanks,
> Sai.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org