You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by AarKay <ks...@gmail.com> on 2013/05/08 10:23:48 UTC
IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS vs storing
positions and offsets/
I see that Lucene 4.x has FieldInfo.IndexOptions that can be used to tell
lucene whether to Index Documents/Frequencies/Positions/Offsets.
We are in the process of upgrading from Lucene 2.9 to Lucene 4.x and I was
wondering if there was a way to tell lucene whether to index
docs/freqs/pos/offsets or not in the older versions (2.9) or did it always
index positions and offsets by default?
Also I see that Lucene 4.x has FieldType.setStoreTermVectorPositions and
FieldType.setStoreTermVectorOffsets.
Can someone please tell me a usecase for storing positions and offsets in
index?
Is it necessary to store termvector positions and offsets when using
IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS?
Thanks
-AarKay
Re: IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS vs storing
positions and offsets/
Posted by Michael McCandless <lu...@mikemccandless.com>.
On Wed, May 8, 2013 at 9:03 AM, AarKay <ks...@gmail.com> wrote:
> Thanks Mike. This is little bit clear to me now.
>
> Just to make sure I got it right, do you mean that we need to store just
> the offsets and set IndexOptions to DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
> to be able to use PostingsHighlighter?
> Also we don't need to store TermVectors and Positions. Correct?
That's correct.
> I believe usecase for storing TermVectors and Positions is to use other
> highlighter (FastVectorHighlighter)
Yes. The original highlighter will also use TermVectors if they were indexed.
Mike McCandless
http://blog.mikemccandless.com
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS vs storing
positions and offsets/
Posted by AarKay <ks...@gmail.com>.
Thanks Mike. This is little bit clear to me now.
Just to make sure I got it right, do you mean that we need to store just
the offsets and set IndexOptions to DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
to be able to use PostingsHighlighter?
Also we don't need to store TermVectors and Positions. Correct?
I believe usecase for storing TermVectors and Positions is to use other
highlighter (FastVectorHighlighter)
On Wed, May 8, 2013 at 5:59 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:
> On Wed, May 8, 2013 at 4:23 AM, AarKay <ks...@gmail.com> wrote:
> > I see that Lucene 4.x has FieldInfo.IndexOptions that can be used to tell
> > lucene whether to Index Documents/Frequencies/Positions/Offsets.
> >
> > We are in the process of upgrading from Lucene 2.9 to Lucene 4.x and I
> was
> > wondering if there was a way to tell lucene whether to index
> > docs/freqs/pos/offsets or not in the older versions (2.9) or did it
> always
> > index positions and offsets by default?
>
> I believe in 2.9 you could only say "docs"
> (omitTermFreqAndPositions=true), or "docs+freqs+positions". Offsets
> are new in 4.x.
>
> > Also I see that Lucene 4.x has FieldType.setStoreTermVectorPositions and
> > FieldType.setStoreTermVectorOffsets.
> > Can someone please tell me a usecase for storing positions and offsets in
> > index?
>
> Storing offsets in the index (postings) lets you use the new
> PostingsHighlighter. It should be faster than the other two
> highlighters which rely on term vectors or on re-analysis at search
> time.
>
> > Is it necessary to store termvector positions and offsets when using
> > IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS?
>
> No.
>
> Term vectors are stored separately from postings (IndexOptions
> controls what's put into the postings).
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS vs storing
positions and offsets/
Posted by Michael McCandless <lu...@mikemccandless.com>.
On Wed, May 8, 2013 at 4:23 AM, AarKay <ks...@gmail.com> wrote:
> I see that Lucene 4.x has FieldInfo.IndexOptions that can be used to tell
> lucene whether to Index Documents/Frequencies/Positions/Offsets.
>
> We are in the process of upgrading from Lucene 2.9 to Lucene 4.x and I was
> wondering if there was a way to tell lucene whether to index
> docs/freqs/pos/offsets or not in the older versions (2.9) or did it always
> index positions and offsets by default?
I believe in 2.9 you could only say "docs"
(omitTermFreqAndPositions=true), or "docs+freqs+positions". Offsets
are new in 4.x.
> Also I see that Lucene 4.x has FieldType.setStoreTermVectorPositions and
> FieldType.setStoreTermVectorOffsets.
> Can someone please tell me a usecase for storing positions and offsets in
> index?
Storing offsets in the index (postings) lets you use the new
PostingsHighlighter. It should be faster than the other two
highlighters which rely on term vectors or on re-analysis at search
time.
> Is it necessary to store termvector positions and offsets when using
> IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS?
No.
Term vectors are stored separately from postings (IndexOptions
controls what's put into the postings).
Mike McCandless
http://blog.mikemccandless.com
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org