You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by AarKay <ks...@gmail.com> on 2013/05/08 10:23:48 UTC

IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS vs storing positions and offsets/

I see that Lucene 4.x has FieldInfo.IndexOptions that can be used to tell
lucene whether to Index Documents/Frequencies/Positions/Offsets.

We are in the process of upgrading from Lucene 2.9 to Lucene 4.x and I was
wondering if there was a way to tell lucene whether to index
docs/freqs/pos/offsets or not in the older versions (2.9) or did it always
index positions and offsets by default?

Also I see that Lucene 4.x has FieldType.setStoreTermVectorPositions and
FieldType.setStoreTermVectorOffsets.
Can someone please tell me a usecase for storing positions and offsets in
index?
Is it necessary to store termvector positions and offsets when using
IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS?

Thanks
-AarKay

Re: IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS vs storing positions and offsets/

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Wed, May 8, 2013 at 9:03 AM, AarKay <ks...@gmail.com> wrote:
> Thanks Mike. This is little bit clear to me now.
>
> Just to make sure I got it right, do you mean that we need to store just
> the offsets and set IndexOptions to DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
> to be able to use PostingsHighlighter?
> Also we don't need to store TermVectors and Positions. Correct?

That's correct.

> I believe usecase for storing TermVectors and Positions is to use other
> highlighter (FastVectorHighlighter)

Yes.  The original highlighter will also use TermVectors if they were indexed.

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS vs storing positions and offsets/

Posted by AarKay <ks...@gmail.com>.
Thanks Mike. This is little bit clear to me now.

Just to make sure I got it right, do you mean that we need to store just
the offsets and set IndexOptions to DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
to be able to use PostingsHighlighter?
Also we don't need to store TermVectors and Positions. Correct?

I believe usecase for storing TermVectors and Positions is to use other
highlighter (FastVectorHighlighter)



On Wed, May 8, 2013 at 5:59 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Wed, May 8, 2013 at 4:23 AM, AarKay <ks...@gmail.com> wrote:
> > I see that Lucene 4.x has FieldInfo.IndexOptions that can be used to tell
> > lucene whether to Index Documents/Frequencies/Positions/Offsets.
> >
> > We are in the process of upgrading from Lucene 2.9 to Lucene 4.x and I
> was
> > wondering if there was a way to tell lucene whether to index
> > docs/freqs/pos/offsets or not in the older versions (2.9) or did it
> always
> > index positions and offsets by default?
>
> I believe in 2.9 you could only say "docs"
> (omitTermFreqAndPositions=true), or "docs+freqs+positions".  Offsets
> are new in 4.x.
>
> > Also I see that Lucene 4.x has FieldType.setStoreTermVectorPositions and
> > FieldType.setStoreTermVectorOffsets.
> > Can someone please tell me a usecase for storing positions and offsets in
> > index?
>
> Storing offsets in the index (postings) lets you use the new
> PostingsHighlighter.  It should be faster than the other two
> highlighters which rely on term vectors or on re-analysis at search
> time.
>
> > Is it necessary to store termvector positions and offsets when using
> > IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS?
>
> No.
>
> Term vectors are stored separately from postings (IndexOptions
> controls what's put into the postings).
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS vs storing positions and offsets/

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Wed, May 8, 2013 at 4:23 AM, AarKay <ks...@gmail.com> wrote:
> I see that Lucene 4.x has FieldInfo.IndexOptions that can be used to tell
> lucene whether to Index Documents/Frequencies/Positions/Offsets.
>
> We are in the process of upgrading from Lucene 2.9 to Lucene 4.x and I was
> wondering if there was a way to tell lucene whether to index
> docs/freqs/pos/offsets or not in the older versions (2.9) or did it always
> index positions and offsets by default?

I believe in 2.9 you could only say "docs"
(omitTermFreqAndPositions=true), or "docs+freqs+positions".  Offsets
are new in 4.x.

> Also I see that Lucene 4.x has FieldType.setStoreTermVectorPositions and
> FieldType.setStoreTermVectorOffsets.
> Can someone please tell me a usecase for storing positions and offsets in
> index?

Storing offsets in the index (postings) lets you use the new
PostingsHighlighter.  It should be faster than the other two
highlighters which rely on term vectors or on re-analysis at search
time.

> Is it necessary to store termvector positions and offsets when using
> IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS?

No.

Term vectors are stored separately from postings (IndexOptions
controls what's put into the postings).

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org