You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by bu...@apache.org on 2004/08/19 14:10:10 UTC
DO NOT REPLY [Bug 18927] -
[PATCH] Term Vector support
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=18927>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=18927
[PATCH] Term Vector support
------- Additional Comments From grant_ingersoll@yahoo.com 2004-08-19 12:10 -------
Term Vector support now has optional support for storing
Token.getPositionIncrement() and Token.startOffset() and Token.endOffset()
information. Control of this is done through the standard Field creation
methods. All options are backward compatible (position and offset information
will _not_ be stored by default). Added many new test cases to demonstrate
functionality. There are two new files needed: SegmentTermPositionVector and
TermVectorOffsetInfo. All tests pass as of 8/19/04 in the AM.
Attached should be 1 patch file plus a zip containing 2 new files.
What is this info good for?
1. I think the highlighter could use this info (offset) instead of reparsing
every document at runtime
2. Many IR algorithms need character position, etc.
3. Others??
Remember, the values stored are based on what values you set when running the
Analyzer (i.e. Token.startOffset and Token.endOffset and
Token.positionIncrement). These values are controlled by the application
author and can vary by application.
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org