You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Christoph Goller <go...@detego-software.de> on 2004/10/05 19:46:05 UTC

Term vector patch

Hi All,

I appied Grant's new term vector patch. Termvectors can
now optionally be stored with positions (token number) and/or
offsets (start and end offset of token in original text).

I ended up with some modifications of Grant's patch:

*) IndexReader.getTermVectors(...) methods now always returns null,
if there are no term vectors for the specified input. If they throw an
IOException this indcates that there was an error while accessing
the index. So far, IOExceptions had been caught in TermVectorsReader.
This is a small change to the old API but I think it's more consistent.

*) I did some low-level changes concerning reading and writing
positions and offsets. Most importantly, I switch to delta-encoding
where possible. This should save some space.

*) I changed the public API of termvectors a little bit. E.g. IndexReader
is also using Field.TermVector.VALUE instead of the boolean variables.

*) I did some code restructuring and removed some unused methods.


All unit tests are still running. I hope everything I did was correct :-)
Looking forward to feedback.

Christoph


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org