You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Christoph Goller <go...@detego-software.de> on 2004/07/02 14:10:44 UTC

Performance of TermVectors and skipTo

Hi folks,

I have done some performance tests for TermVectors and the new
TermDocs.skipTo() implementation, both introduced with 1.4.
I am very pleased with the results. I did these tests with the
Reuters news corpus (roughly 800000 documents).

*) I compared TermVectors with the solution of storing the
respective fields and re-analyzing the documents in order to
get their terms. According to my measurements, TermVectors speed
up accesss to the terms by a factor of 7!

*) For testing skipTo, I used my implementation for getting highly
correlated terms. For computing the correlation measure I have to
compare a lot of TermDocs lists with each other or other lists of
document ids. According to my measurements on an optimized index
skipTo speeds up my term correlation implementation by a factor of
2. And the benefit of skipTo probably increases with index size.

regards,
Christoph




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org