You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Tony Bowden <to...@kasei.com> on 2004/03/31 11:07:12 UTC
TermInfosReader optimisation?
An interesting thing has come up with Plucene:
The code for TermInfosReader.get has an optimisation so that in
sequential access it doesn't need to keep seeking:
final synchronized TermInfo get(Term term) throws IOException {
if (size == 0) return null;
// optimize sequential access: first try scanning cached enum w/o seeking
if (enum.term() != null // term is at or past current
&& ((enum.prev != null && term.compareTo(enum.prev) > 0)
|| term.compareTo(enum.term()) >= 0)) {
int enumOffset = (enum.position/TermInfosWriter.INDEX_INTERVAL)+1;
if (indexTerms.length == enumOffset // but before end of block
|| term.compareTo(indexTerms[enumOffset]) < 0)
return scanEnum(term); // no need to seek
}
// random-access: must seek
seekEnum(getIndexOffset(term));
return scanEnum(term);
}
In the Perl version, this whole middle section slows everything down
considerably (by almost 50%). I'm not sure whether this is because of
bottlenecks being at different places in Perl vs Java, but I'm curious
as what impact this optimisation has in the Java.
I can't easily test it from here at the minute, but I'm curious if
there are any Benchmarks on the effect of having that optimisation vs
not having it.
Thanks,
Tony
Tony
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org