You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Tony Bowden <to...@kasei.com> on 2004/03/31 11:07:12 UTC

TermInfosReader optimisation?

An interesting thing has come up with Plucene:

The code for TermInfosReader.get has an optimisation so that in
sequential access it doesn't need to keep seeking:

  final synchronized TermInfo get(Term term) throws IOException {
    if (size == 0) return null;

    // optimize sequential access: first try scanning cached enum w/o seeking
    if (enum.term() != null                          // term is at or past current
        && ((enum.prev != null && term.compareTo(enum.prev) > 0)
            || term.compareTo(enum.term()) >= 0)) {
      int enumOffset = (enum.position/TermInfosWriter.INDEX_INTERVAL)+1;
      if (indexTerms.length == enumOffset          // but before end of block
          || term.compareTo(indexTerms[enumOffset]) < 0)
        return scanEnum(term);                          // no need to seek
    }

    // random-access: must seek
    seekEnum(getIndexOffset(term));
    return scanEnum(term);
  }

In the Perl version, this whole middle section slows everything down
considerably (by almost 50%). I'm not sure whether this is because of
bottlenecks being at different places in Perl vs Java, but I'm curious
as what impact this optimisation has in the Java.

I can't easily test it from here at the minute, but I'm curious if
there are any Benchmarks on the effect of having that optimisation vs
not having it.

Thanks,

Tony



Tony


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org