You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Wolfgang Hoschek <wh...@lbl.gov> on 2005/05/19 07:23:32 UTC
Add Term.createTerm to avoid 99% of String.intern() calls
For the MemoryIndex, I'm seeing large performance overheads due to
repetitive temporary string interning of o.a.l.index.Term.
For example, consider a FuzzyTermQuery or similar, scanning all terms
via TermEnum in the index: 40% of the time is spent in String.intern
() of new Term(). [Allocating temporary memory and
FuzzyTermEnum.termCompare are less of a problem according to profiling].
Note that the field name would only need to be interned once, not
time and again for each term. But the non-iterning Term constructor
is private and hence not accessible from o.a.l.index.memory.*.
TermBuffer isn't what I'm looking for, and it's private anyway. The
best solution I came up with is to have an additional safe public
method in Term.java:
/** Constructs a term with the given text and the same interned
field name as
* this term (minimizes interning overhead). */
public Term createTerm(String txt) { // WH
return new Term(field, txt, false);
}
Besides dramatically improving performance, this has the benefit of
keeping the non-interning constructor private.
Comments/opinions, anyone?
Here's a sketch of how it can be used:
public Term term() {
...
if (cachedTerm == null) cachedTerm = new Term
((String) sortedFields[j].getKey(), "");
return cachedTerm.createTerm((String)
info.sortedTerms[i].getKey());
}
public boolean next() {
...
if (...) cachedTerm = null;
}
I'll send the full patch for MemoryIndex if this is accepted.
Wolfgang.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org