You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Wolfgang Hoschek <wh...@lbl.gov> on 2005/05/19 07:23:32 UTC

Add Term.createTerm to avoid 99% of String.intern() calls

For the MemoryIndex, I'm seeing large performance overheads due to  
repetitive temporary string interning of o.a.l.index.Term.
For example, consider a FuzzyTermQuery or similar, scanning all terms  
via TermEnum in the index: 40% of the time is spent in String.intern 
() of new Term(). [Allocating temporary memory and  
FuzzyTermEnum.termCompare are less of a problem according to profiling].

Note that the field name would only need to be interned once, not  
time and again for each term. But the non-iterning Term constructor  
is private and hence not accessible from o.a.l.index.memory.*.  
TermBuffer isn't what I'm looking for, and it's private anyway. The  
best solution I came up with is to have an additional safe public  
method in Term.java:

   /** Constructs a term with the given text and the same interned  
field name as
    * this term (minimizes interning overhead). */
   public Term createTerm(String txt) { // WH
       return new Term(field, txt, false);
   }

Besides dramatically improving performance, this has the benefit of  
keeping the non-interning constructor private.
Comments/opinions, anyone?

Here's a sketch of how it can be used:

public Term term() {
                     ...
                     if (cachedTerm == null) cachedTerm = new Term 
((String) sortedFields[j].getKey(), "");
                     return cachedTerm.createTerm((String)  
info.sortedTerms[i].getKey());
}

public boolean next() {
                     ...
                     if (...) cachedTerm = null;
}

I'll send the full patch for MemoryIndex if this is accepted.

Wolfgang.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org