You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by bu...@apache.org on 2004/03/23 12:05:07 UTC

DO NOT REPLY [Bug 27868] New: - Bad performance in PrefixQuery for large indices.

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=27868>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=27868

Bad performance in PrefixQuery for large indices.

           Summary: Bad performance in PrefixQuery for large indices.
           Product: Lucene
           Version: unspecified
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: Normal
          Priority: Other
         Component: Search
        AssignedTo: lucene-dev@jakarta.apache.org
        ReportedBy: jorgen@polopoly.com


[Version is lucene-1.3-final, but that was not selectable as version above]

In org.apache.lucene.search.PrefixQuery.rewrite(IndexReader):

1.  term.text().startsWith(prefixText) is checked before
    term.field() == prefixField although it is much more expensive.
    Why check text at all when it is the wrong field?

2.  If there are many matches in the index, lots and lots of
    potentially identical TermQuery's are added to the BooleanQuery.
    Either it can be solved here by first adding the TermQueries to
    a HashSet (so all entries in the set are unique) and then traverse
    the set and add them to the BooleanQuery. Or modify BooleanQuery's
    add method so it only adds if not already contained in "clauses".

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org