You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by David Seltzer <ds...@TVEyes.com> on 2009/04/17 21:18:38 UTC

PrefixQuery.rewrite

Hello,

I'm writing a small function to enumerate all the values of a field. I
decided that I'd base my code of the PrefixQuery.rewrite() code. This
code, as I understand it, is designed to expand a prefix wildcard and
rewrite the query as a long boolean series of ANDs.

To improve performance the code has a Break statement designed to kick
out of the TermEnum starts enumerating on another field.

  //FROM /src/java/org/apache/lucene/search/PrefixQuery.java
  public Query rewrite(IndexReader reader) throws IOException {
    BooleanQuery query = new BooleanQuery(true);
    TermEnum enumerator = reader.terms(prefix);
    try {
      String prefixText = prefix.text();
      String prefixField = prefix.field();
      do {
        Term term = enumerator.term();
        if (term != null &&
            term.text().startsWith(prefixText) &&
            term.field() == prefixField) // interned comparison 
        {
          TermQuery tq = new TermQuery(term);	  // found a match
          tq.setBoost(getBoost());                // set the boost
          query.add(tq, BooleanClause.Occur.SHOULD);		  // add
to query
          //System.out.println("added " + term);
        } else {
          break;
        }
      } while (enumerator.next());
    } finally {
      enumerator.close();
    }
    return query;
  }

I think that there may be a logic problem here - - - to me it seems that
if I performed a prefix query on a Field that wasn't first in line
during the the TermEnum's output that my prefix would never be expanded.
I may be misunderstanding the ordering that IndexReader.terms(Term)
produces. 

This seems like a pretty serious issue, which is why I think there is
some error in my understanding of the Enumeration process. Can anyone
correct me?

Thanks!

-Dave

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: PrefixQuery.rewrite

Posted by David Seltzer <ds...@TVEyes.com>.
Thanks for the explanation! I was mistaken in my understanding of the
sort order of TermEnum.

-Dave

-----Original Message-----
From: Uwe Schindler [mailto:uwe@thetaphi.de] 
Sent: Friday, April 17, 2009 5:08 PM
To: java-dev@lucene.apache.org
Subject: RE: PrefixQuery.rewrite

Hi Dave,

The code is correct, here my comments:

> This
> code, as I understand it, is designed to expand a prefix wildcard and
> rewrite the query as a long boolean series of ANDs.
> 
> To improve performance the code has a Break statement designed to kick
> out of the TermEnum starts enumerating on another field.
> 
>   //FROM /src/java/org/apache/lucene/search/PrefixQuery.java
>   public Query rewrite(IndexReader reader) throws IOException {
>     BooleanQuery query = new BooleanQuery(true);

Here a new TermEnum is created, which starts at the term prefix=new
Term(field,prefixText). The TermEnum is ordered by (field,termtext).
Reader.terms(term) retrieves a TermEnum that is positioned exactly at
the
given term or, if that not exists, at the next one following the
requested
term (in the above described order):

>     TermEnum enumerator = reader.terms(prefix);
>     try {
>       String prefixText = prefix.text();
>       String prefixField = prefix.field();
>       do {
>         Term term = enumerator.term();

This check does exactly what you think, it is the exit condition:
If the term is from another field, exit
If the term is null, the enumeration is exhausted, exit
If the term does not start with the prefix, also exit. This condition is
enough. If the initial positioning of the enum was exactly on a term
with
the prefix (the prefix term itself), it is really the first, and no term
was
forgotten. If the initial term was not exactly the same but bigger, it
can
be two different cases:
a) it starts with the prefix -> iterate further
b) it does not start with the prefix, there were never be a term with
that
prefix.

>         if (term != null &&
>             term.text().startsWith(prefixText) &&
>             term.field() == prefixField) // interned comparison
>         {
>           TermQuery tq = new TermQuery(term);	  // found a match
>           tq.setBoost(getBoost());                // set the boost
>           query.add(tq, BooleanClause.Occur.SHOULD);		  // add
> to query
>           //System.out.println("added " + term);
>         } else {
>           break;
>         }
>       } while (enumerator.next());
>     } finally {
>       enumerator.close();
>     }
>     return query;
>   }
> 
> I think that there may be a logic problem here - - - to me it seems
that
> if I performed a prefix query on a Field that wasn't first in line
> during the the TermEnum's output that my prefix would never be
expanded.
> I may be misunderstanding the ordering that IndexReader.terms(Term)
> produces.


Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: PrefixQuery.rewrite

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi Dave,

The code is correct, here my comments:

> This
> code, as I understand it, is designed to expand a prefix wildcard and
> rewrite the query as a long boolean series of ANDs.
> 
> To improve performance the code has a Break statement designed to kick
> out of the TermEnum starts enumerating on another field.
> 
>   //FROM /src/java/org/apache/lucene/search/PrefixQuery.java
>   public Query rewrite(IndexReader reader) throws IOException {
>     BooleanQuery query = new BooleanQuery(true);

Here a new TermEnum is created, which starts at the term prefix=new
Term(field,prefixText). The TermEnum is ordered by (field,termtext).
Reader.terms(term) retrieves a TermEnum that is positioned exactly at the
given term or, if that not exists, at the next one following the requested
term (in the above described order):

>     TermEnum enumerator = reader.terms(prefix);
>     try {
>       String prefixText = prefix.text();
>       String prefixField = prefix.field();
>       do {
>         Term term = enumerator.term();

This check does exactly what you think, it is the exit condition:
If the term is from another field, exit
If the term is null, the enumeration is exhausted, exit
If the term does not start with the prefix, also exit. This condition is
enough. If the initial positioning of the enum was exactly on a term with
the prefix (the prefix term itself), it is really the first, and no term was
forgotten. If the initial term was not exactly the same but bigger, it can
be two different cases:
a) it starts with the prefix -> iterate further
b) it does not start with the prefix, there were never be a term with that
prefix.

>         if (term != null &&
>             term.text().startsWith(prefixText) &&
>             term.field() == prefixField) // interned comparison
>         {
>           TermQuery tq = new TermQuery(term);	  // found a match
>           tq.setBoost(getBoost());                // set the boost
>           query.add(tq, BooleanClause.Occur.SHOULD);		  // add
> to query
>           //System.out.println("added " + term);
>         } else {
>           break;
>         }
>       } while (enumerator.next());
>     } finally {
>       enumerator.close();
>     }
>     return query;
>   }
> 
> I think that there may be a logic problem here - - - to me it seems that
> if I performed a prefix query on a Field that wasn't first in line
> during the the TermEnum's output that my prefix would never be expanded.
> I may be misunderstanding the ordering that IndexReader.terms(Term)
> produces.


Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org