You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Eric Jain <Er...@isb-sib.ch> on 2004/08/06 16:54:55 UTC

Weighted queries

Is it possible to expand a query such as

   foo bar

into

   (title:foo^4 OR abstract:foo^2 OR content:foo) AND
   (title:bar^4 OR abstract:bar^2 OR content:bar)

?

I can assign weights to individual fields when indexing, and could use 
the MultiFieldQueryParser - but it seems this parser can't be configured 
to use AND as default!

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Weighted queries

Posted by Eric Jain <Er...@isb-sib.ch>.
Zilverline info wrote:
> I have implemented this in Zilverline. What I do is the following: 
> subclass QueryParser and override getFieldQuery:

Thanks; as you can see I ended up with a similar but slightly simpler 
solution, as I do not need to specify weights at query time.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Weighted queries

Posted by Zilverline info <in...@zilverline.org>.
Hi Eric,

I have implemented this in Zilverline. What I do is the following: 
subclass QueryParser and override getFieldQuery:

    protected Query getFieldQuery(String field, Analyzer analyzer, 
String queryText) throws ParseException {

        // for field that contain 'contents' add boostfactors for other 
terms specified in BoostFactor
        if (defaultField.equals(field)) {
            TokenStream source = analyzer.tokenStream(field, new 
StringReader(queryText));
            Vector v = new Vector();
            org.apache.lucene.analysis.Token t;
            while (true) {
                try {
                    t = source.next();
                } catch (IOException e) {
                    t = null;
                }
                if (t == null)
                    break;
                v.addElement(t.termText());
                log.debug(field + " , " + t.termText());
            }
            try {
                source.close();
            } catch (IOException e) { // ignore
            }

            if (v.size() == 0) {
                return null;
            }
            else {
                // create a new composed query
                BooleanQuery bq = new BooleanQuery();
                // get the static BoostFactors through non static getter
                BoostFactor bf = new BoostFactor();
                // For all boostfactors create a new PhraseQuery
                Iterator iter = bf.getFactors().entrySet().iterator();
                while (iter.hasNext()) {
                    Map.Entry element = (Map.Entry) iter.next();
                    String thisField = ((String) 
element.getKey()).toLowerCase();
                    Float boost = (Float) element.getValue();
                    PhraseQuery q = new PhraseQuery();
                    // and add all the terms of the query
                    for (int i = 0; i < v.size(); i++) {
                        q.add(new Term(thisField, (String) v.elementAt(i)));
                    }
                    // boost the query
                    q.setBoost(boost.floatValue());
                    // and add it to the composed query
                    bq.add(q, false, false);
                }
                log.debug("Query: " + bq);
                return bq;
            }
        } else {
            return super.getFieldQuery(field, analyzer, queryText);
        }
    }

Read the Boostfactors from an external source. Im using a object with a 
Hashmap. see Boostfactors @ www.zilverline.org

Cheers,

   Michael Franken

Eric Jain wrote:

> Is it possible to expand a query such as
>
>   foo bar
>
> into
>
>   (title:foo^4 OR abstract:foo^2 OR content:foo) AND
>   (title:bar^4 OR abstract:bar^2 OR content:bar)
>
> ?
>
> I can assign weights to individual fields when indexing, and could use 
> the MultiFieldQueryParser - but it seems this parser can't be 
> configured to use AND as default!
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Weighted queries

Posted by Eric Jain <Er...@isb-sib.ch>.
>>   (title:foo^4 OR abstract:foo^2 OR content:foo) AND
>>   (title:bar^4 OR abstract:bar^2 OR content:bar)

> That's not the way MultiFieldQueryParser will rewrite your query.

You are right - what happens is this:

   (title:foo OR title:bar) OR
   (abstract:foo OR abstract:bar) OR
   (content:foo OR content:bar) OR

Looks like a dead end... On the other hand I just realize I could 
subclass the QueryParser, e.g.:

import java.util.Vector;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.Query;

public class CustomQueryParser
   extends QueryParser
{
   private String[] fields;

   public CustomQueryParser(String[] fields, Analyzer analyzer)
   {
     super(null, analyzer);
     this.fields = fields;
   }


   protected Query getFieldQuery(String field, Analyzer analyzer, String 
queryText)
     throws ParseException
   {
     if (field == null)
     {
       Vector clauses = new Vector();
       for (int i = 0; i < fields.length; i++)
         clauses.add(new BooleanClause(super.getFieldQuery(fields[i], 
analyzer, queryText), false, false));
       return getBooleanQuery(clauses);
     }

     return super.getFieldQuery(field, analyzer, queryText);
   }
}

Now:

String[] fields = new String[] { "title", "abstract", "content" };
QueryParser parser = new CustomQueryParser(fields, new SimpleAnalyzer());
parser.setOperator(QueryParser.DEFAULT_OPERATOR_AND);
Query query = parser.parse("foo -bar (baz OR title:bla)");
System.out.println("? " + query);

Produces:

? +(title:foo abstract:foo content:foo) -(title:bar abstract:bar 
content:bar) +((title:baz abstract:baz content:baz) title:bla)

Perfect!

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Weighted queries

Posted by Daniel Naber <da...@t-online.de>.
On Friday 06 August 2004 16:54, Eric Jain wrote:

>    (title:foo^4 OR abstract:foo^2 OR content:foo) AND
>    (title:bar^4 OR abstract:bar^2 OR content:bar)

That's not the way MultiFieldQueryParser will rewrite your query. To get this 
kind of query you have to parse it with QueryParser and then iterate 
recursivly (in case of BooleanQuery) over it, using Java's "instanceof". Each 
term needs to be replaced with a BooleanQuery over all the fields you want to 
search in.

Regards
 Daniel


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Index and Search question in Lucene.

Posted by Ernesto De Santis <er...@colaborativa.net>.
Hi Dimitri

What analyzer you use?

You need take carefully with Keyword fields and analyzers. When you
index a Document, the fields that have set tokenized = false, like
Keyword, are not analyzed. 
In search time you need parse the query with your analyzer but not
analyze the untokenized fields, like your filename.

> I can do a search as this
> "+contents:SomeWord  +filename:SomePath"
> 

The sintaxis is rigth, but if you search +filename:somepath, find only
this file.

For example, 
+content:version +filename:/my/path/myfile.ext

Only can found myfile.ext, and if this file don't content "version", not
going to find nothing. This is because you use +. + set the term
required.

You can see the queries sintaxis in lucene site.

http://jakarta.apache.org/lucene/docs/queryparsersyntax.html

http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.search&toc=faq#q5

good luck.

Bye
Ernesto.


El dom, 15 de 08 de 2004 a las 17:13, Dmitrii PapaGeorgio escribió:
> Ok so when I index a file such as below
> 
> Document doc = new Document();
> doc.Add(Field.Text("contents", new StreamReader(dataDir)));
> doc.Add(Field.Keyword("filename", dataDir));
> 
> I can do a search as this
> "+contents:SomeWord  +filename:SomePath"
> 
> Correct?
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Index and Search question in Lucene.

Posted by Dmitrii PapaGeorgio <te...@woh.rr.com>.
Ok so when I index a file such as below

Document doc = new Document();
doc.Add(Field.Text("contents", new StreamReader(dataDir)));
doc.Add(Field.Keyword("filename", dataDir));

I can do a search as this
"+contents:SomeWord  +filename:SomePath"

Correct?

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org