You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Eric Jain <Er...@isb-sib.ch> on 2004/08/06 16:54:55 UTC
Weighted queries
Is it possible to expand a query such as
foo bar
into
(title:foo^4 OR abstract:foo^2 OR content:foo) AND
(title:bar^4 OR abstract:bar^2 OR content:bar)
?
I can assign weights to individual fields when indexing, and could use
the MultiFieldQueryParser - but it seems this parser can't be configured
to use AND as default!
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Weighted queries
Posted by Eric Jain <Er...@isb-sib.ch>.
Zilverline info wrote:
> I have implemented this in Zilverline. What I do is the following:
> subclass QueryParser and override getFieldQuery:
Thanks; as you can see I ended up with a similar but slightly simpler
solution, as I do not need to specify weights at query time.
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Weighted queries
Posted by Zilverline info <in...@zilverline.org>.
Hi Eric,
I have implemented this in Zilverline. What I do is the following:
subclass QueryParser and override getFieldQuery:
protected Query getFieldQuery(String field, Analyzer analyzer,
String queryText) throws ParseException {
// for field that contain 'contents' add boostfactors for other
terms specified in BoostFactor
if (defaultField.equals(field)) {
TokenStream source = analyzer.tokenStream(field, new
StringReader(queryText));
Vector v = new Vector();
org.apache.lucene.analysis.Token t;
while (true) {
try {
t = source.next();
} catch (IOException e) {
t = null;
}
if (t == null)
break;
v.addElement(t.termText());
log.debug(field + " , " + t.termText());
}
try {
source.close();
} catch (IOException e) { // ignore
}
if (v.size() == 0) {
return null;
}
else {
// create a new composed query
BooleanQuery bq = new BooleanQuery();
// get the static BoostFactors through non static getter
BoostFactor bf = new BoostFactor();
// For all boostfactors create a new PhraseQuery
Iterator iter = bf.getFactors().entrySet().iterator();
while (iter.hasNext()) {
Map.Entry element = (Map.Entry) iter.next();
String thisField = ((String)
element.getKey()).toLowerCase();
Float boost = (Float) element.getValue();
PhraseQuery q = new PhraseQuery();
// and add all the terms of the query
for (int i = 0; i < v.size(); i++) {
q.add(new Term(thisField, (String) v.elementAt(i)));
}
// boost the query
q.setBoost(boost.floatValue());
// and add it to the composed query
bq.add(q, false, false);
}
log.debug("Query: " + bq);
return bq;
}
} else {
return super.getFieldQuery(field, analyzer, queryText);
}
}
Read the Boostfactors from an external source. Im using a object with a
Hashmap. see Boostfactors @ www.zilverline.org
Cheers,
Michael Franken
Eric Jain wrote:
> Is it possible to expand a query such as
>
> foo bar
>
> into
>
> (title:foo^4 OR abstract:foo^2 OR content:foo) AND
> (title:bar^4 OR abstract:bar^2 OR content:bar)
>
> ?
>
> I can assign weights to individual fields when indexing, and could use
> the MultiFieldQueryParser - but it seems this parser can't be
> configured to use AND as default!
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Weighted queries
Posted by Eric Jain <Er...@isb-sib.ch>.
>> (title:foo^4 OR abstract:foo^2 OR content:foo) AND
>> (title:bar^4 OR abstract:bar^2 OR content:bar)
> That's not the way MultiFieldQueryParser will rewrite your query.
You are right - what happens is this:
(title:foo OR title:bar) OR
(abstract:foo OR abstract:bar) OR
(content:foo OR content:bar) OR
Looks like a dead end... On the other hand I just realize I could
subclass the QueryParser, e.g.:
import java.util.Vector;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.Query;
public class CustomQueryParser
extends QueryParser
{
private String[] fields;
public CustomQueryParser(String[] fields, Analyzer analyzer)
{
super(null, analyzer);
this.fields = fields;
}
protected Query getFieldQuery(String field, Analyzer analyzer, String
queryText)
throws ParseException
{
if (field == null)
{
Vector clauses = new Vector();
for (int i = 0; i < fields.length; i++)
clauses.add(new BooleanClause(super.getFieldQuery(fields[i],
analyzer, queryText), false, false));
return getBooleanQuery(clauses);
}
return super.getFieldQuery(field, analyzer, queryText);
}
}
Now:
String[] fields = new String[] { "title", "abstract", "content" };
QueryParser parser = new CustomQueryParser(fields, new SimpleAnalyzer());
parser.setOperator(QueryParser.DEFAULT_OPERATOR_AND);
Query query = parser.parse("foo -bar (baz OR title:bla)");
System.out.println("? " + query);
Produces:
? +(title:foo abstract:foo content:foo) -(title:bar abstract:bar
content:bar) +((title:baz abstract:baz content:baz) title:bla)
Perfect!
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Weighted queries
Posted by Daniel Naber <da...@t-online.de>.
On Friday 06 August 2004 16:54, Eric Jain wrote:
> (title:foo^4 OR abstract:foo^2 OR content:foo) AND
> (title:bar^4 OR abstract:bar^2 OR content:bar)
That's not the way MultiFieldQueryParser will rewrite your query. To get this
kind of query you have to parse it with QueryParser and then iterate
recursivly (in case of BooleanQuery) over it, using Java's "instanceof". Each
term needs to be replaced with a BooleanQuery over all the fields you want to
search in.
Regards
Daniel
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Index and Search question in Lucene.
Posted by Ernesto De Santis <er...@colaborativa.net>.
Hi Dimitri
What analyzer you use?
You need take carefully with Keyword fields and analyzers. When you
index a Document, the fields that have set tokenized = false, like
Keyword, are not analyzed.
In search time you need parse the query with your analyzer but not
analyze the untokenized fields, like your filename.
> I can do a search as this
> "+contents:SomeWord +filename:SomePath"
>
The sintaxis is rigth, but if you search +filename:somepath, find only
this file.
For example,
+content:version +filename:/my/path/myfile.ext
Only can found myfile.ext, and if this file don't content "version", not
going to find nothing. This is because you use +. + set the term
required.
You can see the queries sintaxis in lucene site.
http://jakarta.apache.org/lucene/docs/queryparsersyntax.html
http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.search&toc=faq#q5
good luck.
Bye
Ernesto.
El dom, 15 de 08 de 2004 a las 17:13, Dmitrii PapaGeorgio escribió:
> Ok so when I index a file such as below
>
> Document doc = new Document();
> doc.Add(Field.Text("contents", new StreamReader(dataDir)));
> doc.Add(Field.Keyword("filename", dataDir));
>
> I can do a search as this
> "+contents:SomeWord +filename:SomePath"
>
> Correct?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Index and Search question in Lucene.
Posted by Dmitrii PapaGeorgio <te...@woh.rr.com>.
Ok so when I index a file such as below
Document doc = new Document();
doc.Add(Field.Text("contents", new StreamReader(dataDir)));
doc.Add(Field.Keyword("filename", dataDir));
I can do a search as this
"+contents:SomeWord +filename:SomePath"
Correct?
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org