You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by blazingwolf7 <bl...@gmail.com> on 2009/07/28 10:28:50 UTC

Generating Query for Multiple Clauses in a Single Field

Hi,

I am currently creating a search engine and will need to generate a query
like the following:
title:(+chemistry +"national curriculum")

its mention that it can be done using the QueryParser but unfortunately I
can't find any reference in how to used it. Can anyone help me with this?

Thanks
-- 
View this message in context: http://www.nabble.com/Generating-Query-for-Multiple-Clauses-in-a-Single-Field-tp24694748p24694748.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Generating Query for Multiple Clauses in a Single Field

Posted by blazingwolf7 <bl...@gmail.com>.
Thanks a lot....it is truly cause by the length normalization there. I follow
your suggestion and change it to 1.0f. Now it works properly. 

Thanks again


Ahmet Arslan wrote:
> 
> 
>> yah, before this i used default lucene...but i dont know
>> what end up wrong...some results with only single word matching when to
>> the top of the results. 
> 
> Hmm. Interesting. It seems that length normalization causing this. Very
> short documents with only single word matching getting high score due to
> length normalization. The documents containing all of the query terms are
> probably very long and getting lower score. Lucene punishes long
> documents, and favors short documents.
> 
> Can you verify/confirm my guess looking at the document lengths of the
> result set? Also org.apache.lucene.search.Explanation describes the score
> computation for document and query.
> 
> There is an excellent publication [1] [2] (in section 4.1 and 4.2) about
> lucene score modification. SweetSpotSimilarity [3] with the appropriate
> parameters (steepness, min, and max) can solve your problem.
> 
> Alternatively if your requirement is very important (you don't care about
> long documents taking over) then you can try to extend the
> DefaultSimilarity so that it will ignore the document length. Just return
> 1.
> 
> public float lengthNorm(String fieldName, int numTerms) {
>     return 1.0f;
>   }
> 
> 
>> This i assumed is due to the score of the result being to
>> high. Tat's why i am trying to add additional boost
> 
> I don't think there exists such a boosting mechanism.
> 
> Ahmet
> 
> [1]
> http://wiki.apache.org/lucene-java/TREC_2007_Million_Queries_Track_-_IBM_Haifa_Team
> [2]http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf
> [3]http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/misc/SweetSpotSimilarity.html
> 
> 
> 
> 
>       
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Generating-Query-for-Multiple-Clauses-in-a-Single-Field-tp24694748p24750660.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Generating Query for Multiple Clauses in a Single Field

Posted by AHMET ARSLAN <io...@yahoo.com>.
> yah, before this i used default lucene...but i dont know
> what end up wrong...some results with only single word matching when to
> the top of the results. 

Hmm. Interesting. It seems that length normalization causing this. Very short documents with only single word matching getting high score due to length normalization. The documents containing all of the query terms are probably very long and getting lower score. Lucene punishes long documents, and favors short documents.

Can you verify/confirm my guess looking at the document lengths of the result set? Also org.apache.lucene.search.Explanation describes the score computation for document and query.

There is an excellent publication [1] [2] (in section 4.1 and 4.2) about lucene score modification. SweetSpotSimilarity [3] with the appropriate parameters (steepness, min, and max) can solve your problem.

Alternatively if your requirement is very important (you don't care about long documents taking over) then you can try to extend the DefaultSimilarity so that it will ignore the document length. Just return 1.

public float lengthNorm(String fieldName, int numTerms) {
    return 1.0f;
  }


> This i assumed is due to the score of the result being to
> high. Tat's why i am trying to add additional boost

I don't think there exists such a boosting mechanism.

Ahmet

[1] http://wiki.apache.org/lucene-java/TREC_2007_Million_Queries_Track_-_IBM_Haifa_Team
[2]http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf
[3]http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/misc/SweetSpotSimilarity.html




      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Generating Query for Multiple Clauses in a Single Field

Posted by blazingwolf7 <bl...@gmail.com>.
yah, before this i used default lucene...but i dont know what end up
wrong...some results with only single word matching when to the top of the
results. 

This i assumed is due to the score of the result being to high. Tat's why i
am trying to add additional boost


Ahmet Arslan wrote:
> 
> 
> : I am trying to create a query, that first will return a set
> : of results, then
> : it will give a boost to the results that have all the
> : keyword entered by the user.
> 
> If I understand you correctly: User will enter multiple keywords. Lets say
> a b c d. And you want documents - that contains/have all of the keywords
> (a b c d) - get higher scores (boosted). In other words if there are some
> documents in the collection that have all (a b c d), you want to see them
> at the top of the result set. And result set may contain/retrieve
> documents that have one or two of the keywords at the end of list. Am i
> correct?
> 
> If that's you want, you don't need to do anything special. Lucene does it
> by default. Use default operator OR. The more query terms appears in a
> document, the more relevant that document is to the query.
> 
> 
> 
>       
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Generating-Query-for-Multiple-Clauses-in-a-Single-Field-tp24694748p24734379.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Generating Query for Multiple Clauses in a Single Field

Posted by AHMET ARSLAN <io...@yahoo.com>.
: I am trying to create a query, that first will return a set
: of results, then
: it will give a boost to the results that have all the
: keyword entered by the user.

If I understand you correctly: User will enter multiple keywords. Lets say a b c d. And you want documents - that contains/have all of the keywords (a b c d) - get higher scores (boosted). In other words if there are some documents in the collection that have all (a b c d), you want to see them at the top of the result set. And result set may contain/retrieve documents that have one or two of the keywords at the end of list. Am i correct?

If that's you want, you don't need to do anything special. Lucene does it by default. Use default operator OR. The more query terms appears in a document, the more relevant that document is to the query.



      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Generating Query for Multiple Clauses in a Single Field

Posted by blazingwolf7 <bl...@gmail.com>.
I am trying to create a query, that first will return a set of results, then
it will give a boost to the results that have all the keyword entered by the
user.


Ahmet Arslan wrote:
> 
> 
>> generate a query like the following:
>> title:(+chemistry +"national curriculum")
> 
> I didn't understand what exactly you are asking but the query string is
> already well-formatted. You can pass this string directly to the parse
> method of QueryParser. The following four examples yields the same Query
> object.
> 
> String[] ar = {"title:(+chemistry +\"national curriculum\")"};
> org.apache.lucene.queryParser.QueryParser.main(ar);
> 
> String[] ar1 = {"title:(chemistry AND \"national curriculum\")"};
> org.apache.lucene.queryParser.QueryParser.main(ar1);
> 
> QueryParser qp = new QueryParser("title", new StandardAnalyzer());
> Query q = qp.parse("chemistry AND \"national curriculum\"");
> System.out.println(q.toString());
> 
> qp.setDefaultOperator(QueryParser.AND_OPERATOR);
> q = qp.parse("chemistry \"national curriculum\"");
> System.out.println(q.toString());
> 
>> its mention that it can be done using the QueryParser but
>> unfortunately I can't find any reference in how to used it. 
> 
> http://lucene.apache.org/java/2_4_1/queryparsersyntax.html
> Just prepare a String according to descriptions in here, and pass it to
> the parse method of QueryParser.
> 
> 
> 
>       
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Generating-Query-for-Multiple-Clauses-in-a-Single-Field-tp24694748p24733084.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org