You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Lahiru Samarakoon <la...@gmail.com> on 2011/01/18 13:12:15 UTC

Lucene Ranking Problem

Dear All,

 I have two documents. The analyzed and the tokenized contents are mentioned
below.

 *Document 1 :*

 *when*, null_1, *my*, null_1, money,

fund, amount, payment, creditcard, credit,

card, *bank, account*, debit, deduct,

*charge*, null_1, my, mobile, usage,

*service*, connection


 *Document 2:*

 *when*, what, time, what, day,

null_1, money, fund, cash, payment,

null_1, i, do, you, i,

null_1, deduct, *charge*, reduce, debit,

from, *my*, *bank, account*, credit,

card, null_1, *adsl*, adsl1, adsl-2,

adsl-1, adsl2, adsl, 1, adsl,

2, usage, connection, *service*


 Then, I searched for the following text.

 *Query:* when my bank account charge adsl service

 *Scores
*

Document 1 = 0.74406385

Document 2 = Score = 0.66067594

 I was expecting to have Document 2 as the top ranked document. But I get
Document 1 as the top ranked even it does not contains  the term “adsl”.

 The word order of the Document 1 matches with the query very well. Can it
be the reason ?

If it is, how can I neglect the word order when searching. (I am not using
phase queries).

My searching code look like below and it is very simple.


 *QueryParser parser = new QueryParser(Version.LUCENE_30, *

*"pattern", *

*new StandardAnalyzer(Version.LUCENE_30)); *

*org.apache.lucene.search.Query query1 =
parser.parse(this.query.getQuestion()); *

*TopDocs hits = is.search(query1, 10); *

 Please advice


Thanks,

Lahiru

Re: Lucene Ranking Problem

Posted by Lahiru Samarakoon <la...@gmail.com>.

HI Ian & Umesh.

This is what I was looking for.
Thank a lot.

Regards,
Lahiru

Re: Lucene Ranking Problem

Posted by Umesh Prasad <um...@gmail.com>.

Hi Lahiru,
   Comments are inline:


On Tue, Jan 18, 2011 at 5:42 PM, Lahiru Samarakoon <la...@gmail.com>wrote:

> Dear All,
>
>  I have two documents. The analyzed and the tokenized contents are
> mentioned
> below.
>
>  *Document 1 :*
>
>  *when*, null_1, *my*, null_1, money,
>
> fund, amount, payment, creditcard, credit,
>
> card, *bank, account*, debit, deduct,
>
> *charge*, null_1, my, mobile, usage,
>
> *service*, connection
>
>
>  *Document 2:*
>
>  *when*, what, time, what, day,
>
> null_1, money, fund, cash, payment,
>
> null_1, i, do, you, i,
>
> null_1, deduct, *charge*, reduce, debit,
>
> from, *my*, *bank, account*, credit,
>
> card, null_1, *adsl*, adsl1, adsl-2,
>
> adsl-1, adsl2, adsl, 1, adsl,
>
> 2, usage, connection, *service*
>
>
>  Then, I searched for the following text.
>
>  *Query:* when my bank account charge adsl service
>
>  *Scores
> *
>
> Document 1 = 0.74406385
>
> Document 2 = Score = 0.66067594
>
>     Please read the documentation of lucene scoring.
 http://lucene.apache.org/java/2_9_1/scoring.html.
That will help you understand the bigger picture.


>  I was expecting to have Document 2 as the top ranked document. But I get
> Document 1 as the top ranked even it does not contains  the term “adsl”.
>
>  The word order of the Document 1 matches with the query very well. Can it
> be the reason ?
>
> Word order doesn't matter. However tf/idf , norms and other factors do
matter as described in above link.

You can get see how , documents got assigned score by using

IndexSearcher.explain(query,docId); as described in
http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/search/Searcher.html#explain%28org.apache.lucene.search.Query,%20int%29


If it is, how can I neglect the word order when searching. (I am not using
> phase queries).
>
> My searching code look like below and it is very simple.
>
>
>  *QueryParser parser = new QueryParser(Version.LUCENE_30, *
>
> *"pattern", *
>
> *new StandardAnalyzer(Version.LUCENE_30)); *
>
> *org.apache.lucene.search.Query query1 =
> parser.parse(this.query.getQuestion()); *
>
> *TopDocs hits = is.search(query1, 10); *
>
>  Please advice
>
>
> Thanks,
>
> Lahiru
>



-- 
---
Thanks & Regards
Umesh Prasad

Re: Lucene Ranking Problem

Posted by Ian Lea <ia...@gmail.com>.

See what Searcher.explain() says for each hit. I don't think that word
order will matter with the query you give.  There are several factors
in scoring - see oal.search.Similarity or google lucene scoring.

Or have a play with Luke: invaluable for investigating things with
lucene and will tell you everything about your index.


--
Ian.


On Tue, Jan 18, 2011 at 12:12 PM, Lahiru Samarakoon <la...@gmail.com> wrote:
> Dear All,
>
>  I have two documents. The analyzed and the tokenized contents are mentioned
> below.
>
>  *Document 1 :*
>
>  *when*, null_1, *my*, null_1, money,
>
> fund, amount, payment, creditcard, credit,
>
> card, *bank, account*, debit, deduct,
>
> *charge*, null_1, my, mobile, usage,
>
> *service*, connection
>
>
>  *Document 2:*
>
>  *when*, what, time, what, day,
>
> null_1, money, fund, cash, payment,
>
> null_1, i, do, you, i,
>
> null_1, deduct, *charge*, reduce, debit,
>
> from, *my*, *bank, account*, credit,
>
> card, null_1, *adsl*, adsl1, adsl-2,
>
> adsl-1, adsl2, adsl, 1, adsl,
>
> 2, usage, connection, *service*
>
>
>  Then, I searched for the following text.
>
>  *Query:* when my bank account charge adsl service
>
>  *Scores
> *
>
> Document 1 = 0.74406385
>
> Document 2 = Score = 0.66067594
>
>  I was expecting to have Document 2 as the top ranked document. But I get
> Document 1 as the top ranked even it does not contains  the term “adsl”.
>
>  The word order of the Document 1 matches with the query very well. Can it
> be the reason ?
>
> If it is, how can I neglect the word order when searching. (I am not using
> phase queries).
>
> My searching code look like below and it is very simple.
>
>
>  *QueryParser parser = new QueryParser(Version.LUCENE_30, *
>
> *"pattern", *
>
> *new StandardAnalyzer(Version.LUCENE_30)); *
>
> *org.apache.lucene.search.Query query1 =
> parser.parse(this.query.getQuestion()); *
>
> *TopDocs hits = is.search(query1, 10); *
>
>  Please advice
>
>
> Thanks,
>
> Lahiru
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org