You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Lahiru Samarakoon <la...@gmail.com> on 2011/01/18 13:12:15 UTC
Lucene Ranking Problem
Dear All,
I have two documents. The analyzed and the tokenized contents are mentioned
below.
*Document 1 :*
*when*, null_1, *my*, null_1, money,
fund, amount, payment, creditcard, credit,
card, *bank, account*, debit, deduct,
*charge*, null_1, my, mobile, usage,
*service*, connection
*Document 2:*
*when*, what, time, what, day,
null_1, money, fund, cash, payment,
null_1, i, do, you, i,
null_1, deduct, *charge*, reduce, debit,
from, *my*, *bank, account*, credit,
card, null_1, *adsl*, adsl1, adsl-2,
adsl-1, adsl2, adsl, 1, adsl,
2, usage, connection, *service*
Then, I searched for the following text.
*Query:* when my bank account charge adsl service
*Scores
*
Document 1 = 0.74406385
Document 2 = Score = 0.66067594
I was expecting to have Document 2 as the top ranked document. But I get
Document 1 as the top ranked even it does not contains the term “adsl”.
The word order of the Document 1 matches with the query very well. Can it
be the reason ?
If it is, how can I neglect the word order when searching. (I am not using
phase queries).
My searching code look like below and it is very simple.
*QueryParser parser = new QueryParser(Version.LUCENE_30, *
*"pattern", *
*new StandardAnalyzer(Version.LUCENE_30)); *
*org.apache.lucene.search.Query query1 =
parser.parse(this.query.getQuestion()); *
*TopDocs hits = is.search(query1, 10); *
Please advice
Thanks,
Lahiru
Re: Lucene Ranking Problem
Posted by Lahiru Samarakoon <la...@gmail.com>.
HI Ian & Umesh.
This is what I was looking for.
Thank a lot.
Regards,
Lahiru
Re: Lucene Ranking Problem
Posted by Umesh Prasad <um...@gmail.com>.
Hi Lahiru,
Comments are inline:
On Tue, Jan 18, 2011 at 5:42 PM, Lahiru Samarakoon <la...@gmail.com>wrote:
> Dear All,
>
> I have two documents. The analyzed and the tokenized contents are
> mentioned
> below.
>
> *Document 1 :*
>
> *when*, null_1, *my*, null_1, money,
>
> fund, amount, payment, creditcard, credit,
>
> card, *bank, account*, debit, deduct,
>
> *charge*, null_1, my, mobile, usage,
>
> *service*, connection
>
>
> *Document 2:*
>
> *when*, what, time, what, day,
>
> null_1, money, fund, cash, payment,
>
> null_1, i, do, you, i,
>
> null_1, deduct, *charge*, reduce, debit,
>
> from, *my*, *bank, account*, credit,
>
> card, null_1, *adsl*, adsl1, adsl-2,
>
> adsl-1, adsl2, adsl, 1, adsl,
>
> 2, usage, connection, *service*
>
>
> Then, I searched for the following text.
>
> *Query:* when my bank account charge adsl service
>
> *Scores
> *
>
> Document 1 = 0.74406385
>
> Document 2 = Score = 0.66067594
>
> Please read the documentation of lucene scoring.
http://lucene.apache.org/java/2_9_1/scoring.html.
That will help you understand the bigger picture.
> I was expecting to have Document 2 as the top ranked document. But I get
> Document 1 as the top ranked even it does not contains the term “adsl”.
>
> The word order of the Document 1 matches with the query very well. Can it
> be the reason ?
>
> Word order doesn't matter. However tf/idf , norms and other factors do
matter as described in above link.
You can get see how , documents got assigned score by using
IndexSearcher.explain(query,docId); as described in
http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/search/Searcher.html#explain%28org.apache.lucene.search.Query,%20int%29
If it is, how can I neglect the word order when searching. (I am not using
> phase queries).
>
> My searching code look like below and it is very simple.
>
>
> *QueryParser parser = new QueryParser(Version.LUCENE_30, *
>
> *"pattern", *
>
> *new StandardAnalyzer(Version.LUCENE_30)); *
>
> *org.apache.lucene.search.Query query1 =
> parser.parse(this.query.getQuestion()); *
>
> *TopDocs hits = is.search(query1, 10); *
>
> Please advice
>
>
> Thanks,
>
> Lahiru
>
--
---
Thanks & Regards
Umesh Prasad
Re: Lucene Ranking Problem
Posted by Ian Lea <ia...@gmail.com>.
See what Searcher.explain() says for each hit. I don't think that word
order will matter with the query you give. There are several factors
in scoring - see oal.search.Similarity or google lucene scoring.
Or have a play with Luke: invaluable for investigating things with
lucene and will tell you everything about your index.
--
Ian.
On Tue, Jan 18, 2011 at 12:12 PM, Lahiru Samarakoon <la...@gmail.com> wrote:
> Dear All,
>
> I have two documents. The analyzed and the tokenized contents are mentioned
> below.
>
> *Document 1 :*
>
> *when*, null_1, *my*, null_1, money,
>
> fund, amount, payment, creditcard, credit,
>
> card, *bank, account*, debit, deduct,
>
> *charge*, null_1, my, mobile, usage,
>
> *service*, connection
>
>
> *Document 2:*
>
> *when*, what, time, what, day,
>
> null_1, money, fund, cash, payment,
>
> null_1, i, do, you, i,
>
> null_1, deduct, *charge*, reduce, debit,
>
> from, *my*, *bank, account*, credit,
>
> card, null_1, *adsl*, adsl1, adsl-2,
>
> adsl-1, adsl2, adsl, 1, adsl,
>
> 2, usage, connection, *service*
>
>
> Then, I searched for the following text.
>
> *Query:* when my bank account charge adsl service
>
> *Scores
> *
>
> Document 1 = 0.74406385
>
> Document 2 = Score = 0.66067594
>
> I was expecting to have Document 2 as the top ranked document. But I get
> Document 1 as the top ranked even it does not contains the term “adsl”.
>
> The word order of the Document 1 matches with the query very well. Can it
> be the reason ?
>
> If it is, how can I neglect the word order when searching. (I am not using
> phase queries).
>
> My searching code look like below and it is very simple.
>
>
> *QueryParser parser = new QueryParser(Version.LUCENE_30, *
>
> *"pattern", *
>
> *new StandardAnalyzer(Version.LUCENE_30)); *
>
> *org.apache.lucene.search.Query query1 =
> parser.parse(this.query.getQuestion()); *
>
> *TopDocs hits = is.search(query1, 10); *
>
> Please advice
>
>
> Thanks,
>
> Lahiru
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org