You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by bourne71 <ga...@live.com> on 2009/07/31 12:01:36 UTC

Boosting Search Results

Hi, new here.

I recently started using lucene and had encounter a problem.I crawl and
index a number of documents. 
When i perform a search, lets say "tall fat", by right the results that
matches all the keyword should be on top and display first. 

But in my search results, some of the document with only 1 matches of the
keyword like 'tall' is display first. Why is that? What had i done wrong?

can anyone advise me on this? thanks
-- 
View this message in context: http://www.nabble.com/Boosting-Search-Results-tp24753954p24753954.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Boosting Search Results

Posted by Ian Lea <ia...@gmail.com>.
Hi


It's not quite that simple.  Other things being equal, results that
match all keywords are likely to come first but there are other
factors such as term frequency and the length of the document.

Searcher.explain() will give you the gory details.  Luke will let you
see what is in your index.  A google for relevancy or scoring will
give you more info.  DefaultSimilarity is the default scoring
implementation.


--
Ian.


On Fri, Jul 31, 2009 at 11:01 AM, bourne71<ga...@live.com> wrote:
>
> Hi, new here.
>
> I recently started using lucene and had encounter a problem.I crawl and
> index a number of documents.
> When i perform a search, lets say "tall fat", by right the results that
> matches all the keyword should be on top and display first.
>
> But in my search results, some of the document with only 1 matches of the
> keyword like 'tall' is display first. Why is that? What had i done wrong?
>
> can anyone advise me on this? thanks
> --
> View this message in context: http://www.nabble.com/Boosting-Search-Results-tp24753954p24753954.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Boosting Search Results

Posted by bourne71 <ga...@live.com>.
Sorry...I mean the double searching part. That is the part I dont understand
how to do...since after retrieving the 1st results, I am not sure how to
search it again.


Ian Lea wrote:
> 
> Sorry, I'm not clear what you don't know how to do.
> 
> 
> To spell out the double search suggestion a bit more:
> 
> QueryParser qp = new QueryParser(...)
> 
> Query q1 = qp.parse("+word1 +word2");
> TopDocs td1 = searcher.search(q1, ...)
> 
> Query q2 = qp.parse("word1 word2");
> TopDocs td2 = searcher.search(q2);
> 
> ScoreDoc[] sd1 = td1.scoreDocs;
> ScoreDoc[] sd2 = td2.scoreDocs;
> 
> // Grab all docids from first search
> List<Integer> docidl = new ArrayList<Integer>();
> for (int i1 = 0; i1 < sd1.length; i1++) {
>   docidl.add(sd1[i1].doc);
> }
> 
> // Add any docids from second search that are not already on the list
> for (int i2 = 0; i2 < sd2.length; i2++) {
>   int docid = sd2[i2].doc);
>   if (!docidl.contains(docid)) {
>     docidl.add(docid);
>   }
> }
> 
> (code just a suggestion, off the top of my head, may not work, may be
> full of bugs, there will be other maybe better ways to do it).
> 
> If that doesn't help, perhaps you could rephrase the question.
> 
> 
> --
> Ian.
> 
> 
> On Mon, Aug 3, 2009 at 10:51 AM, bourne71<ga...@live.com> wrote:
>>
>> Hey, thanks for the suggestion.
>> I think of performing 2 searches as well. Unfortunately I dont know how
>> to
>> perform a search on the first results return. Could u guide me a little?
>> I
>> tried to look around for the information but found none
>>
>> Thanks
>>
>> Ian Lea wrote:
>>>
>>> You could write your own Similarity, extending DefaultSimilarity and
>>> overriding whichever methods will help you achieve your aims.
>>>
>>> Or how about running 2 searches, the first with both words required
>>> (+word1 +word2) and then a second search where they aren't both
>>> required (word1 word2).  Then merge/dedup the two lists of hits,
>>> keeping the ones from the first search at the top.
>>>
>>>
>>> --
>>> Ian.
>>>
>>> On Mon, Aug 3, 2009 at 4:14 AM, bourne71<ga...@live.com> wrote:
>>>>
>>>> Thanks for all the reply. It help me to understand problem better, but
>>>> is
>>>> it
>>>> possible to create a query that will give additional boost to the
>>>> results
>>>> if
>>>> and only if both of the word is found inside the results. This will
>>>> definitely make sure that the results will be in the higher up of the
>>>> list.
>>>>
>>>> Can this type of query be created?
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/Boosting-Search-Results-tp24753954p24784708.html
>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Boosting-Search-Results-tp24753954p24788000.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Boosting-Search-Results-tp24753954p24800970.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Boosting Search Results

Posted by Ian Lea <ia...@gmail.com>.
Sorry, I'm not clear what you don't know how to do.


To spell out the double search suggestion a bit more:

QueryParser qp = new QueryParser(...)

Query q1 = qp.parse("+word1 +word2");
TopDocs td1 = searcher.search(q1, ...)

Query q2 = qp.parse("word1 word2");
TopDocs td2 = searcher.search(q2);

ScoreDoc[] sd1 = td1.scoreDocs;
ScoreDoc[] sd2 = td2.scoreDocs;

// Grab all docids from first search
List<Integer> docidl = new ArrayList<Integer>();
for (int i1 = 0; i1 < sd1.length; i1++) {
  docidl.add(sd1[i1].doc);
}

// Add any docids from second search that are not already on the list
for (int i2 = 0; i2 < sd2.length; i2++) {
  int docid = sd2[i2].doc);
  if (!docidl.contains(docid)) {
    docidl.add(docid);
  }
}

(code just a suggestion, off the top of my head, may not work, may be
full of bugs, there will be other maybe better ways to do it).

If that doesn't help, perhaps you could rephrase the question.


--
Ian.


On Mon, Aug 3, 2009 at 10:51 AM, bourne71<ga...@live.com> wrote:
>
> Hey, thanks for the suggestion.
> I think of performing 2 searches as well. Unfortunately I dont know how to
> perform a search on the first results return. Could u guide me a little? I
> tried to look around for the information but found none
>
> Thanks
>
> Ian Lea wrote:
>>
>> You could write your own Similarity, extending DefaultSimilarity and
>> overriding whichever methods will help you achieve your aims.
>>
>> Or how about running 2 searches, the first with both words required
>> (+word1 +word2) and then a second search where they aren't both
>> required (word1 word2).  Then merge/dedup the two lists of hits,
>> keeping the ones from the first search at the top.
>>
>>
>> --
>> Ian.
>>
>> On Mon, Aug 3, 2009 at 4:14 AM, bourne71<ga...@live.com> wrote:
>>>
>>> Thanks for all the reply. It help me to understand problem better, but is
>>> it
>>> possible to create a query that will give additional boost to the results
>>> if
>>> and only if both of the word is found inside the results. This will
>>> definitely make sure that the results will be in the higher up of the
>>> list.
>>>
>>> Can this type of query be created?
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Boosting-Search-Results-tp24753954p24784708.html
>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Boosting-Search-Results-tp24753954p24788000.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Boosting Search Results

Posted by bourne71 <ga...@live.com>.
Hey, thanks for the suggestion.
I think of performing 2 searches as well. Unfortunately I dont know how to
perform a search on the first results return. Could u guide me a little? I
tried to look around for the information but found none

Thanks

Ian Lea wrote:
> 
> You could write your own Similarity, extending DefaultSimilarity and
> overriding whichever methods will help you achieve your aims.
> 
> Or how about running 2 searches, the first with both words required
> (+word1 +word2) and then a second search where they aren't both
> required (word1 word2).  Then merge/dedup the two lists of hits,
> keeping the ones from the first search at the top.
> 
> 
> --
> Ian.
> 
> On Mon, Aug 3, 2009 at 4:14 AM, bourne71<ga...@live.com> wrote:
>>
>> Thanks for all the reply. It help me to understand problem better, but is
>> it
>> possible to create a query that will give additional boost to the results
>> if
>> and only if both of the word is found inside the results. This will
>> definitely make sure that the results will be in the higher up of the
>> list.
>>
>> Can this type of query be created?
>> --
>> View this message in context:
>> http://www.nabble.com/Boosting-Search-Results-tp24753954p24784708.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Boosting-Search-Results-tp24753954p24788000.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Boosting Search Results

Posted by Ian Lea <ia...@gmail.com>.
You could write your own Similarity, extending DefaultSimilarity and
overriding whichever methods will help you achieve your aims.

Or how about running 2 searches, the first with both words required
(+word1 +word2) and then a second search where they aren't both
required (word1 word2).  Then merge/dedup the two lists of hits,
keeping the ones from the first search at the top.


--
Ian.

On Mon, Aug 3, 2009 at 4:14 AM, bourne71<ga...@live.com> wrote:
>
> Thanks for all the reply. It help me to understand problem better, but is it
> possible to create a query that will give additional boost to the results if
> and only if both of the word is found inside the results. This will
> definitely make sure that the results will be in the higher up of the list.
>
> Can this type of query be created?
> --
> View this message in context: http://www.nabble.com/Boosting-Search-Results-tp24753954p24784708.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Boosting Search Results

Posted by henok sahilu <he...@yahoo.com>.
hello there 
i like to know about the Boosting Search results thing
thanks


--- On Sun, 8/2/09, bourne71 <ga...@live.com> wrote:

From: bourne71 <ga...@live.com>
Subject: Re: Boosting Search Results
To: java-user@lucene.apache.org
Date: Sunday, August 2, 2009, 8:14 PM


Thanks for all the reply. It help me to understand problem better, but is it
possible to create a query that will give additional boost to the results if
and only if both of the word is found inside the results. This will
definitely make sure that the results will be in the higher up of the list.

Can this type of query be created?
-- 
View this message in context: http://www.nabble.com/Boosting-Search-Results-tp24753954p24784708.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




      

Re: Boosting Search Results

Posted by bourne71 <ga...@live.com>.
Thanks for all the reply. It help me to understand problem better, but is it
possible to create a query that will give additional boost to the results if
and only if both of the word is found inside the results. This will
definitely make sure that the results will be in the higher up of the list.

Can this type of query be created?
-- 
View this message in context: http://www.nabble.com/Boosting-Search-Results-tp24753954p24784708.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Boosting Search Results

Posted by prashant ullegaddi <pr...@gmail.com>.
It might be because there are hardly any documents containing both the
words.
Try exact search: "\"tall fat\""

On Fri, Jul 31, 2009 at 3:31 PM, bourne71 <ga...@live.com> wrote:

>
> Hi, new here.
>
> I recently started using lucene and had encounter a problem.I crawl and
> index a number of documents.
> When i perform a search, lets say "tall fat", by right the results that
> matches all the keyword should be on top and display first.
>
> But in my search results, some of the document with only 1 matches of the
> keyword like 'tall' is display first. Why is that? What had i done wrong?
>
> can anyone advise me on this? thanks
> --
> View this message in context:
> http://www.nabble.com/Boosting-Search-Results-tp24753954p24753954.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Boosting Search Results

Posted by AHMET ARSLAN <io...@yahoo.com>.
> When i perform a search, lets say "tall fat", by right the
> results that matches all the keyword should be on top and display first.

Answer of your question lies at the end of this thread:

http://www.nabble.com/Generating-Query-for-Multiple-Clauses-in-a-Single-Field-td24694748.html


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org