You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by EDMOND KEMOKAI <ek...@gmail.com> on 2007/04/14 14:55:44 UTC

Re: PAGE RANKING IN LUCENE?

You'll have to implement your own ranking on top of Lucene.  Lucene only
gives you document scores, which is a measure of how well your query match a
document. Page rank determines how relevant a document is to your query, a
document might score well by having a lot of the query words, but it might
not be what you're looking for.

On 4/14/07, karl wettin <ka...@gmail.com> wrote:
>
>
> 14 apr 2007 kl. 06.19 skrev supereric:
>
> > I want to change the page ranking algorithm in lucene and I do not
> > know
> > where to start from and what file should I change?
> > I do not know what classes are involved. I have only a few days to
> > do so, so
> > please help me with your complete explanation as a big favor!
>
> Eric,
>
> Lucene has no built in page rank, however you might mean something
> else. It is easier to help if you explain what it is you want to achive.
>
> http://wiki.apache.org/jakarta-lucene/PageRank
>
>
> --
> karl
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
"talk trash and carry a small stick."
PAUL KRUGMAN (NYT)

Re: Scoring results?!

Posted by Peter Keegan <pe...@gmail.com>.

If I use BoostingTermQuery on a query containing terms without payloads, I
get very different results than doing the same query with TermQuery.
Presumably, this is because the BoostingSpanScorer/SpanScorer compute scores
differently than TermScorer. Is there a way to make BoostingTermQuery behave
like TermQuery for terms without payloads?

Peter


On 5/9/07, Grant Ingersoll <gs...@apache.org> wrote:
>
> Hi Eric,
>
> On May 9, 2007, at 2:39 AM, supereric wrote:
>
> >
> > How I can get the tag word score in lucene. suppose that you have
> > searched a
> > tag word and 3 hit documents
> > are now found.
> > 1 -How someone could find number of occurrences in any document so
> > it could
> > sort the results.
>
> Span Queries tell you where the matches occur in the document by
> offset, but I am not sure what your sorting criteria would be.  The
> explain method also can give you information about why a particular
> document scored a particular way.
>
>
> > Also I wan to have some other policies for ranking the results.
> > What should
> > I do to handle that. for example
> > I want to score boldfaced tag words in an html document twice
> > normal texts.
>
> Although totally experimental at this stage, the new Payload stuff in
> the trunk version of Lucene (or nightly builds) is designed for such
> a scenario.  Check out the BoostingTermQuery which can boost term
> scores based on the contents of a payload located at a particular
> term.  Feedback on the APIs is very much appreciated.
>
> > 2- How I can omit some tag words from the index?! for example
> > common words
> > in another language?
>
> See the StopFilter token filter and/or the StopwordAnalyzer
>
>
> >
> >
>
> HTH,
> Grant
>
> --------------------------
> Grant Ingersoll
> Center for Natural Language Processing
> http://www.cnlp.org/tech/lucene.asp
>
> Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/
> LuceneFAQ
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Scoring results?!

Posted by Grant Ingersoll <gs...@apache.org>.

Hi Eric,

On May 9, 2007, at 2:39 AM, supereric wrote:

>
> How I can get the tag word score in lucene. suppose that you have  
> searched a
> tag word and 3 hit documents
> are now found.
> 1 -How someone could find number of occurrences in any document so  
> it could
> sort the results.

Span Queries tell you where the matches occur in the document by  
offset, but I am not sure what your sorting criteria would be.  The  
explain method also can give you information about why a particular  
document scored a particular way.

> Also I wan to have some other policies for ranking the results.  
> What should
> I do to handle that. for example
> I want to score boldfaced tag words in an html document twice  
> normal texts.

Although totally experimental at this stage, the new Payload stuff in  
the trunk version of Lucene (or nightly builds) is designed for such  
a scenario.  Check out the BoostingTermQuery which can boost term  
scores based on the contents of a payload located at a particular  
term.  Feedback on the APIs is very much appreciated.

> 2- How I can omit some tag words from the index?! for example  
> common words
> in another language?

See the StopFilter token filter and/or the StopwordAnalyzer

>
>

HTH,
Grant

--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org/tech/lucene.asp

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ 
LuceneFAQ

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Scoring results?!

Posted by supereric <er...@yahoo.com>.

How I can get the tag word score in lucene. suppose that you have searched a
tag word and 3 hit documents
are now found. 
1 -How someone could find number of occurrences in any document so it could
sort the results.
Also I wan to have some other policies for ranking the results. What should
I do to handle that. for example 
I want to score boldfaced tag words in an html document twice normal texts.
2- How I can omit some tag words from the index?! for example common words
in another language?



EDMOND KEMOKAI wrote:
> 
> You'll have to implement your own ranking on top of Lucene.  Lucene only
> gives you document scores, which is a measure of how well your query match
> a
> document. Page rank determines how relevant a document is to your query, a
> document might score well by having a lot of the query words, but it might
> not be what you're looking for.
> 
> On 4/14/07, karl wettin <ka...@gmail.com> wrote:
>>
>>
>> 14 apr 2007 kl. 06.19 skrev supereric:
>>
>> > I want to change the page ranking algorithm in lucene and I do not
>> > know
>> > where to start from and what file should I change?
>> > I do not know what classes are involved. I have only a few days to
>> > do so, so
>> > please help me with your complete explanation as a big favor!
>>
>> Eric,
>>
>> Lucene has no built in page rank, however you might mean something
>> else. It is easier to help if you explain what it is you want to achive.
>>
>> http://wiki.apache.org/jakarta-lucene/PageRank
>>
>>
>> --
>> karl
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> 
> -- 
> "talk trash and carry a small stick."
> PAUL KRUGMAN (NYT)
> 
> 

-- 
View this message in context: http://www.nabble.com/PAGE-RANKING-IN-LUCENE----%3CNEED-URGENT-HELP%21%3E-tf3574992.html#a10389201
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org