You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by EDMOND KEMOKAI <ek...@gmail.com> on 2007/04/14 14:55:44 UTC
Re: PAGE RANKING IN LUCENE?
You'll have to implement your own ranking on top of Lucene. Lucene only
gives you document scores, which is a measure of how well your query match a
document. Page rank determines how relevant a document is to your query, a
document might score well by having a lot of the query words, but it might
not be what you're looking for.
On 4/14/07, karl wettin <ka...@gmail.com> wrote:
>
>
> 14 apr 2007 kl. 06.19 skrev supereric:
>
> > I want to change the page ranking algorithm in lucene and I do not
> > know
> > where to start from and what file should I change?
> > I do not know what classes are involved. I have only a few days to
> > do so, so
> > please help me with your complete explanation as a big favor!
>
> Eric,
>
> Lucene has no built in page rank, however you might mean something
> else. It is easier to help if you explain what it is you want to achive.
>
> http://wiki.apache.org/jakarta-lucene/PageRank
>
>
> --
> karl
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
--
"talk trash and carry a small stick."
PAUL KRUGMAN (NYT)
Re: Scoring results?!
Posted by Peter Keegan <pe...@gmail.com>.
If I use BoostingTermQuery on a query containing terms without payloads, I
get very different results than doing the same query with TermQuery.
Presumably, this is because the BoostingSpanScorer/SpanScorer compute scores
differently than TermScorer. Is there a way to make BoostingTermQuery behave
like TermQuery for terms without payloads?
Peter
On 5/9/07, Grant Ingersoll <gs...@apache.org> wrote:
>
> Hi Eric,
>
> On May 9, 2007, at 2:39 AM, supereric wrote:
>
> >
> > How I can get the tag word score in lucene. suppose that you have
> > searched a
> > tag word and 3 hit documents
> > are now found.
> > 1 -How someone could find number of occurrences in any document so
> > it could
> > sort the results.
>
> Span Queries tell you where the matches occur in the document by
> offset, but I am not sure what your sorting criteria would be. The
> explain method also can give you information about why a particular
> document scored a particular way.
>
>
> > Also I wan to have some other policies for ranking the results.
> > What should
> > I do to handle that. for example
> > I want to score boldfaced tag words in an html document twice
> > normal texts.
>
> Although totally experimental at this stage, the new Payload stuff in
> the trunk version of Lucene (or nightly builds) is designed for such
> a scenario. Check out the BoostingTermQuery which can boost term
> scores based on the contents of a payload located at a particular
> term. Feedback on the APIs is very much appreciated.
>
> > 2- How I can omit some tag words from the index?! for example
> > common words
> > in another language?
>
> See the StopFilter token filter and/or the StopwordAnalyzer
>
>
> >
> >
>
> HTH,
> Grant
>
> --------------------------
> Grant Ingersoll
> Center for Natural Language Processing
> http://www.cnlp.org/tech/lucene.asp
>
> Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/
> LuceneFAQ
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: Scoring results?!
Posted by Grant Ingersoll <gs...@apache.org>.
Hi Eric,
On May 9, 2007, at 2:39 AM, supereric wrote:
>
> How I can get the tag word score in lucene. suppose that you have
> searched a
> tag word and 3 hit documents
> are now found.
> 1 -How someone could find number of occurrences in any document so
> it could
> sort the results.
Span Queries tell you where the matches occur in the document by
offset, but I am not sure what your sorting criteria would be. The
explain method also can give you information about why a particular
document scored a particular way.
> Also I wan to have some other policies for ranking the results.
> What should
> I do to handle that. for example
> I want to score boldfaced tag words in an html document twice
> normal texts.
Although totally experimental at this stage, the new Payload stuff in
the trunk version of Lucene (or nightly builds) is designed for such
a scenario. Check out the BoostingTermQuery which can boost term
scores based on the contents of a payload located at a particular
term. Feedback on the APIs is very much appreciated.
> 2- How I can omit some tag words from the index?! for example
> common words
> in another language?
See the StopFilter token filter and/or the StopwordAnalyzer
>
>
HTH,
Grant
--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org/tech/lucene.asp
Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/
LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Scoring results?!
Posted by supereric <er...@yahoo.com>.
How I can get the tag word score in lucene. suppose that you have searched a
tag word and 3 hit documents
are now found.
1 -How someone could find number of occurrences in any document so it could
sort the results.
Also I wan to have some other policies for ranking the results. What should
I do to handle that. for example
I want to score boldfaced tag words in an html document twice normal texts.
2- How I can omit some tag words from the index?! for example common words
in another language?
EDMOND KEMOKAI wrote:
>
> You'll have to implement your own ranking on top of Lucene. Lucene only
> gives you document scores, which is a measure of how well your query match
> a
> document. Page rank determines how relevant a document is to your query, a
> document might score well by having a lot of the query words, but it might
> not be what you're looking for.
>
> On 4/14/07, karl wettin <ka...@gmail.com> wrote:
>>
>>
>> 14 apr 2007 kl. 06.19 skrev supereric:
>>
>> > I want to change the page ranking algorithm in lucene and I do not
>> > know
>> > where to start from and what file should I change?
>> > I do not know what classes are involved. I have only a few days to
>> > do so, so
>> > please help me with your complete explanation as a big favor!
>>
>> Eric,
>>
>> Lucene has no built in page rank, however you might mean something
>> else. It is easier to help if you explain what it is you want to achive.
>>
>> http://wiki.apache.org/jakarta-lucene/PageRank
>>
>>
>> --
>> karl
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> --
> "talk trash and carry a small stick."
> PAUL KRUGMAN (NYT)
>
>
--
View this message in context: http://www.nabble.com/PAGE-RANKING-IN-LUCENE----%3CNEED-URGENT-HELP%21%3E-tf3574992.html#a10389201
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org