You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Paolo Castagna <ca...@googlemail.com> on 2012/05/03 10:24:28 UTC
LARQ scores not normalized (Was: [ANN] Release of Apache Jena LARQ
1.0.0-incubating)
Tao wrote:
> Hi Paolo,
>
> Just noticed some change in the LARQ score. Originally the score seemed to
> be normalized to range [0, 1]. Now the score can be higher than 1. Is this a
> change of Lucene or LARQ?
>
> How can I get the old good [0, 1] LARQ score now?
>
> Thanks
> Tao
Hi Tao,
first of all, thanks.
I see... LARQ is now using Lucene 3.x and something might have changed there or
something went wrong while porting LARQ over Lucene 3.x new APIs.
Do you want to raise a JIRA issue for this?
https://issues.apache.org/jira/browse/JENA
The good news is that it should not be that difficult to fix and if you want you
can try submitting a patch for this.
All searches call the IndexLARQ.search(...) [1] method which does something like
this (reformatted):
TopDocs topDocs = ...
Map1<ScoreDoc,HitLARQ> converter = new Map1<ScoreDoc,HitLARQ>(){
public HitLARQ map1(ScoreDoc object) {
return new HitLARQ(searcher, object) ;
}} ;
Iterator<ScoreDoc> iterScoreDoc =
Arrays.asList(topDocs.scoreDocs).iterator() ;
Iterator<HitLARQ> iter =
new Map1Iterator<ScoreDoc, HitLARQ>(converter, iterScoreDoc) ;
return iter ;
There is a getMaxScore method in Lucene's TopDocs [2] which we can use to
normalize scores for the same query.
Paolo
[1]
http://svn.apache.org/repos/asf/incubator/jena/Jena2/LARQ/trunk/src/main/java/org/apache/jena/larq/IndexLARQ.java
[2]
http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/core/org/apache/lucene/search/TopDocs.html#getMaxScore%28%29
RE: LARQ scores not normalized (Was: [ANN] Release of Apache Jena LARQ 1.0.0-incubating)
Posted by "Tao (陶信东)" <ta...@myhexin.com>.
Thanks Paolo, the issue was created. Please check.
https://issues.apache.org/jira/browse/JENA-242
-----Original Message-----
From: Paolo Castagna [mailto:castagna.lists@googlemail.com]
Sent: Thursday, May 03, 2012 6:08 PM
To: jena-users@incubator.apache.org
Subject: Re: LARQ scores not normalized (Was: [ANN] Release of Apache Jena
LARQ 1.0.0-incubating)
Hi Tao,
please, go ahead and open a JIRA issue for this.
(I can do that if you prefer, but you found it and you should be the
'reporter'
of the issue).
Thanks,
Paolo
Tao (陶信东) wrote:
> Thanks Paolo. I want normalized scores to filter sparql results (so
> that only items above certain quality is shown).
>
> I know Lucene scores cannot ensure the quality of a search for the RDF
> literals. So maybe we should re-score LARQ with something else, e.g.
> minimal edit distance?
>
> Thanks
> Tao
>
> -----Original Message-----
> From: Paolo Castagna [mailto:castagna.lists@googlemail.com]
> Sent: Thursday, May 03, 2012 4:38 PM
> To: jena-users@incubator.apache.org
> Subject: Re: LARQ scores not normalized (Was: [ANN] Release of Apache
> Jena LARQ 1.0.0-incubating)
>
> By the way, Tao, why do you want/need normalized scores?
>
> "score values are meaningful only for purposes of comparison between
> other documents for the exact same query and the exact same index.
> when you try to compute a percentage, you are setting up an implicit
> comparison with scores from other queries."
> -- http://wiki.apache.org/lucene-java/ScoresAsPercentages
>
> So, perhaps, we should just keep it as it is and return to the users
> scores as we get them from Lucene (i.e. not normalized).
>
> What do you think?
>
> I imagine people would use scores for sorting results and/or find the
> highest match. Tao, are you using the scores for something else?
>
> Paolo
>
> Paolo Castagna wrote:
>> Tao wrote:
>>> Hi Paolo,
>>>
>>> Just noticed some change in the LARQ score. Originally the score
>>> seemed to be normalized to range [0, 1]. Now the score can be higher
>>> than 1. Is this a change of Lucene or LARQ?
>>>
>>> How can I get the old good [0, 1] LARQ score now?
>>>
>>> Thanks
>>> Tao
>> Hi Tao,
>> first of all, thanks.
>>
>> I see... LARQ is now using Lucene 3.x and something might have
>> changed there or something went wrong while porting LARQ over Lucene 3.x
new APIs.
>>
>> Do you want to raise a JIRA issue for this?
>> https://issues.apache.org/jira/browse/JENA
>>
>> The good news is that it should not be that difficult to fix and if
>> you want you can try submitting a patch for this.
>>
>> All searches call the IndexLARQ.search(...) [1] method which does
>> something like this (reformatted):
>>
>> TopDocs topDocs = ...
>> Map1<ScoreDoc,HitLARQ> converter = new Map1<ScoreDoc,HitLARQ>(){
>> public HitLARQ map1(ScoreDoc object) {
>> return new HitLARQ(searcher, object) ;
>> }} ;
>> Iterator<ScoreDoc> iterScoreDoc =
>> Arrays.asList(topDocs.scoreDocs).iterator() ;
>> Iterator<HitLARQ> iter =
>> new Map1Iterator<ScoreDoc, HitLARQ>(converter, iterScoreDoc) ;
>> return iter ;
>>
>> There is a getMaxScore method in Lucene's TopDocs [2] which we can
>> use to normalize scores for the same query.
>>
>> Paolo
>>
>> [1]
>> http://svn.apache.org/repos/asf/incubator/jena/Jena2/LARQ/trunk/src/m
>> a in/java/org/apache/jena/larq/IndexLARQ.java
>> [2]
>> http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/c
>> o
>> re/org/apache/lucene/search/TopDocs.html#getMaxScore%28%29
>
Re: LARQ scores not normalized (Was: [ANN] Release of Apache Jena
LARQ 1.0.0-incubating)
Posted by Paolo Castagna <ca...@googlemail.com>.
Hi Tao,
please, go ahead and open a JIRA issue for this.
(I can do that if you prefer, but you found it and you should be the 'reporter'
of the issue).
Thanks,
Paolo
Tao (陶信东) wrote:
> Thanks Paolo. I want normalized scores to filter sparql results (so that
> only items above certain quality is shown).
>
> I know Lucene scores cannot ensure the quality of a search for the RDF
> literals. So maybe we should re-score LARQ with something else, e.g. minimal
> edit distance?
>
> Thanks
> Tao
>
> -----Original Message-----
> From: Paolo Castagna [mailto:castagna.lists@googlemail.com]
> Sent: Thursday, May 03, 2012 4:38 PM
> To: jena-users@incubator.apache.org
> Subject: Re: LARQ scores not normalized (Was: [ANN] Release of Apache Jena
> LARQ 1.0.0-incubating)
>
> By the way, Tao, why do you want/need normalized scores?
>
> "score values are meaningful only for purposes of comparison between
> other documents for the exact same query and the exact same index.
> when you try to compute a percentage, you are setting up an implicit
> comparison with scores from other queries."
> -- http://wiki.apache.org/lucene-java/ScoresAsPercentages
>
> So, perhaps, we should just keep it as it is and return to the users scores
> as we get them from Lucene (i.e. not normalized).
>
> What do you think?
>
> I imagine people would use scores for sorting results and/or find the
> highest match. Tao, are you using the scores for something else?
>
> Paolo
>
> Paolo Castagna wrote:
>> Tao wrote:
>>> Hi Paolo,
>>>
>>> Just noticed some change in the LARQ score. Originally the score
>>> seemed to be normalized to range [0, 1]. Now the score can be higher
>>> than 1. Is this a change of Lucene or LARQ?
>>>
>>> How can I get the old good [0, 1] LARQ score now?
>>>
>>> Thanks
>>> Tao
>> Hi Tao,
>> first of all, thanks.
>>
>> I see... LARQ is now using Lucene 3.x and something might have changed
>> there or something went wrong while porting LARQ over Lucene 3.x new APIs.
>>
>> Do you want to raise a JIRA issue for this?
>> https://issues.apache.org/jira/browse/JENA
>>
>> The good news is that it should not be that difficult to fix and if
>> you want you can try submitting a patch for this.
>>
>> All searches call the IndexLARQ.search(...) [1] method which does
>> something like this (reformatted):
>>
>> TopDocs topDocs = ...
>> Map1<ScoreDoc,HitLARQ> converter = new Map1<ScoreDoc,HitLARQ>(){
>> public HitLARQ map1(ScoreDoc object) {
>> return new HitLARQ(searcher, object) ;
>> }} ;
>> Iterator<ScoreDoc> iterScoreDoc =
>> Arrays.asList(topDocs.scoreDocs).iterator() ;
>> Iterator<HitLARQ> iter =
>> new Map1Iterator<ScoreDoc, HitLARQ>(converter, iterScoreDoc) ;
>> return iter ;
>>
>> There is a getMaxScore method in Lucene's TopDocs [2] which we can use
>> to normalize scores for the same query.
>>
>> Paolo
>>
>> [1]
>> http://svn.apache.org/repos/asf/incubator/jena/Jena2/LARQ/trunk/src/ma
>> in/java/org/apache/jena/larq/IndexLARQ.java
>> [2]
>> http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/co
>> re/org/apache/lucene/search/TopDocs.html#getMaxScore%28%29
>
RE: LARQ scores not normalized (Was: [ANN] Release of Apache Jena LARQ 1.0.0-incubating)
Posted by "Tao (陶信东)" <ta...@myhexin.com>.
Thanks Paolo. I want normalized scores to filter sparql results (so that
only items above certain quality is shown).
I know Lucene scores cannot ensure the quality of a search for the RDF
literals. So maybe we should re-score LARQ with something else, e.g. minimal
edit distance?
Thanks
Tao
-----Original Message-----
From: Paolo Castagna [mailto:castagna.lists@googlemail.com]
Sent: Thursday, May 03, 2012 4:38 PM
To: jena-users@incubator.apache.org
Subject: Re: LARQ scores not normalized (Was: [ANN] Release of Apache Jena
LARQ 1.0.0-incubating)
By the way, Tao, why do you want/need normalized scores?
"score values are meaningful only for purposes of comparison between
other documents for the exact same query and the exact same index.
when you try to compute a percentage, you are setting up an implicit
comparison with scores from other queries."
-- http://wiki.apache.org/lucene-java/ScoresAsPercentages
So, perhaps, we should just keep it as it is and return to the users scores
as we get them from Lucene (i.e. not normalized).
What do you think?
I imagine people would use scores for sorting results and/or find the
highest match. Tao, are you using the scores for something else?
Paolo
Paolo Castagna wrote:
> Tao wrote:
>> Hi Paolo,
>>
>> Just noticed some change in the LARQ score. Originally the score
>> seemed to be normalized to range [0, 1]. Now the score can be higher
>> than 1. Is this a change of Lucene or LARQ?
>>
>> How can I get the old good [0, 1] LARQ score now?
>>
>> Thanks
>> Tao
>
> Hi Tao,
> first of all, thanks.
>
> I see... LARQ is now using Lucene 3.x and something might have changed
> there or something went wrong while porting LARQ over Lucene 3.x new APIs.
>
> Do you want to raise a JIRA issue for this?
> https://issues.apache.org/jira/browse/JENA
>
> The good news is that it should not be that difficult to fix and if
> you want you can try submitting a patch for this.
>
> All searches call the IndexLARQ.search(...) [1] method which does
> something like this (reformatted):
>
> TopDocs topDocs = ...
> Map1<ScoreDoc,HitLARQ> converter = new Map1<ScoreDoc,HitLARQ>(){
> public HitLARQ map1(ScoreDoc object) {
> return new HitLARQ(searcher, object) ;
> }} ;
> Iterator<ScoreDoc> iterScoreDoc =
> Arrays.asList(topDocs.scoreDocs).iterator() ;
> Iterator<HitLARQ> iter =
> new Map1Iterator<ScoreDoc, HitLARQ>(converter, iterScoreDoc) ;
> return iter ;
>
> There is a getMaxScore method in Lucene's TopDocs [2] which we can use
> to normalize scores for the same query.
>
> Paolo
>
> [1]
> http://svn.apache.org/repos/asf/incubator/jena/Jena2/LARQ/trunk/src/ma
> in/java/org/apache/jena/larq/IndexLARQ.java
> [2]
> http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/co
> re/org/apache/lucene/search/TopDocs.html#getMaxScore%28%29
Re: LARQ scores not normalized (Was: [ANN] Release of Apache Jena
LARQ 1.0.0-incubating)
Posted by Paolo Castagna <ca...@googlemail.com>.
By the way, Tao, why do you want/need normalized scores?
"score values are meaningful only for purposes of comparison between
other documents for the exact same query and the exact same index.
when you try to compute a percentage, you are setting up an implicit
comparison with scores from other queries."
-- http://wiki.apache.org/lucene-java/ScoresAsPercentages
So, perhaps, we should just keep it as it is and return to the users scores
as we get them from Lucene (i.e. not normalized).
What do you think?
I imagine people would use scores for sorting results and/or find the
highest match. Tao, are you using the scores for something else?
Paolo
Paolo Castagna wrote:
> Tao wrote:
>> Hi Paolo,
>>
>> Just noticed some change in the LARQ score. Originally the score seemed to
>> be normalized to range [0, 1]. Now the score can be higher than 1. Is this a
>> change of Lucene or LARQ?
>>
>> How can I get the old good [0, 1] LARQ score now?
>>
>> Thanks
>> Tao
>
> Hi Tao,
> first of all, thanks.
>
> I see... LARQ is now using Lucene 3.x and something might have changed there or
> something went wrong while porting LARQ over Lucene 3.x new APIs.
>
> Do you want to raise a JIRA issue for this?
> https://issues.apache.org/jira/browse/JENA
>
> The good news is that it should not be that difficult to fix and if you want you
> can try submitting a patch for this.
>
> All searches call the IndexLARQ.search(...) [1] method which does something like
> this (reformatted):
>
> TopDocs topDocs = ...
> Map1<ScoreDoc,HitLARQ> converter = new Map1<ScoreDoc,HitLARQ>(){
> public HitLARQ map1(ScoreDoc object) {
> return new HitLARQ(searcher, object) ;
> }} ;
> Iterator<ScoreDoc> iterScoreDoc =
> Arrays.asList(topDocs.scoreDocs).iterator() ;
> Iterator<HitLARQ> iter =
> new Map1Iterator<ScoreDoc, HitLARQ>(converter, iterScoreDoc) ;
> return iter ;
>
> There is a getMaxScore method in Lucene's TopDocs [2] which we can use to
> normalize scores for the same query.
>
> Paolo
>
> [1]
> http://svn.apache.org/repos/asf/incubator/jena/Jena2/LARQ/trunk/src/main/java/org/apache/jena/larq/IndexLARQ.java
> [2]
> http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/core/org/apache/lucene/search/TopDocs.html#getMaxScore%28%29