You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Armbrust, Daniel C." <Ar...@mayo.edu> on 2004/04/14 18:16:25 UTC

Result scoring question

I know that the lucene scoring algorithm is pretty complicated, I know I don't understand all the pieces.  But given these documents:

A) - <preferred_designation> left renal calculus
B) - <other_designation> renal calculus

Should a query of 

other_designation:("renal calculus") OR preferred_designation:("renal calculus")

Score document B higher than document A?

Those documents are a made up example.  Here are the documents and scores I am getting back from the query on my real index:

Score 1.0 - Document<Text<first_word:left> Text<preferred_designation:left renal calculus in calyceal diverticulum> Unindexed<frequency:4> Text<codeTokenized:M00004001> Keyword<code:M00004001> Keyword<UNIQUE_DOCUMENT_IDENTIFIER_FIELD:48270>>

Score 0.85714287 - Document<Keyword<UNIQUE_DOCUMENT_IDENTIFIER_FIELD:514631> Keyword<code:M00035214> Text<codeTokenized:M00035214> Unindexed<frequency:4> Text<preferred_designation:left renal calculus in a solitary left kidney> Text<first_word:left>>

Score 0.7409672 - Document<Text<first_word:renal> Text<other_designation:renal calculus> Unindexed<frequency:3> Text<codeTokenized:M00032753> Keyword<code:M00032753> Keyword<UNIQUE_DOCUMENT_IDENTIFIER_FIELD:481129>>


Am I just making a dumb mistake somewhere?

Thanks, 

Dan

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


RE: Software for suggesting alternative words or sentences

Posted by Tate Avery <ta...@nstein.com>.
Also...

http://jazzy.sourceforge.net/


-----Original Message-----
From: Felix Huber [mailto:huberfelix@webtopia.de]
Sent: Friday, April 16, 2004 1:17 PM
To: Lucene Users List
Subject: Re: Software for suggesting alternative words or sentences


Check http://www.iu.hio.no/~frodes/sprell/sprell.html - it includes a german
and a norwegian dictionary.

Regards,
Felix Huber



Venu Durgam wrote:
> I was wondering if there is any open source software for suggesting
> alternative words or sentences for search queries like Google.
>
> Thanks
> Venu Durgam


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Software for suggesting alternative words or sentences

Posted by Felix Huber <hu...@webtopia.de>.
Check http://www.iu.hio.no/~frodes/sprell/sprell.html - it includes a german
and a norwegian dictionary.

Regards,
Felix Huber



Venu Durgam wrote:
> I was wondering if there is any open source software for suggesting
> alternative words or sentences for search queries like Google.
>
> Thanks
> Venu Durgam


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Software for suggesting alternative words or sentences

Posted by Venu Durgam <vd...@yahoo.com>.
I was wondering if there is any open source software for suggesting alternative words or sentences for search queries like Google. 

Thanks
Venu Durgam


Re: Result scoring question

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Try using IndexSearcher.explain (and then a toString on the resulting 
Explanation object) to see the details of why things are scoring how 
they are.  This can be most enlightening!

	Erik


On Apr 14, 2004, at 12:16 PM, Armbrust, Daniel C. wrote:

> I know that the lucene scoring algorithm is pretty complicated, I know 
> I don't understand all the pieces.  But given these documents:
>
> A) - <preferred_designation> left renal calculus
> B) - <other_designation> renal calculus
>
> Should a query of
>
> other_designation:("renal calculus") OR preferred_designation:("renal 
> calculus")
>
> Score document B higher than document A?
>
> Those documents are a made up example.  Here are the documents and 
> scores I am getting back from the query on my real index:
>
> Score 1.0 - Document<Text<first_word:left> 
> Text<preferred_designation:left renal calculus in calyceal 
> diverticulum> Unindexed<frequency:4> Text<codeTokenized:M00004001> 
> Keyword<code:M00004001> 
> Keyword<UNIQUE_DOCUMENT_IDENTIFIER_FIELD:48270>>
>
> Score 0.85714287 - 
> Document<Keyword<UNIQUE_DOCUMENT_IDENTIFIER_FIELD:514631> 
> Keyword<code:M00035214> Text<codeTokenized:M00035214> 
> Unindexed<frequency:4> Text<preferred_designation:left renal calculus 
> in a solitary left kidney> Text<first_word:left>>
>
> Score 0.7409672 - Document<Text<first_word:renal> 
> Text<other_designation:renal calculus> Unindexed<frequency:3> 
> Text<codeTokenized:M00032753> Keyword<code:M00032753> 
> Keyword<UNIQUE_DOCUMENT_IDENTIFIER_FIELD:481129>>
>
>
> Am I just making a dumb mistake somewhere?
>
> Thanks,
>
> Dan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org